On March 22, Lao Huang, who both players and AI practitioners love and hate, came with his new “nuclear bomb”.Unfortunately, this new “nuclear bomb” has little to do with players, and is mainly aimed at the corporate and industrial markets. It is estimated that the RTX 40 series related to players will not have news until September at the earliest.
Well, without further ado, let’s see what kind of “big baby” Lao Huang brought out this time. First up is the successor to the A100 graphics card. A new generation of computing card H100 debuts. The H100 uses a new Hopper architecture and TSMC’s latest 4nm process. Compared with the previous generation A100, all aspects of the parameters have been significantly improved.
Nvidia’s super server chip Grace has also been exposed again. Compared with the data given last time, the performance of the Grace chip exposed this time has been surprisingly improved. According to the description of the conference, Nvidia seems to be on the same path as Apple. Use more chips to assemble the processor.
In addition to the exposure and release of hardware products, NVIDIA has also brought many new things in the software field, such as Omniverse Cloud, which focuses on cloud collaboration, allowing multiple users to directly participate in the editing and rendering of the same media file in the cloud.
In addition, NVIDIA also demonstrated a number of industrial and traffic simulation cases based on virtual reality environments, as well as a set of AI-driven virtual character systems. The system can perform action training through deep learning, and no additional skeletal action design is required after training. Actions can make corresponding actions according to the instructions. This is not only the ecstasy of AI practitioners, but also the ecstasy of film and game practitioners.
I have to say that Lao Huang brought a lot of things this time, each of which can bring obvious changes to the development of AI and other industries. Let’s take a closer look at what Nvidia has released.
H100 and Grace
Since last year, there has been news that Nvidia will release a new generation of computing cards this year, and will use a new Hopper architecture. The current news is accurate, but everyone speculates that the new generation of computing cards will use TSMC’s 5nm process, but now it seems that Nvidia has chosen to use the latest 4nm process step by step.Although it is essentially 5nm+, it is power consumption. It has better performance and can also integrate higher transistors.
In fact, judging from the core specifications of the H100, it is not difficult to understand why Nvidia finally chose 4nm, and the transistor integration level is as high as 80 billion, which is 26 billion more than the previous generation A100.The number of cores has increased to 16896, which is the chip core with the largest number of cores in the world and 2.5 times that of the previous generation 100.
The performance improvement brought by the exaggerated kernel parameter improvement is also extremely exaggerated.According to NVIDIA’s official data, the H100’s floating-point computing and tensor core computing capabilities will be at least 3 times higher than the previous generation, and the FP32 can reach up to 60 teraflops/second. , compared to 19.5 teraflops for the previous-generation A100.
The H100 will also be the first to support PCIe 5.0 and HBM3, allowing the memory bandwidth to reach an astonishing 3TB/s. Lao Huang said that currently only 20 H100s can handle global network traffic . Although it sounds exaggerated, it does reflect the exaggerated performance parameters of the H100.
Powerful performance also comes with exaggerated power consumption. The power consumption of the H100 given by NVIDIA is as high as 700W (a true “nuclear bomb” graphics card), compared to the previous generation A100. The power consumption is only 400W, in exchange for twice the power consumption. 3 times the performance improvement is not a loss as a whole.
The H100 is also optimized for the models used in AI training. It is equipped with a Transformer optimization engine, so that the training speed of large models can be increased by 6 times, which greatly reduces the time required to train artificial intelligence models for large models. The functionality also echoes the AI clone system that will be discussed below.
In the test data given by NVIDIA, training a GPT-3 model with 175 billion parameters reduced the time from the original one week to only 19 hours, and a Transforme model with 395 billion parameters only took 21 hours to complete. Training, the efficiency is increased by nearly 9 times.
Although the parameters look good, the actual performance has yet to be revealed by subsequent actual test results. At least from the experience of the RTX 30 series and A100, the final actual performance improvement may be between 2x and 2.5x. , it is unlikely to achieve 3 times, but even if it is only 2 times, it is already quite good, at least in terms of AI, it has completely crushed AMD’s computing cards.
In addition, H100 also introduces NVIDIA’s latest NVIDIA NVLink fourth-generation interconnect technology, which can further improve the efficiency of multi-GPU serialization. In the data given by NVIDIA, the I/O bandwidth after concatenation can be extended to 900GB/s, which is 50% higher than the previous generation.
Let’s take a look at Nvidia’s new “toy” Grace, which is Nvidia’s super server chip for the server business. series of products.The Grace chip uses the latest Arm V9 architecture, and Nvidia has built two super chips against it—the Grace Hopper and the Grace CPU super chips.
Among them, Grace Hopper consists of a Grace CPU and a GPU with Hopper architecture. The two will form a complete computing system . It only takes one chip to build a powerful computing server. Chips are connected in series to form larger computing arrays.
The Grace CPU super chip consists of two Grace CPUs, which are interconnected through NVIDIA NVLink-C2C technology to form a giant chip (Grace CPU Ultra) with built-in 144 Arm cores and 1TB/s memory bandwidth.
Let’s be honest, Nvidia’s Grace CPU super chip is hard not to be reminiscent of Apple’s M1 Ultra at its spring event. It is also based on the Arm architecture and also consists of two chips. It also has exaggerated features. Memory bandwidth and performance.
Obviously, chip interconnection and assembly technology has become one of the industry trends. AMD also revealed that a CPU with similar technology is under development and will meet you as early as 2023. It can only be said that the performance development of a single chip is now approaching its limit. If you want to have a bigger boost, you may have to use a similar interconnect technology for chip stacking.
However, the power consumption of the Grace CPU super chip is not low. The official data given by NVIDIA is 500W, which has far exceeded the traditional x86 architecture CPU. Of course, considering the exaggerated performance of the Grace CPU super chip: the SPECrate score is 740 points, 60% higher than the second place , such power consumption is not unacceptable.
Clearly, Nvidia has big ambitions in the Arm server space.
Nvidia’s virtual world
In addition to a bunch of high-performance hardware, NVIDIA also showed a lot of software demonstration cases this time, including using hardware such as H100 to simulate virtual reality environments for various tests and simulations. In NVIDIA’s demonstration, in the future, enterprises can build a realistic virtual test environment through powerful NVIDIA hardware, in which to test autonomous driving, smart factory operations, etc.
By using a virtual test environment, researchers can more easily test the feedback of autonomous driving in the face of various emergencies, and directly locate the problem during the test process, reducing the overall test cost. In addition, it is possible to build a 1:1 “digital factory”, simulate operations in advance, seek to improve efficiency, find possible problems, and reduce the probability of problems after the factory is officially operated.
Nvidia calls this set of applications a “digital twin,” which could significantly reduce research and testing investment in automated factories and autonomous driving.
Omniverse Cloud is a new cloud creation service from NVIDIA. Through Omniverse Cloud, users can access and edit large-scale 3D scenes anytime, anywhere, without waiting for the transfer of large amounts of data, and also allow users to directly collaborate online to build 3D models.
In the past, the collaborative construction of 3D models and 3D scenes needed to be done on the server. After Omniverse Cloud goes live, relevant creators can directly access the collaboration space and participate in it through any Omniverse Cloud-enabled terminal. It greatly improves the creator’s reaction speed and freedom of work.
In addition, NVIDIA also prepared a second surprise for creators, an AI-driven virtual character system that allows AI to complete training in a short time and learn actions corresponding to various commands. For example, for a simple hack and slash action, in the normal production process, the action architect first needs to adjust the action skeleton (commonly known as K frame) step by step, and then put it into the scene for testing. The whole process takes a lot of time. And each different action requires re-commissioning.
With this AI virtual character system, when you want the virtual model to perform a slashing action, only one command is needed, and the AI will find out the associated actions from the learned actions and run them automatically, saving the savings directly. A lot of time and manpower, for game developers and VFX creators, this system allows them to focus more of their energy elsewhere.
Although NVIDIA’s press conference did not mention the Metaverse too much, from hardware to software, it is the basis for building the Metaverse in the future. There are two main reasons why the Metaverse is currently not a reality. One is that the hardware performance cannot meet our needs, and the other is that the software field is not mature enough to provide real-time simulation of the real environment. The foundation of technology.
Before that, the first thing we need is more powerful computing hardware and smarter artificial intelligence systems.Nvidia’s H100, the advent of virtual reality environments and AI avatar systems, will bring us one step closer to a true Metaverse.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/nvidias-spring-launch-brings-hope-to-the-metaverse/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.