On March 22, Lao Huang, whom players and AI practitioners both love and hate, came with his new “nuclear bomb”.Unfortunately, this new “nuclear bomb” has little to do with players, mainly for the enterprise and industrial markets. It is estimated that the RTX 40 series related to players will not have news until September at the earliest.
Well, without further ado, let’s see what kind of “big baby” Lao Huang brought out this time. The first is the successor of the A100 graphics card. The new generation of computing card H100 debuts. The H100 adopts the new Hopper architecture and TSMC’s latest 4nm process. Compared with the previous generation of A100, the parameters in all aspects are significantly improved.
Nvidia’s super server chip Grace has also been exposed again. Compared with the data given last time, the performance of the Grace chip exposed this time has been amazingly improved. According to the description of the conference, Nvidia seems to be on the same path as Apple. Use more chips to assemble a processor.
In addition to hardware product exposure and release, NVIDIA has also brought a lot of new things in the software field, such as Omniverse Cloud, which focuses on cloud collaboration, allowing multiple users to directly participate in the editing and rendering of the same media file in the cloud.
In addition, NVIDIA also demonstrated a number of industrial and traffic simulation cases based on virtual reality environments, as well as a set of AI-driven virtual character systems. The system can perform action training through deep learning, and no additional skeletal action design is required after training. The operation can make corresponding actions according to the instructions. This is not only the ecstasy of AI practitioners, but also the ecstasy of film and game practitioners.
I have to say that Lao Huang brought a lot of things this time, each of which can bring obvious changes to the development of AI and other industries. Let’s take a detailed look at what Yingweida has released.
H100 and Grace
Since last year, it has been reported that Nvidia will release a new generation of computing cards this year, and will use a new Hopper architecture. At present, the news is accurate, but everyone guessed that the new generation of computing cards will use TSMC’s 5nm process, but now it seems that NVIDIA has chosen to use the latest 4nm process in one step. Although it is essentially 5nm+, it is power consumption. It has better performance and can also integrate higher transistors.
In fact, judging from the core specifications of the H100, it is not difficult to understand why Nvidia finally chose 4nm,with a transistor integration level of up to 80 billion, which is 26 billion more than the previous generation A100, and the number of cores is increased to 16896. This is the chip core with the highest number of cores in the world, and it is also 2.5 times that of the previous generation 100.
The performance improvement brought by the exaggerated kernel parameter improvement is also extremely exaggerated. According to the official data given by NVIDIA, the floating-point computing and tensor core computing power of H100 will be at least 3 times higher than that of the previous generation, and the FP32 will be up to 60 teraflops/second. , while the previous generation A100 was 19.5 teraflops.
H100 will also be the first to support PCIe 5.0 and HBM3, allowing the memory bandwidth to reach an astonishing 3TB/s. Lao Huang said that only 20 H100s can handle the current global network traffic . Although it sounds exaggerated, it does reflect H100 exaggerated performance parameters.
Powerful performance is also accompanied by exaggerated power consumption. The power consumption of the H100 given by NVIDIA is as high as 700W (a true “nuclear bomb” graphics card), as compared to the previous generation A100. The power consumption is only 400W, but it is exchanged for twice the power consumption. 3 times the performance improvement is not a loss as a whole.
H100 also conducts targeted optimization for the models used in AI training, etc. It is equipped with an optimization engine for Transformer, so that the training speed of large models can be increased to 6 times the original, which greatly reduces the training required for large AI models. Time, this feature also echoes the AI avatar system that will be discussed below.
In the test data given by NVIDIA, training a GPT-3 model with 175 billion parameters will reduce the time from the original one week to only 19 hours, and a Transforme model with 395 billion parameters can be completed in only 21 hours. Training, the efficiency is increased by nearly 9 times.
Although the parameters look very good, the actual performance remains to be revealed by subsequent actual test results. At least from the experience of RTX 30 series and A100, the final actual performance improvement may be between 2 times and 2.5 times. , it is unlikely to achieve 3 times, but even if it is only 2 times, it is quite good, at least in terms of AI, it has completely crushed AMD’s computing cards.
Moreover, H100 also introduces NVIDIA’s latest NVIDIA NVLink fourth-generation interconnect technology, which can further improve the efficiency of multi-GPU series connection. In the data given by NVIDIA, the I/O bandwidth after series connection can be expanded to 900GB/s , 50% higher than the previous generation.
Let’s take a look at Nvidia’s new “toy” Grace, which is a super server chip prepared by Nvidia for the server business. series of products. The Grace chip uses the latest Arm V9 architecture, and Nvidia uses it as a benchmark to build two super chips – the Grace Hopper and the Grace CPU super chips.
Among them, Grace Hopper consists of a Grace CPU and a GPU with a Hopper architecture. The two will form a complete computing system . Only one chip can be used to build a powerful computing server. Chips are connected in series to form a larger computing array.
The Grace CPU super chip is composed of two Grace CPUs, which are interconnected through NVIDIA NVLink-C2C technology to form a giant-level chip with 144 built-in Arm cores and 1TB/s memory bandwidth (Grace CPU Ultra?).
To be honest, Nvidia’s Grace CPU super chip is hard not to be reminiscent of the M1 Ultra released by Apple at the spring conference. It is also based on the Arm architecture and is also composed of two chips. It also has exaggerated features. Memory bandwidth and performance.
Obviously, chip interconnection and assembly technology has become one of the trends in the industry, and AMD has also revealed that CPUs with similar technologies are under development, and will meet you as early as 2023. It can only be said that the performance development of a single chip is now approaching the limit. If you want to have a greater improvement, you may have to use similar interconnection technology for chip stacking.
However, the power consumption of the Grace CPU super chip is not low. The official data given by NVIDIA is 500W, which has far exceeded the traditional x86 architecture CPU. Of course, considering the exaggerated performance of the Grace CPU super chip: SPECrate runs 740 points , 60% higher than the second place , this power consumption is not unacceptable.
Obviously, in the field of Arm servers, Nvidia’s ambitions are very big.
Nvidia’s virtual world
In addition to a bunch of high-performance hardware, NVIDIA also exhibited a lot of software demonstration cases this time, including the use of H100 and other hardware to simulate a virtual reality environment for various tests and simulations. In NVIDIA’s demonstration, future enterprises can build a realistic virtual test environment through powerful NVIDIA hardware, and test autonomous driving, smart factory operation, etc. in it.
Through the use of the virtual test environment, researchers can more easily test the feedback of autonomous driving in the face of various emergencies, and directly locate the problem during the test, reducing the overall test cost. In addition, a 1:1 “digital factory” can be constructed to simulate the operation in advance, looking for improving efficiency and finding possible problems, so as to reduce the probability of problems after the official operation of the factory.
Nvidia calls this set of applications a “digital twin”, which can greatly reduce research and testing investment in automated factories and autonomous driving.
Omniverse Cloud is a new cloud creation service launched by NVIDIA. Through Omniverse Cloud, users can access and edit large-scale 3D scenes from anywhere without waiting for the transmission of large amounts of data, and also allow users to directly collaborate online to build 3D models.
In the past, the collaborative construction of 3D models and 3D scenes needed to be carried out on a server. After the launch of Omniverse Cloud, relevant creators can directly access the collaborative space and participate in it through any terminal that supports Omniverse Cloud. Greatly improved the creator’s response speed and work freedom.
In addition, NVIDIA also prepared a second surprise for creators, a set of AI-driven virtual character system, which allows AI to complete training in a short time and learn the actions corresponding to various commands. For example, for a simple slashing action, in the normal production process, the action architect first needs to adjust the action skeleton step by step (commonly known as K frame), and then put it in the scene for testing. The whole process takes a lot of time. And each different action needs to be re-debugged.
With the help of this AI virtual character system, when you want the virtual model to make a slashing action, you only need one command, and the AI will find out the associated actions from the learned actions and run them automatically, saving direct savings. A lot of time and manpower, for game developers and VFX creators, this system will allow them to focus more of their energy elsewhere.
Although NVIDIA’s conference did not mention the Metaverse too much, from hardware to software, it is the basis for building the Metaverse in the future. There are two main reasons why the Metaverse cannot become a reality at present. One is that the hardware performance cannot meet our needs, and the other is that the software field is not mature enough to provide real-time realistic environment simulation. The foundation of technology.
Before that, the first thing we need is more powerful computing hardware and smarter AI systems. Nvidia’s H100, the emergence of virtual reality environments and AI virtual character systems, will take us a big step closer to a true Metaverse.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/nvidias-spring-launch-gives-hope-to-the-metaverse/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.