LAS VEGAS, May 17, 2023 – Tachyum™ today released the second edition of the “Tachyum Prodigy on the Leading Edge of AI Industry Trends” whitepaper featuring updates such as the implementation of improved 8-bit floating point (FP8) quantization aware techniques with adaptive scaling achieving 32-bit floating point (FP32) accuracy.
In the white paper, Tachyum demonstrates the optimality of the FP8 format for quantizing deep neural networks including weights, activations and gradients, exploiting the fact that floating point numbers provide better coverage than 8-bit integer data type. The results show that FP8 quantized networks can maintain accuracy on par – or even exceed the accuracy – of baseline FP32 models.
FP8 is essential for being able to do more with less. It achieves much higher performance at much lower power consumption and chip area than legacy technology like BFLOAT16. FP8 not only reduces the cost of computation but also memory requirements for large and rapidly growing AI models. The white paper features analysis and quantization errors for different models and datasets as well as how Tachyum amplifies the benefits of FP8 in terms of twice the performance, power efficiency and bandwidth reduction.
Visual models, large language AI models and generative AI are increasingly being included in a number of software applications, making AI an essential part of data processing and requiring tighter and lower latency integration into software. With FP8 capable of performing mainstream AI functions, leaders like Tachyum, are poised to help accelerate the rapid evolution of AI hardware technology. This will lead to a unification of specialized HPC and AI hardware modules into a single processing engine rather than integrating disparate chips into one package, which is a more costly and less satisfactory solution.
“Our experimental results show that FP8 enables faster training and reduced power consumption without any degradation in accuracy for a range of deep learning models,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “This is one of the most significant AI milestones Tachyum wanted to achieve before tape-out to ensure we have what it takes to make FP8 with sparsity and super-sparsity mainstream AI technology.”
After achieving the milestone of completing infrastructure around FP8 to achieve nearly the same precision as FP32 for training and inference, Tachyum is shifting its engineering focus to a yet-to-be-announced Tachyum AI (TAI) infrastructure, which provides the next level of AI beyond FP8. TAI is part of Tachyum’s current hardware and will be introduced later this year with results obtained from today’s mainstream AI applications.
Prodigy delivers a revolutionary new architecture that unifies the functionality of CPU, GPGPU, and TPU into a single chip. As a Universal Processor, Prodigy provides both the high performance required for cloud and HPC/AI workloads within a single architecture. Because of its utility for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains.
By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 128 high-performance custom-designed 64-bit compute cores, to deliver up to 4x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.
To read more about Prodigy’s capabilities in AI, including results from its implementation of FP8, interested parties can download Tachyum’s latest white paper.
Tachyum is transforming AI, HPC, public and private cloud data center markets with its recently launched flagship product. Prodigy, the world’s first Universal Processor, unifies the functionality of a CPU, a GPU, and a TPU into a single processor that delivers industry-leading performance, cost, and power efficiency for both specialty and general-purpose computing. When Prodigy processors are provisioned in a hyperscale data center, they enable all AI, HPC, and general-purpose applications to run on one hardware infrastructure, saving companies billions of dollars per year. With data centers currently consuming over 4% of the planet’s electricity, predicted to be 10% by 2030, the ultra-low power Prodigy Universal Processor is critical to continue doubling worldwide data center capacity every four years. Tachyum, co-founded by Dr. Radoslav Danilak is building the world’s fastest AI supercomputer (128 AI exaflops) in the EU based on Prodigy processors. Tachyum has offices in the United States and Slovakia. For more information, visit https://www.tachyum.com/.