- May 17, 2023 Date of publishing
- 14 Pages
Tachyum Prodigy is the industry’s first universal processor, unifying the functionality of CPU, GPGPU, and TPU into a single monolithic chip. Prodigy’s revolutionary new architecture provides 6x more raw performance on AI training and inference workloads than the industry’s highest performing GPU, and up to 10x performance at the same power. Prodigy’s features include 128 high performance 64-bit processor cores running up to 5.7GHz with each core integrating a cutting-edge AI subsystem that includes a 4096-bit matrix processor supporting 16x16, 8x8, and 4x4 matrix operations. In addition, Prodigy’s memory subsystem integrates 16 DDR5 memory controllers that run up to DDR5-7200, providing the memory bandwidth and capacity to enable the highest performance processing of the most complex AI models.
This paper presents details of Prodigy’s AI subsystem and architecture, providing a deep dive into Prodigy’s AI features and how they deliver the highest performance for today’s demanding applications and workloads. The topics covered include trends in the industry driving the need for higher performance, mixed precision training, FP8 quantization, and sparsity, including Tachyum-invented super-sparsity.