Tachyum Details 4-bit and 2-bit New AI Frontier Formats

2023年10月24日 · 読むのに 4 分

LAS VEGAS, October 24, 2023 – Tachyum^{^®}, creator of Prodigy^®, the world’s first Universal Processor, today released a new white paper on the use of 4-bit Tachyum AI (TAI) format, including 2-bit effective per weight (TAI2) format, in both inference and Deep Neural Network (DNN) learning.

“Image AI Processing at the Next Level With 4b TAI & 2b Effective Weights” presents the 4-bit TAI format, an extremely silicon- and power-efficient approach that limits memory and bandwidth for even large models.

To reduce the computational complexity of neural networks, a process of quantization and pruning is used to reduce the number of parameters (weights) of a training model. Low-bit floating point formats have recently emerged as promising for Deep Neural Network (DNN) quantization. While INT8-level quantization is currently used, INT4—which would double throughput compared to INT8—is currently being evaluated by industry for feasibility and potential loss of accuracy and quality.

The TAI format is capable of significantly exceeding INT4 with a logarithmic 4-bit format and 2-bit effective weight format. Tachyum’s AI team has published experimental results showing that TAI2 effectively runs on Prodigy.

To verify the usability of the TAI formats with 4-bits and 2-bits effective per weight, Tachyum selected several models from the field of image processing and computer vision. From image, classifiers selected were ResNet20, ResNet32, ResNet34 and SWIN transformer; from segmentation models UNet, FastSCNN and ConvMixer; and from detectors SSD. The team tested well-known datasets such as CIFAR10, CIFAR100, and Imagenet for image classification tasks; Cityscapes and Kits19 for segmentation tasks; and VOC and COCO for object detection.

“The TAI format is far more powerful and efficient than INT4, and 2-bit per weight is ahead of any other technology of our time for reduced bandwidth and compute requirements,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “We are seeing the future of AI now, and it’s truly a new frontier, even for large training models and complex image processing.”

Tachyum experiments with training models in TAI with 2b effective weight formats have shown that it is a usable format for general models that are not specially optimized for specific tasks or for specific datasets. The degradation of the models was at an acceptable level, even in the case of post-training pruning. TAI format can also be used as a format suitable for pre-training the model.

Large models perform well even with 2b per weight; smaller models do not experience performance issues, so they are not a primary target.

Tachyum TPU^® (Tachyum Processing Unit) Inference intellectual property is available as a licensable core for models trained on Tachyum’s Prodigy Universal Processor chip.

“Image AI Processing at the Next Level With 4b TAI & 2b Effective Weights” white paper is now available on Tachyum’s website.

As a Universal Processor offering utility for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) on a single architecture. By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 192 high-performance custom-designed 64-bit compute cores, to deliver up to 4.5x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.

Follow Tachyum

https://twitter.com/tachyum

https://www.linkedin.com/company/tachyum

https://www.facebook.com/Tachyum/

About Tachyum

Tachyum is transforming the economics of AI, HPC, public and private cloud workloads with Prodigy, the world’s first Universal Processor. Prodigy unifies the functionality of a CPU, a GPU, and a TPU in a single processor to deliver industry-leading performance, cost and power efficiency for both specialty and general-purpose computing. As global data center emissions continue to contribute to a changing climate, with projections of their consuming 10 percent of the world’s electricity by 2030, the ultra-low power Prodigy is positioned to help balance the world’s appetite for computing at a lower environmental cost. Tachyum recently received a major purchase order from a US company to build a large-scale system that can deliver more than 50 exaflops performance, which will exponentially exceed the computational capabilities of the fastest inference or generative AI supercomputers available anywhere in the world today. When complete in 2025, the Prodigy-powered system will deliver a 25x multiplier vs. the world’s fastest conventional supercomputer – built just this year – and will achieve AI capabilities 25,000x larger than models for ChatGPT4. Tachyum has offices in the United States and Slovakia. For more information, visit https://www.tachyum.com/.