DeepSeek’s model release was a watershed moment in the AI industry, creating turmoil in the financial markets and accusations of theft from competitors, among other hot topics. However, there are several misconceptions floating around that should be clarified.
What does this release signify?
What is DeepSeek’s motivation?
What were the key developments – and what it means for AI compute architectures?
Does this mean more or less AI compute is needed?
Significance of DeepSeek Release
This release, and the resulting turmoil and conversations, shows that AI models and AI compute is a generational shift in the semiconductor industry. Beyond mainstream, AI now dominates the attention of business news cycles. This is because it has the potential to outstrip the economic impacts of almost all other industries. Billions of dollars are being invested in new AI approaches, thousands of engineers and scientists from all over the world are working on new techniques. The virtuous cycle will propel AI into every aspect of business and personal life. AI compute acceleration will need to be everywhere, from datacenters to handhelds.
DeepSeek’s Motivation – Efficiency
DeepSeek’s results are a result of the company’s drive to find more efficient ways of developing, training, and deploying generative AI models. Prior to this, many companies were using brute force in scaling their datacenters – adding compute at the cost of efficiency (and skyrocketing power consumption). This is because current AI acceleration approaches are inherently inefficient in cost and power. CPU and GPU architectures were developed well before the advent of modern AI models. DeepSeek looked to overcome the inherent inefficiencies of GPU-base compute.
Efficiency Drives Innovation – and Aligns with Untether AI’s Vision
Many of DeepSeek’s innovations were driven by a pursuit of AI model execution efficiency – the same goal of Untether AI’s At-Memory compute architecture. Some of the key innovations include:
- Spatial architecture: DeepSeek uses a Mixture of Experts (MoE) – a spatial neural network model architecture that activates one of many Expert Models, depending on the data and the task requested. This concept of having many models resident in silicon awaiting activations aligns with spatial compute architecture developed by Untether AI.
- Streaming data: DeepSeek bypassed CUDA code and went directly to bare-metal PTX programming in order to circumvent the cached-structure of GPUs. They sacrificed programming efficiency in order to gain compute efficiency. With Untether AI’s At-Memory compute architecture, this is not required – it naturally supports streaming data with its efficient cache-less structure. It was designed from the ground up to support spatial, streaming AI model architectures.
- Keeping data on-chip, reduce DRAM bandwidth usage: DeepSeek strove to keep its data on-chip, reducing the DRAM bandwidth bottlenecks and power increases. One of the techniques it used was Multi-Head Latent Attention, which reduces the size of KV caches and therefore external DRAM usage. This is the same philosophy as Untether AI’s At-Memory compute – keep the data as close to the compute as possible.
- FP8 datatypes: DeepSeek used an 8-bit floating point datatype for training and inference, in order to reduce memory footprint and improve compute efficiency. Untether AI embraced FP8 datatypes with the introduction of its speedAI devices in 2022.
Does DeepSeek innovation decrease the market for AI Acceleration?
This myth could not be further from the truth. The DeepSeek “moment” is actually a signpost that the work needs more AI acceleration, not less.
- Worldwide AI ecosystem: AI has been built on open-source models, and there are now thousands of well funded practitioners around the globe. We will continue to see more innovation and AI will spread around the world, from datacenters to the edge – fueling demand for AI acceleration.
- The rise of regional and sovereign AI: Localization and data security will drive the need for regional AI compute, whether it be in the datacenter or on the edge. DeepSeek shows that there will not be one-or-three hyperscalers servicing all the AI compute needs worldwide. Rather there will be AI compute throughout the world, driving the need for more efficient AI acceleration devices. See the recent announcement between Untether AI and Ola/Krutrim on the joint development next generation AI acceleration for India
The Future of AI is Efficiency
DeepSeek’s innovations reinforce the industry-wide shift toward AI solutions that prioritize efficiency, scalability, and precision—values that have always been at the core of Untether AI’s philosophy. Untether AI’s At-Memory Compute architecture, built for spatial neural networks and optimized data streaming, is a testament to the future of AI: a future where efficiency leads the way.