We have received requests for AIStorm’s take on Deepseek. We augmented our opinions from discussions with investors, academics, and corporations.
In summary, the fear underlying Deepseek and further expressed by huge equity losses, is that the air has come out of AI. That somehow the oversized valuations of a few companies like nVidia or OpenAI misled investors as to the vulnerabilities.
As to the first, the concensus is rather the opposite – that reducing the hardware costs of AI will expand the total available market (TAM).
As to the second, this is true. AIStorm has long questioned the wisdom of over-inflating a company designing video game chips. We predicted that there would be disruption and here we are. nVidia’s chips are too complex, too overburdened, and too expensive.
Deepseek is just one in a chain of significant model improvements. AlexNet, a model for face recognition, required 60M weights, but today is replaced with models that do the job with less than 1M. This is due to ongoing innovation in model mathematics.
The problem is associating “breakthrough” functionality with the efforts of a few huge companies or groups who introduce or market them. In reality: i) the mathematics of model improvements is a parallel effort by an enormous number of players (the research herd) and; ii) these efforts are evolutionary.
The general purpose transformer (GPT) which preceeded a sudden public awakening did not spring into existence through the efforts of one company. An evolution including RNNs, LSTMs, GRUs, early attention, BERT, etc. brought feedforward, residual, feedback & transformer innovations. The landmark paper, “Attention is all you need,” sums up the eventual epiphany that a group of researchers culled from the efforts of the herd. ChatGPT caught the publics’ attention by making the technology available, not for inventing the plumbing.
History suggests the danger of overfunding a few companies while ignoring the chance that an innovation from the herd could quickly undermine such efforts and not questioning the specifics of their contribution.
Deepseek highlighted:
1. That a precision of 8 bits (FP8) – or even less – is sufficient (instead of 32 or more), reducing processing & memory overhead significantly. nVidia wastes a lot of silicon dealing with unnecessarily large numbers.
2. That one can Bypass the Cuda Ecosystem, overcoming the argument that nVidia had created a moat like Microsoft Windows.
3. That far fewer weights (memory) are/(is) required.
4. That no single vendor or group of vendors is going to dominate model development.
Ultimately, this disruption is the air coming out of nVidia and going into companies like AIStorm. AIStorm’s technology can run models like Deepseek far more efficiently due to our charge domain processing. For more details, check out our documentation