the broken, on 08 March 2025 - 07:31 PM, said:
QuickTidal, on 03 March 2025 - 01:27 PM, said:
DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts – SemiAnalysis
Quote
DeepSeek's price and efficiencies caused the frenzy this week, with the main headline being the "$6M" dollar figure training cost of DeepSeek V3. This is wrong. This akin to pointing to a specific part of a bill of materials for a product and attributing it as the entire cost. The pre-training cost is a very narrow portion of the total cost.
Training Cost
We believe the pre-training number is nowhere the actual amount spent on the model. We are confident their hardware spend is well higher than $500M over the company history. To develop new architecture innovations, during the model development, there is a considerable spend on testing new ideas, new architecture ideas, and ablations. Multi-Head Latent Attention, a key innovation of DeepSeek, took several months to develop and cost a whole team of manhours and GPU hours.
The $6M cost in the paper is attributed to just the GPU cost of the pre-training run, which is only a portion of the total cost of the model. Excluded are important pieces of the puzzle like R&D and TCO of the hardware itself. For reference, Claude 3.5 Sonnet cost $10s of millions to train, and if that was the total cost Anthropic needed, then they would not raise billions from Google and tens of billions from Amazon. It's because they have to experiment, come up with new architectures, gather and clean data, pay employees, and much more.
People ran with the 6m headline, even though it was just the final training run cost. Without that price tag, it's just another AI like the others.
First of all, you claimed they lied, but they did in fact specify from the beginning that they were referring to training cost.
Second, the cost of training the model actually was shockingly low, and a major breakthrough. (And it's not called a "training run" in English---it's training the model, the process by which the artificial neural network "learns" from the training data.)
On that note, in case anyone missed it:
Quote
Like DeepSeek, the new Qwen model is open source, and both Chinese projects are reasoning models that excel at technical work.
The Qwen release is one more sign that plenty of further progress is possible in making this kind of AI more efficient.
Reasoning models are making the transition from "amazing new breakthrough" to "cheap and widely available commodity" in record time.
[... Meanwhile, in the West:]
[...] After a week of hands-on experience with OpenAI's latest and biggest model, GPT-4.5, AI experts remain a little puzzled by it, given that it costs a fortune to use yet doesn't break benchmark records.
But one consensus has emerged among fans of GPT-4.5: The new model has "taste."
https://www.axios.co...-45-qwen-sesame
lol, oh yes... a taste of poppycockshitstuffedgarbageinferno (that's a German word, isn't it?...).
And Trump is doing a lot to sabotage the US advantage in AI hardware---and not just by saying he wants to revoke the CHIPS act (which would be difficult, though I'd suppose he could withhold further funds or come up with some other totalitarian bullshit to sabotage it):
Quote
[...] rebuilding the industrial ecosystem after decades of offshoring is a time-consuming and complex process. There is a shortage of skilled labour in the US[...] In discouraging imports, the US would not be able to produce many products due to a lack of capability to manufacture all necessary components and parts.
[...] Making 2-nanometre semiconductors needs Zeiss lenses from Germany, extreme ultraviolet light from ASML in the Netherlands and speciality gases from Japan, as well as materials from Applied Materials and testing equipment from KLA in the US. Tariffs [...] would drive up costs and slow technological progress.
While the US is burning bridges with its protectionist trade policies, China is building bridges
This post has been edited by Azath Vitr (D'ivers: 08 March 2025 - 09:05 PM