Share
Home Technology DeepSeek-V3 vs. V2: How the new model outperforms its predecessor

DeepSeek-V3 vs. V2: How the new model outperforms its predecessor

The upgrade positions the Chinese AI startup competitively against OpenAI in the AI landscape
DeepSeek-V3 vs. V2: How the new model outperforms its predecessor
The core features implemented within DeepSeek-V3 brought substantial power increases to its operational capabilities.

Chinese artificial intelligence startup DeepSeek has lately evolved its language model to DeepSeek-V3, therefore significantly improving the performance and capabilities of open-source large language models (LLMs) compared to V2. This upgrade introduces several enhancements that improve both speed and efficiency in direct competition with OpenAI’s offerings, making it a noteworthy development in the AI landscape.

Key features of DeepSeek-V3

The core features implemented within DeepSeek-V3 brought substantial power increases to its operational capabilities. The system delivers remarkable speed increments via its 60 tokens per second performance threshold which surpasses the V2 version by three times. The application needs real-time processing and responsiveness so the higher processing speed compared to V2 becomes essential.

DeepSeek-V3 contains a model architecture that utilizes 671 billion parameters through its mixture-of-experts (MoE) structure. During inference the mixture-of-experts architecture functions by activating only a select number of experts which results in less overall CPU usage than V2.

deepseek south korea

DeepSeek-V3 receives its training data from an enormous database of 14.8 trillion high-quality tokens. A very large training dataset enables the model to develop superior capabilities for producing text that mimics natural human content, enhancing its output compared to V2.

The training cost for DeepSeek-V3 amounts to $6 million which stands as a much lower figure than most proprietary models, including V2. Organizations can choose this cost-effective AI solution for their advanced systems implementation needs because it avoids exorbitant investment costs compared to V2.

The improved load balancing mechanism in the model utilizes an auxiliary-loss-free approach to make expert selection more efficient during the training process. The enhanced feature stops routing collapses from occurring when it distributes experts so some experts receive excessive requests but others receive insufficient work, an improvement over V2.

DeepSeek-V3 executes multi-token prediction training objectives to enable simultaneous token prediction rather than sequential token prediction. The method supports better understanding of context while producing text output that keeps both meaning and relevance to the text, advancing beyond V2.

DeepSeek maintains its commitment to open-source principles through unrestricted access to both its models along with its research papers, including those related to V2.. This transparency fosters collaboration and innovation within the AI community.

deepseek south korea ban

Read more: DeepSeek’s national security threats trigger bans in Australia and Taiwan

API pricing

The API pricing for DeepSeek-V3 has been updated, with costs set at $0.27 per million tokens for cache misses, $0.07 for cache hits, and $1.10 for output tokens. This pricing structure remains competitive within the market, especially considering the model’s capabilities compared to V2.

Future developments

DeepSeek has indicated that this is just the beginning, with plans for multimodal support and other advanced features in the pipeline. The ongoing development aims to further bridge the gap between open-source models like V2 and closed-source models, promoting inclusivity in the field of artificial intelligence. 

The stories on our website are intended for informational purposes only. Those with finance, investment, tax or legal content are not to be taken as financial advice or recommendation. Refer to our full disclaimer policy here.