Maria Deutscher
2024-12-26 18:36:00
siliconangle.com
Chinese artificial intelligence developer DeepSeek today open-sourced DeepSeek-V3, a new large language model with 671 billion parameters.
The LLM can generate text, craft software code and perform related tasks. DeepSeek says it outperforms two of the most advanced open-source LLMs on the market across more than a half-dozen benchmark tests.
DeepSeek-V3 is based on a so-called mixture of experts, or MoE, architecture. It comprises multiple neural networks that are each optimized for a different set of tasks. When DeepSeek-V3 receives a prompt, a component known as a router sends the request to the neural network best-equipped to answer it.
The MoE architecture’s main benefit is that it reduces hardware costs. Sending a prompt to DeepSeek-V3 doesn’t activate the entire LLM, but only the specific neural network to which the request is routed. Each such neural network has 34 billion parameters, which means it requires a relatively limited amount of infrastructure to run.
Alongside its benefits, the MoE architecture also introduces certain challenges. During the training process, some of a MoE model’s neural networks receive more training data than the others, which can create inconsistencies in the LLM’s output quality. DeepSeek says it has developed a new method of mitigating this challenge and implemented it in DeepSeek-V3.
The LLM was trained on 14.8 trillion tokens’ worth of information. One token corresponds to a few letters or numbers. The training process took 2.788 million graphics processing unit hours, which means it used relatively little infrastructure. The industry’s most advanced AI clusters have tens of thousands of GPUs or more that can complete such a training project in a few days.
Alongside its MoE architecture, DeepSeek-V3 is equipped with several optimizations designed to boost its output quality.
LLMs use a technique called attention to identify the most important details in a sentence. DeepSeek-3 implements multihead latent attention, an improved version of the technique that allows it to extract key details from a text snippet several times rather than only once. This makes the LLM less likely to overlook important information.
DeepSeek-V also features a so-called multitoken prediction feature. Language models usually generate text one token at a time. DeepSeeek-V3, in contrast, generates several at once, which speeds up inference.
DeepSeek put its algorithm to the test by comparing it with three other open-source LLMs: the previous-generation DeepSeek-V2, Llama 3.1 405B and Qwen2.5 72B. DeepSeek-V3 achieved higher scores across all nine of the coding and math benchmarks that were used in the evaluation. It also proved better at a range of text processing tasks.
The code for DeepSeek-V3 is available on Hugging Face.
Image: Unsplash
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU
Enjoy the perfect blend of retro charm and modern convenience with the Udreamer Vinyl Record Player. With 9,041 ratings, a 4.3/5-star average, and 400+ units sold in the past month, this player is a fan favorite, available now for just $39.99.
The record player features built-in stereo speakers that deliver retro-style sound while also offering modern functionality. Pair it with your phone via Bluetooth to wirelessly listen to your favorite tracks. Udreamer also provides 24-hour one-on-one service for customer support, ensuring your satisfaction.
Don’t miss out—get yours today for only $39.99 at Amazon!
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.