Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Prego Has a Dinner-Conversation-Recording Device, Capisce?
  • AI CEOs think they can be everywhere at once
  • OpenAI’s GPT-5.4 Cyber: A Finely Tuned Model for Verified Security Defenders
  • Code Implementation for an AI-Powered Pipeline to Detect File Types and Perform Security Analysis with OpenAI and Magika
  • TabPFN’s superior accuracy on tabular data sets is achieved by leveraging in-context learning compared to Random Forest or CatBoost
  • Moonshot AI Researchers and Tsinghua Researchers propose PrfaaS, a cross-datacenter KVCache architecture that rethinks how LLMs can be served at scale.
  • OpenMythos – A PyTorch Open Source Reconstruction of Claude Mythos, where 770M Parameters match a 1.3B Transformator
  • This tutorial will show you how to run PrismML Bonsai 1Bit LLM using CUDA, Benchmarking and Chat with JSON, RAG, GGUF.All 128 weights have the same FP16 scaling factor. 1 bit (sign) + 16/128 bits (shared scale) = 1.125 bpw Compare Memory for Bonsai 1.7B:?It is 14.2 times smaller than Q1_0_g128!
AI-trends.todayAI-trends.today
Home»AI»Distillation can make AI models smaller and cheaper

Distillation can make AI models smaller and cheaper

AI By Gavin Wallace20/09/20255 Mins Read
Facebook Twitter LinkedIn Email
Netflix Adds ChatGPT-Powered AI to Stop You From Scrolling Forever
Netflix Adds ChatGPT-Powered AI to Stop You From Scrolling Forever
Share
Facebook Twitter LinkedIn Email

Original version You can also find out more about the following: this story The following appeared: Quanta Magazine.

Chinese AI firm DeepSeek launched a chatbot called R1 earlier this year, which attracted a lot of attention. It’s mostly a chatbot focused on the fact The stock prices of many Western tech companies plummeted as a result. Nvidia, which sells the chips that run leading AI models, was among the worst hit. The stocks of Western technology companies plunged as a result. Nvidia sells chips used to run the leading AI models. lost more stock value in a single day More than any other company in history

A certain amount of this attention was accompanied by accusations. Sources alleged You can also find out more about us here. DeepSeek had obtainedOpenAI’s proprietary model o1 can be accessed without authorization by using the distillation technique. Much of the news coverage This possibility was framed as a shocking event for the AI industry. It implied that DeepSeek’s AI had been improved.

Distillation is also known as knowledge distillation. It has been a topic of research in computer science for over a decade, and it’s a tool used by big tech firms on their models. “Distillation is one of the most important tools that companies have today to make models more efficient,” The following are some of the ways to get in touch with each other Enric Boix-AdseraA researcher at Wharton School of the University of Pennsylvania who studies distillation.

Dark Knowledge

It was a distillation that began the idea of distillation. a 2015 paper Three Google researchers, among them Geoffrey Hinton – the godfather of AI – and 2024 Nobel laureate. At the time, researchers often ran ensembles of models—”many models glued together,” The following are some of the ways to get in touch with each other Oriol Vinyals, a principal scientist at Google DeepMind and one of the paper’s authors—to improve their performance. “But it was incredibly cumbersome and expensive to run all the models in parallel,” “Vinyals” “We were intrigued with the idea of distilling that onto a single model.”

“Distillation is one of the most important tools that companies have today to make models more efficient.”

Enric Boix Adsera

The researchers believed they would make significant progress if they addressed a weak point of machine-learning algorithms. Wrong answers, no matter how bad they were, were viewed the same. For example, in an image classification model. “confusing a dog with a fox was penalized the same way as confusing a dog with a pizza,” Vinyals said. Researchers suspected ensemble models contained information on which incorrect answers are worse than others. It is possible that a smaller number of wrong answers are worse than others. “student” The model can use data from the large “teacher” Hinton called this model in order to better understand the categories into which pictures should be sorted. Hinton referred to this as “dark knowledge,” Invoking analogy to cosmological Dark Matter

Vinyals and Hinton discussed this option, then developed a solution to have the teacher’s model pass on more image information to a student model. Focusing on the key aspect was crucial. “soft targets” in the teacher model—where it assigns probabilities to each possibility, rather than firm this-or-that answers. Another model is, for instance, calculated There was a 30% chance of an image showing a dog. A 20% chance showed that the picture showed a cat. By utilizing these probabilities the model taught the students that dogs and cats are very similar. Cows and cars, however, differ quite a bit. This information, according to the researchers, would allow students to more easily identify images of cows, cars, or dogs. It is possible to reduce a big and complex model into a smaller one without losing accuracy.

Explosive Growth

It was not a hit right away. Vinyals became discouraged after his paper was turned down by a conference. The distillation process came to an important point. The engineers discovered that neural networks were more efficient the more data they used to train them. Models grew in size and complexity. capabilitiesThe cost of operating them increased in proportion to their size.

Researchers have used distillation to create smaller models. Google researchers, for example, unveiled in 2018 a powerful model of language called BERTThe company began to use it for parsing billions of searches on the web. BERT, however, was too expensive to maintain, and so developers developed a more manageable version called DistilBERT. It became popular in both business and research. It became common practice to distill alcohol, which is now provided by many companies. Google, OpenAIThen, Amazon. The original distillation papers, which were only published on arxiv.org’s preprint server until now, have been republished. been cited more than 25,000 times.

Since the data distillation process requires the access to the inner workings of the model teacher, it is not possible for DeepSeek to do this. That said, a student model could still learn quite a bit from a teacher model just through prompting the teacher with certain questions and using the answers to train its own models—an almost Socratic approach to distillation.

Other researchers are also finding new uses. NovaSky at UC Berkeley launched its first satellite in January. showed that distillation works well for training chain-of-thought reasoning modelsMultistep “thinking” To better answer complex questions. It says that its Sky-T1 fully open-source model costs less than $400 to train and achieves similar results as a larger open-source model. “We were genuinely surprised by how well distillation worked in this setting,” The following are some of the ways to get in touch with each other Dacheng Li, A Berkeley doctoral candidate and co-student of the NovaSky group. “Distillation is a fundamental technique in AI.”


Original story This article has been reprinted by permission. Quanta Magazine, The Independent is an independent editorial publication. Simons Foundation The magazine’s mission is to increase public awareness of science through the coverage of research trends, developments and advances in mathematics and physical and biological sciences.

artificial intelligence quanta magazine Science
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

Prego Has a Dinner-Conversation-Recording Device, Capisce?

20/04/2026

AI CEOs think they can be everywhere at once

20/04/2026

Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In

18/04/2026

Hacking the EU’s new age-verification app takes only 2 minutes

18/04/2026
Top News

Free Local RAG Scraper for Custom GPTs and Assistants • AI Blog

The jammer wants to block AI wearables that are always listening. This is probably not going to work

AI Blog: How can you become immortal? • AI Blog

Sears exposed AI chatbot phone calls and text messages to anyone on the web

Anthropic claims that Claude has its own set of emotions

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

Finest Time to Submit on Instagram in 2025 — New Information

23/07/2025

Sakana AI released ShinkaEvolve : an Open-Source Framework for Scientific Discovery that Provides Unprecedented Efficiency in Samples.

26/09/2025
Latest News

Prego Has a Dinner-Conversation-Recording Device, Capisce?

20/04/2026

AI CEOs think they can be everywhere at once

20/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.