AI firms follow DeepSeek’s lead, create cheaper models with “distillation” – Ars Technica
Through distillation, companies take a large language model—dubbed a “teacher” model—which generates the next likely word in a sentence. The teacher model generates data which then trains a smaller “student” model, helping to quickly transfer knowledge and predictions of the bigger model to the smaller one. While distillation has been widely used for years, recent advances have led industry experts to believe the process will increasingly be a boon for start-ups seeking cost-effective ways to build applications based on the technology. […]
Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.
Leave a comment