Search
Close this search box.

Microsoft and NVIDIA Revolutionise AI with New Advanced Language Models

Microsoft
xr:d:DAF_8jyx8Lc:3,j:8795952225011596054,t:24031910

Microsoft and NVIDIA have each introduced pioneering language models, leveraging innovative techniques to enhance performance and efficiency. Microsoft’s Phi-3.5 series and NVIDIA’s Mistral-NeMo-Minitron 8B stand out as leading examples of the latest advancements in AI technology.

Microsoft’s Phi-3.5: Innovating with Mixture of Experts

Microsoft has unveiled its new Phi-3.5 family of language models, marking a notable advancement in AI capabilities. The series includes three variants: Phi-3.5-Vision, Phi-3.5-MoE (Mixture of Experts), and Phi-3.5-Mini. Notably, Phi-3.5-MoE is Microsoft’s first foray into using Mixture of Experts technology, a method that allows the model to selectively engage different parts of its neural network, enhancing efficiency and output quality.

The Mixture of Experts approach enables the Phi-3.5-MoE model to operate with only 6.6 billion active parameters, despite involving sixteen underlying models or “experts.” This selective activation allows the model to perform at a level comparable to more complex systems, such as GPT-4o-mini, while remaining leaner and more computationally efficient. This technological innovation not only reduces the computational power required for training but also offers significant cost savings. For instance, the Phi-3.5-MoE was trained on 4.9 trillion tokens using 512 H100 GPUs, demonstrating its capability to handle extensive datasets with relative ease.

NVIDIA’s Mistral-NeMo-Minitron 8B: Efficiency Through Pruning and Distillation

Meanwhile, NVIDIA has introduced the Mistral-NeMo-Minitron 8B, a streamlined version of its earlier Mistral NeMo 12B model. The Mistral-NeMo-Minitron 8B employs a sophisticated method of model optimisation known as width pruning, combined with knowledge distillation. This technique refines the model by reducing its complexity without sacrificing performance.

Width pruning works by narrowing down the neural network, focusing on essential components while eliminating redundancies. NVIDIA achieved this by pruning both the embedding and MLP intermediate dimensions of the Mistral NeMo 12B model. Subsequent knowledge distillation allowed NVIDIA to train the smaller Minitron 8B model to retain much of the predictive accuracy of its larger predecessor. This method reduces the training dataset size by a factor of more than 40, making the process both cost-effective and environmentally friendly.

Implications for the Future of AI

Both Microsoft and NVIDIA’s approaches represent significant progress in AI model development, focusing on creating more powerful, efficient, and adaptable systems. Microsoft’s Phi-3.5-MoE showcases the potential of Mixture of Experts technology in improving model performance while keeping resource demands manageable. Similarly, NVIDIA’s Mistral-NeMo-Minitron 8B highlights the effectiveness of pruning and distillation in developing scalable, high-performance AI models.

Ad Banner

These advancements indicate a promising future where AI systems can be tailored to specific tasks, improving accuracy and efficiency while minimising resource consumption. The widespread availability of these models on platforms like Hugging Face also suggests a democratisation of AI technology, enabling developers to build and customise applications more readily.

As the AI landscape continues to evolve, the innovations introduced by Microsoft and NVIDIA set the stage for even more advanced models, pushing the boundaries of what artificial intelligence can achieve. With these developments, AI is poised to become even more integral to various industries, driving innovation and transforming how we interact with technology.

Share this article

Leave a Reply

Your email address will not be published. Required fields are marked *

Receive the latest news

Subscribe To Our Newsletter

Get notified about new articles