The Phi-3.5 Family

Microsoft released the Phi-3.5 model family in September 2025, comprising Phi-3.5 Mini (3.8B parameters), Phi-3.5 MoE (16x3.8B parameters, 6.6B active), and Phi-3.5 Vision (multimodal, 4.2B parameters). All were available as open-weight models on Hugging Face and through Azure AI Studio. The family continued Microsoft's Phi series strategy: demonstrating that smaller models trained on higher-quality data could match or exceed much larger models on many practical tasks.

Performance vs Size

Phi-3.5 Mini (3.8B parameters) achieved 69.0% on MMLU, competitive with models three to five times its size from 2023. On HumanEval coding, it scored 62.8% — remarkable for a model small enough to run on a modern smartphone's NPU. The key insight behind Phi's performance was training data curation: Microsoft's "textbook quality" data philosophy filtered training data to include only high-quality educational content, code, and reasoning examples, compensating for smaller model capacity with better data.

Device Deployment

Phi-3.5 Mini could run at 30+ tokens/second on an iPhone 15 Pro using Apple's Core ML with Neural Engine acceleration, and at similar speeds on Qualcomm Snapdragon 8 Gen 3 devices. Memory requirements of approximately 2.5GB (4-bit quantised) fit comfortably within mobile device constraints. Microsoft published integration guides for iOS, Android, and Windows applications, with sample apps demonstrating on-device translation, summarisation, and Q&A without any API calls.

Phi-3.5 Vision

Phi-3.5 Vision added image understanding to the Phi family, achieving performance competitive with GPT-4V on several vision benchmarks while being small enough for on-device deployment. Use cases included document understanding from photos, product identification from images, and visual Q&A — all without sending images to cloud servers.

What This Means for Indian Businesses

Microsoft Phi-3.5 is strategically important for India's tier-2 and tier-3 markets where connectivity is intermittent and device memory is limited. A 3.8B parameter model that runs at 30+ tokens/second on a modern smartphone enables AI-powered applications that function offline — particularly relevant for India's agricultural sector, rural healthcare workers, and field sales teams who need intelligent assistance in areas with poor network coverage.