A smartphone can now process complex logic by activating only a fraction of its digital brain
The latest generation of mobile artificial intelligence achieves cloud-level reasoning by selectively engaging only a small fraction of its digital architecture, drastically reducing the energy required for complex logic.
Google's Gemma 4 model can match the performance of massive systems like GPT-4 while operating locally on a smartphone by utilizing a Mixture-of-Experts (MoE) architecture. Instead of processing every bit of data through its entire neural network, the system routes tasks to 8 specialized 'experts,' activating only 375 million parameters per token out of its total 2 billion. This strategy slashes computational requirements by 75 percent, allowing the iPhone 16's neural processing unit to maintain a speed of 40 tokens per second.