News Highlights:
- Arm Lumex CSS platform enables real-time on-device AI applications, featuring SME2-enabled Arm CPUs that offer up to 5x faster AI performance.
- Developers can leverage SME2 performance with KleidiAI, now integrated into major mobile OSes and AI frameworks, including PyTorch ExecuTorch, Google LiteRT, Alibaba MNN, and Microsoft ONNX Runtime.
- The Arm Lumex CSS platform achieves remarkable six years of double-digit IPC performance gains for flagship devices.
- New Mali G1-Ultra revolutionizes mobile entertainment for gamers, delivering a 2x improvement in ray tracing capabilities.
AI has emerged as the cornerstone of next-generation mobile and consumer technology. Users increasingly demand real-time assistance, seamless communication, and personalized content that is instant, private, and inherently device-based. Meeting these heightened expectations necessitates transformative advancements that enhance performance, privacy, and efficiency cohesively. Arm’s latest innovation addresses these needs, reports 24brussels.
Introducing Arm Lumex
To fulfill these demands, we introduce Arm Lumex, our most sophisticated compute subsystem (CSS) platform designed to accelerate AI functionalities on flagship smartphones and next-gen PCs.
Lumex integrates our highest-performing CPUs utilizing Scalable Matrix Extension version 2 (SME2), along with GPUs and system IP, enabling a rapid market introduction of AI devices and facilitating experiences ranging from desktop-class mobile gaming to real-time translation, advanced assistants, and customized applications.
We are implementing SME2 across every CPU platform. By 2030, SME and SME2 are expected to contribute over 10 billion TOPS of computing power across more than 3 billion devices, marking a significant leap in on-device AI capabilities.
Partners have the flexibility to choose how they incorporate Lumex into their SoCs—employing the platform as delivered for rapid commercialization or customizing the RTL platform for their specific needs, gaining significant benefits in performance and market timing.
The Lumex platform and our streamlined product naming conventions were announced earlier this year.
The platform comprises:
- A next-generation SME2-enabled Armv9.3 CPU cluster, including C1-Ultra and C1-Pro, powering flagship devices.
- The new C1-Premium, designed for sub-flagship markets, providing unmatched area efficiency.
- A new Mali G1-Ultra GPU, featuring next-gen ray tracing for advanced gaming graphics and a boost in AI performance.
- The most adaptable and power-conscious DynamIQ Shared Unit (DSU) Arm has developed to date: C1-DSU.
- Optimized implementations for 3nm technology nodes.
- Deep integration within the software ecosystem that facilitates seamless AI acceleration for developers using KleidiAI libraries.
Accelerated AI Everywhere with SME2-Enabled CPUs
The SME2-enabled Arm C1 CPU cluster provides substantial AI performance enhancements for real-world, AI-centric applications:
- Up to a 5x increase in AI performance.
- 4.7x reduction in latency for speech-related tasks.
- 2.8x acceleration in audio generation.
This advancement in CPU AI computing facilitates real-time, on-device AI inference, offering users enhanced and quicker interactions in audio creation, computer vision, and contextual assistant functionalities.
For instance, SME2-powered applications, such as the Smart Yoga Tutor, demonstrate a 2.4x enhancement in text-to-speech capability, allowing instantaneous feedback on poses without significantly draining battery life. Collaborating with Alipay and vivo resulted in a 40% reduction in LLM response time, affirming that SME2 is delivering rapid on-device generative AI responses.
SME2 is also unlocking powerful AI capabilities that conventional CPUs cannot achieve. Neural camera denoising, for example, operates at over 120fps in 1080p and 30fps in 4K, all on a single core, enabling smartphone users to capture detailed images even in low-light conditions.
Contrary to cloud-based AI—which typically grapples with latency, cost, and privacy issues—Lumex delivers intelligence directly to the device, ensuring it is quicker, safer, and consistently available. Major ecosystem players, including Alibaba, Alipay, Samsung LSI, Tencent, and vivo, are incorporating SME2 into their technologies.
Architectural Freedom for Every Product Tier
Lumex grants partners the latitude to optimize performance, efficiency, and silicon area across a variety of products from high-end smartphones to emerging AI-centric form factors:
CPU | Key benefit | Performance and efficiency gains | Ideal use cases |
C1-Ultra | Flagship performance | +25% single-thread efficiency Annual double-digit IPC gains |
Large-model inference, computational photography, content creation, generative AI |
C1-Premium | High performance with lower area usage | 35% smaller area than C1-Ultra | Sub-flagship mobile, voice assistants, multitasking |
C1-Pro | Efficient performance | +16% sustained performance | Video playback, streaming inference |
C1-Nano | Highly efficient | +26% efficiency with reduced area | Wearables, compact designs |
Enabling Desktop-Class Gaming and Enhanced AI Inference on Mali GPU
With over 12 billion Arm GPUs delivered to date, Arm is integral to mobile gaming experiences. The new Arm Mali G1-Ultra GPU continues to elevate mobile gaming by offering high-fidelity, console-level graphics, supported by the innovative Ray Tracing Unit v2 (RTUv2), which achieves a 2x enhancement in ray tracing performance over its predecessor. Additionally, for AI tasks, the G1-Ultra provides up to 20% quicker inference performance, refining responsiveness across real-time applications.
The Mali G1-Ultra demonstrates a 20% enhancement in graphics benchmarks compared to prior generations, with substantial improvements seen in popular titles like Arena Breakout, Fortnite, Genshin Impact, and Honkai Star Rail. Meanwhile, the G1-Premium and G1-Pro GPUs deliver exceptional performance and power efficiency for devices with limited resources.