Apple Drops Core AI: On-Device LLMs for Everyone

Apple released Core AI, a framework for on-device machine learning. It supports models up to 3.5 billion parameters. That's enough for small LLMs like Phi-3 or Llama-2-7B (quantized). The framework integrates with Core ML and Metal Performance Shaders.

What Core AI Does

Core AI handles inference and training on Apple Silicon. It uses the Neural Engine, GPU, and CPU. You can run models with minimal latency. Apple claims a 3B parameter model runs at 50 tokens per second on an M2 Ultra.

Code Example: Running a Model

import CoreAI

let modelURL = Bundle.main.url(forResource: "my_model", withExtension: "mlmodelc")!
let context = try CAIModelContext(url: modelURL)
let input = CAITensor(shape: [1, 512], dataType: .float16)
let output = try context.predict(input)
print(output)

Why It Matters

Core AI is Apple's answer to on-device AI. It keeps data private. No cloud calls. Developers can build apps that use LLMs without sending user data to servers. The framework supports LoRA fine-tuning directly on device.

Technical Details

  • Model Formats: Core ML (.mlpackage) and new .caimodel format
  • Quantization: INT8, FP16, and FP32
  • Operators: Attention, FeedForward, RMSNorm, RoPE
  • Memory: Uses up to 8GB for a 3.5B model

Performance Benchmarks

Apple published benchmarks on M2 Ultra:

  • 3B model: 50 tokens/s
  • 1.5B model: 100 tokens/s
  • 350M model: 300 tokens/s

All tests used FP16 precision.

Comparison to Core ML

Core AI is not a replacement. It's a higher-level API for transformer models. Core ML remains for vision and classical ML. Core AI adds:

  • Dynamic batching
  • KV cache management
  • Streaming outputs

Getting Started

You need Xcode 16 beta and macOS 15 Sequoia. Import CoreAI and add your model. Apple provides sample models for text generation, summarization, and code completion.

Limitations

  • Only works on Apple Silicon (M1 and later)
  • No GPU support on Intel Macs
  • Models must be converted to Core AI format
  • No distributed inference across devices

What Developers Should Do Now

  1. Download Xcode 16 beta
  2. Try the sample project on GitHub
  3. Convert your PyTorch model using coreai-convert tool
  4. Profile with Instruments (Core AI template)

Core AI is in beta. Expect changes. But the direction is clear: Apple wants on-device AI to be a first-class citizen.