On-device LLMs in your Flutter app.

No servers. No cloud. Just Dart.

$ flutter pub add flutter_gemma

runs in your browser

6 platforms · multimodal · private · MIT
160/160 pub.dev points | Built with DeepWiki docs | DeepWiki — AI-indexed codebase

Everything you need for on-device AI

🧠

Multimodal

Vision + audio input with Gemma 4, Gemma3n, FastVLM

📞

Function Calling

Models call your Dart functions — structured tool use on-device

💭

Thinking Mode

See the reasoning chains of DeepSeek R1 & Gemma 4

🔍

On-device RAG

qdrant-edge native vector store, wa-sqlite on web

GPU Acceleration

Metal / Vulkan / WebGPU / DX12 — all backends covered

🔌

Modular

Core + 5 opt-in packages — ship only what you use

See it in action

Real on-device features — recorded on a phone, not a server.

Thinking mode

Watch the model reason step by step before it answers — fully on-device.

▶ Try it live

Function calling

The model calls your Dart functions with structured arguments — no server in the loop.

▶ Try it live

Multimodal with Gemma 4

Send an image and chat about it — vision and audio input, running locally with Gemma 4.

▶ Try it live

Platform support matrix

Platform Vision Audio Embeddings NPU
Android
iOS
Web
macOS
Windows
Linux

NPU support requires Intel LunarLake/PantherLake (Windows). iOS GPU pending upstream libLiteRtMetalAccelerator.dylib.

5 minutes to on-device inference

Register your engines once, install a model, create a chat session — then generate. The same Dart API across all six platforms.

dart
FlutterGemma.initialize(
  inferenceEngines: [LiteRtLmEngine(), MediaPipeEngine()],
  embeddingBackends: [LiteRtEmbeddingBackend()],
  vectorStore: QdrantVectorStore(),
);

await FlutterGemma.installModel(modelType: ModelType.gemma4)
    .fromNetwork('https://.../gemma-4-E2B-it.litertlm')
    .install();

final model = await FlutterGemma.getActiveModel(maxTokens: 2048);
final chat = await model.createChat();
await chat.addQueryChunk(Message.text(text: 'Hello!', isUser: true));
final response = await chat.generateChatResponse();

Need step-by-step setup? Read the full guide →

Supported models

All models run entirely on-device. Pick by capability, size, or platform support.

Gemma 4 E2B

Next-gen multimodal — text, image & audio

Gemma 4 E4B

Next-gen multimodal — higher capacity

Gemma3n E2B/E4B

Multimodal chat — image & audio

FastVLM 0.5B

Fast vision-language on desktop

Phi-4 Mini

Reasoning & instruction following

DeepSeek R1

Reasoning & code generation

Qwen3 0.6B

Compact multilingual with thinking

Qwen 2.5

Multilingual chat

Gemma 3 1B

Balanced text — all platforms

Gemma 3 270M

LoRA fine-tuning base

FunctionGemma 270M

On-device function calling

SmolLM 135M

Ultra-compact for edge devices

Why on-device?

🔒

Privacy

Data never leaves the device

✈️

Offline

Works with no network connection

💸

Zero cost

No API bills, no rate limits

Low latency

No round-trip to a server

Ship AI that never leaves the device.

Open source, MIT licensed, maintained by the community. Star the repo and help spread on-device AI for Flutter.