flutter_gemma supports Gemma 4, Gemma3n, FastVLM, Gemma 3, FunctionGemma, Qwen3,
Qwen 2.5, Phi-4, DeepSeek R1, SmolLM and more. Desktop platforms (macOS, Windows,
Linux) require the .litertlm model format.
Model file types#
Flutter Gemma supports different model file formats, grouped into two types based on how chat templates are handled.
Type 1: MediaPipe-managed templates#
.taskfiles: MediaPipe-optimized format for mobile (Android/iOS)..litertlmfiles: LiteRT-LM format for Android, iOS, and Desktop.
Both formats have identical behavior — chat templates are handled internally.
Type 2: Manual template formatting#
.binfiles: standard binary format..tflitefiles: LiteRT format (formerly TensorFlow Lite).
Both formats require manual chat template formatting in your code.
Format by platform#
| Format | Android | iOS | Web | Desktop | Use Case |
|---|---|---|---|---|---|
.task |
✅ | ✅ | ✅ | ❌ | Older models (Gemma3n, Gemma 3, DeepSeek, Qwen 2.5, Phi-4) |
.litertlm |
✅ | ✅ ¹ | ❌ | ✅ | Newer models (Gemma 4, Qwen3, FastVLM + desktop for all) |
-web.task |
❌ | ❌ | ✅ | ❌ | Web-specific builds (e.g. Gemma 4, Gemma3n) |
.bin |
✅ | ✅ | ✅ | ❌ | Manual chat template formatting required |
.tflite |
✅ | ✅ | ✅ | ✅ | Embeddings only (EmbeddingGemma, Gecko) |
¹ iOS .litertlm runs on the FFI engine — vision and audio supported on physical
devices. The Simulator stays CPU-only because Metal sim has a 256 MB
single-allocation cap.
Model capabilities#
| Model Family | Best For | Function Calling | Thinking Mode | Vision | Languages | Size |
|---|---|---|---|---|---|---|
| Gemma 4 E2B | Next-gen multimodal chat — text, image, audio | ✅ | ✅ | ✅ | Multilingual | 2.4GB |
| Gemma 4 E4B | Next-gen multimodal chat — text, image, audio | ✅ | ✅ | ✅ | Multilingual | 4.3GB |
| Gemma3n | On-device multimodal chat and image analysis | ✅ | ❌ | ✅ | Multilingual | 3-6GB |
| FastVLM 0.5B | Fast vision-language inference | ❌ | ❌ | ✅ | Multilingual | 0.5GB |
| Phi-4 Mini | Advanced reasoning and instruction following | ✅ | ❌ | ❌ | Multilingual | 3.9GB |
| DeepSeek R1 | High-performance reasoning and code generation | ✅ | ✅ | ❌ | Multilingual | 1.7GB |
| Qwen3 0.6B | Compact multilingual chat with function calling | ✅ | ✅ | ❌ | Multilingual | 586MB |
| Qwen 2.5 | Strong multilingual chat and instruction following | ✅ | ❌ | ❌ | Multilingual | 0.5-1.6GB |
| Gemma 3 1B | Balanced and efficient text generation | ✅ | ❌ | ❌ | Multilingual | 0.5GB |
| Gemma 3 270M | Ideal for fine-tuning (LoRA) for specific tasks | ❌ | ❌ | ❌ | Multilingual | 0.3GB |
| FunctionGemma 270M | Specialized for function calling on-device | ✅ | ❌ | ❌ | Multilingual | 284MB |
| SmolLM 135M | Ultra-compact, resource-constrained devices | ❌ | ❌ | ❌ | English | 135MB |
| TranslateGemma 4B † | Single-shot 55-language translation | ❌ | ❌ | ❌ | 55 languages | 2-4GB |
ModelType reference#
When installing models, specify the correct ModelType:
| Model Family | ModelType | Examples |
|---|---|---|
| Gemma 4 | ModelType.gemma4 |
Gemma 4 E2B, Gemma 4 E4B (native function-call tokens) |
| Gemma 3 / Gemma3n | ModelType.gemmaIt |
Gemma 3 1B, Gemma 3 270M, Gemma3n E2B/E4B |
| DeepSeek | ModelType.deepSeek | DeepSeek R1 |
| Qwen 2.5 | ModelType.qwen |
Qwen 2.5 1.5B, Qwen 2.5 0.5B |
| Qwen 3 | ModelType.qwen3 | Qwen3 0.6B |
| FunctionGemma | ModelType.functionGemma |
FunctionGemma 270M IT |
| Phi | ModelType.phi | Phi-4 Mini |
| General | ModelType.general |
FastVLM 0.5B, SmolLM 135M |
Usage example:
// Gemma models
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromNetwork(url).install();
// DeepSeek models
await FlutterGemma.installModel(modelType: ModelType.deepSeek)
.fromNetwork(url).install();
// Phi-4 (uses general type)
await FlutterGemma.installModel(modelType: ModelType.general)
.fromNetwork(url).install();
Supported models & platforms#
| Model | Size | Desktop | Mobile | Web |
|---|---|---|---|---|
| Gemma 4 E2B | 2.4GB | ✅ | ✅ | ✅ |
| Gemma 4 E4B | 4.3GB | ✅ | ✅ | ✅ |
| Gemma3n E2B | 3.1GB | ✅ | ✅ | ✅ |
| Gemma3n E4B | 6.5GB | ✅ | ✅ | ✅ |
| FastVLM 0.5B | 0.5GB | ✅ | ❌ | ❌ |
| Gemma-3 1B | 0.5GB | ✅ | ✅ | ✅ |
| Gemma 3 270M | 0.3GB | ✅ | ✅ | ✅ |
| FunctionGemma 270M | 284MB | ✅ | ✅ | ❌ |
| Qwen3 0.6B | 586MB | ✅ | ✅ | ✅ |
| Qwen 2.5 1.5B | 1.6GB | ✅ | ✅ | ❌ |
| Qwen 2.5 0.5B | 0.5GB | ❌ | ✅ | ❌ |
| SmolLM 135M | 135MB | ❌ | ✅ | ❌ |
| Phi-4 Mini | 3.9GB | ✅ | ✅ | ✅ |
| DeepSeek R1 | 1.7GB | ❌ | ✅ | ❌ |
Installation sources#
// Network — .litertlm is the cross-platform default (Android/iOS/Desktop).
// For mobile-only or web-only apps you can substitute a .task URL.
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromNetwork('https://example.com/model.litertlm', token: 'optional')
.install();
// Flutter assets
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromAsset('assets/models/model.litertlm')
.install();
// Native bundle
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromBundled('model.litertlm')
.install();
// External file (native only)
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromFile('/path/to/model.litertlm')
.install();
Source capabilities#
| Source Type | Platform | Progress | Resume | Authentication | Use Case |
|---|---|---|---|---|---|
| NetworkSource | All | ✅ Detailed | ⚠️ Server-dependent | ✅ Supported | HuggingFace, CDNs, private servers |
| AssetSource | All | ⚠️ End only | ❌ No | ❌ N/A | Models bundled in app assets |
| BundledSource | All | ⚠️ End only | ❌ No | ❌ N/A | Native platform resources |
| FileSource | Native (no Web) | ⚠️ End only | ❌ No | ❌ N/A | User-selected files (file picker) |
Android foreground service (large downloads)#
Android has a 9-minute background execution limit. For large models (>500MB) the plugin auto-detects and uses a foreground service (shows a notification) to bypass it:
// Auto-detect based on file size (>500MB = foreground) — DEFAULT
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromNetwork(url) // foreground: null (auto-detect)
.install();
// Force foreground mode
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromNetwork(url, foreground: true)
.install();
iOS uses native URLSession which handles long downloads automatically — no foreground service needed.
Cancelling downloads#
import 'package:flutter_gemma/core/model_management/cancel_token.dart';
final cancelToken = CancelToken();
final future = FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromNetwork(url)
.withCancelToken(cancelToken)
.withProgress((progress) => print('Progress: $progress%'))
.install();
// Cancel from elsewhere (e.g. user pressed a cancel button)
cancelToken.cancel('User cancelled download');
try {
await future;
} catch (e) {
if (CancelToken.isCancel(e)) {
print('Download was cancelled by user');
}
}
CancelToken cancels all files in multi-file downloads (e.g. embedding model +
tokenizer), works on mobile + web, and throws DownloadCancelledException.
Text embedding models#
All embedding models generate 768-dimensional vectors. The numbers in names (64/256/512/1024/2048) indicate maximum input sequence length in tokens, not embedding dimension. See Embeddings & RAG for usage.
| Model | Parameters | Dimensions | Max Seq Length | Size | Auth Required |
|---|---|---|---|---|---|
| Gecko 64 | 110M | 768D | 64 tokens | 110MB | ❌ |
| Gecko 256 | 110M | 768D | 256 tokens | 114MB | ❌ |
| Gecko 512 | 110M | 768D | 512 tokens | 116MB | ❌ |
| EmbeddingGemma 256 | 300M | 768D | 256 tokens | 179MB | ✅ |
| EmbeddingGemma 512 | 300M | 768D | 512 tokens | 179MB | ✅ |
| EmbeddingGemma 1024 | 300M | 768D | 1024 tokens | 183MB | ✅ |
| EmbeddingGemma 2048 | 300M | 768D | 2048 tokens | 196MB | ✅ |
Performance (Android Pixel 8, GPU acceleration):
- Gecko 64: ~109 ms/doc embedding, 130 ms search (fastest — 2.6× faster than EmbeddingGemma).
- EmbeddingGemma 256: ~286 ms/doc embedding, 342 ms search (more accurate — 300M vs 110M params).