Models

Supported models, file formats, capabilities, ModelType reference, and download URLs.

flutter_gemma supports Gemma 4, Gemma3n, FastVLM, Gemma 3, FunctionGemma, Qwen3, Qwen 2.5, Phi-4, DeepSeek R1, SmolLM and more. Desktop platforms (macOS, Windows, Linux) require the .litertlm model format.

Model file types#

Flutter Gemma supports different model file formats, grouped into two types based on how chat templates are handled.

Type 1: MediaPipe-managed templates#

.task files: MediaPipe-optimized format for mobile (Android/iOS).
.litertlm files: LiteRT-LM format for Android, iOS, and Desktop.

Both formats have identical behavior — chat templates are handled internally.

Type 2: Manual template formatting#

.bin files: standard binary format.
.tflite files: LiteRT format (formerly TensorFlow Lite).

Both formats require manual chat template formatting in your code.

The plugin automatically detects the file extension and applies the appropriate formatting. When specifying `ModelFileType` in code: use `ModelFileType.task` for `.task` and `.litertlm` files (same behavior), and `ModelFileType.binary` for `.bin` and `.tflite` files (same behavior).

Format by platform#

Format	Android	iOS	Web	Desktop	Use Case
`.task`	✅	✅	✅	❌	Older models (Gemma3n, Gemma 3, DeepSeek, Qwen 2.5, Phi-4)
`.litertlm`	✅	✅ ¹	❌	✅	Newer models (Gemma 4, Qwen3, FastVLM + desktop for all)
`-web.task`	❌	❌	✅	❌	Web-specific builds (e.g. Gemma 4, Gemma3n)
`.bin`	✅	✅	✅	❌	Manual chat template formatting required
`.tflite`	✅	✅	✅	✅	Embeddings only (EmbeddingGemma, Gecko)

¹ iOS .litertlm runs on the FFI engine — vision and audio supported on physical devices. The Simulator stays CPU-only because Metal sim has a 256 MB single-allocation cap.

Model capabilities#

Model Family	Best For	Function Calling	Thinking Mode	Vision	Languages	Size
Gemma 4 E2B	Next-gen multimodal chat — text, image, audio	✅	✅	✅	Multilingual	2.4GB
Gemma 4 E4B	Next-gen multimodal chat — text, image, audio	✅	✅	✅	Multilingual	4.3GB
Gemma3n	On-device multimodal chat and image analysis	✅	❌	✅	Multilingual	3-6GB
FastVLM 0.5B	Fast vision-language inference	❌	❌	✅	Multilingual	0.5GB
Phi-4 Mini	Advanced reasoning and instruction following	✅	❌	❌	Multilingual	3.9GB
DeepSeek R1	High-performance reasoning and code generation	✅	✅	❌	Multilingual	1.7GB
Qwen3 0.6B	Compact multilingual chat with function calling	✅	✅	❌	Multilingual	586MB
Qwen 2.5	Strong multilingual chat and instruction following	✅	❌	❌	Multilingual	0.5-1.6GB
Gemma 3 1B	Balanced and efficient text generation	✅	❌	❌	Multilingual	0.5GB
Gemma 3 270M	Ideal for fine-tuning (LoRA) for specific tasks	❌	❌	❌	Multilingual	0.3GB
FunctionGemma 270M	Specialized for function calling on-device	✅	❌	❌	Multilingual	284MB
SmolLM 135M	Ultra-compact, resource-constrained devices	❌	❌	❌	English	135MB
TranslateGemma 4B †	Single-shot 55-language translation	❌	❌	❌	55 languages	2-4GB

† **TranslateGemma is CPU-only for now.** Google hasn't released a mobile/desktop `.litertlm` bundle ([HF discussion #5](https://huggingface.co/google/translategemma-4b-it/discussions/5)). The community-converted bundle from [`barakplasma/translategemma-4b-it-android-task-quantized`](https://huggingface.co/barakplasma/translategemma-4b-it-android-task-quantized) keeps `EMBEDDING_LOOKUP` weights in float32 for MediaPipe `.task` compatibility, which crashes the LiteRT GPU partitioner on Metal/WebGPU across all platforms (tracked at [LiteRT-LM#1748](https://github.com/google-ai-edge/LiteRT-LM/issues/1748)). Until Google ships the `litert-lm` quantization CLI, translation runs on CPU only (≈90 s prefill on a 4 B int4 bundle on M-series Macs).

ModelType reference#

When installing models, specify the correct ModelType:

Model Family	ModelType	Examples
Gemma 4	`ModelType.gemma4`	Gemma 4 E2B, Gemma 4 E4B (native function-call tokens)
Gemma 3 / Gemma3n	`ModelType.gemmaIt`	Gemma 3 1B, Gemma 3 270M, Gemma3n E2B/E4B
DeepSeek	`ModelType.deepSeek`	DeepSeek R1
Qwen 2.5	`ModelType.qwen`	Qwen 2.5 1.5B, Qwen 2.5 0.5B
Qwen 3	`ModelType.qwen3`	Qwen3 0.6B
FunctionGemma	`ModelType.functionGemma`	FunctionGemma 270M IT
Phi	`ModelType.phi`	Phi-4 Mini
General	`ModelType.general`	FastVLM 0.5B, SmolLM 135M

Gemma 4 uses `ModelType.gemma4` so its native tool-call tokens are routed through the LiteRT-LM SDK's chat-template path. For Gemma 3 and earlier, keep `ModelType.gemmaIt`.

Usage example:

// Gemma models
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromNetwork(url).install();

// DeepSeek models
await FlutterGemma.installModel(modelType: ModelType.deepSeek)
  .fromNetwork(url).install();

// Phi-4 (uses general type)
await FlutterGemma.installModel(modelType: ModelType.general)
  .fromNetwork(url).install();

Supported models & platforms#

Model	Size	Desktop	Mobile	Web
Gemma 4 E2B	2.4GB	✅	✅	✅
Gemma 4 E4B	4.3GB	✅	✅	✅
Gemma3n E2B	3.1GB	✅	✅	✅
Gemma3n E4B	6.5GB	✅	✅	✅
FastVLM 0.5B	0.5GB	✅	❌	❌
Gemma-3 1B	0.5GB	✅	✅	✅
Gemma 3 270M	0.3GB	✅	✅	✅
FunctionGemma 270M	284MB	✅	✅	❌
Qwen3 0.6B	586MB	✅	✅	✅
Qwen 2.5 1.5B	1.6GB	✅	✅	❌
Qwen 2.5 0.5B	0.5GB	❌	✅	❌
SmolLM 135M	135MB	❌	✅	❌
Phi-4 Mini	3.9GB	✅	✅	✅
DeepSeek R1	1.7GB	❌	✅	❌

Installation sources#

// Network — .litertlm is the cross-platform default (Android/iOS/Desktop).
// For mobile-only or web-only apps you can substitute a .task URL.
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromNetwork('https://example.com/model.litertlm', token: 'optional')
  .install();

// Flutter assets
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromAsset('assets/models/model.litertlm')
  .install();

// Native bundle
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromBundled('model.litertlm')
  .install();

// External file (native only)
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromFile('/path/to/model.litertlm')
  .install();

Source capabilities#

Source Type	Platform	Progress	Resume	Authentication	Use Case
NetworkSource	All	✅ Detailed	⚠️ Server-dependent	✅ Supported	HuggingFace, CDNs, private servers
AssetSource	All	⚠️ End only	❌ No	❌ N/A	Models bundled in app assets
BundledSource	All	⚠️ End only	❌ No	❌ N/A	Native platform resources
FileSource	Native (no Web)	⚠️ End only	❌ No	❌ N/A	User-selected files (file picker)

Resume after interruption is server-dependent and **not supported by the HuggingFace CDN** — flutter_gemma uses smart retry logic with exponential backoff and automatic restart instead. See [Troubleshooting](/docs/troubleshooting).

Android foreground service (large downloads)#

Android has a 9-minute background execution limit. For large models (>500MB) the plugin auto-detects and uses a foreground service (shows a notification) to bypass it:

// Auto-detect based on file size (>500MB = foreground) — DEFAULT
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromNetwork(url)  // foreground: null (auto-detect)
  .install();

// Force foreground mode
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromNetwork(url, foreground: true)
  .install();

iOS uses native URLSession which handles long downloads automatically — no foreground service needed.

Cancelling downloads#

import 'package:flutter_gemma/core/model_management/cancel_token.dart';

final cancelToken = CancelToken();

final future = FlutterGemma.installModel(modelType: ModelType.gemmaIt)
  .fromNetwork(url)
  .withCancelToken(cancelToken)
  .withProgress((progress) => print('Progress: $progress%'))
  .install();

// Cancel from elsewhere (e.g. user pressed a cancel button)
cancelToken.cancel('User cancelled download');

try {
  await future;
} catch (e) {
  if (CancelToken.isCancel(e)) {
    print('Download was cancelled by user');
  }
}

CancelToken cancels all files in multi-file downloads (e.g. embedding model + tokenizer), works on mobile + web, and throws DownloadCancelledException.

Text embedding models#

All embedding models generate 768-dimensional vectors. The numbers in names (64/256/512/1024/2048) indicate maximum input sequence length in tokens, not embedding dimension. See Embeddings & RAG for usage.

Model	Parameters	Dimensions	Max Seq Length	Size	Auth Required
Gecko 64	110M	768D	64 tokens	110MB	❌
Gecko 256	110M	768D	256 tokens	114MB	❌
Gecko 512	110M	768D	512 tokens	116MB	❌
EmbeddingGemma 256	300M	768D	256 tokens	179MB	✅
EmbeddingGemma 512	300M	768D	512 tokens	179MB	✅
EmbeddingGemma 1024	300M	768D	1024 tokens	183MB	✅
EmbeddingGemma 2048	300M	768D	2048 tokens	196MB	✅

Performance (Android Pixel 8, GPU acceleration):

Gecko 64: ~109 ms/doc embedding, 130 ms search (fastest — 2.6× faster than EmbeddingGemma).
EmbeddingGemma 256: ~286 ms/doc embedding, 342 ms search (more accurate — 300M vs 110M params).