As of 1.0, the monolithic flutter_gemma plugin is split into a small
core package plus opt-in packages for each engine / backend. Your app
ships only the native weight it actually uses. All packages live in one monorepo
(a Dart pub workspace) and the opt-in packages depend on core one-directionally.
The six packages#
| Package | What it does | Platforms |
|---|---|---|
flutter_gemma |
Core — registry, contracts, model management, sessions, chat. No engine on its own. Always required. | All |
flutter_gemma_litertlm |
.litertlm
inference via
dart:ffi
(LiteRT-LM C API). Owns the shared native library.
|
Mobile + Desktop + Web |
flutter_gemma_mediapipe |
.task / .bin inference via MediaPipe. |
Mobile + Web |
flutter_gemma_embeddings |
Text embeddings (EmbeddingGemma / Gecko) via LiteRT C API. | All |
flutter_gemma_rag_qdrant |
On-device RAG vector store (qdrant-edge, native Rust FFI). | Native (no Web) |
flutter_gemma_rag_sqlite |
On-device RAG vector store (wa-sqlite on Web; sqlite3 on native). | Web + native |
How it works#
-
Core registers no engine by itself. You wire the packages you added through
FlutterGemma.initialize(inferenceEngines:, embeddingBackends:, vectorStore:). See Installation. -
Probe-chain registry. Engines and backends are pure factories that declare
canHandle(spec)+ a priority. The registry selects a provider per model by file type —.task/.bin/.tflite→ MediaPipe,.litertlm→ LiteRT-LM. -
One app can run both formats. Register both
LiteRtLmEngine()andMediaPipeEngine(), and the registry routes each model to the engine that handles its extension. -
Shared native library.
flutter_gemma_litertlmandflutter_gemma_embeddingsshare one native LiteRT library, fetched at build time via each package's Native-Assets hook (no manual download/bundling).
Choosing packages#
| You want to… | Add |
|---|---|
Run .litertlm models (Gemma 4, Qwen3, FastVLM, + all desktop) |
flutter_gemma_litertlm |
Run .task / .bin models (Gemma3n, Gemma 3, DeepSeek, Qwen 2.5, Phi-4) |
flutter_gemma_mediapipe |
| Generate text embeddings | flutter_gemma_embeddings |
| On-device RAG on native (Android/iOS/desktop) | flutter_gemma_rag_qdrant |
| On-device RAG on web | flutter_gemma_rag_sqlite |
Migrating from the 0.16.x monolith is just adding these packages plus one
initialize(...) call — every model / session / chat / embedding / RAG API is
unchanged. See Migration (0.x → 1.0).
Genkit integration#
Two companion packages integrate flutter_gemma with Genkit, Google's framework for building AI features:
| Package | What it does | Depends on |
|---|---|---|
genkit_flutter_gemma |
Exposes flutter_gemma as a Genkit model/embedder provider — call
ai.generate(model: flutterGemma.model(...))
and
ai.embed(...)
through the standard Genkit API.
|
flutter_gemma + genkit |
genkit_hybrid |
Provider-agnostic hybrid routing: combine an on-device and a cloud model behind one routing policy, with correct streaming + before-first-token fallback. | genkit only (no flutter_gemma) |
See Genkit for setup and examples.