Logoflutter_gemma

Packages

The 1.0 modular architecture — a small core plus opt-in engine, embedding, and RAG packages.

As of 1.0, the monolithic flutter_gemma plugin is split into a small core package plus opt-in packages for each engine / backend. Your app ships only the native weight it actually uses. All packages live in one monorepo (a Dart pub workspace) and the opt-in packages depend on core one-directionally.

The six packages#

PackageWhat it doesPlatforms
flutter_gemma Core — registry, contracts, model management, sessions, chat. No engine on its own. Always required. All
flutter_gemma_litertlm .litertlm inference via dart:ffi (LiteRT-LM C API). Owns the shared native library. Mobile + Desktop + Web
flutter_gemma_mediapipe .task / .bin inference via MediaPipe. Mobile + Web
flutter_gemma_embeddings Text embeddings (EmbeddingGemma / Gecko) via LiteRT C API. All
flutter_gemma_rag_qdrant On-device RAG vector store (qdrant-edge, native Rust FFI). Native (no Web)
flutter_gemma_rag_sqlite On-device RAG vector store (wa-sqlite on Web; sqlite3 on native). Web + native

How it works#

  • Core registers no engine by itself. You wire the packages you added through FlutterGemma.initialize(inferenceEngines:, embeddingBackends:, vectorStore:). See Installation.
  • Probe-chain registry. Engines and backends are pure factories that declare canHandle(spec) + a priority. The registry selects a provider per model by file type — .task / .bin / .tflite → MediaPipe, .litertlm → LiteRT-LM.
  • One app can run both formats. Register both LiteRtLmEngine() and MediaPipeEngine(), and the registry routes each model to the engine that handles its extension.
  • Shared native library. flutter_gemma_litertlm and flutter_gemma_embeddings share one native LiteRT library, fetched at build time via each package's Native-Assets hook (no manual download/bundling).

Choosing packages#

You want to…Add
Run .litertlm models (Gemma 4, Qwen3, FastVLM, + all desktop) flutter_gemma_litertlm
Run .task / .bin models (Gemma3n, Gemma 3, DeepSeek, Qwen 2.5, Phi-4) flutter_gemma_mediapipe
Generate text embeddingsflutter_gemma_embeddings
On-device RAG on native (Android/iOS/desktop)flutter_gemma_rag_qdrant
On-device RAG on webflutter_gemma_rag_sqlite
Desktop is served exclusively by `flutter_gemma_litertlm` and uses LiteRT-LM format only. There is no MediaPipe engine on desktop. See [Desktop Support](/docs/desktop).

Migrating from the 0.16.x monolith is just adding these packages plus one initialize(...) call — every model / session / chat / embedding / RAG API is unchanged. See Migration (0.x → 1.0).

Genkit integration#

Two companion packages integrate flutter_gemma with Genkit, Google's framework for building AI features:

PackageWhat it doesDepends on
genkit_flutter_gemma Exposes flutter_gemma as a Genkit model/embedder provider — call ai.generate(model: flutterGemma.model(...)) and ai.embed(...) through the standard Genkit API. flutter_gemma + genkit
genkit_hybrid Provider-agnostic hybrid routing: combine an on-device and a cloud model behind one routing policy, with correct streaming + before-first-token fallback. genkit only (no flutter_gemma)

See Genkit for setup and examples.