flutter_gemma can generate vector embeddings from text (EmbeddingGemma / Gecko) and run on-device RAG with a vector store: qdrant-edge on native, wa-sqlite on Web. The same Dart API works on both, so your code is portable across platforms.
Setup#
Embeddings need the flutter_gemma_embeddings package, and RAG needs a vector
store package — flutter_gemma_rag_qdrant (native) or flutter_gemma_rag_sqlite
(web). Register them in FlutterGemma.initialize(...):
FlutterGemma.initialize(
inferenceEngines: const [LiteRtLmEngine()],
embeddingBackends: const [LiteRtEmbeddingBackend()], // flutter_gemma_embeddings
vectorStore: QdrantVectorStore(), // or WebSqliteVectorStore() on web
);
See Installation for the full registration reference.
Text embeddings#
All embedding models generate 768-dimensional vectors. The number in a model name (64/256/512/1024/2048) is the max input sequence length in tokens, not the embedding dimension. See Models for the full list.
Install an embedding model#
await FlutterGemma.installEmbedder()
.modelFromNetwork(
'https://huggingface.co/litert-community/embeddinggemma-300m/resolve/main/embeddinggemma-300M_seq256_mixed-precision.tflite',
token: 'hf_...',
)
.tokenizerFromNetwork(
'https://huggingface.co/litert-community/embeddinggemma-300m/resolve/main/sentencepiece.model',
token: 'hf_...',
)
.install();
Generate embeddings#
final embedder = FlutterGemmaPlugin.instance.initializedEmbeddingModel!;
final embeddings = await embedder.generateEmbeddings(
docs.map((d) => d.content).toList(),
taskType: TaskType.retrievalDocument,
);
On-device RAG / vector store#
import 'package:flutter_gemma/flutter_gemma.dart';
// 1. Install an embedding model (any of Gecko / EmbeddingGemma) — see above.
// 2. Initialize the vector store (one shard per database path)
await FlutterGemmaPlugin.instance.initializeVectorStore('rag_store');
// 3. Add documents — let the plugin compute embeddings for you
for (final doc in docs) {
await FlutterGemmaPlugin.instance.addDocument(
id: doc.id,
content: doc.content,
metadata: '{"category":"science","lang":"en"}',
);
}
// 3b. Or batch-embed yourself and feed pre-computed vectors via
// addDocumentWithEmbedding(...) for higher throughput.
final embedder = FlutterGemmaPlugin.instance.initializedEmbeddingModel!;
final embeddings = await embedder.generateEmbeddings(
docs.map((d) => d.content).toList(),
taskType: TaskType.retrievalDocument,
);
for (var i = 0; i < docs.length; i++) {
await FlutterGemmaPlugin.instance.addDocumentWithEmbedding(
id: docs[i].id,
content: docs[i].content,
embedding: embeddings[i],
metadata: '{"category":"science","lang":"en"}',
);
}
// 4. Semantic search, with optional payload-aware Filter (native only)
final results = await FlutterGemmaPlugin.instance.searchSimilar(
query: 'quantum entanglement',
topK: 10,
threshold: 0.0,
filter: Filter(
must: [FieldEquals(key: 'category', value: 'science')],
mustNot: [FieldEquals(key: 'lang', value: 'fr')],
),
);
The Filter API#
Filter supports must / should / mustNot lists of conditions:
FieldEquals— exact match on a payload field.FieldRange— numeric range on a payload field.FieldMatchAny— match against any value in a set.
Platform support#
| Feature | Android | iOS | Web | Desktop |
|---|---|---|---|---|
| Text Embeddings | ✅ | ✅ | ✅ | ✅ |
| VectorStore (RAG) | ✅ qdrant-edge | ✅ qdrant-edge | ✅ wa-sqlite (WASM) | ✅ qdrant-edge |
Payload Filter | ✅ | ✅ | ❌ | ✅ |
Benchmarks comparing qdrant-edge to the legacy sqlite + local_hnsw backend across 5 platforms (5,000 documents, EmbeddingGemma 300M, 768-dim) are in the repo benchmarks.