Logoflutter_gemma

Agent Skills

On-device agentic skills for flutter_gemma — give the model a SKILL.md catalog and let it pick and run skills through the tool-calling loop, fully offline.

flutter_gemma_agent is an opt-in satellite package that turns the inference core into an on-device agent: you give the model a set of skills (SKILL.md files), and it decides which to invoke through flutter_gemma's existing function-calling, runs them, and feeds the results back — fully offline.

It is reverse-engineered from google-ai-edge/gallery (Apache-2.0) and is Gallery-compatible: their SKILL.md catalog parses unmodified, and their JavaScript skills run as-is.

Agent skills build on function calling, so they need a function-calling-capable model (Gemma 4 E2B/E4B recommended). See Function Calling for the model support matrix.

The four skill mechanisms#

A skill is a Markdown file (SKILL.md) with YAML frontmatter and one of four execution mechanisms:

Skill type What it does Android iOS macOS Windows Web Linux
text-only A persona / instruction the model follows directly
MCP Calls remote MCP tools over Streamable HTTP
native-intent Opens an OS surface (mail, SMS, calendar, notification)
JS Runs the skill's JavaScript in a sandboxed headless webview ✅¹ ❌²

¹ Windows JS skills need the WebView2 Runtime (pre-installed on Windows 11). ² Linux has no embeddable webview, so JS skills return an ErrorResult; text / native-intent / MCP skills work on Linux.

JS skills run in a headless, sandboxed webview. To grant a secure context (so skills using crypto.subtle and other secure-context Web APIs work), the package serves each skill's assets over a loopback HTTP server (http://127.0.0.1, a W3C "potentially trustworthy" origin) — one mechanism that works identically across WebView2 / WKWebView / Android WebView, verified on hardware. On the web the skill runs in a sandboxed <iframe>.

Web is not supported yet. The columns above describe where each skill type can execute once invoked, but the agent loop depends on the model reliably emitting well-formed tool calls, and today's browser LLM runtimes don't: the LiteRT-LM web runtime (@litert-lm/core) doesn't consistently emit the tool-call tokens, and even when it does the small models miss required arguments or produce malformed JSON, so skills don't run end-to-end. The agent is verified on Android, iOS, macOS, and Windows — use those. The example app disables the agent on web accordingly.

Quick start#

import 'package:flutter_gemma/flutter_gemma.dart';
import 'package:flutter_gemma_agent/flutter_gemma_agent.dart';

// 1. Register the inference engine (and, optionally, the skill executors).
await FlutterGemma.initialize(
  inferenceEngines: [LiteRtLmEngine()],
);

// 2. Install + load a function-calling model (Gemma 4 E2B/E4B recommended).
await FlutterGemma
    .installModel(modelType: ModelType.gemma4, fileType: ModelFileType.litertlm)
    .fromNetwork(gemma4E2BUrl)
    .install();
final model = await FlutterGemma.getActiveModel(maxTokens: 4096);

// 3. Load the bundled starter skills and build the agent session.
final source = AssetSkillSource();
final registry = SkillRegistry()..addAll(await source.load(), selected: true);

final session = await AgentSession.fromModel(
  model,
  registry: registry,
  executors: [
    TextSkillExecutor(),
    JsSkillExecutor(sourceFor: source.jsSkillSourceFor),
    NativeIntentExecutor(),
    // McpSkillExecutor(...) to also call remote MCP tools.
  ],
);

// 4. Mount the chat view.
//   AgentChatView(session: session)
// e.g. "Calculate the hash of hello" or "Show Paris on interactive map".

The package also ships an adaptive UI: AgentChatView, SkillManagerView, McpManagerView, SecretEditorDialog, and SkillTesterView.

Bundled starter skills#

Seven starter skills ship as package assets and cover all four mechanisms:

SkillTypeWhat it does
calculate-hashJSHash a piece of text
qr-codeJS (image)Generate a QR code
query-wikipediaJS (data)Summarize a Wikipedia topic
interactive-map JS (webview) Show a location on an embedded map
send-emailintentOpen the OS mail composer
create-calendar-eventintentOpen the calendar event editor
kitchen-adventure text-only A text-adventure dungeon-master persona

SKILL.md format#

---
name: kebab-case-id
description: One-line summary the model uses to pick the skill.
metadata:
  homepage: https://optional
  require-secret: true
  require-secret-description: how to obtain the key
---
# Title
## Instructions
Call the `run_js` tool with: script name: index.html, data: { field: Type }

JS skills additionally ship scripts/index.html exposing window.ai_edge_gallery_get_result(data, secret) returning a JSON string ({ result | image | webview | error }). Secrets are injected as the JS secret argument and never placed in the model prompt.

Registering executors#

Two equivalent ways to wire executors — pass them per session, or register them once globally and let fromModel read the core registry (mirrors how inference engines are registered):

// Global registration (then omit `executors:` on fromModel):
await FlutterGemma.initialize(
  inferenceEngines: [LiteRtLmEngine()],
  skillExecutors: [TextSkillExecutor(), JsSkillExecutor(sourceFor: ...), NativeIntentExecutor()],
);
final session = await AgentSession.fromModel(model, registry: registry);

Setup#

Most skills need no platform setup. For the platform-specific bits:

  • Windows — JS skills require the WebView2 Runtime (pre-installed on Windows 11; bundle the bootstrapper for Windows 10).
  • iOS — the create-calendar-event intent needs a usage description in ios/Runner/Info.plist:
    <key>NSCalendarsUsageDescription</key>
    <string>Create calendar events from the agent.</string>
    
  • Androidschedule_notification requires core-library desugaring in android/app/build.gradle(.kts):
    android { compileOptions { isCoreLibraryDesugaringEnabled = true } }
    dependencies { coreLibraryDesugaring("com.android.tools:desugar_jdk_libs:2.1.4") }
    
Adding a skill grants the model the ability to run that skill's code or open OS surfaces. Only load skills you trust — `require-secret` skill keys are stored in memory and passed to the skill, never to the model prompt.

Third-party attribution#

The bundled starter skills and the SKILL.md format are derived from google-ai-edge/gallery, licensed under the Apache License 2.0. flutter_gemma_agent itself is MIT-licensed.