{
  "access": "public",
  "type": "reference",
  "format": "markdown",
  "title": "Semio Types",
  "chunked": true,
  "url": "https://library.datagrout.ai/semio",
  "summary": "**The type system that makes cross-system workflows deterministic**",
  "content_markdown": "# Semio: Semantic Types\n\n**The type system that makes cross-system workflows deterministic**\n\nSemio is DataGrout's semantic interface layer. It gives tools a typed contract — declaring what kind of data they consume and produce — so the planning engine can verify workflow compatibility before anything runs, and route data between systems without LLM guesswork.\n\nThe full formal treatment is in the lab paper: [Semio: A Semantic Interface Layer for Tool-Oriented AI Systems](https://labs.datagrout.ai/papers/semio).\n\n---\n\n## The Problem Semio Solves\n\nA `customer` in Salesforce looks different from a `customer` in QuickBooks. Different field names, different IDs, different schemas. Bridging them traditionally means:\n\n- Writing hand-coded glue for every pair of systems (O(N²) complexity)\n- Or asking an LLM to figure it out at runtime (probabilistic, fragile, token-expensive)\n\nSemio takes a third path: tools declare their semantic types, and the planner reasons about compatibility symbolically before execution. The LLM describes intent; Semio handles schema matching.\n\n---\n\n## Semantic Types\n\nEvery Semio type follows the pattern:\n\n```\n<family>.<entity>@<version>\n```\n\nExamples:\n\n| Type | Meaning |\n|------|---------|\n| `crm.lead@1` | A CRM lead record |\n| `billing.invoice@1` | A billing invoice |\n| `billing.customer@1` | A billing customer |\n| `core.email@1` | An email address (primitive) |\n| `crm.lead.list@1` | A list of CRM leads |\n\nThese types exist independently of any vendor. A Salesforce lead and a HubSpot contact are both `crm.lead@1`.\n\n---\n\n## Tool Contracts\n\nTools declare their inputs and outputs using Semio types. When you look at a tool in the Playground or via `discovery.discover`, you see its semantic contract:\n\n```yaml\ntool: salesforce@1/get_lead@1\noutputs:\n  - type: crm.lead@1\n    keys: [id, email]\n```\n\n```yaml\ntool: quickbooks@1/create_invoice@1\ninputs:\n  - name: customer\n    type: billing.customer@1\n    required: true\noutputs:\n  - type: billing.invoice@1\n```\n\nThe planning engine uses these contracts to verify that workflow steps connect — that what one tool outputs is compatible with what the next tool expects.\n\n---\n\n## Adapters: Type Bridges\n\nWhen two tools use related but different types, Semio adapters bridge the gap. An adapter declares that one type can be transformed into another, using a shared identity key (like `email`).\n\n```\ncrm.lead@1 ──[adapter]──▶ billing.customer@1\n                anchor: email\n```\n\nWhen the planner finds a workflow that requires `billing.customer@1` but you only have `crm.lead@1`, it inserts the adapter automatically. No LLM reasoning needed at execution time.\n\n---\n\n## How This Affects Planning\n\nWhen you call `discovery.plan` or use `flow.into`, the planner works with Semio types:\n\n1. **Input types**: What data do you have to start with?\n2. **Goal type**: What type does the final step need to produce?\n3. **Path search**: Find a chain of tools and adapters that bridges the gap\n4. **Verification**: Check type safety at every step before execution\n\nThis is what allows Cognitive Trust Certificates to assert \"type safe\" as a compile-time proof — the planner checked every type transformation before any tool ran.\n\n**Example**: Goal is to create an invoice given only an email address:\n\n```\ncore.email@1\n  → [salesforce@1/get_lead@1]\n  → crm.lead@1\n  → [adapter: crm.lead@1 → billing.customer@1]\n  → billing.customer@1\n  → [quickbooks@1/create_invoice@1]\n  → billing.invoice@1\n```\n\nThe entire path is verified before execution begins.\n\n---\n\n## Type Tiers\n\nFields within a Semio type are categorized into tiers that guide planning and PII handling:\n\n| Tier | Meaning |\n|------|---------|\n| **Core** | Required for basic operations (`id`, `name`, `email`) |\n| **Useful** | Enhance workflows but aren't strictly required (`company`, `status`) |\n| **PII** | Personally identifiable — triggers Dynamic Redaction (`email`, `phone`) |\n| **Index** | Optimized for search and lookup (`email`, `company`) |\n\nThe planner uses tiers to request only the fields a workflow actually needs, and to flag when PII fields require policy clearance.\n\n---\n\n## Identity Anchors (Keys)\n\nCross-system entity resolution uses shared keys, not system-specific IDs. When a Salesforce lead and a QuickBooks customer represent the same person, they're matched via a shared key like `email` — not their respective internal IDs.\n\n```\nSalesforce lead: { id: \"00Q...\", email: \"jane@acme.com\" }\nQuickBooks customer: { id: \"cust_99\", email: \"jane@acme.com\" }\n        └── matched via email anchor ──┘\n```\n\nThis is why Semio adapters specify an `anchor` key: it's the identity field that survives the type transformation.\n\n---\n\n## Automatic Enrichment\n\nWhen a workflow step needs a field that the previous step didn't return, the planner searches for enrichment tools automatically. If you have a lead with only `{id, email}` but the next step needs `status`, the planner finds a tool that can look up `status` by `id` and inserts it before proceeding.\n\nThis happens transparently — you describe the goal, the planner figures out what data needs to be filled in.\n\n---\n\n## Related\n\n- [Core Concepts](core-concepts) -- How discovery and planning use Semio types\n- [Discovery Tools](discovery-tools) -- Semantic search over the type graph\n- [Cognitive Trust Certificates](cognitive-trust-certificates) -- Type safety as a compile-time proof\n- [Lab paper: Semio](https://labs.datagrout.ai/papers/semio) -- Full formal treatment of the type system\n"
}