Semantic Search

Semantic search enables AI-powered discovery of data across your app and other apps. When you write data to storage, embeddings are automatically generated in the background, making your data searchable by meaning rather than just keywords.

How It Works

Your App                    Infrastructure              AI/Orchestrator
   │                             │                           │
   │  storage.write(data)        │                           │
   │────────────────────────────>│                           │
   │                             │                           │
   │  { success: true }          │                           │
   │<────────────────────────────│                           │
   │                             │                           │
   │              [Background: Generate embedding]           │
   │                             │                           │
   │                             │  search("customer info")  │
   │                             │<──────────────────────────│
   │                             │                           │
   │                             │  [Filter by permissions]  │
   │                             │  [Return matches]         │
   │                             │──────────────────────────>│

Key points:

Zero configuration - Works automatically, no code changes needed
Non-blocking - Storage operations return immediately; embeddings are generated asynchronously
Permission-aware - Results are filtered based on the user's tokens
Cross-app - Find data from any app the user has access to

Using Search in Tools

Declare the Capability

Add search to your tool's capabilities:

{
  "name": "find_relevant_data",
  "description": "Finds data relevant to a query",
  "capabilities": ["search"],
  "input_schema": {
    "type": "object",
    "properties": {
      "query": { "type": "string" }
    },
    "required": ["query"]
  },
  "output_schema": {
    "type": "object",
    "properties": {
      "results": { "type": "array" }
    }
  }
}

Basic Search

export default async function find_relevant_data(
  input: Input,
  capabilities: Capabilities
): Promise<Output> {
  const results = await capabilities.search(input.query, {
    types: ['storage'],
    limit: 10,
    minSimilarity: 0.7
  });
  
  return { results };
}

Search Options

Option	Type	Default	Description
`types`	`string[]`	all	What to search: `'tools'`, `'objects'`, `'examples'`, `'storage'`
`limit`	`number`	20	Maximum results
`minSimilarity`	`number`	0.5	Minimum similarity score (0-1)
`locations`	`object`	-	Filter to specific app paths

Search Results

interface StorageSearchResult {
  type: 'storage';
  app: string;           // App that owns the data
  path: string;          // Storage path
  securityKey: string;   // Permission boundary
  description?: string;  // From storage.json
  preview?: string;      // First ~200 chars
  similarity: number;    // Match score 0-1
}

Location Filtering

Narrow searches to specific paths within an app:

// Search only in Slack conversations
const results = await capabilities.search('meeting notes', {
  types: ['storage'],
  locations: {
    '@malv/slack': '/teams/team-123/conversations'
  }
});

Location Patterns

Pattern	Meaning
`''` or `'/'`	All paths in the app
`'/teams/team-123/*'`	All paths under this prefix
`'/teams/team-123/conversations'`	Paths starting with this structure

Multiple Locations

Search across multiple apps or paths:

const results = await capabilities.search('quarterly report', {
  types: ['storage'],
  locations: {
    '@malv/slack': '/teams/team-123/conversations',
    '@malv/drive': '/teams/team-123/documents',
    '@malv/notes': '*'  // All paths in notes
  }
});

Making Data Searchable

Add Descriptions

Add descriptions to storage paths in storage.json to improve search relevance:

{
  "same_app": {
    "/teams/<token.team>/contacts/": {
      "operations": ["read", "write"],
      "description": "Customer contact information including email, phone, and address"
    },
    "/teams/<token.team>/projects/": {
      "operations": ["read", "write", "list"],
      "description": "Project details, timelines, and team assignments"
    }
  }
}

The description is included in the embedding, making it easier to match queries like "customer phone numbers" or "project timelines."

Write Searchable Data

Include meaningful text in your stored data:

// Good - searchable fields
await storage.put('/teams/team-123/contacts/john.json', {
  name: 'John Doe',
  role: 'Sales Manager',
  company: 'Acme Corp',
  notes: 'Met at conference, interested in enterprise plan'
});

// Less searchable - mostly IDs
await storage.put('/teams/team-123/contacts/john.json', {
  id: 'contact-123',
  roleId: 'role-456',
  companyId: 'company-789'
});

Exclude Sensitive Data

For data that shouldn't be searchable, disable embedding generation:

{
  "same_app": {
    "/cache/": {
      "operations": ["read", "write"],
      "skipEmbedding": true
    },
    "/users/<token.accountId>/private/": {
      "operations": ["read", "write"],
      "skipEmbedding": true
    }
  }
}

Use skipEmbedding: true for:

Sensitive data (API keys, passwords, personal info)
Temporary/cache data
Large binary files
Frequently changing data that doesn't need to be searchable

Security

Search results are automatically filtered based on the user's tokens. The security model works like this:

When data is written, a security key is derived from the token requirements
When a user searches, their tokens generate matching security keys
Only results with matching keys are returned

This means:

Users only see data they have permission to access
No additional permission checks needed in your tool
Cross-app data sharing works automatically if tokens align

How Security Keys Work

Storage paths with token requirements generate security keys:

{
  "same_app": {
    "/teams/<token.teamId>/data/": {
      "tokenType": "account",
      "tokenFromApp": "@malv/auth"
    }
  }
}

If two apps use the same token structure, their data shares security keys and can be searched together:

// @malv/notes storage.json
{ "teamId": { "app": "@malv/auth", "token": "account" } }

// @malv/files storage.json
{ "teamId": { "app": "@malv/auth", "token": "account" } }

Both apps' team data is searchable together because they share the same permission boundary.

Example: Cross-App Discovery

export default async function find_context(
  input: Input,
  capabilities: Capabilities
): Promise<Output> {
  // Search across all apps the user has access to
  const results = await capabilities.search(input.query, {
    types: ['storage'],
    limit: 5
  });
  
  // Results might include:
  // - @malv/tables: /teams/team-123/tables/customers/rows/row-1.json
  // - @malv/contacts: /teams/team-123/contacts/john-doe.json
  // - @malv/crm: /teams/team-123/deals/deal-456/notes.json
  
  // Use the results to provide context
  return {
    relevantData: results.map(r => ({
      app: r.app,
      path: r.path,
      preview: r.preview,
      similarity: r.similarity
    }))
  };
}

Best Practices

Write descriptive descriptions - The description directly impacts search quality:

// Good - specific and descriptive
"description": "Customer contact information including email, phone, and address"

// Bad - too generic
"description": "Data file"

Think about search use cases - Consider how users might search for your data:

Data Type	Likely Queries	Description
Projects	"my projects", "project settings"	"Project configuration and metadata"
Contacts	"customer info", "phone numbers"	"Contact details including name, email, phone"
Notes	"meeting notes", "ideas about X"	"User notes and documentation"

Use appropriate boundaries - Design storage paths with appropriate permission boundaries:

// Team-wide data - all team members can search
"/teams/<token.teamId>/shared/"

// User-specific data - only that user can search
"/users/<token.accountId>/private/"

Consider search in your data model - When designing what to store, think about what text will be meaningful for search. Store human-readable descriptions alongside IDs when possible.