Semantic Search
Semantic search enables AI-powered discovery of data across your app and other apps. When you write data to storage, embeddings are automatically generated in the background, making your data searchable by meaning rather than just keywords.
How It Works
Your App Infrastructure AI/Orchestrator
│ │ │
│ storage.write(data) │ │
│────────────────────────────>│ │
│ │ │
│ { success: true } │ │
│<────────────────────────────│ │
│ │ │
│ [Background: Generate embedding] │
│ │ │
│ │ search("customer info") │
│ │<──────────────────────────│
│ │ │
│ │ [Filter by permissions] │
│ │ [Return matches] │
│ │──────────────────────────>│
Key points:
- Zero configuration - Works automatically, no code changes needed
- Non-blocking - Storage operations return immediately; embeddings are generated asynchronously
- Permission-aware - Results are filtered based on the user's tokens
- Cross-app - Find data from any app the user has access to
Using Search in Tools
Declare the Capability
Add search to your tool's capabilities:
{
"name": "find_relevant_data",
"description": "Finds data relevant to a query",
"capabilities": ["search"],
"input_schema": {
"type": "object",
"properties": {
"query": { "type": "string" }
},
"required": ["query"]
},
"output_schema": {
"type": "object",
"properties": {
"results": { "type": "array" }
}
}
}
Basic Search
export default async function find_relevant_data(
input: Input,
capabilities: Capabilities
): Promise<Output> {
const results = await capabilities.search(input.query, {
types: ['storage'],
limit: 10,
minSimilarity: 0.7
});
return { results };
}
Search Options
| Option | Type | Default | Description |
|---|---|---|---|
types |
string[] |
all | What to search: 'tools', 'objects', 'examples', 'storage' |
limit |
number |
20 | Maximum results |
minSimilarity |
number |
0.5 | Minimum similarity score (0-1) |
locations |
object |
- | Filter to specific app paths |
Search Results
interface StorageSearchResult {
type: 'storage';
app: string; // App that owns the data
path: string; // Storage path
securityKey: string; // Permission boundary
description?: string; // From storage.json
preview?: string; // First ~200 chars
similarity: number; // Match score 0-1
}
Location Filtering
Narrow searches to specific paths within an app:
// Search only in Slack conversations
const results = await capabilities.search('meeting notes', {
types: ['storage'],
locations: {
'@malv/slack': '/teams/team-123/conversations'
}
});
Location Patterns
| Pattern | Meaning |
|---|---|
'*' or '/*' |
All paths in the app |
'/teams/team-123/*' |
All paths under this prefix |
'/teams/team-123/conversations' |
Paths starting with this structure |
Multiple Locations
Search across multiple apps or paths:
const results = await capabilities.search('quarterly report', {
types: ['storage'],
locations: {
'@malv/slack': '/teams/team-123/conversations',
'@malv/drive': '/teams/team-123/documents',
'@malv/notes': '*' // All paths in notes
}
});
Making Data Searchable
Add Descriptions
Add descriptions to storage paths in storage.json to improve search relevance:
{
"same_app": {
"/teams/<token.team>/contacts/": {
"operations": ["read", "write"],
"description": "Customer contact information including email, phone, and address"
},
"/teams/<token.team>/projects/": {
"operations": ["read", "write", "list"],
"description": "Project details, timelines, and team assignments"
}
}
}
The description is included in the embedding, making it easier to match queries like "customer phone numbers" or "project timelines."
Write Searchable Data
Include meaningful text in your stored data:
// Good - searchable fields
await storage.put('/teams/team-123/contacts/john.json', {
name: 'John Doe',
role: 'Sales Manager',
company: 'Acme Corp',
notes: 'Met at conference, interested in enterprise plan'
});
// Less searchable - mostly IDs
await storage.put('/teams/team-123/contacts/john.json', {
id: 'contact-123',
roleId: 'role-456',
companyId: 'company-789'
});
Exclude Sensitive Data
For data that shouldn't be searchable, disable embedding generation:
{
"same_app": {
"/cache/": {
"operations": ["read", "write"],
"skipEmbedding": true
},
"/users/<token.accountId>/private/": {
"operations": ["read", "write"],
"skipEmbedding": true
}
}
}
Use skipEmbedding: true for:
- Sensitive data (API keys, passwords, personal info)
- Temporary/cache data
- Large binary files
- Frequently changing data that doesn't need to be searchable
Security
Search results are automatically filtered based on the user's tokens. The security model works like this:
- When data is written, a security key is derived from the token requirements
- When a user searches, their tokens generate matching security keys
- Only results with matching keys are returned
This means:
- Users only see data they have permission to access
- No additional permission checks needed in your tool
- Cross-app data sharing works automatically if tokens align
How Security Keys Work
Storage paths with token requirements generate security keys:
{
"same_app": {
"/teams/<token.teamId>/data/": {
"tokenType": "account",
"tokenFromApp": "@malv/auth"
}
}
}
If two apps use the same token structure, their data shares security keys and can be searched together:
// @malv/notes storage.json
{ "teamId": { "app": "@malv/auth", "token": "account" } }
// @malv/files storage.json
{ "teamId": { "app": "@malv/auth", "token": "account" } }
Both apps' team data is searchable together because they share the same permission boundary.
Example: Cross-App Discovery
export default async function find_context(
input: Input,
capabilities: Capabilities
): Promise<Output> {
// Search across all apps the user has access to
const results = await capabilities.search(input.query, {
types: ['storage'],
limit: 5
});
// Results might include:
// - @malv/tables: /teams/team-123/tables/customers/rows/row-1.json
// - @malv/contacts: /teams/team-123/contacts/john-doe.json
// - @malv/crm: /teams/team-123/deals/deal-456/notes.json
// Use the results to provide context
return {
relevantData: results.map(r => ({
app: r.app,
path: r.path,
preview: r.preview,
similarity: r.similarity
}))
};
}
Best Practices
Write descriptive descriptions - The description directly impacts search quality:
// Good - specific and descriptive
"description": "Customer contact information including email, phone, and address"
// Bad - too generic
"description": "Data file"
Think about search use cases - Consider how users might search for your data:
| Data Type | Likely Queries | Description |
|---|---|---|
| Projects | "my projects", "project settings" | "Project configuration and metadata" |
| Contacts | "customer info", "phone numbers" | "Contact details including name, email, phone" |
| Notes | "meeting notes", "ideas about X" | "User notes and documentation" |
Use appropriate boundaries - Design storage paths with appropriate permission boundaries:
// Team-wide data - all team members can search
"/teams/<token.teamId>/shared/"
// User-specific data - only that user can search
"/users/<token.accountId>/private/"
Consider search in your data model - When designing what to store, think about what text will be meaningful for search. Store human-readable descriptions alongside IDs when possible.