screenpipe-api

name: screenpipe-api description: Query the user's data via the local screenpipe REST API at localhost:3030 — screen recordings, audio, UI elements, usage analytics, meetings, connected services, and the user's persistent memory store. Use for questions about screen activity, meetings, apps, productivity, media export, retranscription, connections, OR to save / remember / store information for later (POST /memories — survives across sessions, queryable by external agents).

Screenpipe API

Local REST API at http://localhost:3030.

Authentication

Every request needs auth. $SCREENPIPE_LOCAL_API_KEY is already in your env; without it you get 403.

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/..."

No-auth endpoints: /health, /ws/health, /audio/device/status, /connections/oauth/callback, /frames/*, /notify, /pipes/store/*.

Context Window Protection

Responses can be large. Write curl output to a file (-o /tmp/sp.json), check size (wc -c), and if over ~5KB read only the first 50-100 lines / extract with jq. Never dump full large responses into context.

Cut tokens at the source on list endpoints (/search, /elements): add &format=csv (or tsv) for a columnar table (column names written once instead of per-row keys — ~70% cheaper on uniform rows like elements), and &fields=a,b,c for only the columns you need (dotted paths like content.text). Text-heavy ocr/audio barely benefit — use fields + max_content_length there instead.

1. Activity Summary — `GET /activity-summary`

Default broad-context call. Bundles apps, windows, key_texts, audio, edited_files, recording health, top memories, deduped screen+audio snippets, and a data_status/query_status/guidance triple.

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/activity-summary?start_time=30m%20ago&end_time=now"

Required: start_time, end_time. Optional: app_name, q (filters memories+snippets, drives query_status); include_recording|memories|snippets|guidance=false to slim (each defaults true); max_snippets, max_snippet_chars, max_memories. For a lean time-tracking sweep also set include_key_texts=false (biggest win), include_apps=false, include_windows=false — total_active_minutes + per-app/window minutes + the status triple still return.

data_status ∈ ok|empty_but_recording|no_capture_in_range|not_recording — check before claiming "no activity".
query_status ∈ not_requested|matched|no_query_matches; guidance.next_best_query is a ready hint when empty.
Escalate to /search only for verbatim quotes / frame_ids.

2. Search — `GET /search`

Use when /activity-summary says ok but you need verbatim quotes, media paths, frame IDs, or a specific match.

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/search?q=QUERY&content_type=all&limit=10&start_time=1h%20ago"

Parameter	Required	Description
`q`	No	Keywords. Avoid for audio — transcriptions are noisy, `q` over-filters.
`content_type`	No	`all` (default), `accessibility`, `audio`, `input`, `ocr`, `memory`. Screen text is primarily the accessibility tree; OCR is the fallback for apps without it (videos, games, remote desktops).
`limit`	No	Default 20. Keep ≤20 to protect context.
`offset`	No	Pagination. Default 0.
`start_time`	Yes	ISO 8601 or relative (`16h ago`, `2d ago`, `30m ago`).
`end_time`	No	Defaults to now (`now`, `1h ago`).
`app_name`	No	Substring, e.g. "Google Chrome", "Slack".
`window_name`	No	Window title substring.
`speaker_name`	No	Filter audio by speaker (case-insensitive partial).
`focused`	No	Only focused windows.
`tags`	No	Comma-separated; returns items carrying ALL of them (`person:ada,project:atlas`). Exact match.
`include_related`	No	With `tags`, also return a `related` map of co-occurring tags (people/projects/workflows), most-frequent first.
`max_content_length`	No	Middle-truncate each result's text.
`format`	No	`json` (default), `csv`, `tsv`/`table`. CSV is lossless; TSV collapses newlines.
`fields`	No	Column allowlist of dotted paths, e.g. `type,content.app_name,content.text`.

Critical rules: always include start_time (unbounded queries timeout) · "recent" = 30 min, "today" = since midnight, "yesterday" = yesterday's range · if /search is empty, fall back to /activity-summary and check data_status before saying "no data" · on timeout, narrow the range.

Tags link people/projects/topics across screen, audio, and memories under one namespace (person:ada, project:atlas, topic:pricing). Add to a frame/audio: POST /tags/vision/{frame_id} or POST /tags/audio/{chunk_id} body {"tags":["person:ada"]}; to a memory: tags in POST /memories. Retrieve: GET /search?tags=person:ada&start_time=30d%20ago (add content_type=memory for memories). Frames are pruned by retention — tag a memory for durable links (memories carry created_at + a frame_id back to the moment). include_related=true returns co-occurring tags grouped by namespace, replacing 2-3 follow-up calls.

Response: {"data": [{"type":"OCR","content":{"frame_id":...,"text":...,"app_name":...}}, {"type":"Audio","content":{"chunk_id":...,"transcription":...,"speaker":{"name":...}}}, {"type":"Input","content":{...}}], "pagination":{"limit":10,"offset":0,"total":42}}.

3. Elements — `GET /elements`

Lightweight FTS over UI elements (~100-500 bytes each vs 5-20KB from /search). Uniform rows, so format=csv pays off most.

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/elements?frame_id=12345&format=csv&fields=role,text,bounds.left,bounds.top"

Params: q, frame_id, source (accessibility|ocr), role, start_time, end_time, app_name, limit, offset, format, fields.

Frame context (accessibility text, parsed nodes, extracted URLs): GET /frames/{id}/context.

Roles are not normalized across platforms — use the right one for the user's OS:

Concept	macOS	Windows	Linux
Button	`AXButton`	`Button`	`Button`
Static text	`AXStaticText`	`Text`	`Label`
Link	`AXLink`	`Hyperlink`	`Link`
Text field	`AXTextField`	`Edit`	`Entry`
Menu item	`AXMenuItem`	`MenuItem`	`MenuItem`
Checkbox	`AXCheckBox`	`CheckBox`	`CheckBox`
Web area	`AXWebArea`	`Pane`	`DocumentWeb`
Heading	`AXHeading`	`Header`	`Heading`
List item	`AXRow`	`ListItem`	`ListItem`

OCR-only roles (accessibility-unavailable fallback): line, word, block, paragraph, page.

4. Frames (Screenshots) — `GET /frames/{frame_id}`

curl -o /tmp/frame.png "http://localhost:3030/frames/12345"

Raw PNG. Never fetch more than 2-3 frames per query (~1000-2000 tokens each).

5. Media Export — `POST /export`

Real-time MP4 (screen frames at true timestamps + synced mic audio). Duration matches the wall-clock span — NOT a timelapse.

curl -X POST http://localhost:3030/export -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" -d '{"start": "5m ago", "end": "now"}'

Fields: start+end (ISO 8601 or relative; end defaults to now), OR meeting_id for a whole meeting. Optional output_path (absolute, e.g. ~/Downloads/clip.mp4); else lands in the data dir's exports/. Returns {output_path, frame_count, audio_chunk_count, duration_secs, file_size_bytes} — show output_path as inline code. Long ranges take minutes.

ffmpeg on audio file_path from search results (always -y, save to ~/.screenpipe/exports/):

ffmpeg -y -i audio.mp4 -q:a 2 out.mp3                              # convert
ffmpeg -y -i in.mp4 -ss 00:01:00 -to 00:05:00 -q:a 2 clip.mp3      # trim
ffmpeg -y -i in.mp4 -t 10 -vf "fps=10,scale=640:-1" out.gif        # GIF

6. Retranscribe — `POST /audio/retranscribe`

curl -X POST http://localhost:3030/audio/retranscribe -H "Content-Type: application/json" \
  -d '{"start": "1h ago", "end": "now"}'

Optional: engine (deepgram, screenpipe-cloud, whisper-large, whisper-large-v3-turbo, whisper-large-v3-turbo-quantized, qwen3-asr, parakeet, parakeet-mlx, openai-compatible), vocabulary (array of {"word","replacement"}), prompt (Whisper topic context). Keep ranges ≤1h. Show old vs new.

7. Raw SQL — `POST /raw_sql`

curl -X POST http://localhost:3030/raw_sql -H "Content-Type: application/json" \
  -d '{"query": "SELECT ... LIMIT 100"}'

Rules: every SELECT needs LIMIT · always filter by time (datetime('now','-24 hours')) · read-only. Never use frame counts for time estimates — frames are event-driven; use /activity-summary for screen time.

Table	Key Columns	Time Column
`frames`	`app_name`, `window_name`, `browser_url`, `focused`	`timestamp`
`ocr_text`	`text`, `app_name`, `window_name`	join via `frame_id`
`elements`	`source`, `role`, `text`, `bounds_*`	join via `frame_id`
`audio_transcriptions`	`transcription`, `device`, `speaker_id`, `is_input_device`	`timestamp`
`audio_chunks`	`file_path`	`timestamp`
`speakers`	`name`, `metadata`	—
`ui_events`	`event_type`, `app_name`, `window_title`, `browser_url`	`timestamp`
`accessibility`	`app_name`, `window_name`, `text_content`, `browser_url`	`timestamp`
`meetings`	`meeting_app`, `title`, `attendees`, `detection_source`	`meeting_start`
`memories`	`content`, `source`, `tags`, `importance`	`created_at`

-- Most used apps (last 24h)
SELECT app_name, COUNT(*) AS frames FROM frames
WHERE timestamp > datetime('now','-24 hours') AND app_name IS NOT NULL
GROUP BY app_name ORDER BY frames DESC LIMIT 20;

-- Context switches per hour
SELECT strftime('%H:00', timestamp) AS hour, COUNT(*) AS switches
FROM ui_events WHERE event_type='app_switch' AND timestamp > datetime('now','-24 hours')
GROUP BY hour ORDER BY hour LIMIT 24;

Patterns: GROUP BY date(timestamp) (daily), GROUP BY strftime('%H:00', timestamp) (hourly), HAVING frames > 5 (filter noise).

8. Connections — `GET /connections`

curl http://localhost:3030/connections            # list all integrations (40+)
curl http://localhost:3030/connections/telegram   # saved creds for a webhook/token integration

Each entry's description is self-describing — for control surfaces (browsers, gateways, OAuth proxies) it includes the exact endpoint + body shape. Read it before guessing. If not connected, tell the user to set it up in Settings > Connections.

Credential integrations — GET /connections/<id> returns fields to call the service directly:

Telegram: bot_token+chat_id → POST https://api.telegram.org/bot{token}/sendMessage
Slack / Teams: webhook_url → POST {webhook_url} with {"text":...}
Discord: webhook_url → POST {webhook_url} with {"content":...}
Todoist: api_token → POST https://api.todoist.com/api/v1/tasks (Bearer)
Email: smtp_host, smtp_port, smtp_user, smtp_pass, from_address

OAuth/proxy integrations — tokens live in SecretStore, never exposed via GET. Call the local proxy; it injects auth and forwards upstream. There is no /connections/<id>/token endpoint.

# GitHub create issue (repo from pipe settings). Same shape for comments: .../issues/42/comments {"body":...}
curl -X POST http://localhost:3030/connections/github/proxy/repos/OWNER/REPO/issues \
  -H "Content-Type: application/json" -d '{"title":"Bug","body":"Steps..."}'

# Generic OAuth proxy (Zoom, Vercel, Google Docs, Microsoft 365, ...)
curl -X POST http://localhost:3030/connections/<id>/proxy/<upstream-api-path> \
  -H "Content-Type: application/json" -d '{...}'

Don't call https://api.github.com/... directly from a pipe — use the proxy.

Calendar — use calendar endpoints for appointments/upcoming events. If /connections shows ics-calendar.connected: true, include ICS results too before saying the calendar is empty:

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" \
  "http://localhost:3030/connections/calendar/events?hours_back=0&hours_ahead=72"
# also: /connections/google-calendar/events , /connections/ics-calendar/events

Browser control (owned-default) — an embedded browser, shown in the chat. Cookies persist (isolated profile); password fields are stripped from snapshots. Try snapshot first; reach for eval only when needed.

# Navigate → {"ok":true,"url":"<final>"}
curl -X POST -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" -H "Content-Type: application/json" \
  -d '{"url":"https://en.wikipedia.org/wiki/Giraffe"}' \
  http://localhost:3030/connections/browsers/owned-default/navigate

# Snapshot (no JS) → {title, url, tree:"[h1] ...\n  [a] ... → /href", truncated}. Best for "what's on the page?".
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" \
  http://localhost:3030/connections/browsers/owned-default/snapshot

# Eval (escape hatch) — arbitrary JS return value, for clicks / values the snapshot tree omits.
curl -X POST -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" -H "Content-Type: application/json" \
  -d '{"code":"return [...document.querySelectorAll(\".title>a\")].slice(0,5).map(a=>a.innerText)"}' \
  http://localhost:3030/connections/browsers/owned-default/eval

9. Meetings — `GET /meetings`, `PUT /meetings/:id`

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/meetings?start_time=1d%20ago&end_time=now&limit=10"
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/meetings/42"

# Partial update — omitted fields stay as-is. Read first and re-include existing `note` so user notes survive.
curl -X PUT http://localhost:3030/meetings/42 -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" \
  -H "Content-Type: application/json" -d '{"title":"Q3 planning","note":"<existing>\n\n## Summary\n<summary>"}'

Detected from calendar, app detection, window titles, UI elements, multi-speaker audio. q is a case-insensitive substring over title/attendees/notes. Uses PUT, not PATCH. Fields: id, meeting_start, meeting_end (null if ongoing), meeting_app, title?, attendees?, note?, detection_source. Also queryable via raw SQL on the meetings table.

10. Speakers — `POST /speakers/*`

All POST with Content-Type: application/json unless noted:

GET /speakers/search?name=John — search by name
GET /speakers/unnamed?limit=20 — unnamed speakers (for labeling)
GET /speakers/similar?speaker_id=29&limit=5 — similar by voice embedding
/speakers/update {"id":29,"name":"Jordan"} — rename/metadata
/speakers/reassign {"audio_chunk_id":456,"new_speaker_name":"Jordan","propagate_similar":true} — returns new_speaker_id, transcriptions_updated, old_assignments (for undo)
/speakers/undo-reassign {"old_assignments":[{"transcription_id":1,"old_speaker_id":29}]}
/speakers/merge {"speaker_to_keep_id":5,"speaker_to_merge_id":29}
/speakers/hallucination {"speaker_id":29} — mark false detection
/speakers/delete {"id":29} — also removes audio chunk files

"That was actually Jordan, not Karishma": find the audio result's chunk_id → POST /speakers/reassign with audio_chunk_id + new_speaker_name; propagate_similar:true (default) also fixes similar chunks.

11. Memories — High-Signal Persistent Knowledge

Memories are the highest-signal source — curated facts, preferences, decisions, project context distilled from hours of data. If you're calling /search, also query /memories: search gives you what happened, memories give you what matters and why. Query memories first when answering about preferences/decisions/past context, building background on a project/person/workflow, or generating any summary/recommendation/plan.

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?q=preference&limit=20"          # FTS search
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?min_importance=0.5&limit=20"    # recent, high importance
curl -X POST http://localhost:3030/memories -H "Content-Type: application/json" \
  -d '{"content":"User prefers dark mode","source":"user","tags":["preference","ui"],"importance":0.7}'                   # create
curl -X PUT http://localhost:3030/memories/1 -H "Content-Type: application/json" -d '{"content":"...","importance":0.8}' # update
curl -X DELETE http://localhost:3030/memories/1                                                                          # delete

GET /memories params: q, source, tags, min_importance, start_time, end_time, limit, offset. Memories also come via GET /search?content_type=memory (NOT included in content_type=all — ask explicitly), which adds tags + include_related. When you learn a genuinely useful long-lived fact, store it with importance 0.0-1.0 — not transient observations.

12. Notifications — `POST http://localhost:11435/notify`

Notify the desktop UI. This is the Tauri sidecar (port 11435), not the main API. body supports markdown (**bold**, `code`, [text](url)).

curl -X POST http://localhost:11435/notify -H "Content-Type: application/json" \
  -d '{"title":"3 new voice memos","body":"found recordings from today"}'

# Markdown body + action buttons. action types: "link" (web), "deeplink" (screenpipe://), "dismiss".
curl -X POST http://localhost:11435/notify -H "Content-Type: application/json" \
  -d '{"title":"Meeting summary","body":"**Q3 Planning** saved\n\nopen [notes](~/Documents/q3.md)","actions":[{"id":"view","label":"view","type":"deeplink","url":"screenpipe://timeline"},{"id":"skip","label":"skip","type":"dismiss"}]}'

Fields: title* , body* (markdown), type (default "pipe"), timeout/autoDismissMs (ms, default 20000), actions (buttons). Body links: web URL → browser, file path (~/notes.md, /var/log/app.log) → default app, screenpipe://... → in-app. Returns {"success":true}.

13. Other Endpoints

curl http://localhost:3030/health        # health check
curl http://localhost:3030/audio/list    # audio devices
curl http://localhost:3030/vision/list   # monitors

Deep Links & Videos

Reference real moments with clickable links (only IDs/timestamps from actual results — never fabricate):

[10:30 AM — Chrome](screenpipe://frame/12345) — screen results (use frame_id)
[meeting at 3pm](screenpipe://timeline?timestamp=ISO8601) — audio results (use timestamp)

Show a search result's file_path as inline code to make it a playable video: `/Users/name/.screenpipe/data/monitor_1_..._10-30-00.mp4`.

Screenpipe API

Authentication

Context Window Protection

1. Activity Summary — GET /activity-summary

2. Search — GET /search

3. Elements — GET /elements

4. Frames (Screenshots) — GET /frames/{frame_id}

5. Media Export — POST /export

6. Retranscribe — POST /audio/retranscribe

7. Raw SQL — POST /raw_sql

8. Connections — GET /connections

9. Meetings — GET /meetings, PUT /meetings/:id

10. Speakers — POST /speakers/*