Universal social search
Single endpoint that fans out across 12 social platforms in parallel — ranked, clustered, and enriched with real-people comments.
GET /v1/search/everywhere
One endpoint, 12 sources. Send a query, get a ranked + clustered set of results pulled from Reddit, X, YouTube, TikTok, Instagram, Hacker News, Polymarket, GitHub, Threads, Pinterest, Perplexity, and Tavily — with the top comments from each post included so you see real people's sentiment, not just titles.
Credit cost: flat 20 credits per call regardless of how many sources respond or how many comments are enriched.
Auth: x-api-key header, same as every other endpoint.
Quick example
curl 'https://api.socialcrawl.dev/v1/search/everywhere?query=kanye+west' \
-H 'x-api-key: sc_...'{
"success": true,
"data": {
"plan": { "intent": "breaking_news", "freshness_mode": "strict_recent", "subqueries": [...] },
"items": [
{
"candidate_id": "https://reddit.com/r/Music/comments/.../...",
"source": "reddit",
"title": "Kanye West travel to UK blocked by government",
"url": "https://www.reddit.com/r/Music/comments/.../...",
"rerank_score": 88,
"final_score": 67.16,
"source_items": [
{
"engagement": { "score": 17508, "num_comments": 1470, "upvote_ratio": 0.96 },
"media": { "thumbnail_url": "https://..." },
"metadata": {
"top_comments": [
{
"score": 4823,
"excerpt": "Surprised it took this long honestly. Wireless made the right call — they couldn't afford the fallout if he started ranting on stage.",
"author": "u/some_real_user",
"url": "https://reddit.com/r/Music/comments/.../c1",
"date": "2026-04-08T12:34:56Z"
}
]
}
}
]
}
],
"items_by_source": { "reddit": [...], "hackernews": [...], ... },
"clusters": [...],
"sources_called": [...],
"sources_failed": {}
}
}Real-people comments — metadata.top_comments[]
This is the headline feature. For every result whose source has a comments endpoint we expose, we automatically fetch the top-scoring comments and attach them under data.items[i].source_items[0].metadata.top_comments[].
| Source | Comments populate from | Auth |
|---|---|---|
/v1/reddit/post/comments | bundled | |
| TikTok | /v1/tiktok/video/comments | bundled |
/v2/instagram/post/comments | bundled | |
| YouTube | /v1/youtube/video/comments | bundled |
| HackerNews | Algolia HN thread tree (/api/v1/items/{id}) | none |
| GitHub | /repos/{o}/{r}/issues/{n}/comments?sort=reactions | bundled |
Sources without a comments endpoint render no top_comments: twitter-ai-search (synthesis), threads, pinterest, polymarket, perplexity, tavily.
TopComment shape
{
score: number | null, // upvote / like / points count, source-specific
excerpt: string, // up to 300 chars, HTML stripped
author: string | null, // null if [deleted]/[removed]
url: string | null, // direct comment URL when source exposes one
date: string | null // ISO timestamp when available
}Sorted by score descending. Capped at 5 comments per result. Max 300 characters per excerpt.
Why this matters
Search engines that don't expose comments are reading the chrome of the social web. The post body is the headline; the discussion underneath is where actual opinions, contrarian takes, lived experience, and signal vs noise live. By default, every /v1/search/everywhere response surfaces what real people actually said about each result — no second API call needed.
Streaming
Add Accept: text/event-stream to get an SSE stream instead of a JSON envelope. You'll receive these chunks in order:
{ "type": "meta", "request_id": "...", "query": "...", "plan": {...}, "sources_planned": [...] }
{ "type": "source_started", "source": "reddit" }
{ "type": "items", "source": "reddit", "items": [...], "duration_ms": 1234 }
{ "type": "source_failed", "source": "twitter-ai-search", "error": {...} }
{ "type": "ranked_partial", "scores": [{ "candidate_id": "...", "rerank_score": 88 }, ...] }
{ "type": "ranked_final", "items": [...] }
{ "type": "comments_enriched", "candidate_id": "...", "source": "reddit",
"comments": [{ "score": 4823, "excerpt": "...", "author": "...", "url": "...", "date": "..." }] }
{ "type": "clusters", "clusters": [...] }
{ "type": "done", "summary": {...} }The comments_enriched chunk is per-candidate — one fires for each result that had its comments fetched successfully. Frontends can key on candidate_id to merge the comments into the row that's already on screen, no layout reflow.
Order guarantees:
- All
comments_enrichedchunks land before the terminaldone. - They may interleave with
ranked_partial/ranked_final. - Empty results (source has no enrichment, fetch failed, or all comments were
[deleted]) don't emit a chunk.
Parameters
| Param | Required | Type | Description |
|---|---|---|---|
query | yes | string (1–512 chars) | The search query. |
lookback_days | no | integer | Days to look back. Default 30. Mutually exclusive with from_date / to_date. |
from_date | no | ISO YYYY-MM-DD | Lower bound. Mutually exclusive with lookback_days. |
to_date | no | ISO YYYY-MM-DD | Upper bound. Defaults to today when from_date is set alone. |
sources | no | CSV string | Allowlist of source names (e.g. reddit,youtube,github). Mutually exclusive with exclude. |
exclude | no | CSV string | Blocklist of source names. Mutually exclusive with sources. |
Latency expectations
The endpoint runs through a multi-stage pipeline:
- Plan — deterministic (~5ms) + parallel LLM refinement (8s timeout).
- Fan-out — all 12 sources fire concurrently. Per-source timeout 8s (12s for slow sources).
- Annotate + dedupe + fusion — pure compute, ~10ms.
- Comment enrichment — runs in parallel with rerank. Per-candidate timeout 5s. Top 40 candidates × ~6 enrichable sources = up to ~25–40 internal calls. Failures are non-fatal (the affected row just renders no
top_comments). - Rerank — LLM call against gpt-5.4-nano. 15s timeout, partial scores kept on timeout.
- Cluster — pure compute, ~10ms.
Streaming consumers see results progressively (within ~1s of the first source finishing). Sync consumers wait for the full pipeline, typically 12–25s.
Failure modes
- One source times out — pipeline continues with the others; the source appears in
sources_failed: { source: errorMessage }. SSE consumers receive asource_failedchunk. - Comment fetch fails for a candidate — non-fatal. The row's
top_commentsis empty; the response is otherwise unchanged. - Every source fails (zero-floor) — sync returns 502 + auto-refunds the 20cr; stream emits
done(refunded: true). - LLM rerank times out — partial scores already streamed are kept; un-scored candidates fall back to local relevance × freshness × source-quality heuristic.
Source coverage notes
- Pinterest — upstream currently returns
pins: []for every query (vendor issue, seeLAST30DAYS-PARITY-NOTES.md). The Pinterest section may be empty even on broad queries. - Twitter — uses
/v1/twitter/ai-search(Grok-backed synthesis) rather than raw tweet search. The synthesis becomes its own rankable item alongside per-source items.
Related
- Architecture:
docs/features/social-search/UNIVERSAL-SEARCH.md - Comment enrichment deep dive:
docs/features/API/new-additions/COMMENT-ENRICHMENT.md - Frontend SERP architecture:
docs/features/social-search/FRONTEND-ARCHITECTURE.md
