SocialCrawl
search

Universal social search

Single endpoint that fans out across 12 social platforms in parallel — ranked, clustered, and enriched with real-people comments.

GET /v1/search/everywhere

One endpoint, 12 sources. Send a query, get a ranked + clustered set of results pulled from Reddit, X, YouTube, TikTok, Instagram, Hacker News, Polymarket, GitHub, Threads, Pinterest, Perplexity, and Tavily — with the top comments from each post included so you see real people's sentiment, not just titles.

Credit cost: flat 20 credits per call regardless of how many sources respond or how many comments are enriched.

Auth: x-api-key header, same as every other endpoint.


Quick example

curl 'https://api.socialcrawl.dev/v1/search/everywhere?query=kanye+west' \
  -H 'x-api-key: sc_...'
{
  "success": true,
  "data": {
    "plan": { "intent": "breaking_news", "freshness_mode": "strict_recent", "subqueries": [...] },
    "items": [
      {
        "candidate_id": "https://reddit.com/r/Music/comments/.../...",
        "source": "reddit",
        "title": "Kanye West travel to UK blocked by government",
        "url": "https://www.reddit.com/r/Music/comments/.../...",
        "rerank_score": 88,
        "final_score": 67.16,
        "source_items": [
          {
            "engagement": { "score": 17508, "num_comments": 1470, "upvote_ratio": 0.96 },
            "media": { "thumbnail_url": "https://..." },
            "metadata": {
              "top_comments": [
                {
                  "score": 4823,
                  "excerpt": "Surprised it took this long honestly. Wireless made the right call — they couldn't afford the fallout if he started ranting on stage.",
                  "author": "u/some_real_user",
                  "url": "https://reddit.com/r/Music/comments/.../c1",
                  "date": "2026-04-08T12:34:56Z"
                }
              ]
            }
          }
        ]
      }
    ],
    "items_by_source": { "reddit": [...], "hackernews": [...], ... },
    "clusters": [...],
    "sources_called": [...],
    "sources_failed": {}
  }
}

Real-people comments — metadata.top_comments[]

This is the headline feature. For every result whose source has a comments endpoint we expose, we automatically fetch the top-scoring comments and attach them under data.items[i].source_items[0].metadata.top_comments[].

SourceComments populate fromAuth
Reddit/v1/reddit/post/commentsbundled
TikTok/v1/tiktok/video/commentsbundled
Instagram/v2/instagram/post/commentsbundled
YouTube/v1/youtube/video/commentsbundled
HackerNewsAlgolia HN thread tree (/api/v1/items/{id})none
GitHub/repos/{o}/{r}/issues/{n}/comments?sort=reactionsbundled

Sources without a comments endpoint render no top_comments: twitter-ai-search (synthesis), threads, pinterest, polymarket, perplexity, tavily.

TopComment shape

{
  score: number | null,    // upvote / like / points count, source-specific
  excerpt: string,         // up to 300 chars, HTML stripped
  author: string | null,   // null if [deleted]/[removed]
  url: string | null,      // direct comment URL when source exposes one
  date: string | null      // ISO timestamp when available
}

Sorted by score descending. Capped at 5 comments per result. Max 300 characters per excerpt.

Why this matters

Search engines that don't expose comments are reading the chrome of the social web. The post body is the headline; the discussion underneath is where actual opinions, contrarian takes, lived experience, and signal vs noise live. By default, every /v1/search/everywhere response surfaces what real people actually said about each result — no second API call needed.


Streaming

Add Accept: text/event-stream to get an SSE stream instead of a JSON envelope. You'll receive these chunks in order:

{ "type": "meta",              "request_id": "...", "query": "...", "plan": {...}, "sources_planned": [...] }
{ "type": "source_started",    "source": "reddit" }
{ "type": "items",             "source": "reddit", "items": [...], "duration_ms": 1234 }
{ "type": "source_failed",     "source": "twitter-ai-search", "error": {...} }
{ "type": "ranked_partial",    "scores": [{ "candidate_id": "...", "rerank_score": 88 }, ...] }
{ "type": "ranked_final",      "items": [...] }
{ "type": "comments_enriched", "candidate_id": "...", "source": "reddit",
                               "comments": [{ "score": 4823, "excerpt": "...", "author": "...", "url": "...", "date": "..." }] }
{ "type": "clusters",          "clusters": [...] }
{ "type": "done",              "summary": {...} }

The comments_enriched chunk is per-candidate — one fires for each result that had its comments fetched successfully. Frontends can key on candidate_id to merge the comments into the row that's already on screen, no layout reflow.

Order guarantees:

  • All comments_enriched chunks land before the terminal done.
  • They may interleave with ranked_partial / ranked_final.
  • Empty results (source has no enrichment, fetch failed, or all comments were [deleted]) don't emit a chunk.

Parameters

ParamRequiredTypeDescription
queryyesstring (1–512 chars)The search query.
lookback_daysnointegerDays to look back. Default 30. Mutually exclusive with from_date / to_date.
from_datenoISO YYYY-MM-DDLower bound. Mutually exclusive with lookback_days.
to_datenoISO YYYY-MM-DDUpper bound. Defaults to today when from_date is set alone.
sourcesnoCSV stringAllowlist of source names (e.g. reddit,youtube,github). Mutually exclusive with exclude.
excludenoCSV stringBlocklist of source names. Mutually exclusive with sources.

Latency expectations

The endpoint runs through a multi-stage pipeline:

  1. Plan — deterministic (~5ms) + parallel LLM refinement (8s timeout).
  2. Fan-out — all 12 sources fire concurrently. Per-source timeout 8s (12s for slow sources).
  3. Annotate + dedupe + fusion — pure compute, ~10ms.
  4. Comment enrichment — runs in parallel with rerank. Per-candidate timeout 5s. Top 40 candidates × ~6 enrichable sources = up to ~25–40 internal calls. Failures are non-fatal (the affected row just renders no top_comments).
  5. Rerank — LLM call against gpt-5.4-nano. 15s timeout, partial scores kept on timeout.
  6. Cluster — pure compute, ~10ms.

Streaming consumers see results progressively (within ~1s of the first source finishing). Sync consumers wait for the full pipeline, typically 12–25s.


Failure modes

  • One source times out — pipeline continues with the others; the source appears in sources_failed: { source: errorMessage }. SSE consumers receive a source_failed chunk.
  • Comment fetch fails for a candidate — non-fatal. The row's top_comments is empty; the response is otherwise unchanged.
  • Every source fails (zero-floor) — sync returns 502 + auto-refunds the 20cr; stream emits done(refunded: true).
  • LLM rerank times out — partial scores already streamed are kept; un-scored candidates fall back to local relevance × freshness × source-quality heuristic.

Source coverage notes

  • Pinterest — upstream currently returns pins: [] for every query (vendor issue, see LAST30DAYS-PARITY-NOTES.md). The Pinterest section may be empty even on broad queries.
  • Twitter — uses /v1/twitter/ai-search (Grok-backed synthesis) rather than raw tweet search. The synthesis becomes its own rankable item alongside per-source items.

Universal social search | Socialcrawl