100 free credits — no credit card required.Start building
Logo
Back to blog

Social Search Engine API: One Call, 12+ Platforms

·17 min read

A social search engine API searches 12+ social platforms in one call, fusing and reranking results via RRF + LLM into 40 items. See real captured responses.

Social Search Engine API: One Call, 12+ Platforms

A social search engine calls TikTok, Reddit, YouTube, and other platform APIs directly, fusing results from all of them into one ranked response in a single API call. Searching across social platforms today means opening Reddit, then YouTube, then TikTok: one tab at a time, one schema at a time. /v1/search/everywhere is a single GET request that fans out across up to 17 source lanes in parallel, runs LLM query planning, fuses results via RRF, reranks with a second LLM pass, clusters findings semantically, and returns everything in one JSON envelope. One API key. 20 credits per live call.

This is a social search engine in the form of an API endpoint. Below are three real captured calls made on 2026-06-29: not marketing copy, not a demo, actual request and response pairs. Coverage ranged from 59% to 82% across the three calls. Three source lanes (instagram-hashtag, youtube-hashtag, linkedin) consistently returned errors in our test window. That is not hidden: every response includes sources_failed with per-lane status and sources_succeeded with the lanes that actually contributed data.

What does one social search engine API call actually return?

The query was open source LLMs with lookback_days=60. This particular call was a cache hit (0 credits) because an identical query had been made within the 2-minute TTL window.

FieldValue
Sources called17
Sources succeeded14 (82.4% coverage)
Sources failed3 (instagram-hashtag 401, linkedin 401, youtube-hashtag 400)
Total raw items collected179
Fused and reranked items returned40
Semantic clusters40
Plan intentconcept
Plan freshness modebalanced_recent
Plan cluster modedebate
LLM subqueries planned6
Credits used0 (cache hit)

The minimal curl to reproduce this:

curl -s -H "x-api-key: $SOCIALCRAWL_API_KEY" \
  "https://www.socialcrawl.dev/v1/search/everywhere?query=open+source+LLMs&lookback_days=60"

Here is item #1 from the fused result, verbatim from the captured response:

{
  "candidate_id": "https://reddit.com/r/localllm/comments/1u9gohq/open_source_is_starting_to_beat_frontier_on",
  "source": "reddit",
  "title": "Open source is starting to beat frontier on cost/performance",
  "url": "https://www.reddit.com/r/LocalLLM/comments/1u9gohq/open_source_is_starting_to_beat_frontier_on/",
  "published_at": "2026-06-18T19:58:33.000Z",
  "engagement": { "score": 405, "num_comments": 43 },
  "final_score": 66.64,
  "rerank_score": 85,
  "explanation": "Explicit",
  "cluster_id": "cluster-1",
  "subquery_labels": ["primary"],
  "top_comments": [
    { "score": 48, "excerpt": "Starting to? It's been years.", "author": "StupidScaredSquirrel" },
    { "score": 45, "excerpt": "I can run the local model that is as good as the best one from a couple of years ago. \"Frontier\" models are not that fantastic.", "author": "readmond" },
    { "score": 12, "excerpt": "This is really just stupid. You can't possibly represent this concept in a single chart plotting two benchmarks.", "author": "Uninterested_Viewer" }
  ]
}

final_score vs rerank_score. The rerank_score (85 here) is the LLM reranker's 0-100 relevance signal for this specific item. The final_score (66.64) is a composite of RRF rank position, rerank score, freshness, and engagement. It is not bounded to 100 and is not a simple percentage. The RRF formula applied during fusion is RRF(d) = Σ 1/(k + r(d)) where k=60. The MongoDB explainer has a readable walkthrough of the math.

top_comments at no extra cost. The three inline Reddit comments arrive in the same response as the post itself, without a separate call. The community voice is part of the fused result, including the dissenting commenter who calls the chart "stupid." For a social media aggregator API, that is the difference between a title and the conversation around it.

cluster_id groups related results. Items sharing the same cluster_id form a semantic topic group across whatever platforms contributed. The debate cluster mode here reflects the planner's concept intent: conceptual queries tend to surface opposing views worth grouping separately.

The LLM planner generated 6 subqueries from one input. Before calling any upstream source, the planner expanded open source LLMs into six targeted queries, each routed to the platforms most likely to carry that specific angle:

  1. open source LLMs: all 14 active platforms (primary sweep)
  2. open source LLM llama mistral mixtral qwen gemma openweights: reddit, hackernews, twitter-ai-search, threads, youtube
  3. open source LLM license apache mit gpl lgpl llama license model weights redistribution: reddit, hackernews, twitter-ai-search, linkedin
  4. fine-tuning open source LLMs lora qlora vllm llama.cpp ollama deployment gpu inference: github, reddit, hackernews, youtube
  5. open source LLM safety jailbreaks alignment moderation refusal policy hallucinations: reddit, twitter-ai-search, perplexity, youtube
  6. open source LLM benchmark leaderboards evals lm-eval harness mmlu hellaswag: hackernews, github, perplexity, youtube

The licensing debate routes to reddit and hackernews. Fine-tuning guides route to github and youtube. Benchmarks route to hackernews, github, and perplexity. From 179 raw items collected this way, RRF fusion and LLM reranking produced 40 ranked results.

Honest caveat on failures. Three lanes failed on this call: instagram-hashtag (401), linkedin (401), youtube-hashtag (400). These are upstream authentication and format errors at the time of capture. They are reported transparently in sources_failed. Partial failures do not trigger an automatic credit refund. Auto-refund only fires when all sources fail simultaneously.

How does the LLM planner route a single query across 17 sources?

Hitting all 17 sources for every query would waste upstream resources and return low-signal noise. For a dev-tools query like AI agent frameworks 2026, calling Pinterest and Rumble would produce almost nothing useful. The planner handles this by scoring topical affinity per platform and pruning low-affinity lanes before the fanout begins.

Here is sources_failed from Call 2 (AI agent frameworks 2026, lookback_days=30, 20 credits, captured 2026-06-29):

"sources_failed": {
  "instagram": "pruned: low_affinity (0.15)",
  "polymarket": "pruned: low_affinity (0.10)",
  "pinterest": "pruned: low_affinity (0.10)",
  "rumble": "pruned: low_affinity (0.05)",
  "instagram-hashtag": "upstream instagram-hashtag returned 401",
  "linkedin": "upstream linkedin returned 401",
  "youtube-hashtag": "upstream youtube-hashtag returned 400"
}

Four sources were proactively pruned before any upstream calls were made. Three others hit upstream errors. The 10 that contributed (hackernews, reddit, tavily, youtube, github, tiktok, tiktok-hashtag, threads, perplexity, twitter-ai-search) are exactly the right platforms for a query about AI frameworks in 2026. The 58.8% coverage reflects correct behavior, not a degraded result. More sources is not always better.

The full pipeline for this call:

User query: "AI agent frameworks 2026"
         |
         v
[LLM Query Planner]
 → intent: "prediction"
 → 6 subqueries × targeted platform subsets
 → prunes low-affinity sources (instagram 0.15, polymarket 0.10)
         |
         v
[17-source parallel fanout]
 → 10 sources respond within timeout
 → ~155 raw items collected
         |
         v
[RRF Fusion + LLM Reranking]
 → 40 fused candidates
 → each gets: final_score, rerank_score, explanation, cluster_id
 → top items enriched with inline community comments
         |
         v
[Semantic Clustering]
 → 27 named topic clusters
 → cross-platform grouping (cluster-1 spans HN + Reddit + Tavily + Perplexity)
         |
         v
GET /v1/search/everywhere response
 → 40 ranked items, 27 clusters, plan, coverage meta
 → 20 credits

Cluster-1 from this call shows the aggregate social media approach at work:

{
  "cluster_id": "cluster-1",
  "title": "AI Agent Frameworks Comparison",
  "sources": ["hackernews", "perplexity", "reddit", "tavily"],
  "score": 72.04,
  "candidate_count": 6
}

Six results from four different platforms, grouped under one topic, in one response. The planner chose cluster_mode=market because intent=prediction. A "2026" query asks about direction and future bets, which calls for market-sentiment grouping rather than debate grouping. The clustering strategy adapts to the detected query intent automatically.

Why does a consumer query activate more platforms than a dev tools query?

For Stanley cup tumbler (lookback_days=30), 13 of 17 source lanes succeeded (76.5% coverage). Pinterest, Rumble, and TikTok all contributed: the same platforms the planner pruned for the AI frameworks query.

FieldValue
Sources succeeded13 (76.5%)
Sources failed4 (linkedin 401, instagram-hashtag 401, youtube-hashtag 400, instagram timeout)
Fused items returned40
Semantic clusters40
Plan intentfactual
Plan cluster modenone

The planner assigned high affinity to Pinterest, TikTok, and Rumble for a consumer product query because visual and video content carries strong signal for product research. The routing is query-aware, not fixed. The same social search endpoint adapts to very different query types without any configuration change on your end.

Here is item #6 from the Stanley cup fused result, a TikTok video, verbatim:

{
  "source": "tiktok",
  "title": "Stanley tumbler special edition Fifa world cup Mexico 2026 #stanleytumbler #tumblerstanley",
  "engagement": { "views": 34313, "likes": 227, "comments": 22 },
  "final_score": 58.61,
  "rerank_score": 75
}

Platform-native video metrics appear in the same envelope as Reddit upvotes and Hacker News points. The engagement object is platform-specific: Reddit carries score and num_comments; TikTok carries views, likes, comments; YouTube carries views. Sources like tavily and perplexity return empty engagement objects because they do not expose that data.

The cluster_mode=none reflects the factual intent. A product query does not produce opinion groupings. The planner skips clustering and returns results ranked by composite score alone. For brand monitoring, trend detection, or product research, the endpoint adapts to serve the query. No per-platform configuration required.

What can Exa and Tavily not search that this social search engine can?

Exa describes itself as "Web search, built for AI agents" and "One API for search, crawling, and research agents" (exa.ai, retrieved 2026-06-29). Tavily is "the real-time search engine for AI agents and RAG workflows," claiming 300M+ monthly requests, 2M+ developers, and a p50 latency of 180ms (tavily.com, retrieved 2026-06-29; $25M Series A, August 2025). Both are well-built and widely used for grounding LLMs with live web content. Neither calls TikTok, Reddit, YouTube, Instagram, Bluesky, Threads, Hacker News, or GitHub APIs directly. They index web pages, including pages from those platforms that search engines have already crawled. The Reddit thread from four hours ago, the Hacker News Show HN from this morning, the TikTok with 34K views: none of that appears in Exa's or Tavily's index until hours or days later, if ever.

Social Searcher (social-searcher.com, retrieved 2026-06-29) is the one consumer tool explicitly targeting the "social search" concept. Its footer discloses it uses "publicly indexed data from major search engines," not direct platform API access. Octolens (Jan 14, 2026) summarizes it: "Best for: Quick, one-off social media searches without signing up for anything. Skip if: You need reliable API access, developer community coverage, or consistent data quality." Social Searcher has no developer API.

ExaTavilySocialCrawl /v1/search/everywhere
What it searchesOpen web indexOpen web index12+ social platform APIs directly
Social engagement dataNoneNoneviews, likes, comments, upvotes (platform-native)
Developer APIYesYesYes (one key, flat 20cr/call)

Every platform that succeeds in this aggregate social media call returns the same envelope fields: source, title, url, published_at, engagement, final_score, rerank_score, cluster_id, top_comments. No per-platform parsing logic. The unified schema is identical whether the result came from Reddit or TikTok or GitHub.

The shift in user search behavior makes this relevant to more products than brand monitoring alone. In July 2022, Google SVP Prabhakar Raghavan said at Fortune's Brainstorm Tech conference: "In our studies, something like almost 40% of young people, when they're looking for a place for lunch, they don't go to Google Maps or Search... They go to TikTok or Instagram." (TechCrunch, July 12 2022). Econsultancy (Jan 2024) added the qualifiers: the "almost 40%" covered TikTok and Instagram combined, applied to US users ages 18-24, and was specific to restaurant and dining discovery. Google never released the underlying data beyond Raghavan's spoken remarks. The directional signal stands.

How do you make your first social media search API call?

Three steps.

Step 1: Get a key. Sign up at socialcrawl.dev. Free trial credits are included. No prerequisites beyond an account.

Step 2: The minimal call.

curl -s -H "x-api-key: YOUR_KEY" \
  "https://www.socialcrawl.dev/v1/search/everywhere?query=open+source+LLMs&lookback_days=60"

Step 3: Python snippet.

import os
import requests

api_key = os.environ["SOCIALCRAWL_API_KEY"]
url = "https://www.socialcrawl.dev/v1/search/everywhere"
params = {"query": "open source LLMs", "lookback_days": 60}
headers = {"x-api-key": api_key}

response = requests.get(url, params=params, headers=headers, timeout=90)
data = response.json()

print(data["data"]["items"][0])

SSE streaming. Pass Accept: text/event-stream to receive typed server-sent events as the pipeline runs (meta, source_started, items, ranked_partial, clusters, done) instead of waiting for the complete JSON response. SSE calls skip the 2-minute cache, which is useful for building UIs that stream results platform by platform as they arrive.

Billing. 20 credits per live call. An identical query within the 2-minute cache window costs 0 credits (Call 1 in this post was a cache hit at 0cr). SocialCrawl credit packs are one-time purchases that do not expire.

Coverage. Every response includes data.sources_succeeded, data.coverage, and data.sources_failed. Check these in production. During our test runs, three lanes consistently returned errors: instagram-hashtag (401), youtube-hashtag (400), and linkedin (401). Coverage per call ranged from 58.8% to 82.4%. The API reference has the full parameter list.

Frequently asked questions

How is a social search engine different from Google?

Google indexes the open web via crawlers. A social search engine calls platform APIs directly. A Reddit thread published four hours ago, a Hacker News Show HN from this morning, a TikTok with 34K views. None of that shows up in Google's index until hours or days later, if the platform exposes it publicly at all. Google SVP Prabhakar Raghavan noted in July 2022 that "almost 40% of young people" were going to TikTok or Instagram for dining searches instead of Google (TechCrunch, July 12 2022). Econsultancy (Jan 2024) adds the nuance: the figure covered TikTok and Instagram combined, applied to US users ages 18-24, and was specific to restaurant discovery. Social platforms increasingly carry discovery intent that web crawlers cannot index in real time.

What is the difference between a social search API and a web search API?

Web search APIs (Exa, Tavily, Bing) return pages that crawlers have already indexed. A social search API calls platform endpoints directly. The difference shows up in two places: freshness and engagement data. On freshness: a Reddit thread from four hours ago or a Hacker News post from this morning is not yet in a web crawler's index. A social search API returns it immediately. On engagement data: web search APIs return no upvotes, view counts, or comment threads. A social search API returns platform-native metrics in the same envelope: Reddit score and num_comments, TikTok views and likes, inline top_comments. Query routing also differs. /v1/search/everywhere runs an LLM planner that assigns per-platform affinity scores and prunes low-signal sources before the fanout. A web search API searches one homogenous index regardless of query type.

Is there an API to search all social media at once?

Yes. /v1/search/everywhere from the SocialCrawl API is a single social media search API endpoint that attempts up to 17 source lanes in one call: reddit, hackernews, youtube, tiktok, tiktok-hashtag, instagram, instagram-hashtag, linkedin, github, threads, pinterest, rumble, polymarket, perplexity, tavily, and twitter-ai-search. Results are fused via RRF and reranked via LLM into up to 40 items. Not every lane succeeds on every call. During our test window, linkedin and some instagram lanes returned auth errors consistently. Check data.sources_succeeded per call to see exactly what contributed.

What platforms does the social search engine cover?

The 17 configured source lanes (as of 2026-06-29): instagram-hashtag, linkedin, youtube-hashtag, hackernews, polymarket, github, tavily, threads, youtube, pinterest, tiktok-hashtag, tiktok, rumble, perplexity, reddit, instagram, twitter-ai-search. The LLM planner may prune low-affinity lanes per query. In our test calls, pinterest and rumble were pruned for a dev-tools query but contributed for a consumer product query. Actual sources returning data depend on query affinity scores and upstream availability. Coverage across our three test calls ranged from 58.8% to 82.4%.

How much does the social media search API cost?

20 credits per live call. Identical queries within the 2-minute cache window cost 0 credits. The "open source LLMs" call in this post was a cache hit at 0cr. SocialCrawl credit packs are one-time purchases that do not expire. See pricing for pack sizes and rates.

Can this work as a social listening API for brand monitoring?

The endpoint is optimized for ad-hoc search across platforms. For ongoing monitoring with continuous keyword tracking, alerting, and backfill, that is a distinct use case from what /v1/search/everywhere is designed for (see how to build a social media monitoring API for that pattern). What it does cover: a lookback_days parameter (tested up to 60 days in this post) gives a meaningful historical window for brand mention sampling, and sources_succeeded metadata shows exactly which platforms contributed. Each call returns up to 40 fused, ranked, clustered results from across the social web. The API reference has the full parameter list.

Topics
#social-search-engine#aggregate-social-media#social-listening-api#social-media-monitoring-api#social-media-scraping-api#social-media-search-api#social-media-aggregator-api#social-media-data-api

Related posts