Hybrid search-then-enrich
Find every relevant TikTok and Instagram post about a topic with universal search, then pull all their transcripts into one LLM-ready corpus — 120 credits per run.
Hybrid search-then-enrich
Build a deep-research agent that finds every relevant TikTok and Instagram post about a topic, then pulls the transcripts of all of them, then concatenates them into one corpus ready for LLM analysis.
How do you build a deep-research pipeline over social media?
Chain SocialCrawl's two big primitives: GET /v1/search/everywhere finds where the conversation is happening across 12 platforms, and per-platform transcript endpoints pull what was actually said. Universal search returns the canonical post URL for every result, which feeds straight into the transcript endpoints — no string-massaging required.
The problem
Search results tell you a conversation exists; they don't tell you what's in it. A snippet and a title aren't enough for real research — you need the full spoken content of the top posts, across platforms, in one document an LLM can chew through.
The solution
Three endpoints, one pipeline:
GET /v1/search/everywhere— universal social search (20 credits flat)GET /v1/tiktok/post/transcript— TikTok transcript (10 credits each)GET /v1/instagram/media/transcript— Instagram transcript (10 credits each)
// recipe-deep-research.ts
// Finds top posts on a topic, fetches their transcripts, returns the corpus.
// Run with: SOCIALCRAWL_KEY=sc_... npx tsx recipe-deep-research.ts
const KEY = process.env.SOCIALCRAWL_KEY;
if (!KEY) throw new Error("Set SOCIALCRAWL_KEY");
const BASE = "https://www.socialcrawl.dev/v1";
const topic = "ozempic side effects";
const POSTS_PER_SOURCE = 5;
// ── Step 1: Universal search, sync mode, scoped to TikTok + Instagram ─────
async function search(query: string) {
const url = new URL(`${BASE}/search/everywhere`);
url.searchParams.set("query", query);
url.searchParams.set("sources", "tiktok,instagram");
url.searchParams.set("lookback_days", "30");
const res = await fetch(url, {
headers: { "x-api-key": KEY!, accept: "application/json" },
});
const json = (await res.json()) as {
success: boolean;
data: { items: Array<{ source: string; url: string; title: string }> };
credits_used: number;
};
if (!json.success) throw new Error("search failed");
return json.data.items;
}
// ── Step 2: Per-platform transcript fan-out ───────────────────────────────
type Platform = "tiktok" | "instagram";
const TRANSCRIPT_PATH: Record<Platform, string> = {
tiktok: "tiktok/post/transcript",
instagram: "instagram/media/transcript",
};
// See the video transcription recipe — transcript shapes differ per platform.
// This adapter covers the two used here; extend for the others as needed.
function extractText(
platform: Platform,
data: Record<string, unknown>,
): string {
if (platform === "tiktok") return (data.transcript as string) ?? "";
if (platform === "instagram") {
const t = data.transcripts as Array<{ text?: string }> | undefined;
return t?.[0]?.text ?? "";
}
return (data.transcript_only_text as string) ?? "";
}
async function getTranscript(platform: Platform, postUrl: string) {
const url = new URL(`${BASE}/${TRANSCRIPT_PATH[platform]}`);
url.searchParams.set("url", postUrl);
const res = await fetch(url, { headers: { "x-api-key": KEY! } });
const json = (await res.json()) as {
success: boolean;
data?: Record<string, unknown>;
error?: { message: string };
};
if (!json.success) {
console.warn(`skipped ${postUrl}: ${json.error?.message}`);
return null;
}
return { text: extractText(platform, json.data!) };
}
// ── Step 3: Pipeline ──────────────────────────────────────────────────────
const items = await search(topic);
const tiktoks = items
.filter((i) => i.source === "tiktok")
.slice(0, POSTS_PER_SOURCE);
const instagrams = items
.filter((i) => i.source === "instagram")
.slice(0, POSTS_PER_SOURCE);
const transcripts = await Promise.all([
...tiktoks.map(async (item) => ({
title: item.title,
url: item.url,
transcript: await getTranscript("tiktok", item.url),
})),
...instagrams.map(async (item) => ({
title: item.title,
url: item.url,
transcript: await getTranscript("instagram", item.url),
})),
]);
const corpus = transcripts
.filter((t) => t.transcript !== null && t.transcript.text.length > 0)
.map((t) => `# ${t.title}\n${t.url}\n\n${t.transcript!.text}`)
.join("\n\n---\n\n");
console.log(`corpus length: ${corpus.length} chars`);
console.log(corpus.slice(0, 500), "…");What you get back
// The final `corpus` string after Step 3 — markdown-formatted, ready for an LLM:
// # I tried Ozempic for 90 days — here's what happened
// https://www.tiktok.com/@drmike/video/7314...
//
// So I want to talk about what nobody tells you about Ozempic...
//
// ---
//
// # The Ozempic side effect doctors won't mention
// https://www.instagram.com/reel/CzA1234abcd/
//
// Three weeks in I started experiencing what they call...
//
// ---
//
// (8 more transcripts concatenated)Credits cost
Cost per run: 120 credits (20 for the search + 10 enriched posts × 10 credits each) — a "deep research" budget that produces a real corpus, not just a feed. Tune
POSTS_PER_SOURCEto trade depth for cost.
Take it further
- See Universal social search for advanced filters (
from_date,to_date,exclude) you can layer onto step 1 to narrow the candidate set. - Swap
sources=tiktok,instagramforsources=youtube,redditand updateTRANSCRIPT_PATH— the same pipeline works for any combination of the seven transcript-capable platforms (see Video transcription). - Next: feed
corpusto your LLM of choice and ask for a structured summary, sentiment breakdown, or claim list — that's where the 120-credit budget pays off. Sentiment analysis shows that exact step on comments instead of transcripts. - New here? Quickstart.
