SocialCrawl

Hybrid search-then-enrich

Find every relevant TikTok and Instagram post about a topic with universal search, then pull all their transcripts into one LLM-ready corpus — 120 credits per run.

Hybrid search-then-enrich

Build a deep-research agent that finds every relevant TikTok and Instagram post about a topic, then pulls the transcripts of all of them, then concatenates them into one corpus ready for LLM analysis.

How do you build a deep-research pipeline over social media?

Chain SocialCrawl's two big primitives: GET /v1/search/everywhere finds where the conversation is happening across 12 platforms, and per-platform transcript endpoints pull what was actually said. Universal search returns the canonical post URL for every result, which feeds straight into the transcript endpoints — no string-massaging required.

The problem

Search results tell you a conversation exists; they don't tell you what's in it. A snippet and a title aren't enough for real research — you need the full spoken content of the top posts, across platforms, in one document an LLM can chew through.

The solution

Three endpoints, one pipeline:

  • GET /v1/search/everywhere — universal social search (20 credits flat)
  • GET /v1/tiktok/post/transcript — TikTok transcript (10 credits each)
  • GET /v1/instagram/media/transcript — Instagram transcript (10 credits each)
// recipe-deep-research.ts
// Finds top posts on a topic, fetches their transcripts, returns the corpus.
// Run with: SOCIALCRAWL_KEY=sc_... npx tsx recipe-deep-research.ts

const KEY = process.env.SOCIALCRAWL_KEY;
if (!KEY) throw new Error("Set SOCIALCRAWL_KEY");

const BASE = "https://www.socialcrawl.dev/v1";
const topic = "ozempic side effects";
const POSTS_PER_SOURCE = 5;

// ── Step 1: Universal search, sync mode, scoped to TikTok + Instagram ─────
async function search(query: string) {
  const url = new URL(`${BASE}/search/everywhere`);
  url.searchParams.set("query", query);
  url.searchParams.set("sources", "tiktok,instagram");
  url.searchParams.set("lookback_days", "30");

  const res = await fetch(url, {
    headers: { "x-api-key": KEY!, accept: "application/json" },
  });
  const json = (await res.json()) as {
    success: boolean;
    data: { items: Array<{ source: string; url: string; title: string }> };
    credits_used: number;
  };
  if (!json.success) throw new Error("search failed");
  return json.data.items;
}

// ── Step 2: Per-platform transcript fan-out ───────────────────────────────
type Platform = "tiktok" | "instagram";
const TRANSCRIPT_PATH: Record<Platform, string> = {
  tiktok: "tiktok/post/transcript",
  instagram: "instagram/media/transcript",
};

// See the video transcription recipe — transcript shapes differ per platform.
// This adapter covers the two used here; extend for the others as needed.
function extractText(
  platform: Platform,
  data: Record<string, unknown>,
): string {
  if (platform === "tiktok") return (data.transcript as string) ?? "";
  if (platform === "instagram") {
    const t = data.transcripts as Array<{ text?: string }> | undefined;
    return t?.[0]?.text ?? "";
  }
  return (data.transcript_only_text as string) ?? "";
}

async function getTranscript(platform: Platform, postUrl: string) {
  const url = new URL(`${BASE}/${TRANSCRIPT_PATH[platform]}`);
  url.searchParams.set("url", postUrl);
  const res = await fetch(url, { headers: { "x-api-key": KEY! } });
  const json = (await res.json()) as {
    success: boolean;
    data?: Record<string, unknown>;
    error?: { message: string };
  };
  if (!json.success) {
    console.warn(`skipped ${postUrl}: ${json.error?.message}`);
    return null;
  }
  return { text: extractText(platform, json.data!) };
}

// ── Step 3: Pipeline ──────────────────────────────────────────────────────
const items = await search(topic);

const tiktoks = items
  .filter((i) => i.source === "tiktok")
  .slice(0, POSTS_PER_SOURCE);
const instagrams = items
  .filter((i) => i.source === "instagram")
  .slice(0, POSTS_PER_SOURCE);

const transcripts = await Promise.all([
  ...tiktoks.map(async (item) => ({
    title: item.title,
    url: item.url,
    transcript: await getTranscript("tiktok", item.url),
  })),
  ...instagrams.map(async (item) => ({
    title: item.title,
    url: item.url,
    transcript: await getTranscript("instagram", item.url),
  })),
]);

const corpus = transcripts
  .filter((t) => t.transcript !== null && t.transcript.text.length > 0)
  .map((t) => `# ${t.title}\n${t.url}\n\n${t.transcript!.text}`)
  .join("\n\n---\n\n");

console.log(`corpus length: ${corpus.length} chars`);
console.log(corpus.slice(0, 500), "…");

What you get back

// The final `corpus` string after Step 3 — markdown-formatted, ready for an LLM:
// # I tried Ozempic for 90 days — here's what happened
// https://www.tiktok.com/@drmike/video/7314...
//
// So I want to talk about what nobody tells you about Ozempic...
//
// ---
//
// # The Ozempic side effect doctors won't mention
// https://www.instagram.com/reel/CzA1234abcd/
//
// Three weeks in I started experiencing what they call...
//
// ---
//
// (8 more transcripts concatenated)

Credits cost

Cost per run: 120 credits (20 for the search + 10 enriched posts × 10 credits each) — a "deep research" budget that produces a real corpus, not just a feed. Tune POSTS_PER_SOURCE to trade depth for cost.

Take it further

  • See Universal social search for advanced filters (from_date, to_date, exclude) you can layer onto step 1 to narrow the candidate set.
  • Swap sources=tiktok,instagram for sources=youtube,reddit and update TRANSCRIPT_PATH — the same pipeline works for any combination of the seven transcript-capable platforms (see Video transcription).
  • Next: feed corpus to your LLM of choice and ask for a structured summary, sentiment breakdown, or claim list — that's where the 120-credit budget pays off. Sentiment analysis shows that exact step on comments instead of transcripts.
  • New here? Quickstart.
How to Build a Social Media Deep-Research Pipeline — Search Plus Transcripts | Socialcrawl