SocialCrawl
Recipes

Video transcription

One function that returns the spoken content of any social video — TikTok, Instagram, YouTube, Facebook, X, Reddit, or Rumble — for 10 credits per video.

Video transcription

Build a podcast / video-research tool that takes any social video URL and returns its spoken content, regardless of which platform it lives on.

How do you get a transcript of a social media video?

Pass the video URL to the platform's transcript endpoint — GET /v1/tiktok/post/transcript, /v1/youtube/video/transcript, /v1/instagram/media/transcript, and four more. Each costs 10 credits and returns the spoken content as text. A small adapter normalises the per-platform response shapes into one string, so your application code stays platform-agnostic.

The problem

Transcripts are the highest-density signal in social video — captions lie, thumbnails bait, but the spoken word is what the creator actually said. Each platform exposes (or hides) transcripts differently, and the response shapes don't match: YouTube returns timestamped segments, TikTok returns a plain string, Instagram returns a plural transcripts array.

The solution

Seven transcript endpoints behind one auth model, plus a 20-line adapter:

  • GET /v1/tiktok/post/transcript — TikTok video transcript
  • GET /v1/instagram/media/transcript — Instagram reel / video transcript
  • GET /v1/youtube/video/transcript — YouTube / Shorts transcript
  • GET /v1/facebook/post/transcript — Facebook post / reel transcript
  • GET /v1/twitter/tweet/transcript — X video tweet transcript
  • GET /v1/reddit/post/transcript — Reddit video-post transcript
  • GET /v1/rumble/video/transcript — Rumble video transcript

All seven are premium tier (10 credits each). Transcript endpoints are upstream-pass-through (the canonical Transcript archetype keeps the upstream shape so you don't lose platform-specific fields). The three shapes verified live:

  • YouTubedata.transcript[] (array of {text, startMs, endMs, startTimeText} segments) + data.transcript_only_text (string) + data.language (string)
  • TikTokdata.transcript is a string (the full text directly)
  • Instagramdata.transcripts[] (note the plural) of {id, shortcode, text}

The other four platforms (Facebook, Twitter, Reddit, Rumble) follow the same upstream-pass-through pattern. The extractText() adapter in the snippet below handles the variance.

// recipe-transcribe.ts
// Returns the transcript of any social video URL.
// Run with: SOCIALCRAWL_KEY=sc_... npx tsx recipe-transcribe.ts

const KEY = process.env.SOCIALCRAWL_KEY;
if (!KEY) throw new Error("Set SOCIALCRAWL_KEY");

type Platform =
  | "tiktok"
  | "instagram"
  | "youtube"
  | "facebook"
  | "twitter"
  | "reddit"
  | "rumble";

const ENDPOINT: Record<Platform, string> = {
  tiktok: "tiktok/post/transcript",
  instagram: "instagram/media/transcript",
  youtube: "youtube/video/transcript",
  facebook: "facebook/post/transcript",
  twitter: "twitter/tweet/transcript",
  reddit: "reddit/post/transcript",
  rumble: "rumble/video/transcript",
};

// Adapter that normalises the per-platform shape into a single text string.
// Each branch reflects the upstream's actual response (verified live).
function extractText(
  platform: Platform,
  data: Record<string, unknown>,
): string {
  if (platform === "youtube") {
    return (data.transcript_only_text as string) ?? "";
  }
  if (platform === "tiktok") {
    return (data.transcript as string) ?? "";
  }
  if (platform === "instagram") {
    const t = data.transcripts as Array<{ text?: string }> | undefined;
    return t?.[0]?.text ?? "";
  }
  // Facebook / Twitter / Reddit / Rumble: same upstream-pass-through pattern.
  // Inspect the response on first call and extend this switch as needed.
  return (
    (data.transcript_only_text as string) ?? (data.transcript as string) ?? ""
  );
}

async function transcribe(platform: Platform, videoUrl: string) {
  const url = new URL(`https://www.socialcrawl.dev/v1/${ENDPOINT[platform]}`);
  url.searchParams.set("url", videoUrl);

  const res = await fetch(url, { headers: { "x-api-key": KEY! } });
  const json = (await res.json()) as {
    success: boolean;
    data?: Record<string, unknown>;
    credits_remaining: number;
    error?: { message: string };
  };

  if (!json.success)
    throw new Error(json.error?.message ?? "transcript failed");
  return {
    platform,
    text: extractText(platform, json.data!),
    credits_remaining: json.credits_remaining,
  };
}

// Same input shape, same output shape, three different platforms.
const a = await transcribe(
  "youtube",
  "https://www.youtube.com/watch?v=erLbbextvlY",
);
const b = await transcribe(
  "tiktok",
  "https://www.tiktok.com/@mrbeast/video/7283145247503961371",
);
const c = await transcribe(
  "rumble",
  "https://rumble.com/v5o1eum-the-rubin-report-with-elon-musk.html",
);

for (const t of [a, b, c]) {
  console.log(`[${t.platform}] ${t.text.length} chars`);
  console.log(t.text.slice(0, 200), "…");
  console.log("---");
}

What you get back

// YouTube — segments + full string + ISO language code
{
  "data": {
    "transcript": [                                                              // <-- array of segments
      { "text": "we are now stranded on this deserted", "startMs": "80", "endMs": "4000", "startTimeText": "0:00" }
      // ... 620 more
    ],
    "transcript_only_text": "we are now stranded on this deserted island...",   // <-- concatenated string
    "language": "en-US"
  }
}

// TikTok — transcript is the string directly
{
  "data": {
    "id": "7647161577057258775",
    "url": "https://www.tiktok.com/@...",
    "transcript": "claude is shockingly good at refactors..."                   // <-- full text as string
  }
}

// Instagram — note the PLURAL key
{
  "data": {
    "transcripts": [                                                             // <-- plural, array
      { "id": "DY7SXBFtEpC", "shortcode": "DY7SXBFtEpC", "text": "..." }
    ]
  }
}

Credits cost

Cost per run: 10 credits per video. Fanning out a single video across all seven platforms would cost 70 credits — in practice you call exactly one of these per video because each platform only hosts its own content.

Take it further

How to Transcribe TikTok, YouTube, and Instagram Videos with an API | Socialcrawl