Every SocialCrawl response carries a `computed` block — engagement_rate, language, content_category, estimated_reach. Here's exactly how each is calculated.

Computed fields

Most APIs hand you the raw numbers and walk away. SocialCrawl runs every payload through a transformer that attaches a computed block alongside the upstream data — the same shape on every platform, so a TikTok creator's engagement rate is directly comparable to an Instagram one.

Four fields live in computed:

Field	Type	Range	When it's `null`
`engagement_rate`	`number \| null`	`0.0` – `1.0`	Divisor is missing or zero
`language`	`string \| null`	ISO 639-1 (`en`, `ko`, `ja`, ...)	Input text < 10 chars, or unrecognised
`content_category`	`string \| null`	14 categories or `"other"`	Input text < 10 chars
`estimated_reach`	`number \| null`	integer ≥ 0	Underlying `engagement_rate` is `null`

Every value is either a real number or honestly null. We never substitute 0 for "we don't know" — the difference matters when you're sorting or filtering. If a value was forced into range (typically engagement_rate exceeding 1.0), an explanatory string lands in data._warnings so you can see it happened.

Computed fields attach to two archetypes:

Author — data.computed on profile responses.
Post — data.computed on single-post responses and on every item of a PostList.

CommentList, Audience, Transcript, and SearchResult archetypes don't carry a computed block.

`engagement_rate`

A normalised, comparable engagement signal in the range [0.0, 1.0]. Rounded to 6 decimals.

Author variant (profiles)

engagement_rate = author.likes_count / author.followers

Returns null when followers is 0 or likes_count is absent. "Zero engagement" and "we don't have the data" are different states; we report the second one honestly.

Instagram fallback. Instagram's profile payload never populates author.likes_count. When you call a profile endpoint on Instagram, we instead read up to ~12 recent posts embedded in the same response and compute:

engagement_rate = (mean(post_likes) + mean(post_comments)) / followers

A post counts only if both likes_count and comments_count are present. Zero usable posts → null. This fallback is Instagram-only today.

Post variant (single posts and PostList items)

engagement_rate = (likes + comments + shares) / views

Returns null when views is missing or 0. This is intentional — pre-views-era tweets and platforms that don't report views must be honest. The previous implementation fell back to divisor = 1, which silently surfaced the numerator (e.g. 26,573) as an engagement rate. That's worse than null.

One formula, everywhere. This is the same calculation on every platform. There is no per-platform variant of the post-level formula. What changes between platforms is not the formula, it's which inputs the platform exposes:

likes, comments, shares are treated as 0 when absent. If a platform does not report a shares count (Instagram and YouTube do not), the numerator simply becomes likes + comments. We do not invent a shares number, and we do not null the whole rate just because one addend is missing. This is the single most common reason a hand-recomputed rate won't match ours: if you reproduce the formula assuming a non-null shares on Instagram or YouTube, your result will differ from the value we return, because on our side shares contributed 0.
views is treated as null when absent, and a null divisor makes the whole rate null. Unlike the addends, a missing views is not coerced to 0 (dividing by zero is meaningless), so the rate is honestly null.

Why your recomputation might not match ours (per-platform)

If you validate our engagement_rate by recomputing it from the raw fields, this table explains every case of disagreement. In each case the value we return is correct; the mismatch comes from an input you may not have that we also don't (or vice versa).

Platform	Reproducible from `(likes+comments+shares)/views`?	Why
TikTok	Yes, exactly	All five engagement fields (`views, likes, comments, shares, saves`) are present. Your recomputation will match.
Twitter/X	Yes, on rows where `engagement_rate` is not `null`	Where we return a rate, it reproduces exactly. A large share of tweets return `engagement_rate: null` because Twitter did not report a native `views` count for that tweet (older tweets and many replies). `null` here means "no view count available", not "zero engagement". Filter on `engagement_rate IS NOT NULL` before using it as a feature.
Instagram	No	`shares` (and `saves`) are not exposed on the public upstream, so on our side `shares = 0` and the rate is effectively `(likes + comments) / views`. Photo posts also have no view count, so their rate is `null`; Reels have views and do get a rate. A shares-inclusive reproduction cannot match either branch. Use our value directly.
YouTube	No	Same mechanism as Instagram: `shares` is not returned, so the rate is `(likes + comments) / views`.
Facebook	Partially	We map the upstream `like_count` verbatim into `likes`. That field appears to be the aggregate reaction count (like + love + haha + ...), not likes alone, so our numerator can run higher than a like-only reproduction. We are confirming this upstream; treat our Facebook rate as reaction-inclusive for now.
YouTube livestreams	No (rate is `0` or `null`)	The live/upcoming list shape carries a view count but no like or comment counts, so the numerator is `0` (rate `0`) or the view count is absent pre-broadcast (rate `null`). Hydrate a finished stream through the single-video endpoint to get full engagement.
Reddit, Threads (list items)	No (rate is `null`)	Neither exposes a per-post `views` count on list/feed items, so no rate is possible. For Reddit, use `score` (upvotes) as the engagement proxy. A single Threads post fetched by URL does carry a view count and will get a rate.

The small share of rows where the raw math exceeds 1.0 are clamped to 1.0 with a _warnings note (see below); this is a fraction of a percent of rows and does not affect the population.

Clamping to `[0, 1]`

If the raw computation exceeds 1.0 (common when likes_count > followers on accounts that lost followers, or when an old tweet's likes count outweighs its under-reported views), the field is pinned to 1.0 and a warning is appended:

{
  "data": {
    "computed": { "engagement_rate": 1.0 },
    "_warnings": [
      "computed.engagement_rate: value exceeded 1.0 (raw: 1.42); clamped"
    ]
  }
}

The defensive lower clamp at 0 exists for the same reason but rarely fires — our arithmetic can't produce negatives from positive inputs.

`language`

ISO 639-1 two-letter code (en, ko, ja, pt-BR is not used — we emit pt).

Input on Author: author.bio.
Input on Post: post.content.text.
Returns null if the input text is missing, non-string, or shorter than 10 trimmed characters. The 10-character floor is a confidence gate — anything shorter is more likely to misclassify than help you.

Detection strategy

Two passes:

Unicode fast-path for non-Latin scripts. If the input contains characters in these ranges, we return the language code directly:
- Korean (가–힯, Hangul Jamo) → ko
- Japanese (Hiragana / Katakana) → ja
- CJK Unified Ideographs → zh
- Arabic → ar
- Devanagari → hi
- Thai → th
Trigram classification via franc-min for everything else. franc returns und when below its internal confidence threshold; we map that to null rather than guessing.

Supported codes

If franc's ISO 639-3 code maps to one of the codes below, you get the two-letter form. Anything else collapses to null — we'd rather emit nothing than leak an obscure 3-letter code through the public surface.

ar  bg  ca  cs  da  de  el  en  es  fa  fi  fr  he  hi  hu
id  it  ja  ko  nl  no  pl  pt  ro  ru  sv  th  tr  uk  vi  zh

33 codes total. If you need a language we don't surface, let us know.

`content_category`

One of 14 hand-curated categories or "other". Returns null if the input text is missing or < 10 chars (same gate as language).

Category	Sample matched keywords
`tech`	programming, developer, software, ai, saas, blockchain, frontend
`food`	cooking, recipe, chef, restaurant, baking, vegan
`gaming`	gaming, esports, twitch, fortnite, valorant, fps
`fashion`	fashion, outfit, designer, ootd, streetwear
`beauty`	makeup, skincare, lipstick, serum, moisturizer
`fitness`	workout, gym, cardio, yoga, marathon, protein
`travel`	adventure, destination, vacation, backpacking, wanderlust
`music`	song, artist, album, concert, producer, spotify
`education`	learning, course, tutorial, university, lecture
`entertainment`	movie, tv, netflix, celebrity, comedy, series
`sports`	football, basketball, nba, olympics, championship
`business`	entrepreneur, ceo, marketing, finance, fundraising
`news`	politics, economy, election, journalist, parliament
`lifestyle`	wellness, mindfulness, productivity, minimalism, diy
`other`	matched no keywords (but the input was long enough to evaluate)

Matching rules

Short keywords (≤3 characters, single word) — e.g. "ai", "tv", "dj" — require an exact token match. They match "building with ai" but not "hair" or "said".
Longer or multi-word keywords use a Unicode word-boundary regex. "machine learning" matches inside "I love machine learning!" but doesn't bleed into adjacent words.
The highest-scoring category wins. Ties resolve to whichever category was scored first (alphabetic by category name as authored in the source).

If you build product on top of content_category, treat it as a rough first-pass classifier, not a taxonomy. It's keyword-based — fast, deterministic, and noisy. For nuanced classification, layer your own model on top of the bio/text fields.

`estimated_reach`

A rough upper-bound estimate of how many distinct accounts a post or profile is reaching. Always an integer ≥ 0, or null.

Author variant

estimated_reach = round(followers * engagement_rate * 0.1)

The * 0.1 factor is a conservative damping based on typical impression-to-reach ratios. Because engagement_rate is clamped to [0, 1], the output is bounded by followers / 10 — there's no extra ceiling to apply.

Returns null whenever engagement_rate is null. We don't manufacture a reach number from a missing engagement signal.

Post variant

estimated_reach = round(views * 1.2)

For posts, views is already a stronger reach signal than impressions, so we apply a modest multiplier to estimate unique-account reach (assuming a small fraction of repeat views).

Returns null whenever views is missing or 0.

Caveats

This is a heuristic, not a measurement. It's useful for:

Sorting posts or creators by approximate reach when the platform doesn't expose reach directly.
Estimating order-of-magnitude impact for influencer outreach.

It is not useful for:

Forecasting paid-media ROI.
Comparing reach across platforms with very different view-counting rules (TikTok's auto-loop views vs YouTube's 30-second threshold, for example).

Example response

{
  "success": true,
  "platform": "tiktok",
  "endpoint": "/v1/tiktok/profile",
  "data": {
    "author": {
      "username": "mrbeast",
      "followers": 95000000,
      "likes_count": 8200000000,
      "bio": "I want to make the world a better place before I die."
    },
    "computed": {
      "engagement_rate": 1.0,
      "language": "en",
      "content_category": "lifestyle",
      "estimated_reach": 9500000
    },
    "_warnings": [
      "computed.engagement_rate: value exceeded 1.0 (raw: 86.32); clamped"
    ]
  },
  "credits_used": 1,
  "credits_remaining": 8431
}

(The likes_count / followers ratio is unrealistically high here because TikTok reports cumulative lifetime hearts against current followers — a known idiosyncrasy that triggers the clamp warning.)

When you shouldn't trust the value

Three patterns mean "look at the warnings before using this number":

_warnings mentions clamped — the raw value blew through [0, 1]. Decide whether you want the clamped value or to recompute from the raw upstream fields yourself.
computed.engagement_rate is null on a Post archetype — the post lacks a views field. Pre-2020 tweets, some Reddit endpoints, and a few Facebook surfaces are common offenders.
computed.language is null on a clearly-textual bio — the text was probably under 10 characters or was an emoji-only string. Read author.bio directly to confirm.

Computed fields

Computed fields

`engagement_rate`

Author variant (profiles)

Post variant (single posts and PostList items)

Why your recomputation might not match ours (per-platform)

Clamping to `[0, 1]`

`language`

Detection strategy

Supported codes

`content_category`

Matching rules

`estimated_reach`

Author variant

Post variant

Caveats

Example response

When you shouldn't trust the value

See also

On this page