Computed fields
Every SocialCrawl response carries a `computed` block — engagement_rate, language, content_category, estimated_reach. Here's exactly how each is calculated.
Computed fields
Most APIs hand you the raw numbers and walk away. SocialCrawl runs every payload through a transformer that attaches a computed block alongside the upstream data — the same shape on every platform, so a TikTok creator's engagement rate is directly comparable to an Instagram one.
Four fields live in computed:
| Field | Type | Range | When it's null |
|---|---|---|---|
engagement_rate | number | null | 0.0 – 1.0 | Divisor is missing or zero |
language | string | null | ISO 639-1 (en, ko, ja, ...) | Input text < 10 chars, or unrecognised |
content_category | string | null | 14 categories or "other" | Input text < 10 chars |
estimated_reach | number | null | integer ≥ 0 | Underlying engagement_rate is null |
Every value is either a real number or honestly null. We never substitute 0 for "we don't know" — the difference matters when you're sorting or filtering. If a value was forced into range (typically engagement_rate exceeding 1.0), an explanatory string lands in data._warnings so you can see it happened.
Computed fields attach to two archetypes:
Author—data.computedon profile responses.Post—data.computedon single-post responses and on every item of aPostList.
CommentList, Audience, Transcript, and SearchResult archetypes don't carry a computed block.
engagement_rate
A normalised, comparable engagement signal in the range [0.0, 1.0]. Rounded to 6 decimals.
Author variant (profiles)
engagement_rate = author.likes_count / author.followersReturns null when followers is 0 or likes_count is absent. "Zero engagement" and "we don't have the data" are different states; we report the second one honestly.
Instagram fallback. Instagram's profile payload never populates author.likes_count. When you call a profile endpoint on Instagram, we instead read up to ~12 recent posts embedded in the same response and compute:
engagement_rate = (mean(post_likes) + mean(post_comments)) / followersA post counts only if both likes_count and comments_count are present. Zero usable posts → null. This fallback is Instagram-only today.
Post variant (single posts and PostList items)
engagement_rate = (likes + comments + shares) / viewsReturns null when views is missing or 0. This is intentional — pre-views-era tweets and platforms that don't report views must be honest. The previous implementation fell back to divisor = 1, which silently surfaced the numerator (e.g. 26,573) as an engagement rate. That's worse than null.
Clamping to [0, 1]
If the raw computation exceeds 1.0 (common when likes_count > followers on accounts that lost followers, or when an old tweet's likes count outweighs its under-reported views), the field is pinned to 1.0 and a warning is appended:
{
"data": {
"computed": { "engagement_rate": 1.0 },
"_warnings": ["computed.engagement_rate: value exceeded 1.0 (raw: 1.42); clamped"]
}
}The defensive lower clamp at 0 exists for the same reason but rarely fires — our arithmetic can't produce negatives from positive inputs.
language
ISO 639-1 two-letter code (en, ko, ja, pt-BR is not used — we emit pt).
- Input on
Author:author.bio. - Input on
Post:post.content.text. - Returns
nullif the input text is missing, non-string, or shorter than 10 trimmed characters. The 10-character floor is a confidence gate — anything shorter is more likely to misclassify than help you.
Detection strategy
Two passes:
-
Unicode fast-path for non-Latin scripts. If the input contains characters in these ranges, we return the language code directly:
- Korean (
가–, Hangul Jamo) →ko - Japanese (Hiragana / Katakana) →
ja - CJK Unified Ideographs →
zh - Arabic →
ar - Devanagari →
hi - Thai →
th
- Korean (
-
Trigram classification via
franc-minfor everything else. franc returnsundwhen below its internal confidence threshold; we map that tonullrather than guessing.
Supported codes
If franc's ISO 639-3 code maps to one of the codes below, you get the two-letter form. Anything else collapses to null — we'd rather emit nothing than leak an obscure 3-letter code through the public surface.
ar bg ca cs da de el en es fa fi fr he hi hu
id it ja ko nl no pl pt ro ru sv th tr uk vi zh33 codes total. If you need a language we don't surface, let us know.
content_category
One of 14 hand-curated categories or "other". Returns null if the input text is missing or < 10 chars (same gate as language).
| Category | Sample matched keywords |
|---|---|
tech | programming, developer, software, ai, saas, blockchain, frontend |
food | cooking, recipe, chef, restaurant, baking, vegan |
gaming | gaming, esports, twitch, fortnite, valorant, fps |
fashion | fashion, outfit, designer, ootd, streetwear |
beauty | makeup, skincare, lipstick, serum, moisturizer |
fitness | workout, gym, cardio, yoga, marathon, protein |
travel | adventure, destination, vacation, backpacking, wanderlust |
music | song, artist, album, concert, producer, spotify |
education | learning, course, tutorial, university, lecture |
entertainment | movie, tv, netflix, celebrity, comedy, series |
sports | football, basketball, nba, olympics, championship |
business | entrepreneur, ceo, marketing, finance, fundraising |
news | politics, economy, election, journalist, parliament |
lifestyle | wellness, mindfulness, productivity, minimalism, diy |
other | matched no keywords (but the input was long enough to evaluate) |
Matching rules
- Short keywords (≤3 characters, single word) — e.g.
"ai","tv","dj"— require an exact token match. They match"building with ai"but not"hair"or"said". - Longer or multi-word keywords use a Unicode word-boundary regex.
"machine learning"matches inside"I love machine learning!"but doesn't bleed into adjacent words. - The highest-scoring category wins. Ties resolve to whichever category was scored first (alphabetic by category name as authored in the source).
If you build product on top of content_category, treat it as a rough first-pass classifier, not a taxonomy. It's keyword-based — fast, deterministic, and noisy. For nuanced classification, layer your own model on top of the bio/text fields.
estimated_reach
A rough upper-bound estimate of how many distinct accounts a post or profile is reaching. Always an integer ≥ 0, or null.
Author variant
estimated_reach = round(followers * engagement_rate * 0.1)The * 0.1 factor is a conservative damping based on typical impression-to-reach ratios. Because engagement_rate is clamped to [0, 1], the output is bounded by followers / 10 — there's no extra ceiling to apply.
Returns null whenever engagement_rate is null. We don't manufacture a reach number from a missing engagement signal.
Post variant
estimated_reach = round(views * 1.2)For posts, views is already a stronger reach signal than impressions, so we apply a modest multiplier to estimate unique-account reach (assuming a small fraction of repeat views).
Returns null whenever views is missing or 0.
Caveats
This is a heuristic, not a measurement. It's useful for:
- Sorting posts or creators by approximate reach when the platform doesn't expose reach directly.
- Estimating order-of-magnitude impact for influencer outreach.
It is not useful for:
- Forecasting paid-media ROI.
- Comparing reach across platforms with very different view-counting rules (TikTok's auto-loop views vs YouTube's 30-second threshold, for example).
Example response
{
"success": true,
"platform": "tiktok",
"endpoint": "/v1/tiktok/profile",
"data": {
"author": {
"username": "mrbeast",
"followers": 95000000,
"likes_count": 8200000000,
"bio": "I want to make the world a better place before I die."
},
"computed": {
"engagement_rate": 1.0,
"language": "en",
"content_category": "lifestyle",
"estimated_reach": 9500000
},
"_warnings": [
"computed.engagement_rate: value exceeded 1.0 (raw: 86.32); clamped"
]
},
"credits_used": 1,
"credits_remaining": 8431
}(The likes_count / followers ratio is unrealistically high here because TikTok reports cumulative lifetime hearts against current followers — a known idiosyncrasy that triggers the clamp warning.)
When you shouldn't trust the value
Three patterns mean "look at the warnings before using this number":
_warningsmentionsclamped— the raw value blew through[0, 1]. Decide whether you want the clamped value or to recompute from the raw upstream fields yourself.computed.engagement_rateisnullon aPostarchetype — the post lacks aviewsfield. Pre-2020 tweets, some Reddit endpoints, and a few Facebook surfaces are common offenders.computed.languageisnullon a clearly-textual bio — the text was probably under 10 characters or was an emoji-only string. Readauthor.biodirectly to confirm.
See also
- Response schema → Standard envelope — where
computedsits in the response. - Response schema →
data._warnings[]— how warning strings are surfaced. - Endpoint pricing — every endpoint and what each call costs.
