Integrate once. Every platform returns the same shape.
Every social platform speaks a different language. SocialCrawl normalizes 42 platforms and 325 endpoints into one canonical schema, validated on every response, so you write one integration instead of a dozen.
Why is social data so hard to work with?
Reddit calls it score, TikTok nests it under aweme_info.statistics, Instagram returns reels flat in one endpoint and nested in another. A developer who wants the engagement on a post across five platforms normally writes five integrations and five sets of types.
How does SocialCrawl normalize every platform into one shape?
Every raw payload runs through the same five-stage pipeline before it reaches you. Nothing is guessed, and nothing undeclared slips through.
- 1
Strip the envelope
Unwrap each platform's raw payload and normalize lists to one { items, next_cursor, total } shape.
- 2
Field map
A declarative source-to-target rename. Subtractive by design, so junk and surprises never reach the customer.
- 3
Enrich hook
An additive per-platform step for what the map cannot express: absolute-URL construction, carousel flattening, boolean coercion.
- 4
Normalize + null backstop
Null backstops, ID-prefix stripping, tombstone collapse. We never substitute a zero for we-do-not-know.
- 5
Validate (Zod gate)
Every response is checked against the canonical Zod schema at the wire. Invalid rows are dropped, never passed on.
“Junk and surprises never reach the customer.”
What does SocialCrawl add on top of the raw data?
Every response carries a deterministic computed block: an engagement_rate normalized 0 to 1 and comparable across platforms, a language detection across 33 ISO codes, a content_category, and an estimated_reach.
"computed": {
"engagement_rate": 0.043,
"language": "en",
"content_category": "entertainment",
"estimated_reach": 128400
}These are deterministic arithmetic, not machine learning. We return null rather than a guess below a confidence floor, and we never substitute a zero for we-do-not-know.
LLMs narrate, code computes.
How do you keep the schema and the docs from drifting apart?
One canonical schema, written once in Zod, is the source of truth for every endpoint on every platform. Rename a single field and the docs and validation update themselves.
One source of truth
The canonical schema lives once, in Zod. Every endpoint on every platform conforms to it, PostObject to QuoteObject.
Mechanical cascade
One function walks the schema tree and drives the normalizer, the CI coverage gate, and the OpenAPI docs from the same source.
Validated on every response
Strict in CI, so drift fails the build. Forgiving in production, so a bad row is dropped and the customer never receives a malformed record.
Tombstones rejected
Deleted and removed sentinels fail a schema check, so a deleted comment can never masquerade as content in your response.
“The documentation cannot drift from the implementation, because both derive from the same source.”
Does the unified schema cover more than social posts?
The same discipline extends across every object we return. A Google Play app and an App Store app deserialize into the same AppObject. QuoteObject spans stocks, ETFs, crypto, and forex in one shape.
Once you build typed consumers against these objects across 42 platforms, you never rewrite five integrations again.
One schema, holding across everything we return.
Counts are read live from the registry, so this page can never quote a stale number.
SocialCrawl versus wiring up scrapers yourself
The difference is not the raw data. It is the normalization, validation, and trust layer on top of it.
| SocialCrawl | Wiring up scrapers yourself | |
|---|---|---|
| Integration | One key, one schema | N vendors, N schemas |
| Field naming | One canonical shape across every platform | A different JSON shape per platform |
| Pagination | One opaque cursor everywhere | A different pagination model per source |
| Data quality | Validated on every response, tombstones rejected | Raw blobs, deleted rows leak through |
| Docs | Cannot drift, CI-enforced | Hand-maintained, drifts silently |
Integration
Field naming
Pagination
Data quality
Docs
Frequently asked questions
Can't find what you're looking for? Talk to our team or ask the AI agent below
Start free.
Get an API key and see the same validated shape come back from every platform you call.
curl https://www.socialcrawl.dev/v1/tiktok/profile \
-G --data-urlencode "handle=nasa" \
-H "x-api-key: $SC_KEY"