Building an AI Agent for Social Media Monitoring
Learn to build an AI agent for social media monitoring in under 150 lines of Python. Step-by-step tutorial: fetch posts, classify sentiment, send alerts.

Building an AI Agent for Social Media Monitoring
A working AI social media monitoring agent fits in under 150 lines of Python. This tutorial walks from zero to a running agent that uses Claude's tool-calling API to reason over posts and the SocialCrawl unified API to fetch brand mentions from Instagram and TikTok across 21 platforms with a single auth header, then classifies sentiment and fires a Slack alert when it finds something worth escalating.
There are now 5.66 billion active social media identities worldwide, and the average person uses 6.83 platforms every month (DataReportal Digital 2026). Monitoring 21 platforms manually means 21 different JSON schemas, 21 auth flows, and 21 rate-limit policies. One API key, one normalized schema, one agent loop fixes that.
"Agents are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks." — Anthropic Research, Building Effective Agents
Stack: Python 3.11+ · Anthropic SDK (Claude) · SocialCrawl REST API · httpx · python-dotenv
Prerequisites
Before running any code, you need:
- Python 3.11+
- An Anthropic API key from console.anthropic.com
- A SocialCrawl API key from the SocialCrawl docs (free tier includes 100 credits)
- A Slack webhook URL for alerts (optional but recommended)
Install dependencies:
pip install anthropic httpx python-dotenv
Create a .env file:
ANTHROPIC_API_KEY=sk-ant-...
SOCIALCRAWL_API_KEY=sc-...
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
How Does the Agent Architecture Work?
The agent is a while loop where Claude decides which tool to call next based on the result of the previous call. Anthropic's engineering team distinguishes workflows (predefined code paths where LLMs are orchestrated through fixed steps) from agents (the LLM directs its own tool usage, choosing what to do and when to stop). This tutorial builds the latter — the orchestration logic lives inside Claude, not in Python.
| Dimension | Workflow | Agent |
|---|---|---|
| Control flow | Predefined in code | LLM decides at runtime |
| Predictability | High | Lower (but recoverable) |
| Best for | Narrow, repeatable tasks | Open-ended tasks with branching |
| Iteration guard | Implicit (fixed steps) | Explicit (max_iterations) |
| Social monitoring fit | Poor — each platform branches | Excellent — LLM routes posts dynamically |
The loop:
System prompt + tool definitions (cache-anchored)
|
Claude reasons
|
stop_reason: "tool_use"
→ fetch_posts(platform="instagram", query="yourbrand")
|
SocialCrawl API
→ [{id, content, engagement_rate, estimated_reach, platform, author}]
|
tool_result appended to messages
|
Claude reasons again
→ stop_reason: "tool_use"
→ classify_post(post={...})
|
→ stop_reason: "tool_use"
→ send_alert(post={...}, classification={...})
|
stop_reason: "end_turn"
SocialCrawl normalizes every platform to the same schema (content, engagement_rate, platform, author, timestamp), so the system prompt needs zero platform-specific logic. Instagram and TikTok look identical to the reasoning loop.
Three tool definitions, each with strict: True which guarantees calls always match the schema:
import anthropic
import httpx
import json
import os
import time
import random
from dotenv import load_dotenv
load_dotenv()
SOCIALCRAWL_BASE_URL = "https://socialcrawl.dev/v1"
SOCIALCRAWL_API_KEY = os.environ["SOCIALCRAWL_API_KEY"]
ANTHROPIC_API_KEY = os.environ["ANTHROPIC_API_KEY"]
SLACK_WEBHOOK_URL = os.environ.get("SLACK_WEBHOOK_URL", "")
# Tool definitions — cache-anchored (stable, passed as system-level content)
TOOLS = [
{
"name": "fetch_posts",
"description": (
"Fetch recent posts from a social media platform matching a search query. "
"Returns a list of posts with normalized fields: id, content, engagement_rate, "
"estimated_reach, platform, author, timestamp. "
"Use platform='instagram' or platform='tiktok'. "
"Example: fetch_posts(platform='instagram', query='yourbrand')"
),
"input_schema": {
"type": "object",
"properties": {
"platform": {
"type": "string",
"enum": ["instagram", "tiktok"],
"description": "The social platform to search.",
},
"query": {
"type": "string",
"description": "Hashtag or keyword to search for.",
},
},
"required": ["platform", "query"],
},
"strict": True,
},
{
"name": "classify_post",
"description": (
"Classify a single social media post for sentiment, urgency, and escalation. "
"Returns {sentiment, urgency, escalate, reason}. "
"Pass the full post dict from fetch_posts."
),
"input_schema": {
"type": "object",
"properties": {
"post": {
"type": "object",
"description": "A post dict with fields: id, content, engagement_rate, estimated_reach, platform, author.",
"properties": {
"id": {"type": "string"},
"content": {"type": "string"},
"engagement_rate": {"type": "number"},
"estimated_reach": {"type": "number"},
"platform": {"type": "string"},
"author": {"type": "string"},
},
"required": ["id", "content", "platform"],
}
},
"required": ["post"],
},
"strict": True,
},
{
"name": "send_alert",
"description": (
"Send a Slack alert for a post that requires attention. "
"Only call this when classify_post returns escalate=true."
),
"input_schema": {
"type": "object",
"properties": {
"post": {
"type": "object",
"description": "The post dict from fetch_posts.",
"properties": {
"id": {"type": "string"},
"content": {"type": "string"},
"engagement_rate": {"type": "number"},
"platform": {"type": "string"},
"author": {"type": "string"},
},
"required": ["id", "content", "platform", "author"],
},
"classification": {
"type": "object",
"description": "The classification dict from classify_post.",
"properties": {
"sentiment": {"type": "string"},
"urgency": {"type": "string"},
"escalate": {"type": "boolean"},
"reason": {"type": "string"},
},
"required": ["sentiment", "urgency", "escalate", "reason"],
},
},
"required": ["post", "classification"],
},
"strict": True,
},
]
What API Endpoints Power the Agent?
Two SocialCrawl endpoints do all the data fetching, and both return a unified schema so the agent reads identical fields regardless of platform. The fetch_posts function wraps these SocialCrawl endpoints:
GET /v1/instagram/hashtag?tag=<query>(Instagram docs)GET /v1/tiktok/search?q=<query>(TikTok docs)
Both use the same x-api-key header. The response already includes pre-computed engagement_rate, estimated_reach, and language, so the agent reads a field rather than doing arithmetic.
def fetch_posts(platform: str, query: str) -> list[dict]:
"""
Fetch posts from SocialCrawl's unified API.
Returns a list of normalized post dicts.
"""
endpoint_map = {
"instagram": f"{SOCIALCRAWL_BASE_URL}/instagram/hashtag",
"tiktok": f"{SOCIALCRAWL_BASE_URL}/tiktok/search",
}
param_map = {
"instagram": {"tag": query},
"tiktok": {"q": query},
}
url = endpoint_map[platform]
params = param_map[platform]
headers = {"x-api-key": SOCIALCRAWL_API_KEY}
for attempt in range(4):
try:
response = httpx.get(url, params=params, headers=headers, timeout=15.0)
response.raise_for_status()
data = response.json()
posts = data.get("data", data.get("posts", []))
return [
{
"id": p.get("id", ""),
"content": p.get("content", p.get("caption", p.get("text", ""))),
"engagement_rate": p.get("engagement_rate", 0.0),
"estimated_reach": p.get("estimated_reach", 0),
"platform": platform,
"author": p.get("author", p.get("username", "unknown")),
"timestamp": p.get("timestamp", p.get("created_at", "")),
}
for p in posts
]
except httpx.HTTPStatusError as e:
if e.response.status_code == 429:
# Exponential backoff with jitter for rate limiting
wait = 2 ** attempt + random.random()
time.sleep(wait)
else:
raise
except httpx.RequestError:
if attempt == 3:
raise
time.sleep(1.5)
return []
How Do AI Agents Classify Sentiment?
AI agents classify sentiment by passing each post through a focused LLM call that returns structured JSON with sentiment, urgency, and escalation fields. classify_post makes a direct Claude API call with a tight system prompt. Because engagement_rate and estimated_reach are already in the post payload, the prompt reasons over three fields only: sentiment, urgency, escalate. One inference call, no extra enrichment step.
Transformer-based classifiers reach up to 94% accuracy on social media sentiment benchmarks, substantially higher than traditional lexicon-based tools like VADER on informal or emoji-heavy text (ACL Anthology, 2024).
CLASSIFICATION_SYSTEM_PROMPT = (
"You are a social media brand monitoring analyst. "
"Given a social media post with pre-computed engagement metrics, classify it. "
"Return ONLY a JSON object with these exact fields:\n"
" sentiment: 'positive' | 'negative' | 'neutral'\n"
" urgency: 'immediate' | 'routine' | 'ignore'\n"
" escalate: true | false\n"
" reason: one sentence explaining your decision\n\n"
"Escalate if: sentiment is negative AND (urgency is immediate OR engagement_rate > 0.05). "
"Urgency is 'immediate' if the post requires brand response within 2 hours. "
"Do not include any text outside the JSON object."
)
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
def classify_post(post: dict) -> dict:
"""
Classify a post's sentiment, urgency, and escalation priority.
Uses a focused Claude call, not the full agent loop.
"""
post_summary = (
f"Platform: {post['platform']}\n"
f"Author: {post['author']}\n"
f"Content: {post['content'][:500]}\n"
f"Engagement rate: {post.get('engagement_rate', 0):.4f}\n"
f"Estimated reach: {post.get('estimated_reach', 0):,}"
)
response = client.messages.create(
model="claude-haiku-4-5", # Fast + cost-efficient for high-volume classification
max_tokens=256,
system=CLASSIFICATION_SYSTEM_PROMPT,
messages=[{"role": "user", "content": post_summary}],
)
raw = response.content[0].text.strip()
try:
return json.loads(raw)
except json.JSONDecodeError:
# Fallback if the model adds preamble
start = raw.find("{")
end = raw.rfind("}") + 1
return json.loads(raw[start:end])
How Does the Monitoring Loop Work?
The monitoring loop is a while loop that keeps calling Claude until it emits stop_reason == "end_turn" or hits a max_iterations = 10 ceiling. Two things worth calling out: a seen_ids set deduplicates posts within each polling cycle, and the system prompt is cache-anchored so repeated runs don't pay full input token costs. Prompt caching reduces input token cost by up to 90% and latency by up to 85% for cached prefixes (Anthropic Prompt Caching docs). Anthropic recommends the iteration guard for all production agents.
AGENT_SYSTEM_PROMPT = (
"You are a social media monitoring agent for brand reputation management. "
"Your job:\n"
"1. Use fetch_posts to retrieve recent mentions for each keyword on each platform.\n"
"2. Use classify_post on each post to determine sentiment and urgency.\n"
"3. Use send_alert for any post where classify_post returns escalate=true.\n"
"4. After processing all posts, stop (do not loop indefinitely).\n\n"
"Be systematic: process one platform+keyword combination at a time. "
"Do not classify the same post twice."
)
def run_monitoring_agent(keywords: list[str], platforms: list[str]) -> dict:
"""
Run one monitoring cycle. Returns a summary dict with counts.
"""
seen_ids: set[str] = set()
alerts_fired = 0
posts_processed = 0
max_iterations = 10
iteration = 0
# Build the initial user message listing what to monitor
task_description = (
f"Monitor these keywords: {', '.join(keywords)}\n"
f"On these platforms: {', '.join(platforms)}\n"
"Fetch posts, classify each one, and alert on anything that needs escalation."
)
messages = [{"role": "user", "content": task_description}]
# Tool dispatch table
tool_dispatch = {
"fetch_posts": lambda inp: fetch_posts(**inp),
"classify_post": lambda inp: classify_post(inp["post"]),
"send_alert": lambda inp: send_alert(inp["post"], inp["classification"]),
}
while iteration < max_iterations:
iteration += 1
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=4096,
system=[
{
"type": "text",
"text": AGENT_SYSTEM_PROMPT,
# Cache the system prompt — it never changes between calls
"cache_control": {"type": "ephemeral"},
}
],
tools=TOOLS,
# Cache the tool definitions — stable across the entire loop
# Prompt caching reduces cost on repeated polling cycles
messages=messages,
)
if response.stop_reason == "end_turn":
break
if response.stop_reason != "tool_use":
# Unexpected stop reason — bail safely
break
# Append the full assistant turn (required for tool_use blocks to be valid)
messages.append({"role": "assistant", "content": response.content})
# Execute all tool calls in this turn
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
tool_name = block.name
tool_input = block.input
tool_use_id = block.id
# Deduplication: skip posts we've already seen this cycle
if tool_name == "fetch_posts":
raw_posts = tool_dispatch[tool_name](tool_input)
new_posts = [p for p in raw_posts if p["id"] not in seen_ids]
for p in new_posts:
seen_ids.add(p["id"])
posts_processed += len(new_posts)
result_content = json.dumps(new_posts)
elif tool_name == "send_alert":
tool_dispatch[tool_name](tool_input)
alerts_fired += 1
result_content = json.dumps({"status": "alert_sent"})
else:
result = tool_dispatch[tool_name](tool_input)
result_content = json.dumps(result)
tool_results.append(
{
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": result_content,
}
)
messages.append({"role": "user", "content": tool_results})
return {
"posts_processed": posts_processed,
"alerts_fired": alerts_fired,
"iterations": iteration,
}
How Do You Send Alerts and Run the Agent on a Schedule?
Alerts go out through a Slack webhook, and scheduling is a one-liner with the schedule library or APScheduler. send_alert closes the loop: the Slack payload includes platform, author, content, sentiment, urgency, and engagement rate, so the recipient has enough context to act without opening a separate dashboard.
def send_alert(post: dict, classification: dict) -> dict:
"""
Post a Slack alert for an escalated mention.
Returns {"status": "sent"} on success.
"""
if not SLACK_WEBHOOK_URL:
print(f"[ALERT] {post['platform']} | {post['author']} | "
f"{classification['sentiment'].upper()} | {post['content'][:120]}")
return {"status": "printed_to_stdout"}
message = {
"text": (
f":rotating_light: *Brand mention requires attention*\n"
f"*Platform:* {post['platform'].capitalize()}\n"
f"*Author:* @{post['author']}\n"
f"*Sentiment:* {classification['sentiment']} | "
f"*Urgency:* {classification['urgency']}\n"
f"*Reason:* {classification['reason']}\n"
f"*Engagement rate:* {post.get('engagement_rate', 0):.2%}\n"
f"*Content:* {post['content'][:200]}"
)
}
response = httpx.post(SLACK_WEBHOOK_URL, json=message, timeout=10.0)
response.raise_for_status()
return {"status": "sent"}
# Entry point
if __name__ == "__main__":
result = run_monitoring_agent(
keywords=["yourbrand", "yourproduct"],
platforms=["instagram", "tiktok"],
)
print(f"Done. Posts processed: {result['posts_processed']}, "
f"alerts fired: {result['alerts_fired']}, "
f"loop iterations: {result['iterations']}")
To run on a schedule, wrap the call in a simple polling loop or use APScheduler:
import schedule
schedule.every(5).minutes.do(
run_monitoring_agent,
keywords=["yourbrand"],
platforms=["instagram", "tiktok"],
)
while True:
schedule.run_pending()
time.sleep(30)
Five-minute intervals work for most cases. For crisis-sensitive categories, drop to 1-2 minutes and batch classification calls to keep inference costs flat.
What Could Go Wrong?
Six failure modes are worth planning for before running this in production, with concrete runtime numbers so you can size your budget and retries.
- Inference cost drift. A single polling cycle against two platforms and two keywords uses roughly 1,500-2,500 input tokens (system prompt + tool defs + conversation) and 400-800 output tokens per Claude turn. Claude Haiku 4.5 is priced at $1 per million input tokens and $5 per million output tokens (Anthropic Pricing). With prompt caching enabled, expect under $0.01 per polling cycle — roughly $2-$4 per day at 5-minute intervals.
429from SocialCrawl. Handled infetch_postswith exponential backoff (2s, 4s, 8s + jitter, 4 attempts). For a decorator approach:pip install tenacityand@retry(wait=wait_exponential(multiplier=1, min=2, max=30), stop=stop_after_attempt(4)). SocialCrawl doesn't impose per-second rate limits — throttling happens at the credit layer, so429almost always means an upstream platform limit has been hit.- Agent loops forever. The
max_iterations = 10guard prevents runaway inference. If you hit it consistently, verifysend_alertis returning a validtool_resultwith content. Anthropic's guidance is explicit: always set an iteration ceiling and log the finalstop_reason(Building Effective Agents). - Duplicate alerts on restart.
seen_idsresets in memory. Persist to Redis (SADD seen_posts {id}) or SQLite (INSERT OR IGNORE INTO seen_posts VALUES (?)) and restore at startup. For a 7-day retention window, SQLite with acreated_atcolumn and a nightlyDELETE WHERE created_at < datetime('now', '-7 days')is usually sufficient. - Empty results. Query too narrow, or the platform uses hashtag matching. For Instagram, omit the
#. See the SocialCrawl API reference for per-platform syntax. - Slow classification at volume. Batch up to 10 posts per call by changing
post: objecttoposts: arrayin the tool schema and returning a JSON array. Typical single-post classification runs ~400ms round-trip; a 10-post batch completes in ~600ms — a 6-7x throughput win on classification-heavy workloads.
What's Next?
This agent covers Instagram and TikTok. Adding YouTube or Reddit is a one-line change to endpoint_map inside fetch_posts. The reasoning loop, system prompt, and tool definitions stay identical. That's the direct payoff of a unified schema.
Before wiring up the agent, paste any social URL into the SocialCrawl Data Explorer and see your data before writing a single line of code. It confirms that engagement_rate and estimated_reach are populated for your target accounts.
If you need a deeper look at Instagram's current API landscape — rate limits, what broke in 2024, and which scraping options are legally defensible — read Instagram API & Scrapers in 2026: What Still Works.
For the full endpoint list, rate limits, and credit costs, see the SocialCrawl API docs.
Frequently Asked Questions
What's the difference between social media monitoring and social listening?
Social media monitoring is reactive: you track specific mentions, keywords, and tags as they occur. Social listening is the broader layer, analyzing patterns to extract intent and sentiment trends. This agent does both. fetch_posts handles monitoring (raw mentions), and classify_post handles listening (structured meaning from each post).
How do AI agents handle multiple social media platforms?
Without a unified API, each platform needs its own tool function, auth flow, and schema mapping. With SocialCrawl's unified schema, one fetch_posts tool covers all 21 platforms using the same fields, the same x-api-key header, and the same response shape. The reasoning loop never needs to know which platform it's reading, so adding platforms requires zero changes to agent logic.
What are the most common mistakes when building social monitoring agents?
Five issues that kill production agents: (1) platform-specific parsing that breaks on schema changes; (2) no deduplication, causing repeated alerts; (3) missing rate-limit handling, so 429 errors silently kill the loop; (4) no iteration guard, risking runaway inference costs; (5) feeding raw unstructured data to the LLM instead of pre-normalized fields, which inflates token usage and hurts classification accuracy.
How much does it cost to run this AI monitoring agent?
The two variable costs are inference and data. For classification, claude-haiku-4-5 processes roughly 500 posts per dollar at typical post lengths. SocialCrawl charges 1 credit per Standard endpoint call on the free tier (100 credits included); each fetch_posts call to Instagram hashtag or TikTok search consumes 1 credit and returns up to 20 posts. A 5-minute polling loop monitoring two platforms and two keywords runs fewer than 30 credits per hour at normal cadence.
Related posts

How to Use Claude Agents: A Developer's Guide to Managed Agents, Sub-agents, and Real-time Data
A developer's guide to Claude Managed Agents, Claude Code sub-agents, and the Messages API. Setup in 4 steps, parallel execution, MCP, rate limits.

The OpenAI Assistants API in 2026: A Field Guide to the Shutdown, the Migration, and What Comes Next
OpenAI shuts down the Assistants API on 26 August 2026. A practical migration guide: Responses API, costs, frameworks, and the realistic timeline.

Browse AI Web Scraper: Where It Wins, and Where Social Data APIs Fit Better
Browse AI is great for web pages; social platforms need a specialist. Decision guide for developers choosing scrapers vs unified social-data APIs.
