Sorsa is a fast, affordable, and reliable X (Twitter) scraper API that gives developers real-time access to public X data. Launched in 2022 (formerly TweetScout API), it offers up to 50x lower prices and 20x higher rate limits than the official X API, with instant setup and no approval needed.

Sorsa API vs TwitterAPI.io: which one should I pick?

Most teams choose Sorsa API. Cleaner flat pricing, AI-optimized documentation built for Claude, GPT, and AI agents, and a more intuitive developer interface. Sorsa API is also strictly read-only, which improves stability for serious data and AI projects.

How does Sorsa compare to the official X (Twitter) API?

Sorsa is a high-performance alternative that provides 20x higher rate limits and full data access at up to 50x lower cost. It offers instant access and simple setup, while the official X API requires expensive tiers starting from $5,000 per month and long approval processes.

Can I post tweets or manage account content through Sorsa?

No. Sorsa is a read-only API focused on data retrieval and verification. We intentionally disabled write actions to maintain high stability and protect the service from spam and abuse.

Are rate limits the same across all Sorsa pricing plans?

Yes. All plans, from Starter to Enterprise, have the same flat rate limit of 20 requests per second. We do not throttle speed on lower-tier plans.

Is Sorsa compatible with AI agents and LLM training?

Yes. Sorsa was built with an AI-first approach. Our clean JSON format and specially optimized documentation work excellently with Claude, ChatGPT, OpenClaw and other AI agents for one-shot integration and RAG pipelines.

What is Sorsa's flat pricing model?

With Sorsa, 1 request always equals exactly 1 action. There are no credit multipliers or hidden fees. Fetching a tweet, running an advanced search, or pulling a large follower list all cost the same - one request from your monthly quota.

Is using Sorsa legal and safe?

Yes. Sorsa only accesses publicly available data on X, similar to how any browser or search engine works. The service is built for ethical use cases such as market research, AI training, and business analytics while following all relevant privacy laws.

How fresh is the data from Sorsa and how do I migrate from the official X API?

All data is real-time with sub-second latency. Migration from the official X (Twitter) API is straightforward and usually takes under 2 hours, because our endpoints and JSON structure are very similar. Check our Migration Guide for detailed instructions and example code.

How to Scrape Twitter (X) in 2026: Methods & Tools

By Sorsa Editorial

Updated June 2026: refreshed X's pay-per-use API pricing to the current rates after the April 20 change, re-verified X's January 2026 Terms of Service and the latest scraping case law, and updated the open-source library breakdown (Twikit, twscrape, Tweety, and Scweet work; snscrape, Twint, and ntscraper do not) plus the related guides.

Key Takeaway: You can scrape X.com three ways: a headless browser (Playwright or Puppeteer), an open-source Python library (Twikit, twscrape, Tweety, Scweet), or a managed service. All need residential proxies and a logged-in account for full data, and all break every two to four weeks when X rotates its guest tokens and GraphQL identifiers.

Here is the blunt version, and we will say it before you sink a weekend into this: most people reading a Twitter scraping guide do not actually need to scrape. You need the data, not the machinery around getting it, and a scraper is a standing commitment to proxies, burner accounts that get banned, and re-reverse-engineering X every couple of weeks. We build and run Sorsa API, an alternative Twitter/X API, so we are not a neutral party here, but the math is plain. The same profiles, tweets, search results, and followers come back as clean JSON from one REST call, with no proxies, no login, and no maintenance. It is up to roughly 50x cheaper than the official X API, holds a flat 20 requests per second on every plan, and takes about three minutes to start with no application or approval.

If you still want to build the scraper yourself, for control, for learning, or because your workflow genuinely needs it, good. The rest of this guide is the honest, current how-to: how X.com fights scrapers, which tools actually work in 2026, what it really costs, and where the legal lines are.

Table of Contents

Why scrape X.com (and whether you actually need to)
How X.com works under the hood
Method 1: Headless browser scraping
Method 2: Open-source Python libraries
Method 3: AI-assisted scraping
Method 4: Managed scraping services
What scraping actually costs
Is it legal to scrape X.com?
When an API makes more sense than scraping
In practice: a monitoring pipeline that stopped breaking
FAQ
How we verified this guide

Why scrape X.com (and whether you actually need to)

X.com is still one of the richest sources of real-time public data anywhere. Teams scrape it for brand monitoring, sentiment analysis, competitor research, lead generation, trend detection, and academic studies. The pull is obvious. The cost is not.

Before you write a line of scraping code, separate two questions: do you need to scrape, or do you need the data? They are not the same. Scraping means building and maintaining infrastructure that extracts data from a site engineered to stop exactly that. If you want hands-on control of the extraction, you are learning scraping as a skill, or your workflow is genuinely custom and no tool covers it, scraping is the right call.

For everyone else, there are faster paths, and we cover them later in this guide. The honest framing for the rest of this article is that scraping X is doable and sometimes the right tool, but it is rarely the easy one.

How X.com works under the hood

Understanding X.com's architecture explains why every scraper eventually breaks. If you have scraped other sites, X.com's defenses are in a different league, and the reasons are structural.

X.com is a React single-page application. Load a profile or tweet URL and the server returns a minimal HTML shell. JavaScript then requests a guest token from the backend and fires GraphQL queries to fetch the actual data, which the browser renders. There is almost no useful data in the initial HTML, so simple HTTP-and-parse scraping gets you nothing.

That architecture gives X three chokepoints to block scrapers.

Guest tokens are temporary credentials required for every GraphQL call. They are tied to your IP, expire every few hours, and the way they are issued changes every few weeks. When X shifts token issuance, every scraper relying on the old method stops instantly.

GraphQL operation IDs (doc_ids) are identifiers embedded in X.com's JavaScript bundle that tell the backend which operation to run. Fetching a profile, searching tweets, and loading a timeline each need a different one. X rotates them every two to four weeks, you need to track eight to twelve at once, and there is no documentation. You reverse-engineer them from minified JavaScript, then do it again a couple of weeks later.

Rate limits and detection are the third layer. X enforces roughly 300 requests per hour per IP. Datacenter IPs get flagged within a request or two. TLS fingerprinting catches headless browsers whose network stack does not perfectly mimic a real one, and cookie validation flags suspicious session patterns. If you think an account got flagged, our free shadowban checker confirms it in a few seconds.

We watch these defenses for a living, since we run an X data API, and they have moved from basic rate limiting to a layered detection system that updates faster than most teams can patch. The 2023 API shutdown accelerated it, and the pace has not let up since.

Method 1: Headless browser scraping (Playwright / Puppeteer)

The most common do-it-yourself approach automates a real browser, loads X.com pages, and intercepts the GraphQL responses that carry the data. It works because it runs the same JavaScript X expects, so the data renders the way it would for a human.

Here is a minimal Python example with Playwright that scrapes a single tweet:

python

from playwright.sync_api import sync_playwright
import json

def scrape_tweet(url: str) -> dict:
    xhr_calls = []

    def capture_response(response):
        if response.request.resource_type == "xhr":
            xhr_calls.append(response)

    with sync_playwright() as pw:
        browser = pw.chromium.launch(headless=True)
        page = browser.new_page()
        page.on("response", capture_response)
        page.goto(url)
        page.wait_for_selector("[data-testid='tweet']", timeout=15000)

        for xhr in xhr_calls:
            if "TweetResultByRestId" in xhr.url:
                data = xhr.json()
                return data["data"]["tweetResult"]["result"]

    return {}

tweet = scrape_tweet("https://x.com/elonmusk/status/1234567890")
print(json.dumps(tweet, indent=2))

The script launches Chromium, navigates to a tweet, waits for it to render, then filters the background XHR calls for the one holding tweet data. You get the full tweet object: text, timestamps, engagement counts, media URLs, and the author profile. The same pattern works for profiles (look for UserByScreenName in XHR URLs), search (SearchTimeline), and timelines (UserTweets).

What you can get: profiles, tweets, search results, threads, quote tweets, and replies. Essentially anything visible on the public interface.

What you cannot get: protected accounts, DMs, bookmarks, and full historical archives without heavy scroll automation, plus anything that needs a logged-in session unless you scrape authenticated (which risks bans). For history without scraping, a data API can return tweets back to 2006 through search.

What it takes to keep running. Residential proxies are non-negotiable, since datacenter IPs are blocked almost instantly; budget $1 to $3 per gigabyte and $50 to $200 a month depending on volume. You need fingerprint spoofing, realistic viewport sizes, and randomized human-like delays, plus retry logic for tokens expiring mid-session and GraphQL endpoints returning empty results when doc_ids rotate. This approach works right now and will break within two to four weeks of X's next update, so plan on ten to fifteen hours a month keeping it alive.

Method 2: Open-source Python libraries

Instead of building from scratch, you can use a library that wraps X.com's internal API. Some are actively maintained and currently working; several widely-recommended ones are dead. Here is the honest picture, re-verified against X's live API in June 2026.

Library	Language	Auth	Write actions	Notes
Twikit	Python	Login (credentials)	Yes (post, like, DM)	Most popular by far (~4.4k stars). Async, large community, Grok extension. Read and write.
twscrape	Python	Tokens / cookies	No (read-only)	Actively maintained, with releases through 2026. Built-in multi-account rotation. Best for high-volume data.
Scweet	Python	Cookies + auth token	No (read-only)	Modern cookies-plus-GraphQL approach, multi-account pooling, proxy support, async. Verified working in 2026.
Tweety	Python	Session	No (read-only)	Lightweight "easy scraper," async client, good for quick profile and tweet pulls.
TweeterPy	Python	Login	No (read-only)	Simpler, extraction-focused API. Proxies supported. Smaller community.

The two you will choose between most often are Twikit and twscrape. Here is working code for the most common one.

Twikit is the default for Python developers. It is async, well documented, and has the largest community. A search example:

python

import asyncio
from twikit import Client

client = Client('en-US')

async def main():
    await client.login(
        auth_info_1='username',
        auth_info_2='email@example.com',
        password='password',
        cookies_file='cookies.json'
    )

    tweets = await client.search_tweet('web scraping', 'Latest')
    for tweet in tweets:
        print(tweet.user.name, tweet.text, tweet.created_at)

asyncio.run(main())

twscrape vs Twikit: which to choose

Choose Twikit when you need write actions or a single library for general use, and twscrape when you are harvesting data at volume. They optimize for different jobs. Twikit logs in with account credentials, does both reading and writing (posting, liking, following, DMs), and has the biggest community, which makes it the natural pick for bots, automation, and one-off searches. twscrape is read-only by design and built around data collection: it authenticates with tokens or cookies and ships built-in multi-account rotation, so it distributes requests across accounts and pushes through rate limits faster than a single-account setup. The tradeoffs to know: twscrape is async-only with no built-in file output and no resume, while Twikit, because it rides the public web frontend closely, tends to break when X changes that frontend. For a large historical dataset, twscrape's rotation wins; for mixed read-write automation, Twikit wins.

The graveyard (do not waste your time)

Library	What happened
snscrape	Broke after X's backend changes. Hundreds of open issues, no meaningful commit in a long while. Works intermittently for one-off grabs at best.
Twint	Unmaintained since 2022 and archived. Still cited in old tutorials that should know better.
ntscraper	Depends on Nitter frontends, which have largely shut down. Unreliable.

If a tutorial recommends any of these as a current solution, check its date. The X scraping landscape turns over fast, and a guide even twelve months old can point you at tools that no longer function. This is also why we date-stamp the table above: a library that worked last quarter can be dead this one.

The catch with every working library

Each library in the working table needs a logged-in X account, which carries consequences. Never use your personal account, because X suspends accounts showing automated behavior. For any real volume you need several dedicated accounts and a rotation system, you still need residential proxies, and even an actively maintained library breaks when X pushes an update, at which point you wait on the maintainer's response time.

Method 3: AI-assisted scraping

AI-assisted scraping replaces brittle CSS selectors with a language model that reads the page and extracts what you describe in plain English. The best-known open-source option is ScrapeGraphAI, a Python library where you write a prompt like "get the text, author, and like count of each tweet" and it figures out the structure, then re-adapts when X changes its layout instead of breaking on a renamed element.

This is a real answer to the maintenance problem, with real tradeoffs. The model still has to reach the page, so you carry the same proxy, account, and anti-bot requirements as any other method. Every extraction now also spends LLM tokens, which adds cost and latency and makes results less deterministic than a fixed parser. It is a strong fit for prototyping and for layouts that change often, and a poor fit for cheap, high-volume, repeatable collection, such as building a large training dataset, where you want predictable cost per record. If predictable cost and zero maintenance are the goal, a structured data API is the cleaner version of what AI scraping is reaching for.

Method 4: Managed scraping services

Managed services run the scraping infrastructure for you. You send a query or URL, they return structured data, and proxy rotation, token management, and anti-bot bypass are their problem. The names you will compare are Bright Data, Apify, and ScrapFly, and Scweet also offers a hosted version of its scraper on Apify with a small free tier. The upside is no code to maintain and no proxies to manage. The downsides are cost at scale and vendor dependency: if their scraper breaks, you wait for their fix, and pricing models vary from per tweet to per compute unit to per gigabyte of proxy traffic.

This is a big enough decision to deserve its own breakdown. For real per-1,000 costs, reliability, and hidden fees across each managed option, see our Twitter scrapers comparison.

What scraping actually costs

Most tutorials quote the tool's sticker price and stop. The real cost of scraping X is the tool plus residential proxies plus developer time plus the recurring cost of things breaking. Here is the full picture across the four routes.

	DIY (Playwright/Puppeteer)	Open-source library	Managed scraper	Sorsa
Setup time	Days to weeks	Hours	Minutes	Minutes
Monthly maintenance	10-15 hours	5-10 hours	Near zero	Zero
Proxy cost	$50-200/mo	$50-200/mo	Included	Not needed
Service cost	$0	$0	$50-500/mo	$49-899/mo
Account ban risk	High	High	None (their accounts)	None
Data completeness	Public view only	Moderate (with login)	Good	Full: profiles, tweets, search, followers, engagement, communities
Reliability	Low (breaks every 2-4 weeks)	Medium (depends on maintainer)	High	High
Rate limit	~300 req/hr per IP	Varies	Varies	20 req/s on all plans

The row that decides it is maintenance. Developer time is the most expensive line on the list, and it does not show up until the first time X rotates its doc_ids and your pipeline goes quiet on a Friday. Teams we have worked with routinely spend ten to fifteen hours a month keeping a self-built X scraper alive, plus the proxy bill on top. The free tools are not free once you count your time.

Is it legal to scrape X.com?

Scraping public X.com data sits in a gray area: U.S. courts have largely protected it, while X's own Terms of Service flatly prohibit it. Both things are true at once, and which matters depends on whether you care about contract risk or criminal-law risk.

U.S. case law leans favorable for public data. In 2022 the Ninth Circuit affirmed that scraping publicly available information does not violate the Computer Fraud and Abuse Act, the hiQ v. LinkedIn line of cases being the most cited. X has also tested this directly and lost: a federal court dismissed X Corp. v. Bright Data in May 2024, finding X's claims against the scraper largely preempted, and X's suit against the Center for Countering Digital Hate was dismissed on First Amendment grounds.

X's Terms of Service prohibit scraping outright. The current terms, effective January 15, 2026, bar crawling or scraping "in any form, for any purpose" without prior written consent. They include a liquidated-damages clause: anyone who requests, views, or accesses more than 1,000,000 posts in any 24-hour period in violation of the terms agrees to pay $15,000 USD (or €15,000 in the EU, EFTA, and UK) per million posts. The 2026 update also redefines "Content" to cover AI prompts and outputs, adds a misuse clause aimed at jailbreaking and prompt injection, and sets a Texas venue with a class-action waiver. Violating the terms is a contract matter, not a crime, but X can suspend accounts, block IPs, and point to that clause if you operate at scale.

Practical guidance: scrape only public data, do not collect personal information without a legitimate basis, do not hammer X's servers with aggressive request rates, and for production systems consider that a third-party API provider shifts the data-acquisition posture onto its own infrastructure. This is not legal advice; if your use case involves sensitive data or high volume, talk to a lawyer.

When an API makes more sense than scraping

If you have read this far, the honest takeaway is that scraping X is possible but costly in time, money, and ongoing effort, and for many use cases you do not need to scrape at all. A read-only X data API returns the same information through plain REST endpoints: profiles, tweets, search, followers, engagement, community data, all as clean JSON, one API key in the header, no proxies, no guest tokens, no doc_ids.

Here is a profile lookup with Sorsa's API:

bash

curl -H "ApiKey: YOUR_KEY" \
  "https://api.sorsa.io/v3/info?username=elonmusk"

That returns the full profile object in one line: ID, username, display name, bio, follower and following counts, tweet count, verification status, images, and creation date. Compare it to the 20-plus lines of Playwright earlier, plus the proxy setup, token management, and retry logic that example does not even show.

Sorsa covers 40 endpoints across users, tweets, search, followers, verification, communities, lists, and trends, on a flat 20 requests per second on every plan with no per-endpoint windows. Pricing starts at $49 a month for 10,000 requests, batch endpoints like /info-batch (up to 100 profiles) and /tweet-info-bulk (up to 100 tweets) each count as one request, and you can test any endpoint without code in the playground, and the three-minute quickstart gets a first call working with no approval. Coming from the official X API, the migration guide maps the switch endpoint by endpoint.

Scrape when you need write access (posting, liking, following), you want full control of the extraction, you are building a scraper as the product, or you are learning. Use an API when you need read-only data reliably, you cannot afford weekly breakage, or you are shipping a product on top of X data. For a wider look at providers, see our guide to Twitter/X API alternatives, and for what the official API now costs, our X API pricing breakdown.

In practice: a monitoring pipeline that stopped breaking

A mid-sized fintech analytics team came to us running a self-built Playwright scraper for real-time monitoring across a few hundred finance and competitor accounts. It worked until it didn't: every time X rotated its doc_ids, the pipeline went silent, and an engineer lost a day re-reverse-engineering the new identifiers and rotating proxies. Across developer hours and residential proxies, the real monthly cost had crept past $2,000 for data they treated as routine.

They moved the read side to a data API and kept building only the parts that were actually theirs. The recurring engineering drain went to zero because the same REST calls returned the same JSON every day no matter what X changed on its end, and the spend dropped to a fraction of the proxy-plus-hours total. The cost reduction is a real property of switching from per-account scraping with proxies to flat per-request billing; the maintenance reduction is simply what happens when you stop running a scraper. We kept the example anonymized because naming a client and its monitoring targets can expose both.

FAQ

Can you scrape Twitter without logging in?

You can, but with heavy limits. Without authentication, headless browser scraping can reach basic public profiles and individual tweets, while search, full timelines, threads, and engagement detail are restricted or incomplete. Every open-source library that provides full data access (Twikit, twscrape, Scweet, Tweety) requires either account credentials or session tokens, which is what unlocks the complete data.

What is the best tool to scrape Twitter in 2026?

It depends on your stack. For Python with read and write actions, Twikit is the most popular and actively maintained library. For high-volume data collection, twscrape adds built-in multi-account rotation, and Scweet offers a modern cookies-plus-GraphQL approach with proxy pooling. For zero-maintenance access with no proxies or banned accounts, a read-only data API such as Sorsa API returns the same data through REST endpoints starting at $49 a month.

How do you scrape Twitter without getting blocked?

Use residential proxies rather than datacenter IPs, add randomized delays of at least two to five seconds between requests, rotate browser fingerprints and user agents, and use dedicated scraping accounts, never your personal one. Keep sticky proxy sessions of ten to fifteen minutes so guest tokens stay valid. Even with all of this, expect periodic blocks, because X's detection improves continuously.

Does the official X API still have a free tier?

No. As of early 2026, X replaced its subscription tiers with pay-per-use pricing and there are no free credits for new developers. You buy credits upfront and pay per resource: about $0.005 per post read, $0.010 per user profile, and $0.015 per post created, with a hard cap of 2 million post reads per month on standard accounts. Our companion guide on whether the X API is free covers the current options.

Can you scrape Twitter with Python?

Yes, Python is the most common language for it. You can drive a headless browser with Playwright, or use a library: Twikit (async, read and write), twscrape (multi-account data harvesting), Scweet (cookies plus GraphQL), or Tweety (lightweight extraction). Tweepy exists too but wraps the official paid API rather than scraping. If an API fits better than a scraper, our Twitter API Python guide walks through the REST route.

How often does X.com break scrapers?

Every two to four weeks on average. The usual triggers are changes to how guest tokens are issued, rotations of the GraphQL operation IDs, and new anti-bot detection layers. Any scraper, do-it-yourself or library-based, needs regular updates to keep working, and this recurring maintenance is the single biggest hidden cost of the scraping approach.

How we verified this guide

Reviewed by Keksich, founder of Sorsa, marketer and X API researcher.

This guide is built on our own work running an X data API and testing against X's live endpoints, plus a fresh review of each tool's current state. In this pass we re-checked the open-source libraries directly on GitHub and PyPI to confirm what is maintained (Twikit, twscrape, Tweety, and Scweet are active in 2026; snscrape, Twint, and ntscraper are not), read X's Terms of Service effective January 15, 2026 for the scraping and liquidated-damages language, and used the EFF summary of the Ninth Circuit CFAA ruling for the case-law section. Pricing for our own API comes from the Sorsa docs. X's Terms of Service and the official X API pricing were re-verified on June 14, 2026; the open-source library breakdown was tested in June 2026. Libraries and platform terms change quickly, so re-check anything time-sensitive against the source before you rely on it.

How to Scrape Twitter (X) in 2026: Methods & Tools

Why scrape X.com (and whether you actually need to)

How X.com works under the hood

Method 1: Headless browser scraping (Playwright / Puppeteer)

Method 2: Open-source Python libraries

twscrape vs Twikit: which to choose

The graveyard (do not waste your time)

The catch with every working library

Method 3: AI-assisted scraping

Method 4: Managed scraping services

What scraping actually costs

Is it legal to scrape X.com?

When an API makes more sense than scraping

In practice: a monitoring pipeline that stopped breaking

FAQ

Can you scrape Twitter without logging in?

What is the best tool to scrape Twitter in 2026?

How do you scrape Twitter without getting blocked?

Does the official X API still have a free tier?

Can you scrape Twitter with Python?

How often does X.com break scrapers?

How we verified this guide

More articles

Twitter API in Go: How to Get X Data in 2026 (Guide)

Make.com Twitter Integration: Get X Data (2026 Guide)

Twitter API in n8n: Get X Data via HTTP Request (2026)