SorsaSorsaGet API Key
Return to blog

How to Scrape Twitter (X) in 2026: Methods, Tools, and What Works

Practical guide to scraping X.com in 2026. Covers headless browsers, open-source libraries, managed tools, costs, legal risks, and when an API is the smarter choice.

How to Scrape X (Twitter) in 2026: Methods, Tools, and What Actually Works

Key Takeaway: You can scrape X.com using a headless browser (Playwright or Puppeteer), open-source libraries like Twikit or TweeterPy, or managed scraping services. But X.com actively breaks scrapers every 2-4 weeks by rotating tokens, changing GraphQL identifiers, and tightening bot detection. If your goal is getting structured Twitter data rather than building a scraper, a third-party API like Sorsa API returns the same data through simple REST calls with zero maintenance.

Last updated: March 24, 2026


Table of Contents


Why People Scrape X.com (And Whether You Need To)

X.com remains one of the richest sources of real-time public data on the internet. Developers, researchers, and businesses scrape it for brand monitoring, sentiment analysis, competitor research, lead generation, trend detection, and academic studies.

But here is a question worth asking before you write a single line of scraping code: do you need to scrape, or do you need the data?

Scraping means building and maintaining infrastructure that extracts data from a website designed to resist exactly that. If you want hands-on control over the extraction process, you are learning web scraping as a skill, or you have a highly custom workflow that no existing tool covers, then yes, scraping makes sense.

For everyone else, there are faster paths. Third-party APIs and managed scraping services return the same tweets, profiles, and engagement metrics without the proxy bills, token headaches, and weekly maintenance. We will cover those options later in this guide. For now, let's dig into how scraping X.com actually works.


How X.com Works Under the Hood

Understanding X.com's architecture explains why every scraper eventually breaks. If you have built scrapers for other sites, X.com's defenses are in a different league.

X.com is a React single-page application. When you load a profile or tweet URL, the server sends a minimal HTML shell. JavaScript takes over, requests a guest token from the backend, and then fires GraphQL queries to fetch the actual data. The browser renders the response. There is almost no useful data in the initial HTML.

This architecture creates three chokepoints that X.com uses to block scrapers.

Guest tokens are temporary credentials required for every GraphQL call. They are tied to your IP address, expire every 2-4 hours, and the acquisition method changes every few weeks. When X.com shifts how tokens are issued, every scraper that relies on the old method stops working instantly.

doc_ids are identifiers embedded in X.com's JavaScript bundle that tell the GraphQL backend which operation to execute. Fetching a user profile, searching tweets, and loading a timeline each require a different doc_id. X.com rotates these every 2-4 weeks, and you need to track 8-12 of them simultaneously. There is no documentation. You reverse-engineer them from minified JavaScript, and then you do it again two weeks later.

Rate limits and detection form the third layer. X.com enforces roughly 300 requests per hour per IP address. Datacenter IPs get flagged within 1-2 requests. TLS fingerprinting catches headless browsers that do not perfectly mimic a real browser's network stack. Cookie validation flags suspicious session patterns. If you suspect your account has been flagged, you can use a shadowban checker to verify.

Having worked with Twitter's API since the v1.1 days, I have watched these defenses evolve from basic rate limiting to a multi-layered detection system that updates faster than most teams can respond. The 2023 API shutdown accelerated this dramatically, and the changes have only gotten more aggressive since then.


Method 1: Headless Browser Scraping (Playwright / Puppeteer)

The most common DIY approach is to automate a real browser, load X.com pages, and intercept the GraphQL responses that contain the data you want.

Here is a minimal Python example using Playwright that scrapes a single tweet:

from playwright.sync_api import sync_playwright
import json

def scrape_tweet(url: str) -> dict:
    xhr_calls = []

    def capture_response(response):
        if response.request.resource_type == "xhr":
            xhr_calls.append(response)

    with sync_playwright() as pw:
        browser = pw.chromium.launch(headless=True)
        page = browser.new_page()
        page.on("response", capture_response)
        page.goto(url)
        page.wait_for_selector("[data-testid='tweet']", timeout=15000)

        for xhr in xhr_calls:
            if "TweetResultByRestId" in xhr.url:
                data = xhr.json()
                return data["data"]["tweetResult"]["result"]

    return {}

tweet = scrape_tweet("https://x.com/elonmusk/status/1234567890")
print(json.dumps(tweet, indent=2))

The script launches a Chromium instance, navigates to a tweet URL, waits for the tweet to render, then filters the background XHR calls to find the one containing tweet data. You get the full tweet object: text, timestamps, engagement counts, media URLs, and the author's profile.

The same pattern works for profiles (look for UserBy in XHR URLs), search results (SearchTimeline), and timelines (UserTweets).

What You Can Get

Profiles (username, bio, follower counts, verification status), tweets (text, media, engagement metrics), search results, threads, quote tweets, and replies. Essentially anything visible on the public X.com interface.

What You Cannot Get

Private or protected accounts, DMs, bookmarks, full historical archives without extensive scrolling automation, and any data that requires a logged-in session unless you use authenticated scraping (which risks account bans). For historical data access without scraping, third-party APIs like Sorsa can retrieve tweets back to 2006 through search endpoints.

What You Need to Make It Work

Residential proxies are non-negotiable. Datacenter IPs are blocked almost instantly. Budget $1-3 per gigabyte, and plan for $50-200/month depending on volume. Sticky sessions of 10-15 minutes work best to keep guest tokens and IP sessions aligned.

Anti-detection measures matter. Vanilla headless Chromium gets fingerprinted. You need proper browser fingerprint spoofing, realistic viewport sizes, and human-like request timing with randomized delays.

Error handling and retry logic. Expect failures. Tokens expire mid-session. Rate limits hit without warning. GraphQL endpoints return empty results when doc_ids rotate. A production scraper needs robust retry mechanisms and monitoring.

The Maintenance Reality

This approach works right now. It will break within 2-4 weeks when X.com pushes its next update. Keeping a Playwright scraper running against X.com is a recurring commitment of 10-15 hours per month: monitoring for failures, reverse-engineering new doc_ids, updating token acquisition logic, and adjusting proxy rotation strategies.


Method 2: Open-Source Libraries

Instead of building a scraper from scratch, you can use libraries that wrap X.com's internal APIs. Some are actively maintained. Many are dead. Here is the honest picture as of March 2026.

What Is Actually Working

LibraryLanguageAuth RequiredKey FeaturesNotes
TwikitPythonYes (login)Search, scrape tweets, post, DMs, trends. Async.Most popular option. Active development, frequent updates. Large community on Discord.
TweeterPyPythonYes (login)Profiles, tweets, followers, followings.Simpler API surface. Good for data extraction. Active maintenance.
XActionsNode.jsBrowser scripts: no. Full API: yes.140+ MCP tools, CLI, browser scripts, dashboard, sentiment analysis.Most feature-rich. Works with AI agents (Claude, GPT) via MCP server. Also supports Bluesky and Mastodon.

Here is what working code looks like for each library.

Twikit is the go-to for most Python developers. It is async, well-documented, and has the largest community. Here is a search example:

import asyncio
from twikit import Client

client = Client('en-US')

async def main():
    await client.login(
        auth_info_1='username',
        auth_info_2='email@example.com',
        password='password',
        cookies_file='cookies.json'
    )

    # Search latest tweets
    tweets = await client.search_tweet('web scraping', 'Latest')
    for tweet in tweets:
        print(tweet.user.name, tweet.text, tweet.created_at)

    # Get a user's tweets
    user_tweets = await client.get_user_tweets('123456', 'Tweets')
    for t in user_tweets:
        print(t.text)

asyncio.run(main())

Notice that login is required. Twikit saves cookies to a file, so subsequent runs can skip the login step and reuse the session.

TweeterPy has a simpler, synchronous API that is easier to pick up if you just need to pull profiles and tweets:

from tweeterpy import TweeterPy

twitter = TweeterPy()

# Get a user's numeric ID
user_id = twitter.get_user_id('elonmusk')
print(user_id)

# Get full profile data
profile = twitter.get_user_data('elonmusk')
print(profile)

Less feature-rich than Twikit, but straightforward for extraction tasks. Supports proxies out of the box via a proxies parameter.

XActions is the Node.js option. It stands out by offering a CLI, browser console scripts, and an MCP server for AI agent integration:

# Install globally
npm install -g xactions

# Scrape a profile from the command line
xactions profile elonmusk

# Search tweets
xactions search "web scraping"

# Detect who unfollowed you
xactions unfollowers

For programmatic use in Node.js:

import { scrapeProfile, scrapeTweets } from 'xactions';

const profile = await scrapeProfile('elonmusk');
console.log(profile);

const tweets = await scrapeTweets('elonmusk', { count: 20 });
tweets.forEach(t => console.log(t.text));

XActions also ships 50+ browser console scripts you can paste directly into Chrome DevTools on x.com, no install needed. That makes it the lowest-friction option for quick one-off scrapes.

The Graveyard (Do Not Waste Your Time)

LibraryWhat Happened
TwintArchived in 2022. Completely dead. Still referenced in tutorials that should know better.
snscrapeNo updates in over three years. Broken against current X.com.
twscrape11 months without a commit. Almost certainly broken.
ntscraperDepends on Nitter frontend instances, which are shutting down. Unreliable at best.

If you find a tutorial recommending any of these, check the publication date. The X.com scraping landscape changes fast, and guides from even 12 months ago may point you to tools that no longer function.

The Catch with Open-Source Libraries

Every working library in the table above requires logging in with an X.com account. This means:

  • Account ban risk. X.com suspends accounts that exhibit automated behavior. Do not use your personal account. Ever.
  • Account rotation. For any serious volume, you need multiple accounts and a system to rotate between them.
  • Proxies are still necessary. Same residential proxy requirements as DIY scraping.
  • Breakage happens. Even actively maintained libraries break when X.com pushes updates. You are dependent on the maintainer's response time.

Method 3: Managed Scraping Services

Services like Apify, ScrapFly, and Bright Data handle the scraping infrastructure for you. You provide a query or URL, they return structured data. Proxy rotation, token management, and anti-bot bypass are their problem, not yours.

The upside is obvious: no code to maintain, no proxies to manage. The downside is cost at scale and vendor dependency. If their scraper breaks, you wait for their fix. And pricing models vary wildly: some charge per tweet, others per compute unit, others per GB of proxy traffic.

This is a big enough topic to deserve its own guide. For a detailed breakdown of managed scraping services, see our Twitter scrapers comparison.


What Scraping Actually Costs (The Full Picture)

Most scraping tutorials skip the real costs. Here is what each method actually runs when you factor in everything: not just the tool itself, but proxies, developer time, and the hidden cost of things breaking.

DIY (Playwright/Puppeteer)Open-Source LibraryManaged ScraperSorsa API
Setup timeDays to weeksHoursMinutesMinutes
Monthly maintenance10-15 hours5-10 hoursNear zeroZero
Proxy cost$50-200/mo$50-200/moIncludedNot needed
Service cost$0$0$50-500/mo$49-899/mo
Account ban riskHighHighNone (their accounts)None
Data completenessLimited (public view only)Moderate (with auth)GoodFull (profiles, tweets, search, followers, engagement, communities)
ReliabilityLow (breaks every 2-4 weeks)Medium (depends on maintainer)HighHigh
Rate limit~300 req/hr per IPVariesVaries20 req/s on all plans

The numbers that matter are in the maintenance row. Developer time is expensive. One of my clients spent over 15 hours a month patching a Playwright-based Twitter scraper that broke every time X.com rotated its doc_ids. After three months, the total cost (developer hours plus residential proxies) exceeded $2,000/month for data they could have pulled through an API for under $200. They switched and never looked back.

The free tools are not free when you account for your time.


This is the section everyone skips to. Here is what you need to know.

U.S. case law is broadly favorable to scraping public data. In 2022, the Ninth Circuit Court of Appeals upheld that scraping publicly available information does not violate the Computer Fraud and Abuse Act (CFAA). The hiQ v. LinkedIn decision is the most cited precedent.

X.com's Terms of Service explicitly prohibit scraping. Violating ToS can result in account suspension and IP blocks. It is not a criminal offense, but X.com has the right to deny you access to their platform.

The liquidated damages clause. X.com's current ToS include a provision stating that anyone who accesses more than 1,000,000 posts in a 24-hour period via automated means without permission is liable for $15,000 in liquidated damages per million posts. As of March 2026, there are no publicly known enforcement cases under this clause, and major scraping companies like Bright Data and Apify continue to operate openly. But the provision exists and is worth knowing about if you are operating at scale.

Practical guidance:

  • Scrape only publicly available data.
  • Do not collect personally identifiable information without a legitimate purpose.
  • Do not overload X.com's servers with aggressive request rates.
  • For production systems, using a third-party API provider shifts the compliance burden. They manage data acquisition on their infrastructure and are responsible for their own legal posture.

This is not legal advice. If your use case involves sensitive data or high volume, consult a lawyer.


When an API Makes More Sense Than Scraping

If you have read this far, you understand that scraping X.com is possible but expensive in time, money, and ongoing effort. For many use cases, the honest answer is: you do not need to scrape at all.

Third-party X data APIs return the same information through simple REST endpoints. Profiles, tweets, search results, followers, engagement metrics, community data. All structured as clean JSON. One API key in the header. No proxies. No guest tokens. No doc_ids. No maintenance.

Here is what a profile lookup looks like with Sorsa API:

curl -H "ApiKey: YOUR_KEY" \
  "https://api.sorsa.io/v3/info?username=elonmusk"

That returns the full profile object: ID, username, display name, bio, follower count, following count, tweet count, verification status, profile image, banner, creation date. One request, one line, structured JSON.

Compare that to the 20+ lines of Playwright code earlier in this guide, plus the proxy setup, token management, and retry logic that are not even shown.

Sorsa API covers 38 endpoints across user data, tweets, search, verification, communities, lists, and crypto analytics. The rate limit is a flat 20 requests per second on all plans. Pricing starts at $49/month for 10,000 requests, and batch endpoints like /info-batch (up to 100 profiles) and /tweet-info-bulk (up to 100 tweets) each count as a single request. You can test endpoints without writing code using the API playground. If you are coming from the official X API, the migration guide covers the switch step by step.

When to scrape: You need write access (posting, liking, following). You want full control over the extraction process. You are building a scraping tool as the product itself. You are learning.

When to use an API: You need read-only data (profiles, tweets, search, followers, engagement). You want reliability without maintenance. You are building a product on top of X data and cannot afford weekly breakages.

For a deeper comparison of API providers, see our guide on X (Twitter) API alternatives. And for a breakdown of what the official X API costs in 2026, see our pricing analysis.


FAQ

Can you scrape Twitter without logging in?

Yes, but with significant limitations. Without authentication, you can access basic public profiles and individual tweets through headless browser scraping. Search results, full timelines, threads, and engagement details are restricted or incomplete without a logged-in session. Every open-source library that provides full data access (Twikit, TweeterPy) requires authentication.

What is the best tool to scrape Twitter in 2026?

It depends on your stack and requirements. For Python developers who want full control, Twikit is the most actively maintained library with the largest community. For Node.js and AI agent workflows, XActions offers the broadest feature set including an MCP server. For zero-maintenance data access, a third-party API like Sorsa API or a managed scraping service removes the infrastructure burden entirely.

How do you scrape Twitter without getting blocked?

Use residential proxies, not datacenter IPs. Add randomized delays between requests (2-5 seconds minimum). Rotate browser fingerprints and user agents. Use separate, dedicated accounts for scraping (never your personal account). Implement sticky proxy sessions of 10-15 minutes to keep guest tokens valid. Even with all of this, expect periodic blocks. X.com's detection improves continuously.

Does the official X API still have a free tier?

No. As of early 2026, X replaced all subscription tiers with a pay-per-use model. There are no free credits. You buy credits upfront and pay per resource: $0.005 per post read, $0.01 per user profile, $0.01 per post created. There is also a hard cap of 2 million post reads per month on standard accounts. For details, see our full pricing breakdown.

How often does X.com break scrapers?

Every 2-4 weeks on average. The main causes are guest token acquisition changes, doc_id rotations, and new anti-bot detection layers. Any scraper, whether DIY or open-source, requires regular updates to keep functioning. This is the single biggest hidden cost of the scraping approach.

Can you scrape Twitter with Python?

Yes. Python is the most common language for X.com scraping. You can use Playwright (headless browser automation), Twikit (async library wrapping X.com's internal APIs), or TweeterPy (simpler extraction-focused library). All three are actively maintained as of March 2026. For a no-code option, managed services like Apify offer point-and-click Twitter scrapers with Python SDK integrations. If you decide an API fits your use case better, see our Twitter API Python integration guide.


Daniel Kolbassen is a data engineer and API infrastructure consultant with 12+ years of experience building data pipelines around social media platforms. He has worked with the Twitter/X API since the v1.1 era and has helped over 40 companies restructure their data infrastructure after the 2023 pricing overhaul. Follow him on Twitter/X or connect on LinkedIn.

Sorsa API — fast X (Twitter) data for developers

  • Real-time X data
  • 20 req/sec limits
  • AI-ready JSON
  • 30+ endpoints
  • Flat pricing
  • 3-minute setup
Share

More articles

All blog posts