How do I detect a website's tech stack using Python?

The easiest approach is to call a tech stack detection API from Python using the requests library. The StackPeek API accepts a URL and returns a JSON list of detected technologies with categories and confidence scores. A single call looks like: import requests; r = requests.get('https://us-central1-todd-agent-prod.cloudfunctions.net/stackpeekApi/api/v1/detect', params={'url': 'https://example.com'}); techs = r.json()['technologies']. No API key required for the free tier.

Is there a Wappalyzer alternative for Python?

Yes. The StackPeek API is a developer-friendly Wappalyzer alternative that works well from Python. Wappalyzer's cheapest paid plan starts at $250/month. StackPeek offers 100 free scans per day with no API key, and paid plans starting at $9/month for 5,000 scans. For programmatic access from Python scripts, a REST API like StackPeek is more practical than browser-based detection tools.

How do I scan a large list of websites for tech stack data in Python?

For batch scanning, use asyncio and aiohttp to make concurrent API requests. This is significantly faster than sequential requests. With a concurrency limit of 10, you can scan 1,000 URLs in under 3 minutes. Load your URLs from a CSV or text file, create an async function that calls the StackPeek API per URL, gather results with asyncio.gather(), and write the output to a new CSV. The full code is included in this article.

How do I filter sales leads by technology in pandas?

Once you have scanned your prospect list and loaded the results into a pandas DataFrame, you can filter with a simple boolean mask. For example, to find all Shopify sites: shopify_leads = df[df['technologies'].apply(lambda techs: any(t['name'] == 'Shopify' for t in techs))]. You can chain filters to find prospects running multiple specific technologies, such as sites using Shopify but not Klaviyo — prime targets for an email marketing tool.

What technologies can the StackPeek API detect?

StackPeek detects 120+ technologies across categories: frontend frameworks (React, Vue, Angular, Svelte, Next.js), CMS platforms (WordPress, Shopify, Webflow, Squarespace), analytics tools (Google Analytics, Segment, Mixpanel, Amplitude), payment processors (Stripe, PayPal, Braintree), CDNs (Cloudflare, Fastly, Akamai), and hosting providers (AWS, Vercel, Netlify, Firebase). Detection works by analyzing HTTP headers, HTML source, script tags, and meta tags — no browser rendering required.

Sales Prospecting · Python

Detect Website Tech Stacks with Python: Automate Sales Prospecting

Published March 29, 2026 · 11 min read

If you sell developer tools, SaaS integrations, or anything where the buyer's technology matters, you already know the pain: you have a list of 500 company domains and zero idea which ones are actually worth reaching out to. Cold outreach without technographic context is spam. Outreach with technographic context is relevant.

This article shows you how to build a Python pipeline that scans your prospect list, detects each site's technology stack, loads the results into a pandas DataFrame, and filters leads by technology. By the end you will have a script that turns a flat CSV of domains into a qualified, technology-filtered lead list — automatically, for $9/month instead of Wappalyzer's $250/month.

The Problem with Manual Tech Stack Research

The typical sales workflow looks like this: open a prospect's website, right-click → View Source, squint at script tags, maybe install the Wappalyzer browser extension, tab over to your CRM, manually enter "React, Stripe, Cloudflare." Repeat 499 times.

That is not research, that is data entry. It is also error-prone — the browser extension misses server-side signals, HTTP headers, and meta tags that the HTML source doesn't expose directly. And it doesn't scale. A sales engineer spending two hours on technographic research per 50 leads is burning significant time that could go into actual selling.

The fix is to make tech stack detection a programmatic step in your prospecting pipeline, not a manual step in your sales workflow. You already have a Python environment. You already have a prospect list in a spreadsheet. The only missing piece is a reliable detection API.

The StackPeek API in 30 Seconds

The StackPeek API takes a URL and returns a JSON array of technologies. No authentication required for the free tier. Here is the simplest possible Python call:

import requests

resp = requests.get(
    "https://us-central1-todd-agent-prod.cloudfunctions.net/stackpeekApi/api/v1/detect",
    params={"url": "https://shopify.com"},
    timeout=15
)
data = resp.json()

for tech in data["technologies"]:
    print(f"{tech['name']} ({tech['category']}) — confidence: {tech['confidence']:.0%}")

Output:

React (framework) — confidence: 97%
Next.js (framework) — confidence: 94%
Cloudflare (cdn) — confidence: 99%
Google Analytics (analytics) — confidence: 88%
Stripe (payments) — confidence: 91%
Ruby on Rails (framework) — confidence: 76%

The response includes the technology name, category (framework, analytics, cms, payments, cdn, hosting), and a confidence score between 0 and 1. Detection runs against HTTP headers, HTML source, script URLs, and meta tags — not a browser render, so it is fast and headless-friendly.

Free tier: 100 scans per day, no API key, no account. For a prospect list of 100 companies, you can scan the entire list every day at zero cost. For larger lists, the Starter plan at $9/month gives you 5,000 scans.

Step 1: Scan a Prospect List with requests

Assume your prospect list is a CSV with at minimum a domain column. Here is a script that reads it, scans each domain, and writes a new CSV with technology data appended:

# scan_prospects.py
import csv
import json
import time
import requests
from pathlib import Path

API_URL = "https://us-central1-todd-agent-prod.cloudfunctions.net/stackpeekApi/api/v1/detect"
INPUT_CSV = "prospects.csv"
OUTPUT_CSV = "prospects_with_stacks.csv"

def detect_stack(domain: str) -> list:
    """Return list of technology dicts for a domain. Returns [] on any error."""
    url = domain if domain.startswith("http") else f"https://{domain}"
    try:
        resp = requests.get(API_URL, params={"url": url}, timeout=15)
        resp.raise_for_status()
        return resp.json().get("technologies", [])
    except Exception as e:
        print(f"  [warn] {domain}: {e}")
        return []

rows = list(csv.DictReader(open(INPUT_CSV)))
print(f"Scanning {len(rows)} prospects...")

with open(OUTPUT_CSV, "w", newline="") as f:
    fieldnames = rows[0].keys() | ["technologies_json", "tech_names"]
    writer = csv.DictWriter(f, fieldnames=list(fieldnames))
    writer.writeheader()

    for i, row in enumerate(rows, 1):
        print(f"  [{i}/{len(rows)}] {row['domain']}", end=" ... ", flush=True)
        techs = detect_stack(row["domain"])
        names = [t["name"] for t in techs]
        print(", ".join(names) if names else "(none detected)")
        row["technologies_json"] = json.dumps(techs)
        row["tech_names"] = "|".join(names)
        writer.writerow(row)
        time.sleep(0.3)  # be polite

print(f"\nDone. Results saved to {OUTPUT_CSV}")

The output CSV has every original column plus two new ones: technologies_json (the full API response, serialized) and tech_names (a pipe-delimited list of technology names for easy filtering in Excel or Google Sheets). Run it on your prospects list the night before a sales push and it is ready by morning.

Step 2: Analyze Results with pandas

Once you have the enriched CSV, pandas makes the filtering effortless. Load it back and start slicing:

# analyze_stacks.py
import json
import pandas as pd

df = pd.read_csv("prospects_with_stacks.csv")

# Deserialize the JSON column back to Python objects
df["technologies"] = df["technologies_json"].apply(
    lambda x: json.loads(x) if pd.notna(x) else []
)

# Helper: does this prospect use a specific technology?
def uses(df: pd.DataFrame, tech_name: str) -> pd.Series:
    return df["technologies"].apply(
        lambda techs: any(t["name"].lower() == tech_name.lower() for t in techs)
    )

# --- Use case 1: Find all Shopify stores (sell them your Shopify app)
shopify_leads = df[uses(df, "Shopify")]
print(f"Shopify sites: {len(shopify_leads)}")

# --- Use case 2: React sites without Sentry (sell them error monitoring)
needs_monitoring = df[uses(df, "React") & ~uses(df, "Sentry")]
print(f"React sites missing error monitoring: {len(needs_monitoring)}")

# --- Use case 3: WordPress sites (sell them a WP plugin or migration service)
wp_sites = df[uses(df, "WordPress")]
print(f"WordPress sites: {len(wp_sites)}")

# --- Use case 4: Sites with no CDN (sell them performance services)
cdn_techs = {"Cloudflare", "Fastly", "Akamai", "Amazon CloudFront"}
def has_cdn(techs):
    return any(t["name"] in cdn_techs for t in techs)
no_cdn = df[~df["technologies"].apply(has_cdn)]
print(f"Sites with no CDN: {len(no_cdn)}")

# --- Technology frequency across entire prospect list
all_techs = [t["name"] for techs in df["technologies"] for t in techs]
freq = pd.Series(all_techs).value_counts()
print("\nTop 10 technologies in your prospect list:")
print(freq.head(10).to_string())

The frequency table at the bottom is particularly useful before you start building integrations or writing outreach copy. If 70% of your prospects use Shopify, lead with Shopify. If only 8% use Magento, don't waste a sequence on it.

Exporting Filtered Leads to CSV

Once you have your filtered DataFrame, export it for your CRM or outreach tool:

# Export Shopify leads with clean columns for outreach
export_cols = ["company", "domain", "contact_email", "tech_names"]
shopify_leads[export_cols].to_csv("shopify_leads.csv", index=False)

# Or push straight to a dict list for your CRM API
leads_for_crm = shopify_leads[export_cols].to_dict(orient="records")
print(f"Ready to import {len(leads_for_crm)} leads")

Step 3: Batch Processing at Scale with asyncio and aiohttp

The sequential requests approach works fine for lists up to a few hundred domains. Beyond that, the wall-clock time gets painful. Each API call takes roughly 1.5–3 seconds; at 1,000 prospects, sequential scanning takes 25–50 minutes. With async concurrency set to 10, the same list finishes in 3–5 minutes.

# async_scan.py
import asyncio
import json
import csv
import aiohttp

API_URL = "https://us-central1-todd-agent-prod.cloudfunctions.net/stackpeekApi/api/v1/detect"
CONCURRENCY = 10  # stay well under rate limits

async def detect_one(session: aiohttp.ClientSession, domain: str) -> dict:
    url = domain if domain.startswith("http") else f"https://{domain}"
    try:
        async with session.get(API_URL, params={"url": url}, timeout=aiohttp.ClientTimeout(total=20)) as resp:
            data = await resp.json()
            return {"domain": domain, "technologies": data.get("technologies", []), "error": None}
    except Exception as e:
        return {"domain": domain, "technologies": [], "error": str(e)}

async def scan_all(domains: list[str]) -> list[dict]:
    sem = asyncio.Semaphore(CONCURRENCY)
    async def bounded(session, domain):
        async with sem:
            result = await detect_one(session, domain)
            print(f"  {domain}: {len(result['technologies'])} technologies")
            return result

    connector = aiohttp.TCPConnector(limit=CONCURRENCY)
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [bounded(session, d) for d in domains]
        return await asyncio.gather(*tasks)

# Load domains from CSV
domains = [row["domain"] for row in csv.DictReader(open("prospects.csv"))]
print(f"Scanning {len(domains)} domains with concurrency={CONCURRENCY}...")

results = asyncio.run(scan_all(domains))

# Write output
with open("prospects_async.json", "w") as f:
    json.dump(results, f, indent=2)
print(f"Done. {len(results)} results written to prospects_async.json")

Install the dependency with pip install aiohttp. The semaphore prevents you from overwhelming the API with simultaneous requests. At CONCURRENCY=10, you are making at most 10 requests at once, which is well within the Starter plan's rate limits.

Save money on re-runs: Cache results to disk by domain. Before scanning, check if a cache/{domain}.json file exists. If it does and it is less than 7 days old, skip the API call and load from disk. This is especially useful during development when you are iterating on your filtering logic.

Practical Filtering Recipes

Here are filters for common sales scenarios. All assume you have loaded the results into a pandas DataFrame as shown above.

Find prospects using a competitor's product

# You sell a Segment alternative — find sites using Segment
segment_users = df[uses(df, "Segment")]

# You sell a Mixpanel alternative — find Mixpanel + no Amplitude
mixpanel_only = df[uses(df, "Mixpanel") & ~uses(df, "Amplitude")]

Find high-intent leads by stack combination

# Shopify stores with no email marketing tool
# (warm leads if you sell email or SMS marketing)
email_tools = {"Klaviyo", "Mailchimp", "Drip", "ConvertKit"}
def has_email_tool(techs):
    return any(t["name"] in email_tools for t in techs)

shopify_no_email = df[uses(df, "Shopify") & ~df["technologies"].apply(has_email_tool)]
print(f"High-intent leads (Shopify, no email tool): {len(shopify_no_email)}")

Score prospects by stack sophistication

# Assign a "sophistication score" — more modern tools = better fit
modern_indicators = {"React", "Next.js", "Vue", "Svelte", "Stripe",
                     "Segment", "Cloudflare", "Vercel", "Netlify"}

def sophistication_score(techs):
    names = {t["name"] for t in techs}
    return len(names & modern_indicators)

df["score"] = df["technologies"].apply(sophistication_score)
top_prospects = df.nlargest(20, "score")
print("Top 20 prospects by stack sophistication:")
print(top_prospects[["company", "domain", "score", "tech_names"]].to_string(index=False))

The score-based approach is useful when your ICP (ideal customer profile) correlates with technical sophistication. If you sell a developer tool, a company already using React + Stripe + Segment is a better fit than a company still on jQuery + PayPal.

Wappalyzer vs. StackPeek for Python Workflows

If you have searched for "python website technology detection api" before, you have probably landed on Wappalyzer. Let's compare them honestly:

Feature	Wappalyzer API	StackPeek API
Monthly price	$250/mo (Business)	$9/mo (Starter)
Free tier	50 lookups/mo	100 scans/day
Monthly lookups	5,000 (Business)	5,000 (Starter) / 25,000 (Pro)
Technologies detected	1,200+	120+ (core categories)
API key required	Yes (all tiers)	No (free tier)
JSON response format	Yes	Yes
Python-friendly REST API	Yes	Yes
Annual cost	$3,000/yr	$108/yr

Wappalyzer detects more technologies — if you need coverage of obscure or niche tools, that breadth matters. But for the 80% use case in sales prospecting — frameworks, CMS platforms, analytics, payments, CDNs, hosting — StackPeek's 120+ technology coverage is sufficient, and the $2,892/year difference is hard to argue with for a scrappy sales team or a solo founder.

There is also a defunct open-source Python package called python-wappalyzer that wraps a local copy of Wappalyzer's fingerprint database. It still works for basic cases, but the fingerprint data goes stale fast, it requires Playwright or Puppeteer for JavaScript-rendered sites, and it is not maintained. For production use, a maintained API is more reliable.

Putting It Into Your Sales Workflow

The full pipeline looks like this:

Export domains from your CRM as a CSV. Most CRMs (HubSpot, Salesforce, Pipedrive) support this in two clicks.
Run the async scanner overnight or as a weekly cron job. For 1,000 domains at concurrency 10, it takes under 5 minutes.
Load into pandas, apply filters based on your ICP criteria. Export filtered leads back to CSV.
Import back into your CRM or outreach tool (Apollo, Outreach, Lemlist) with technographic tags attached.
Personalize your sequences based on the detected stack. "I noticed you're on Shopify but not using Klaviyo yet..." converts significantly better than generic copy.

If you want to automate the scan step on a schedule, see our article on building a competitive intelligence dashboard — the same scheduling patterns apply to a prospecting pipeline. A weekly cron job that re-scans your prospect list catches companies that migrate platforms, add new tools, or launch payment infrastructure, all of which are strong buying signals.

Start Scanning Your Prospect List Today

100 free scans per day. No API key. No signup. Copy the Python code above and run it against your prospect list right now.

Get Started →

Frequently Asked Questions

Do I need an API key?

No. The free tier gives you 100 scans per day with zero authentication. Just make a GET request to the endpoint with a url query parameter. For higher volumes, the Starter plan at $9/month adds an API key and raises the limit to 5,000 scans per month.

What if a site blocks scraping?

The StackPeek API does its own fetching server-side; it is not running in a browser on your machine. Sites that block scrapers based on user-agent or IP rate-limiting may still have reduced detection accuracy, but this affects all tech detection services equally, including Wappalyzer. For protected sites, confidence scores will be lower and some technologies may not be detected.

Can I detect technologies on sites that require JavaScript rendering?

StackPeek analyzes HTTP headers, static HTML, and script tag sources — it does not execute JavaScript. Technologies that only appear after JS execution (some single-page apps, lazy-loaded analytics) may not be detected. However, most technologies leave fingerprints in static signals: script source URLs, HTTP headers (X-Powered-By, Server), and meta tags. In practice, this covers the majority of commercially relevant tools.