Python Guide

Detect Website Tech Stacks in Python with StackPeek API

Published March 29, 2026 · 13 min read

Python is the go-to language for web scraping, data pipelines, automation scripts, and backend APIs. If you need to detect what technologies a website is running — its framework, CMS, analytics tools, CDN, and hosting provider — you want an API that returns clean JSON you can work with using standard Python libraries. No headless browsers, no Selenium dependencies, no JavaScript rendering.

The StackPeek API gives you exactly that: send a URL, get back structured JSON listing every detected technology with its category, confidence score, and version when detectable. It works with requests, aiohttp, httpx, or any HTTP library in the Python ecosystem. One GET request, one JSON response, zero complexity.

This guide covers production-ready Python code for every common pattern: basic detection with requests, async scanning with aiohttp and asyncio, a Flask route that wraps StackPeek as a caching proxy, a Django management command for batch scanning, a polished CLI tool with Click, and high-throughput concurrent scanning with asyncio.Semaphore. Every example runs against the live API with no API key required on the free tier.

Why Python Developers Need Programmatic Tech Stack Detection

Python dominates in automation, data engineering, and rapid prototyping. When you are building a competitive intelligence tool, a lead enrichment pipeline, a security scanner, or a web scraping operation, knowing what technologies a target website uses is critical metadata that shapes your entire workflow.

Common scenarios where Python developers reach for tech stack detection:

How the StackPeek API Works

The API is a single GET endpoint. You pass the target URL as a query parameter and receive a JSON response containing every detected technology with its category, confidence score, and version (when detectable).

GET https://us-central1-todd-agent-prod.cloudfunctions.net/stackpeekApi/api/v1/detect?url=https://stripe.com

{
  "url": "https://stripe.com",
  "technologies": [
    {
      "name": "React",
      "category": "JavaScript Framework",
      "confidence": 95,
      "version": "18.2.0",
      "website": "https://reactjs.org"
    },
    {
      "name": "Next.js",
      "category": "Web Framework",
      "confidence": 90,
      "version": "13.4",
      "website": "https://nextjs.org"
    }
  ],
  "scanTime": 1240
}

The free tier gives you 100 scans per day with no API key required. Paid plans start at $9/month for 5,000 scans. The response is standard JSON that Python's json module or requests' built-in .json() method handles natively — no custom parsing, no XML, no pagination tokens.

Basic Detection with Requests

The requests library is the most popular HTTP client in Python. Here is the simplest possible call to the StackPeek API — three lines of code that return structured technology data:

import requests

STACKPEEK_URL = (
    "https://us-central1-todd-agent-prod.cloudfunctions.net"
    "/stackpeekApi/api/v1/detect"
)


def detect_tech_stack(target_url: str) -> dict:
    """Detect the technology stack of a given URL."""
    response = requests.get(
        STACKPEEK_URL,
        params={"url": target_url},
        timeout=30,
    )
    response.raise_for_status()
    return response.json()


if __name__ == "__main__":
    result = detect_tech_stack("https://stripe.com")

    print(f"Detected {len(result['technologies'])} technologies on {result['url']}")
    print(f"Scan completed in {result.get('scanTime', 'N/A')}ms\n")

    for tech in result["technologies"]:
        version = tech.get("version", "-")
        print(f"  {tech['name']:<20} {tech['category']:<20} v{version:<10} {tech['confidence']}%")

Key details: requests.get() handles URL encoding of query parameters automatically when you use the params dictionary — no manual percent-encoding needed. The raise_for_status() call raises an HTTPError for non-2xx responses. The .json() method deserializes the response body into a Python dictionary using the standard json module. Always set a timeout to prevent your script from hanging indefinitely on slow responses.

Install the dependency:

pip install requests

Async Scanning with aiohttp and asyncio

When you need to scan multiple websites, sequential requests calls are slow because each one blocks until the response arrives. Python's asyncio with aiohttp lets you fire off multiple requests concurrently and process responses as they arrive. This is essential for batch scanning jobs where you have hundreds or thousands of URLs to check.

import asyncio
import aiohttp


STACKPEEK_URL = (
    "https://us-central1-todd-agent-prod.cloudfunctions.net"
    "/stackpeekApi/api/v1/detect"
)


async def detect_async(
    session: aiohttp.ClientSession,
    target_url: str,
) -> dict:
    """Detect tech stack asynchronously using aiohttp."""
    async with session.get(
        STACKPEEK_URL,
        params={"url": target_url},
        timeout=aiohttp.ClientTimeout(total=30),
    ) as response:
        response.raise_for_status()
        return await response.json()


async def scan_multiple(urls: list[str], max_concurrency: int = 5) -> list[dict]:
    """Scan multiple URLs concurrently with a concurrency limit."""
    semaphore = asyncio.Semaphore(max_concurrency)
    results = []

    async def scan_one(url: str) -> dict:
        async with semaphore:
            try:
                data = await detect_async(session, url)
                return {"url": url, "success": True, "data": data}
            except Exception as e:
                return {"url": url, "success": False, "error": str(e)}

    async with aiohttp.ClientSession() as session:
        tasks = [asyncio.create_task(scan_one(url)) for url in urls]
        results = await asyncio.gather(*tasks)

    return results


if __name__ == "__main__":
    domains = [
        "https://stripe.com",
        "https://linear.app",
        "https://vercel.com",
        "https://notion.so",
        "https://figma.com",
        "https://github.com",
        "https://shopify.com",
        "https://slack.com",
    ]

    results = asyncio.run(scan_multiple(domains, max_concurrency=5))

    for r in results:
        if r["success"]:
            techs = [t["name"] for t in r["data"]["technologies"][:5]]
            extra = len(r["data"]["technologies"]) - 5
            suffix = f" (+{extra} more)" if extra > 0 else ""
            print(f"  {r['url']:<30} {', '.join(techs)}{suffix}")
        else:
            print(f"  {r['url']:<30} ERROR: {r['error']}")

The asyncio.Semaphore is the key concurrency primitive. Each task acquires the semaphore before making its HTTP request. With max_concurrency set to 5, at most 5 requests are in flight at any time. The remaining tasks wait asynchronously — they do not block a thread or spin-loop. When a request completes and the semaphore is released, the next waiting task proceeds immediately.

Install the dependency:

pip install aiohttp

For most batch scanning jobs, aiohttp with a semaphore delivers 5-10x throughput improvements over sequential requests calls. Scanning 100 domains with a concurrency limit of 10 takes roughly the time of 10 sequential scans.

Flask Integration with Caching

If you are building a web application with Flask, you can expose StackPeek through your own API endpoint with caching to avoid redundant API calls. This is useful when multiple frontend components need tech stack data for the same domain within a short time window.

import time
import requests
from flask import Flask, request, jsonify

app = Flask(__name__)

STACKPEEK_URL = (
    "https://us-central1-todd-agent-prod.cloudfunctions.net"
    "/stackpeekApi/api/v1/detect"
)

# Simple in-memory cache with TTL
_cache: dict[str, tuple[dict, float]] = {}
CACHE_TTL = 86400  # 24 hours


def get_cached_or_fetch(target_url: str) -> dict:
    """Return cached result or fetch from StackPeek API."""
    now = time.time()

    if target_url in _cache:
        data, expires_at = _cache[target_url]
        if now < expires_at:
            return data

    response = requests.get(
        STACKPEEK_URL,
        params={"url": target_url},
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()

    _cache[target_url] = (data, now + CACHE_TTL)
    return data


@app.route("/api/detect")
def detect():
    """Proxy endpoint for tech stack detection."""
    target_url = request.args.get("url")
    if not target_url:
        return jsonify({"error": "Missing 'url' query parameter"}), 400

    if not target_url.startswith(("http://", "https://")):
        return jsonify({"error": "URL must start with http:// or https://"}), 400

    try:
        data = get_cached_or_fetch(target_url)
        return jsonify(data)
    except requests.HTTPError as e:
        return jsonify({"error": f"StackPeek API error: {e}"}), 502
    except requests.Timeout:
        return jsonify({"error": "StackPeek API timed out"}), 504


if __name__ == "__main__":
    app.run(debug=True, port=5000)

This pattern gives you a caching proxy in front of StackPeek. On cache hits, responses return instantly with zero API calls consumed. The 24-hour TTL matches the API's data freshness — a website's tech stack rarely changes more than once a day. For production deployments, swap the dictionary cache for Redis or Flask-Caching with a Memcached backend to persist the cache across process restarts.

Install the dependencies:

pip install flask requests

Django Management Command

Django management commands are the standard way to run one-off scripts and batch jobs in a Django project. Here is a management command that scans a list of URLs from a file and stores the results in your database or outputs them as JSON:

import json
import sys
import requests
from django.core.management.base import BaseCommand, CommandError


STACKPEEK_URL = (
    "https://us-central1-todd-agent-prod.cloudfunctions.net"
    "/stackpeekApi/api/v1/detect"
)


class Command(BaseCommand):
    help = "Detect the technology stack of one or more websites using StackPeek"

    def add_arguments(self, parser):
        parser.add_argument(
            "urls",
            nargs="*",
            help="URLs to scan (e.g., https://stripe.com)",
        )
        parser.add_argument(
            "--file",
            type=str,
            help="Path to a file containing URLs, one per line",
        )
        parser.add_argument(
            "--json",
            action="store_true",
            help="Output results as JSON",
        )
        parser.add_argument(
            "--min-confidence",
            type=float,
            default=0.0,
            help="Only show detections above this confidence threshold",
        )

    def handle(self, *args, **options):
        urls = list(options["urls"])

        if options["file"]:
            with open(options["file"]) as f:
                urls.extend(line.strip() for line in f if line.strip())

        if not urls:
            raise CommandError("Provide at least one URL or use --file")

        results = []
        for url in urls:
            self.stderr.write(f"Scanning {url}...")
            try:
                resp = requests.get(
                    STACKPEEK_URL,
                    params={"url": url},
                    timeout=30,
                )
                resp.raise_for_status()
                data = resp.json()

                techs = [
                    t for t in data.get("technologies", [])
                    if t.get("confidence", 0) >= options["min_confidence"]
                ]
                data["technologies"] = techs
                results.append({"url": url, "success": True, "data": data})

                self.stderr.write(
                    self.style.SUCCESS(f"  Found {len(techs)} technologies")
                )
            except Exception as e:
                results.append({"url": url, "success": False, "error": str(e)})
                self.stderr.write(self.style.ERROR(f"  Failed: {e}"))

        if options["json"]:
            self.stdout.write(json.dumps(results, indent=2))
        else:
            for r in results:
                if r["success"]:
                    self.stdout.write(f"\n{r['url']}:")
                    for t in r["data"]["technologies"]:
                        ver = t.get("version", "-")
                        self.stdout.write(
                            f"  {t['name']:<22} {t['category']:<20} {ver:<10} {t['confidence']}%"
                        )

Save this file as your_app/management/commands/stackpeek_scan.py in your Django project. Usage from the command line:

# Scan a single URL
python manage.py stackpeek_scan https://stripe.com

# Scan from a file
python manage.py stackpeek_scan --file domains.txt

# JSON output for piping to jq
python manage.py stackpeek_scan https://vercel.com --json | jq '.[]'

# Only high-confidence detections
python manage.py stackpeek_scan https://github.com --min-confidence 80

Django management commands handle argument parsing, help text, and colored terminal output automatically. The --file flag lets your team scan entire domain lists without modifying the script. The --json flag makes it composable with other Unix tools.

CLI Tool with Click

The Click library is the most popular choice for building Python CLI tools. It provides decorators for defining commands, arguments, and options with automatic help text, colored output, and progress bars. Here is a complete CLI tool for tech stack detection:

import json
import sys
import click
import requests


STACKPEEK_URL = (
    "https://us-central1-todd-agent-prod.cloudfunctions.net"
    "/stackpeekApi/api/v1/detect"
)


@click.command()
@click.argument("url")
@click.option("--json-output", "-j", is_flag=True, help="Output as JSON")
@click.option(
    "--min-confidence",
    "-c",
    default=0.0,
    type=float,
    help="Minimum confidence threshold (0-100)",
)
@click.option("--timeout", "-t", default=30, type=int, help="Request timeout in seconds")
def detect(url: str, json_output: bool, min_confidence: float, timeout: int):
    """Detect the technology stack of any website.

    Example: stackpeek https://stripe.com
    """
    if not url.startswith(("http://", "https://")):
        url = f"https://{url}"

    try:
        with click.progressbar(length=1, label="Scanning") as bar:
            response = requests.get(
                STACKPEEK_URL,
                params={"url": url},
                timeout=timeout,
            )
            response.raise_for_status()
            data = response.json()
            bar.update(1)

    except requests.HTTPError as e:
        click.secho(f"API error: {e}", fg="red", err=True)
        sys.exit(1)
    except requests.Timeout:
        click.secho(f"Request timed out after {timeout}s", fg="red", err=True)
        sys.exit(1)

    techs = [
        t for t in data.get("technologies", [])
        if t.get("confidence", 0) >= min_confidence
    ]

    if json_output:
        click.echo(json.dumps(data, indent=2))
        return

    click.secho(f"\nTechnologies on {data['url']}:\n", fg="cyan", bold=True)
    click.echo(f"  {'NAME':<22} {'CATEGORY':<20} {'VERSION':<10} CONFIDENCE")
    click.echo(f"  {'-' * 66}")

    for tech in techs:
        ver = tech.get("version", "-")
        conf = tech.get("confidence", 0)
        color = "green" if conf >= 80 else "yellow" if conf >= 50 else "white"
        click.echo(
            f"  {tech['name']:<22} {tech['category']:<20} {ver:<10} "
            + click.style(f"{conf}%", fg=color)
        )

    click.echo(f"\n  {len(techs)} technologies detected")
    if scan_time := data.get("scanTime"):
        click.echo(f"  Scan completed in {scan_time}ms")


if __name__ == "__main__":
    detect()

Usage from the command line:

# Basic scan (auto-adds https://)
$ stackpeek stripe.com

# JSON output for piping to jq
$ stackpeek vercel.com --json-output | jq '.technologies[].name'

# Only high-confidence detections
$ stackpeek github.com --min-confidence 80

# Custom timeout
$ stackpeek slow-site.com --timeout 60

Install the dependencies:

pip install click requests

Click automatically generates --help output from the docstring and option descriptions. The progress bar provides visual feedback during the API call. Color-coded confidence levels make it easy to distinguish strong detections from speculative ones at a glance. Package this as a standalone tool with pipx install or distribute it as a PyPI package for your team.

Batch Scanning with asyncio and Result Aggregation

For large-scale scanning jobs — monitoring thousands of competitor websites, enriching a CRM database, or auditing an enterprise domain portfolio — you need a robust batch scanner that handles errors gracefully, respects rate limits, and produces structured output. Here is a production-ready batch scanner:

import asyncio
import csv
import json
import sys
import time
from dataclasses import dataclass, asdict
from pathlib import Path

import aiohttp


STACKPEEK_URL = (
    "https://us-central1-todd-agent-prod.cloudfunctions.net"
    "/stackpeekApi/api/v1/detect"
)


@dataclass
class ScanResult:
    url: str
    success: bool
    tech_count: int = 0
    technologies: list = None
    scan_time_ms: int = 0
    error: str = ""

    def __post_init__(self):
        if self.technologies is None:
            self.technologies = []


async def scan_with_retry(
    session: aiohttp.ClientSession,
    url: str,
    semaphore: asyncio.Semaphore,
    max_retries: int = 3,
) -> ScanResult:
    """Scan a single URL with retry logic and exponential backoff."""
    async with semaphore:
        for attempt in range(max_retries):
            try:
                async with session.get(
                    STACKPEEK_URL,
                    params={"url": url},
                    timeout=aiohttp.ClientTimeout(total=30),
                ) as response:
                    if response.status == 429:
                        wait = 2 ** attempt * 5
                        await asyncio.sleep(wait)
                        continue

                    response.raise_for_status()
                    data = await response.json()

                    return ScanResult(
                        url=url,
                        success=True,
                        tech_count=len(data.get("technologies", [])),
                        technologies=data.get("technologies", []),
                        scan_time_ms=data.get("scanTime", 0),
                    )

            except asyncio.TimeoutError:
                if attempt == max_retries - 1:
                    return ScanResult(url=url, success=False, error="Timed out")
            except Exception as e:
                if attempt == max_retries - 1:
                    return ScanResult(url=url, success=False, error=str(e))

            await asyncio.sleep(2 ** attempt)

    return ScanResult(url=url, success=False, error="Max retries exceeded")


async def batch_scan(
    urls: list[str],
    max_concurrency: int = 5,
) -> list[ScanResult]:
    """Scan a list of URLs with bounded concurrency."""
    semaphore = asyncio.Semaphore(max_concurrency)

    async with aiohttp.ClientSession() as session:
        tasks = [
            scan_with_retry(session, url, semaphore)
            for url in urls
        ]
        return await asyncio.gather(*tasks)


def print_summary(results: list[ScanResult]):
    """Print a summary of batch scan results."""
    successful = [r for r in results if r.success]
    failed = [r for r in results if not r.success]

    print(f"\nScanned {len(results)} domains")
    print(f"  Successful: {len(successful)}")
    print(f"  Failed:     {len(failed)}")

    if successful:
        total_techs = sum(r.tech_count for r in successful)
        avg_techs = total_techs / len(successful)
        print(f"  Avg technologies per site: {avg_techs:.1f}")

    # Technology frequency analysis
    tech_counts: dict[str, int] = {}
    for r in successful:
        for t in r.technologies:
            name = t["name"]
            tech_counts[name] = tech_counts.get(name, 0) + 1

    if tech_counts:
        print("\n  Most common technologies:")
        for name, count in sorted(
            tech_counts.items(), key=lambda x: x[1], reverse=True
        )[:10]:
            pct = count / len(successful) * 100
            print(f"    {name:<25} {count:>3} sites ({pct:.0f}%)")


if __name__ == "__main__":
    urls = [
        "https://stripe.com",
        "https://linear.app",
        "https://vercel.com",
        "https://notion.so",
        "https://figma.com",
        "https://github.com",
        "https://shopify.com",
        "https://slack.com",
        "https://gitlab.com",
        "https://netlify.com",
    ]

    start = time.time()
    results = asyncio.run(batch_scan(urls, max_concurrency=5))
    elapsed = time.time() - start

    print(f"Batch scan completed in {elapsed:.1f}s")
    print_summary(results)

This scanner includes exponential backoff for rate limiting (status 429), configurable retry logic, and a summary report that shows technology frequency across all scanned domains. The dataclass makes results easy to serialize to JSON or CSV for downstream analysis. For very large jobs, pipe the output into pandas or write results to a database as they arrive.

Pricing: StackPeek vs Wappalyzer

If you have looked at Wappalyzer's API pricing for a Python project, here is how the two compare:

Feature StackPeek Wappalyzer
Free tier 100 scans/day 50 scans/month
Paid plan (entry) $9/month $250/month
Scans on entry plan 5,000/month 50,000/month
Pro plan $29/month (25,000 scans) $450/month (100,000 scans)
Cost per scan (entry) $0.0018 $0.005
API key required No (free tier) Yes
Technologies detected 120+ 1,400+
Response format JSON (Python-native) JSON

StackPeek is 28x cheaper than Wappalyzer at the entry tier. For most Python projects — scripts, Flask APIs, Django apps, data pipelines, CLI tools — you need to detect the major frameworks, CMS platforms, and analytics tools. StackPeek covers these at a fraction of the cost. If you need to identify niche JavaScript libraries or obscure WordPress plugins, Wappalyzer's broader fingerprint database may justify the premium. But for 90% of use cases, StackPeek at $9/month vs Wappalyzer at $250/month is the obvious choice.

Use Cases for Python Developers

Here are the most common scenarios where Python developers use tech stack detection:

Start detecting tech stacks from Python today

100 free scans/day. No API key required. JSON works natively with requests and aiohttp.

Try StackPeek Free →

Getting Started in 60 Seconds

The fastest way to test StackPeek from Python requires just one dependency — requests:

  1. Run pip install requests
  2. Copy the basic detection function into a Python file
  3. Call detect_tech_stack("https://stripe.com") and print the result
  4. See detected technologies in your terminal in under 5 seconds

From there, add aiohttp for async batch scanning, wrap the client in a Flask route for a caching proxy, build a Django management command for team-wide usage, or create a polished CLI with Click. The API is the same regardless of how you call it — a single GET request with a URL query parameter that returns standard JSON.

For production deployments, add a caching layer (Redis, dictionary with TTL, or Flask-Caching) to avoid redundant API calls for the same domain. Add retry logic with exponential backoff for transient network failures. Use asyncio.Semaphore to control concurrency in batch jobs. Every pattern in this guide is copy-paste ready and runs on Python 3.10 or later.

Part of the Peek Suite

StackPeek is one tool in a growing suite of developer APIs. If you find it useful, check out the rest: