Advertising Week Europe
Meet us in London24th - 26th March, 2026Schedule a meeting
Back to Blog
Technical

Detecting AI Traffic: A Technical Guide

Panxo Team
January 16, 2025
8 min read

Technical guide to identifying AI crawlers, bots, and browser traffic. Learn about User-Agent patterns, referrer analysis, and why advanced detection requires multi-signal approaches.

Understanding AI Traffic Detection

AI traffic comes in two distinct forms, each requiring different detection approaches:

  1. AI Crawlers/Bots: Used by AI companies to index content for training or search. These have identifiable User-Agent strings.
  2. AI Browser Traffic: Human users browsing through AI-powered browsers (ChatGPT Atlas, Perplexity Comet, etc.). Detection is significantly more complex.

Part 1: AI Crawler Detection (User-Agent Based)

AI companies operate web crawlers with documented User-Agent strings. These are straightforward to detect:

OpenAI Crawlers

OpenAI operates several bots with distinct purposes:

// GPTBot - Used for training data collection
User-Agent: GPTBot/1.0 (+https://openai.com/gptbot)

// ChatGPT-User - When ChatGPT browses on behalf of users
User-Agent: ChatGPT-User/1.0 (+https://openai.com/bot)

// OAI-SearchBot - For OpenAI search features
User-Agent: OAI-SearchBot/1.0 (+https://openai.com/searchbot)

Detection logic:

function isOpenAIBot(userAgent) {
  return /GPTBot|ChatGPT-User|OAI-SearchBot/i.test(userAgent);
}

Anthropic Crawlers

// ClaudeBot - Anthropic's web crawler
User-Agent: ClaudeBot/1.0 (+https://anthropic.com/claudebot)

// Claude-Web - Claude's browsing capability
User-Agent: Claude-Web/1.0

Perplexity Crawler

// PerplexityBot - For Perplexity's search index
User-Agent: PerplexityBot/1.0 (+https://perplexity.ai/bot)

Other AI Crawlers

// Google Extended (for Bard/Gemini training)
User-Agent: Google-Extended

// CCBot (Common Crawl, used by many AI companies)
User-Agent: CCBot/2.0

// Applebot-Extended (for Apple Intelligence)
User-Agent: Applebot-Extended

Complete Bot Detection Function

function detectAICrawler(userAgent) {
  const patterns = {
    openai: /GPTBot|ChatGPT-User|OAI-SearchBot/i,
    anthropic: /ClaudeBot|Claude-Web/i,
    perplexity: /PerplexityBot/i,
    google: /Google-Extended/i,
    apple: /Applebot-Extended/i,
    commoncrawl: /CCBot/i,
  };

  for (const [source, pattern] of Object.entries(patterns)) {
    if (pattern.test(userAgent)) {
      return {
        isAICrawler: true,
        source: source,
        confidence: 0.99
      };
    }
  }

  return { isAICrawler: false, source: null, confidence: 0 };
}

Part 2: AI Browser Detection (The Hard Problem)

Important: AI Browsers Don't Announce Themselves

Unlike AI crawlers, AI-powered browsers like ChatGPT Atlas and Perplexity Comet do NOT use distinctive User-Agent strings. They typically appear as standard Chrome or Safari browsers, making User-Agent detection ineffective.

This is by design. AI browsers want to provide a seamless browsing experience, and websites often block or serve different content to non-standard User-Agents.

Why User-Agent Detection Fails for AI Browsers

  • Standard User-Agents: Atlas uses Chrome's User-Agent, Comet uses Safari's
  • Privacy considerations: Identifying as an "AI browser" could lead to discrimination
  • Web compatibility: Non-standard User-Agents break many websites

Multi-Signal Detection Approach

Effective AI browser detection requires analyzing multiple signals simultaneously:

1. Referrer Analysis

When users click links from AI chat interfaces, the referrer header may indicate the source:

// Potential referrer patterns
Referer: https://chatgpt.com/...
Referer: https://chat.openai.com/...
Referer: https://perplexity.ai/...
Referer: https://claude.ai/...

Limitation: Direct URL entries, bookmarks, or privacy settings can strip referrers.

2. Behavioral Analysis

AI browser users exhibit distinct behavioral patterns:

  • Navigation patterns (click paths, scroll behavior)
  • Session timing characteristics
  • Interaction with page elements
  • Mouse movement patterns

3. Context Signals

Various browser and network signals can indicate AI-assisted browsing:

  • Client hints and capabilities
  • JavaScript API availability
  • Timing characteristics
  • Network fingerprinting

The Detection Challenge

Building reliable AI browser detection is a significant technical challenge:

  • No single reliable signal: Each method has false positives and negatives
  • Constantly evolving: AI browsers update frequently, changing detection vectors
  • Privacy vs Detection: More invasive techniques face legal/ethical issues
  • Spoofing risk: Bad actors can fake AI traffic signals

Accuracy Requirements

For monetization purposes, detection needs to be highly accurate. False positives mean serving wrong ads to regular users. False negatives mean missing premium CPMs on AI traffic. Both cost money.

Panxo's Approach: Patent-Pending Detection

At Panxo, we've developed proprietary detection technology specifically designed for AI browser identification:

Multi-Layer Detection Engine

Patent Pending: US 63/930,757

Our proprietary technology combines multiple detection vectors with machine learning to achieve industry-leading accuracy:

  • 96%+ detection accuracy for ChatGPT Atlas traffic
  • <10ms latency - detection happens at the edge
  • Continuous learning - adapts as AI browsers evolve
  • Privacy-compliant - no personal data collection

Specific detection methods are proprietary and not disclosed.

What You Can Detect Yourself

For basic AI traffic awareness, you can implement crawler detection:

// Basic AI crawler detection (server-side)
function handleRequest(request) {
  const userAgent = request.headers['user-agent'];
  
  // Detect known AI crawlers
  if (/GPTBot|ClaudeBot|PerplexityBot/i.test(userAgent)) {
    // This is an AI crawler - serve or block accordingly
    return handleAICrawler(request);
  }
  
  // For AI browser detection, you need Panxo
  // User-Agent analysis alone won't work
  return handleNormalRequest(request);
}

What Requires Panxo

  • ChatGPT Atlas detection: No User-Agent indicator, requires advanced signals
  • Perplexity Comet detection: Standard Safari User-Agent, needs behavioral analysis
  • Intent scoring: Understanding user's purpose from AI context
  • Real-time monetization: Serving optimized ads in <50ms

robots.txt for AI Crawlers

While you can't detect AI browsers via User-Agent, you CAN control AI crawlers:

# Block all AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Allow search but block training
User-agent: ChatGPT-User
Allow: /

Summary: Detection Tiers

Traffic Type Detection Method DIY Feasibility
AI Crawlers (GPTBot, etc.) User-Agent matching Easy
ChatGPT referral traffic Referrer analysis Partial
ChatGPT Atlas browser Multi-signal analysis Requires Panxo
Perplexity Comet browser Multi-signal analysis Requires Panxo

Get Accurate AI Detection

Stop guessing about your AI traffic. Panxo's patent-pending detection technology identifies AI browser visitors with 96%+ accuracy:

<script async 
        src="https://cdn.panxo.ai/o/{your-unique-hash}">
</script>

Panxo provides:

  • Accurate ChatGPT Atlas and AI browser detection
  • Real-time intent scoring
  • Dedicated AI traffic analytics
  • Premium monetization ($35-50 CPMs)

Start monetizing your AI traffic today

Join publishers earning premium CPMs from ChatGPT Atlas and other AI browsers.

Get Started Free

Related Articles

View all articles →