How Websites Can Detect Bots and Automation

From Resist Together Wiki

How Google Detects Abuse[edit | edit source]

๐Ÿง  Behavioral & Content-Based Signals[edit | edit source]

  • Timing patterns across users: Do multiple accounts submit at the same timestamp or in regular intervals (e.g. every 10 seconds)?
  • Clickstream analysis: Real users tend to click around, scroll, hover, change focus windows, etc. Bots donโ€™t simulate this well.
  • Device & OS fingerprints: Real users have unique browser fingerprints (screen size, OS, fonts, GPU, etc). Bots often reuse the same ones.
  • Review sentiment & emotional tone: Abusive reviews may lack emotional nuance or show suspicious sentiment spikes (overly positive or negative).
  • Review diversity: Genuine users leave different types of reviews (long, short, neutral), while fake reviews often follow a pattern.
  • Interaction diversity: Real users donโ€™t just reviewโ€”they upload photos, click on maps, check hours, etc. Bots may skip this.
  • Grammar/style matching: Matching writing style across different accounts can indicate a single author (stylometry).

๐ŸŒ Network & Infrastructure Signals[edit | edit source]

  • VPN/proxy detection: Known VPN endpoints or TOR nodes are flagged.
  • Residential vs datacenter IPs: Bots often come from cloud IPs (AWS, GCP, Hetzner).
  • IP reputation databases: Google tracks previously abusive IPs or IPs linked to botnets.
  • ASN (Autonomous System Number) data: Certain networks are more likely to host bots or VPN exit nodes.
  • DNS request patterns: Mass automated bots may generate unnatural DNS behavior.

๐Ÿ‘ค Account & Identity Signals[edit | edit source]

  • Email/phone verification fraud: Temporary/disposable emails or SMS services are flagged.
  • Cross-account similarities: Similar usernames, passwords (if leaked), or recovery emails across fake accounts.
  • Linked data anomalies: Using the same device, cookies, or recovery info across many accounts.
  • Lack of profile enrichment: No profile photo, no search history, no YouTube activity, no app installs = sus.
  • Account velocity: How fast the account goes from creation to activity. Real users usually ramp up slowly.

๐Ÿ› ๏ธ Automation Detection[edit | edit source]

  • JavaScript behavior hooks: Google runs hidden JS challenges to test for bot behaviors (e.g., navigator.webdriver, hidden canvases).
  • Sensor data: Real phones emit gyroscope, accelerometer, and orientation data. Bots/VMs lack these.
  • CPU/GPU fingerprinting: Google can test WebGL performance to spot emulators or VMs.
  • Hidden honeypot fields: Bots fill out form fields invisible to humans (CSS-hidden), which real users ignore.
  • TLS/SSL fingerprinting: The way a bot negotiates HTTPS (cipher suite order, JA3 fingerprint) can be a giveaway.

๐Ÿ“Š Anomaly & Graph-Based Detection[edit | edit source]

  • Graph analysis of account behavior: Google may analyze connections between users, businesses, IPs, and devices.
  • Clustering analysis: Groups of accounts with similar patterns can be flagged even if individually subtle.
  • Temporal anomaly detection: Google tracks seasonal patterns and flags reviews outside expected rhythms (e.g. 50 reviews on a gas station at 3AM).
  • Geo-spatial correlation: Are people reviewing a Thai restaurant in Bangkok and a New York pizzeria within 10 minutes?

๐Ÿงฌ Advanced Techniques[edit | edit source]

  • ML models trained on past abuse: Google likely trains machine learning classifiers on labeled abusive vs. normal behavior.
  • Honeypot listings: Fake businesses or places added to detect bots or spammers (if someone reviews it, they're flagged).
  • Decoy reviews: Certain listings might contain hidden markers to detect LLM-generated content or copy-paste patterns.
  • Noise injection / adversarial review tests: Google might inject minor changes to see how bots react (e.g. reCAPTCHA triggers, field reshuffling).

๐Ÿงฉ Optional/Advanced Detection Avenues[edit | edit source]

  • Browser entropy testing: Measuring performance or timing inconsistencies that betray automation.
  • Side-channel detection: Power usage, timing attacks, or keyboard latency patterns (for high-security use cases).
  • Captcha behavior metrics: Not just if you solve a CAPTCHA, but how you solve it (mouse movement during drag, solve time, etc.).

Teaching Point[edit | edit source]

Googleโ€™s abuse detection is like a massive puzzle, combining user behavior, device fingerprinting, ML models, and traffic analysis. It's not just 'donโ€™t use the same text' โ€” they monitor everything from how your mouse moves to what network you're on, and whether your review matches real-world behavior patterns.