WebDevCalc

Robots.txt Explained: How to Control Search Engine Crawling

The robots.txt file sits at your site root and tells search engines which pages to crawl and which to ignore. A misconfigured robots.txt can accidentally de-index your entire site.

Basic Syntax

User-agent: *
Disallow: /admin/
Disallow: /private/
Sitemap: https://yoursite.com/sitemap.xml

User-agent: * means these rules apply to all crawlers. You can target specific ones (Googlebot, Bingbot) with specific rules.

Disallow: tells the crawler not to visit that path. An empty Disallow (or omitting it) means everything is allowed.

Common Mistakes

Generate Your Robots.txt →
robots.txt is not a security mechanism. It tells well-behaved bots what NOT to crawl. Malicious bots ignore it. Never use it to hide sensitive data.

Best Practices

  1. Always include a Sitemap: directive pointing to your sitemap.xml
  2. Disallow admin, login, and dashboard paths
  3. Disallow duplicate content paths (print versions, sorted listings)
  4. Never block CSS, JavaScript, or images that are needed for rendering
Related: Tabletop gaming guides on TabletopCalc — because knowing which paths to allow applies to both websites and board game strategy.

Bottom Line

Every site needs a robots.txt. Keep it simple: allow everything by default, block admin/private paths, include your sitemap URL. Use our generator to create it correctly.