Robots.txt Explained: How to Control Search Engine Crawling
The robots.txt file sits at your site root and tells search engines which pages to crawl and which to ignore. A misconfigured robots.txt can accidentally de-index your entire site.
Basic Syntax
User-agent: *
Disallow: /admin/
Disallow: /private/
Sitemap: https://yoursite.com/sitemap.xml
User-agent: * means these rules apply to all crawlers. You can target specific ones (Googlebot, Bingbot) with specific rules.
Disallow: tells the crawler not to visit that path. An empty Disallow (or omitting it) means everything is allowed.
Common Mistakes
Disallow: /blocks the entire site. Only do this intentionally during development.- Blocking CSS/JS: If you block Googlebot from CSS or JS, Google can't render your page properly for ranking.
- Using robots.txt for security: It's a polite request, not a firewall. Anyone can ignore it. Use authentication, not robots.txt, to protect sensitive data.
robots.txt is not a security mechanism. It tells well-behaved bots what NOT to crawl. Malicious bots ignore it. Never use it to hide sensitive data.
Best Practices
- Always include a
Sitemap:directive pointing to your sitemap.xml - Disallow admin, login, and dashboard paths
- Disallow duplicate content paths (print versions, sorted listings)
- Never block CSS, JavaScript, or images that are needed for rendering
Related: Tabletop gaming guides on TabletopCalc — because knowing which paths to allow applies to both websites and board game strategy.
Bottom Line
Every site needs a robots.txt. Keep it simple: allow everything by default, block admin/private paths, include your sitemap URL. Use our generator to create it correctly.