The robots exclusion protocol (robots.txt) tells search engine crawlers which pages of your site should and should not be indexed. It lives at your domain root (e.g., example.com/robots.txt) and is one of the first files bots check before crawling.
User-agent: specifies which crawler the rule applies to (* = all)
Disallow: blocks crawlers from accessing specified paths
Allow: permits access to specific paths (overrides Disallow)
Sitemap: points to your XML sitemap location
Crawl-delay: sets minimum seconds between requests (not all bots honor this)