Question 1

What happens if I have no robots.txt at all?

Accepted Answer

All crawlers will assume they can crawl everything that is publicly linked. For most sites this is the right default. You only need a robots.txt if you want to block specific paths from crawling, request crawl-delay throttling, or point crawlers at your sitemap.

Question 2

Does disallow remove pages from Google's index?

Accepted Answer

No. Disallow only blocks crawling; pages already in the index stay there unless you also add a noindex meta tag (which Google cannot see if you blocked crawling). For removing pages, use noindex meta tags or the URL removal tool in Search Console.

Question 3

Can I block AI training crawlers like GPTBot?

Accepted Answer

Yes. Add a separate User-agent: GPTBot section with Disallow: / to block OpenAI from crawling your content for training. ClaudeBot, Google-Extended (Bard training), Bytespider, and other AI crawlers have their own user-agent names and you can target each individually.

Question 4

Is the file case sensitive?

Accepted Answer

User-agent names and the directives Allow and Disallow are case-insensitive in practice, but most operators write them in title case for readability. Path patterns are case-sensitive: /About is different from /about.

Question 5

Should I list my sitemap here?

Accepted Answer

Yes, if your sitemap is not at the default /sitemap.xml location, list it explicitly with the Sitemap: directive. Multiple Sitemap: lines are allowed for sites with multiple sitemap files.

robots.txt Builder

Frequently asked questions

Related tools