seo11 min read

How to Create a Robots.txt File (Complete Guide)

Learn how to create, configure, and test your robots.txt file. This complete guide covers syntax, common directives, best practices, and mistakes to avoid for better SEO.

By ToolScout Team|

The robots.txt file is one of the most important—yet often overlooked—files on your website. This simple text file tells search engine crawlers which pages they can and cannot access. Get it wrong, and you could accidentally block search engines from indexing your entire site.

In this complete guide, we will walk you through everything you need to know about robots.txt: what it is, how to create one, common directives and patterns, best practices, and the tools that make configuration easy.

What Is a Robots.txt File?

The robots.txt file is a plain text file that sits at the root of your website (e.g., https://example.com/robots.txt). It follows the Robots Exclusion Protocol, a standard that tells web crawlers which parts of your site they should not visit.

Key Points About Robots.txt

  • Location: Must be at the root of your domain
  • Format: Plain text file
  • Purpose: Guides crawler behavior
  • Access: Publicly visible to anyone

Important Limitations

Before we go further, understand what robots.txt cannot do:

  1. It is not a security measure: Robots.txt is advisory. Malicious bots may ignore it.
  2. It does not hide content: Anyone can view your robots.txt and see what you are blocking.
  3. It does not prevent indexing: If other sites link to blocked pages, they may still appear in search results (without content).

For truly private content, use proper authentication or password protection.

Why Robots.txt Matters for SEO

Despite its simplicity, robots.txt plays a crucial role in SEO:

Crawl Budget Optimization

Search engines have limited resources to crawl your site. Robots.txt helps you direct crawlers to your most important pages by blocking less important ones.

Prevent Duplicate Content

Block pages that create duplicate content issues, like print versions, filtered views, or parameter-heavy URLs.

Protect Sensitive Areas

Keep crawlers away from admin areas, staging environments, or internal tools (while remembering this is not actual security).

Control Server Load

Prevent crawlers from overwhelming your server by accessing resource-intensive pages.

Robots.txt Syntax

The robots.txt file uses a simple syntax. Let us break it down.

Basic Structure

User-agent: [crawler name]
Disallow: [path to block]
Allow: [path to allow]

User-Agent Directive

The User-agent specifies which crawler the rules apply to.

# All crawlers
User-agent: *

# Just Google
User-agent: Googlebot

# Just Bing
User-agent: Bingbot

Disallow Directive

The Disallow directive blocks access to specified paths.

# Block a specific page
Disallow: /private-page.html

# Block a directory
Disallow: /admin/

# Block all pages
Disallow: /

Allow Directive

The Allow directive permits access to paths that would otherwise be blocked. This is mainly used by Google.

# Block the directory but allow one file
User-agent: *
Disallow: /private/
Allow: /private/public-file.html

Sitemap Directive

The Sitemap directive tells crawlers where to find your sitemap.

Sitemap: https://example.com/sitemap.xml

Crawl-Delay Directive

The Crawl-delay directive asks crawlers to wait between requests. Note: Google ignores this directive.

User-agent: *
Crawl-delay: 10

Common Robots.txt Patterns

Let us look at real-world patterns for different scenarios.

Block All Crawlers

Use this for development or staging sites:

User-agent: *
Disallow: /

Warning: This will remove your site from search results. Use carefully.

Allow All Crawlers

The most permissive robots.txt:

User-agent: *
Disallow:

Or simply have an empty robots.txt file.

Block Specific Directories

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /temp/
Disallow: /cgi-bin/

Block Specific File Types

User-agent: *
Disallow: /*.pdf$
Disallow: /*.doc$
Disallow: /*.xls$

Block Query Parameters

User-agent: *
Disallow: /*?*

Block Specific Crawlers

# Block bad bots
User-agent: BadBot
Disallow: /

# Block specific scrapers
User-agent: AhrefsBot
Disallow: /

WordPress Robots.txt

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /license.txt

Sitemap: https://example.com/sitemap_index.xml

E-commerce Robots.txt

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /search/
Disallow: /*?s=*
Disallow: /*?orderby=*
Disallow: /*?filter_*

Sitemap: https://example.com/sitemap.xml

Complete Robots.txt Example

Here is a comprehensive example for a typical website:

# Robots.txt for example.com
# Last updated: April 2026

# Default rules for all crawlers
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Disallow: /search/
Disallow: /*?*print=*
Disallow: /*?*preview=*

# Allow Google to access CSS and JS for rendering
User-agent: Googlebot
Allow: /*.css$
Allow: /*.js$
Allow: /*.png$
Allow: /*.jpg$

# Slow down aggressive crawlers
User-agent: Bingbot
Crawl-delay: 5

# Block known bad bots
User-agent: MJ12bot
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Creating Your Robots.txt File

Method 1: Manual Creation

  1. Open a text editor (Notepad, VS Code, etc.)
  2. Add your directives
  3. Save as robots.txt (not robots.txt.txt)
  4. Upload to your website root

Method 2: Using RobotsTxtGen

RobotsTxtGen makes creating robots.txt files easy, even for beginners.

Why We Recommend RobotsTxtGen

  • Visual interface: No need to memorize syntax
  • Common patterns: Pre-built templates for common scenarios
  • Validation: Catches syntax errors before deployment
  • Best practices: Suggests improvements automatically
  • Export ready: Download the finished file

How to Use RobotsTxtGen

  1. Visit RobotsTxtGen
  2. Select your website type (blog, e-commerce, etc.)
  3. Choose directories to block
  4. Add your sitemap URL
  5. Download the generated file
  6. Upload to your server

Method 3: CMS-Specific Solutions

WordPress

  • Yoast SEO: Provides robots.txt editor
  • Rank Math: Built-in robots.txt management
  • Manual: Upload to public_html/ or WordPress root

Shopify

Edit through Settings > Files > robots.txt.liquid

Wix

Edit through SEO Settings > Advanced > robots.txt Editor

Testing Your Robots.txt

Before deploying changes, always test your robots.txt file.

Google Search Console

  1. Go to Google Search Console
  2. Navigate to Settings > robots.txt Tester (if available)
  3. Test specific URLs

Test Commands

Check if a specific URL is blocked:

# Test URL
https://example.com/admin/

Online Validators

Several online tools can validate your robots.txt:

  • Google Rich Results Test
  • Merkle Robots.txt Tester
  • RobotsTxtGen validator

Common Testing Scenarios

Test these scenarios before deploying:

URL TypeExpected Result
HomepageAllowed
Blog postsAllowed
Admin pagesBlocked
Search resultsBlocked
Product pagesAllowed
Cart/checkoutBlocked
CSS/JS filesAllowed
SitemapAllowed

Common Robots.txt Mistakes

Avoid these common mistakes that can hurt your SEO:

1. Blocking Your Entire Site

Mistake:

User-agent: *
Disallow: /

Impact: Your entire site disappears from search results.

When it is okay: Development or staging sites only.

2. Blocking CSS and JavaScript

Mistake:

User-agent: *
Disallow: /css/
Disallow: /js/

Impact: Google cannot render your pages properly, potentially hurting rankings.

Fix: Always allow access to CSS and JS files.

3. Blocking Important Images

Mistake:

User-agent: *
Disallow: /images/

Impact: Google Image Search traffic disappears.

Fix: Only block images you truly do not want indexed.

4. Case Sensitivity Confusion

Important: Robots.txt paths are case-sensitive.

# This only blocks /Admin/ not /admin/
Disallow: /Admin/

5. Trailing Slash Mistakes

# Blocks /admin/ but not /admin
Disallow: /admin/

# Blocks both /admin and /admin/
Disallow: /admin

6. Forgetting the Sitemap

Always include your sitemap location:

Sitemap: https://example.com/sitemap.xml

7. Using Robots.txt Instead of Noindex

Scenario: You want a page to not appear in search results.

Wrong approach: Block with robots.txt (page may still be indexed via links)

Right approach: Use noindex meta tag or X-Robots-Tag header.

Robots.txt vs Meta Robots vs X-Robots-Tag

Understanding when to use each:

MethodScopeUse Case
Robots.txtEntire directories/patternsBlock crawling of large sections
Meta RobotsIndividual pagesControl indexing of specific pages
X-Robots-TagHTTP headerPDF, images, non-HTML content

When to Use Each

Use robots.txt when:

  • Blocking entire directories
  • Saving crawl budget
  • Blocking non-HTML resources

Use meta robots when:

  • Controlling indexing of specific pages
  • You want content crawled but not indexed
  • Different robots need different instructions

Use X-Robots-Tag when:

  • Controlling PDF or image indexing
  • You cannot add meta tags (non-HTML files)
  • Server-level control is needed

Advanced Robots.txt Techniques

Pattern Matching

Robots.txt supports simple pattern matching:

# Block URLs containing "search"
Disallow: /*search*

# Block URLs ending in .pdf
Disallow: /*.pdf$

# Block URLs with specific parameters
Disallow: /*?sessionid=*

Pattern Matching Characters

CharacterMeaning
*Matches any sequence of characters
$Matches end of URL

Handling Multiple Sitemaps

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/news-sitemap.xml
Sitemap: https://example.com/image-sitemap.xml

Host Directive (Deprecated)

The Host directive was used to specify preferred domain but is now largely ignored. Use canonical URLs instead.

Monitoring and Maintenance

Regular Audits

Review your robots.txt quarterly to ensure it still matches your site structure.

Check for Issues

Use Google Search Console to monitor:

  • Crawl errors related to blocked resources
  • Important pages being blocked
  • Sitemap accessibility

Version Control

Consider keeping your robots.txt in version control to track changes over time.

Frequently Asked Questions

Where do I put the robots.txt file?

Always at the root of your domain: https://example.com/robots.txt

Can I have different robots.txt for subdomains?

Yes, each subdomain can have its own robots.txt:

  • https://example.com/robots.txt
  • https://blog.example.com/robots.txt

Does robots.txt affect page speed?

No, robots.txt is only read by crawlers, not by browsers loading your pages.

How long until changes take effect?

Crawlers cache robots.txt for up to 24 hours. Changes may not be immediate.

Should I block bad bots?

You can try, but bad bots often ignore robots.txt. Use server-level blocking for actual protection.

Can I password protect with robots.txt?

No. Robots.txt is advisory only and provides no actual access control.

Robots.txt for Different Platforms

Different website platforms have different ways of managing robots.txt. Here is how to handle common platforms:

Static Sites (HTML/Next.js/Gatsby)

For static sites, simply create a robots.txt file in your public or static folder. It will be served at the root URL automatically.

Apache Server

Place your robots.txt file in the web root directory (usually public_html or www). Ensure the file permissions allow it to be read (typically 644).

Nginx Server

Similar to Apache, place the file in your web root. No special configuration is needed—Nginx serves it automatically.

Content Delivery Networks (CDNs)

If you use a CDN like Cloudflare or Fastly, ensure your robots.txt is being served correctly. Some CDNs cache the file aggressively, so changes may take time to propagate.

Real-World Examples

Let us look at how major websites configure their robots.txt:

News Sites

News sites typically allow crawling of articles but block administrative pages and internal search:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /search/
Disallow: /print/

Sitemap: https://news-site.com/sitemap.xml
Sitemap: https://news-site.com/news-sitemap.xml

E-Learning Platforms

E-learning sites need to protect course content while allowing indexing of promotional pages:

User-agent: *
Disallow: /dashboard/
Disallow: /courses/content/
Disallow: /my-account/
Allow: /courses/
Allow: /blog/

Sitemap: https://learning-site.com/sitemap.xml

These examples show how different business needs lead to different robots.txt configurations. Always tailor your robots.txt to your specific requirements.

Conclusion

A properly configured robots.txt file is essential for effective SEO. It helps search engines crawl your site efficiently, prevents duplicate content issues, and keeps your crawl budget focused on important pages.

Key takeaways:

  1. Keep it simple: Start with basic rules and add complexity only as needed
  2. Test before deploying: Always verify your changes do not block important content
  3. Remember limitations: Robots.txt is not security—it is guidance
  4. Include your sitemap: Make it easy for crawlers to find your content map
  5. Review regularly: As your site evolves, so should your robots.txt

Need help creating your robots.txt file? Try RobotsTxtGen to generate a properly formatted file in minutes, with validation and best practices built in.


Your robots.txt is only as useful as the server it lives on. Make sure you're hosting on a fast, reliable platform:

Xserver — Japan's No.1 web hosting. Lightning-fast servers, free SSL, 99.99% uptime. Trusted by 2.5 million websites.

ConoHa WING — Ranked Japan's fastest hosting. No setup fee, WordPress-optimized environment, free domain included.


Have questions about robots.txt? Drop us a line—we are happy to help.

Last updated: April 2026

About ToolScout Team

The ToolScout team reviews and compares the best free tools for freelancers and creators. Our mission is to help you find the perfect tools to grow your business without breaking the bank.