How to Create a Robots.txt File (Complete Guide)

The robots.txt file is one of the most important—yet often overlooked—files on your website. This simple text file tells search engine crawlers which pages they can and cannot access. Get it wrong, and you could accidentally block search engines from indexing your entire site.

In this complete guide, we will walk you through everything you need to know about robots.txt: what it is, how to create one, common directives and patterns, best practices, and the tools that make configuration easy.

What Is a Robots.txt File?

The robots.txt file is a plain text file that sits at the root of your website (e.g., https://example.com/robots.txt). It follows the Robots Exclusion Protocol, a standard that tells web crawlers which parts of your site they should not visit.

Key Points About Robots.txt

Location: Must be at the root of your domain
Format: Plain text file
Purpose: Guides crawler behavior
Access: Publicly visible to anyone

Important Limitations

Before we go further, understand what robots.txt cannot do:

It is not a security measure: Robots.txt is advisory. Malicious bots may ignore it.
It does not hide content: Anyone can view your robots.txt and see what you are blocking.
It does not prevent indexing: If other sites link to blocked pages, they may still appear in search results (without content).

For truly private content, use proper authentication or password protection.

Why Robots.txt Matters for SEO

Despite its simplicity, robots.txt plays a crucial role in SEO:

Crawl Budget Optimization

Search engines have limited resources to crawl your site. Robots.txt helps you direct crawlers to your most important pages by blocking less important ones.

Prevent Duplicate Content

Block pages that create duplicate content issues, like print versions, filtered views, or parameter-heavy URLs.

Protect Sensitive Areas

Keep crawlers away from admin areas, staging environments, or internal tools (while remembering this is not actual security).

Control Server Load

Prevent crawlers from overwhelming your server by accessing resource-intensive pages.

Robots.txt Syntax

The robots.txt file uses a simple syntax. Let us break it down.

Basic Structure

User-agent: [crawler name]
Disallow: [path to block]
Allow: [path to allow]

User-Agent Directive

The User-agent specifies which crawler the rules apply to.

# All crawlers
User-agent: *

# Just Google
User-agent: Googlebot

# Just Bing
User-agent: Bingbot

Disallow Directive

The Disallow directive blocks access to specified paths.

# Block a specific page
Disallow: /private-page.html

# Block a directory
Disallow: /admin/

# Block all pages
Disallow: /

Allow Directive

The Allow directive permits access to paths that would otherwise be blocked. This is mainly used by Google.

# Block the directory but allow one file
User-agent: *
Disallow: /private/
Allow: /private/public-file.html

Sitemap Directive

The Sitemap directive tells crawlers where to find your sitemap.

Sitemap: https://example.com/sitemap.xml

Crawl-Delay Directive

The Crawl-delay directive asks crawlers to wait between requests. Note: Google ignores this directive.

User-agent: *
Crawl-delay: 10

Common Robots.txt Patterns

Let us look at real-world patterns for different scenarios.

Block All Crawlers

Use this for development or staging sites:

User-agent: *
Disallow: /

Warning: This will remove your site from search results. Use carefully.

Allow All Crawlers

The most permissive robots.txt:

User-agent: *
Disallow:

Or simply have an empty robots.txt file.

Block Specific Directories

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /temp/
Disallow: /cgi-bin/

Block Specific File Types

User-agent: *
Disallow: /*.pdf$
Disallow: /*.doc$
Disallow: /*.xls$

Block Query Parameters

User-agent: *
Disallow: /*?*

Block Specific Crawlers

# Block bad bots
User-agent: BadBot
Disallow: /

# Block specific scrapers
User-agent: AhrefsBot
Disallow: /

WordPress Robots.txt

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /license.txt

Sitemap: https://example.com/sitemap_index.xml

E-commerce Robots.txt

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /search/
Disallow: /*?s=*
Disallow: /*?orderby=*
Disallow: /*?filter_*

Sitemap: https://example.com/sitemap.xml

Complete Robots.txt Example

Here is a comprehensive example for a typical website:

# Robots.txt for example.com
# Last updated: April 2026

# Default rules for all crawlers
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Disallow: /search/
Disallow: /*?*print=*
Disallow: /*?*preview=*

# Allow Google to access CSS and JS for rendering
User-agent: Googlebot
Allow: /*.css$
Allow: /*.js$
Allow: /*.png$
Allow: /*.jpg$

# Slow down aggressive crawlers
User-agent: Bingbot
Crawl-delay: 5

# Block known bad bots
User-agent: MJ12bot
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Creating Your Robots.txt File

Method 1: Manual Creation

Open a text editor (Notepad, VS Code, etc.)
Add your directives
Save as robots.txt (not robots.txt.txt)
Upload to your website root

Method 2: Using RobotsTxtGen

RobotsTxtGen makes creating robots.txt files easy, even for beginners.

Visual interface: No need to memorize syntax
Common patterns: Pre-built templates for common scenarios
Validation: Catches syntax errors before deployment
Best practices: Suggests improvements automatically
Export ready: Download the finished file

How to Use RobotsTxtGen

Visit RobotsTxtGen
Select your website type (blog, e-commerce, etc.)
Choose directories to block
Add your sitemap URL
Download the generated file
Upload to your server

Method 3: CMS-Specific Solutions

WordPress

Yoast SEO: Provides robots.txt editor
Rank Math: Built-in robots.txt management
Manual: Upload to public_html/ or WordPress root

Shopify

Edit through Settings > Files > robots.txt.liquid

Wix

Edit through SEO Settings > Advanced > robots.txt Editor

Testing Your Robots.txt

Before deploying changes, always test your robots.txt file.

Google Search Console

Go to Google Search Console
Navigate to Settings > robots.txt Tester (if available)
Test specific URLs

Test Commands

Check if a specific URL is blocked:

# Test URL
https://example.com/admin/

Online Validators

Several online tools can validate your robots.txt:

Google Rich Results Test
Merkle Robots.txt Tester
RobotsTxtGen validator

Common Testing Scenarios

Test these scenarios before deploying:

URL Type	Expected Result
Homepage	Allowed
Blog posts	Allowed
Admin pages	Blocked
Search results	Blocked
Product pages	Allowed
Cart/checkout	Blocked
CSS/JS files	Allowed
Sitemap	Allowed

Common Robots.txt Mistakes

Avoid these common mistakes that can hurt your SEO:

1. Blocking Your Entire Site

Mistake:

User-agent: *
Disallow: /

Impact: Your entire site disappears from search results.

When it is okay: Development or staging sites only.

2. Blocking CSS and JavaScript

Mistake:

User-agent: *
Disallow: /css/
Disallow: /js/

Impact: Google cannot render your pages properly, potentially hurting rankings.

Fix: Always allow access to CSS and JS files.

3. Blocking Important Images

Mistake:

User-agent: *
Disallow: /images/

Impact: Google Image Search traffic disappears.

Fix: Only block images you truly do not want indexed.

4. Case Sensitivity Confusion

Important: Robots.txt paths are case-sensitive.

# This only blocks /Admin/ not /admin/
Disallow: /Admin/

5. Trailing Slash Mistakes

# Blocks /admin/ but not /admin
Disallow: /admin/

# Blocks both /admin and /admin/
Disallow: /admin

6. Forgetting the Sitemap

Always include your sitemap location:

Sitemap: https://example.com/sitemap.xml

7. Using Robots.txt Instead of Noindex

Scenario: You want a page to not appear in search results.

Wrong approach: Block with robots.txt (page may still be indexed via links)

Right approach: Use noindex meta tag or X-Robots-Tag header.

Robots.txt vs Meta Robots vs X-Robots-Tag

Understanding when to use each:

Method	Scope	Use Case
Robots.txt	Entire directories/patterns	Block crawling of large sections
Meta Robots	Individual pages	Control indexing of specific pages
X-Robots-Tag	HTTP header	PDF, images, non-HTML content

When to Use Each

Use robots.txt when:

Blocking entire directories
Saving crawl budget
Blocking non-HTML resources

Use meta robots when:

Controlling indexing of specific pages
You want content crawled but not indexed
Different robots need different instructions

Use X-Robots-Tag when:

Controlling PDF or image indexing
You cannot add meta tags (non-HTML files)
Server-level control is needed

Advanced Robots.txt Techniques

Pattern Matching

Robots.txt supports simple pattern matching:

# Block URLs containing "search"
Disallow: /*search*

# Block URLs ending in .pdf
Disallow: /*.pdf$

# Block URLs with specific parameters
Disallow: /*?sessionid=*

Pattern Matching Characters

Character	Meaning
`*`	Matches any sequence of characters
`$`	Matches end of URL

Handling Multiple Sitemaps

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/news-sitemap.xml
Sitemap: https://example.com/image-sitemap.xml

Host Directive (Deprecated)

The Host directive was used to specify preferred domain but is now largely ignored. Use canonical URLs instead.

Monitoring and Maintenance

Regular Audits

Review your robots.txt quarterly to ensure it still matches your site structure.

Check for Issues

Use Google Search Console to monitor:

Crawl errors related to blocked resources
Important pages being blocked
Sitemap accessibility

Version Control

Consider keeping your robots.txt in version control to track changes over time.

Frequently Asked Questions

Where do I put the robots.txt file?

Always at the root of your domain: https://example.com/robots.txt

Can I have different robots.txt for subdomains?

Yes, each subdomain can have its own robots.txt:

https://example.com/robots.txt
https://blog.example.com/robots.txt

Does robots.txt affect page speed?

No, robots.txt is only read by crawlers, not by browsers loading your pages.

How long until changes take effect?

Crawlers cache robots.txt for up to 24 hours. Changes may not be immediate.

Should I block bad bots?

You can try, but bad bots often ignore robots.txt. Use server-level blocking for actual protection.

Can I password protect with robots.txt?

No. Robots.txt is advisory only and provides no actual access control.

Robots.txt for Different Platforms

Different website platforms have different ways of managing robots.txt. Here is how to handle common platforms:

Static Sites (HTML/Next.js/Gatsby)

For static sites, simply create a robots.txt file in your public or static folder. It will be served at the root URL automatically.

Apache Server

Place your robots.txt file in the web root directory (usually public_html or www). Ensure the file permissions allow it to be read (typically 644).

Nginx Server

Similar to Apache, place the file in your web root. No special configuration is needed—Nginx serves it automatically.

Content Delivery Networks (CDNs)

If you use a CDN like Cloudflare or Fastly, ensure your robots.txt is being served correctly. Some CDNs cache the file aggressively, so changes may take time to propagate.

Real-World Examples

Let us look at how major websites configure their robots.txt:

News Sites

News sites typically allow crawling of articles but block administrative pages and internal search:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /search/
Disallow: /print/

Sitemap: https://news-site.com/sitemap.xml
Sitemap: https://news-site.com/news-sitemap.xml

E-Learning Platforms

E-learning sites need to protect course content while allowing indexing of promotional pages:

User-agent: *
Disallow: /dashboard/
Disallow: /courses/content/
Disallow: /my-account/
Allow: /courses/
Allow: /blog/

Sitemap: https://learning-site.com/sitemap.xml

These examples show how different business needs lead to different robots.txt configurations. Always tailor your robots.txt to your specific requirements.

Conclusion

A properly configured robots.txt file is essential for effective SEO. It helps search engines crawl your site efficiently, prevents duplicate content issues, and keeps your crawl budget focused on important pages.

Key takeaways:

Keep it simple: Start with basic rules and add complexity only as needed
Test before deploying: Always verify your changes do not block important content
Remember limitations: Robots.txt is not security—it is guidance
Include your sitemap: Make it easy for crawlers to find your content map
Review regularly: As your site evolves, so should your robots.txt

Need help creating your robots.txt file? Try RobotsTxtGen to generate a properly formatted file in minutes, with validation and best practices built in.

Recommended Web Hosting

Your robots.txt is only as useful as the server it lives on. Make sure you're hosting on a fast, reliable platform:

Xserver — Japan's No.1 web hosting. Lightning-fast servers, free SSL, 99.99% uptime. Trusted by 2.5 million websites.

ConoHa WING — Ranked Japan's fastest hosting. No setup fee, WordPress-optimized environment, free domain included.

Have questions about robots.txt? Drop us a line—we are happy to help.

Last updated: April 2026

What Is a Robots.txt File?

Key Points About Robots.txt

Important Limitations

Why Robots.txt Matters for SEO

Crawl Budget Optimization

Prevent Duplicate Content

Protect Sensitive Areas

Control Server Load

Robots.txt Syntax

Basic Structure

User-Agent Directive

Disallow Directive

Allow Directive

Sitemap Directive

Crawl-Delay Directive

Common Robots.txt Patterns

Block All Crawlers

Allow All Crawlers

Block Specific Directories

Block Specific File Types

Block Query Parameters

Block Specific Crawlers

WordPress Robots.txt

E-commerce Robots.txt

Complete Robots.txt Example

Creating Your Robots.txt File

Method 1: Manual Creation

Method 2: Using RobotsTxtGen

Why We Recommend RobotsTxtGen

How to Use RobotsTxtGen

Method 3: CMS-Specific Solutions

WordPress

Shopify

Wix

Testing Your Robots.txt

Google Search Console

Test Commands

Online Validators

Common Testing Scenarios

Common Robots.txt Mistakes

1. Blocking Your Entire Site

2. Blocking CSS and JavaScript

3. Blocking Important Images

4. Case Sensitivity Confusion

5. Trailing Slash Mistakes

6. Forgetting the Sitemap

7. Using Robots.txt Instead of Noindex

Robots.txt vs Meta Robots vs X-Robots-Tag

When to Use Each

Advanced Robots.txt Techniques

Pattern Matching

Pattern Matching Characters

Handling Multiple Sitemaps

Host Directive (Deprecated)

Monitoring and Maintenance

Regular Audits

Check for Issues

Version Control

Frequently Asked Questions

Where do I put the robots.txt file?

Can I have different robots.txt for subdomains?

Does robots.txt affect page speed?

How long until changes take effect?

Should I block bad bots?

Can I password protect with robots.txt?

Robots.txt for Different Platforms

Static Sites (HTML/Next.js/Gatsby)

Apache Server

Nginx Server

Content Delivery Networks (CDNs)

Real-World Examples

News Sites

E-Learning Platforms

Conclusion

Recommended Web Hosting

About Noah AI Labs