How to Create a Robots.txt File (Complete Guide)
Learn how to create, configure, and test your robots.txt file. This complete guide covers syntax, common directives, best practices, and mistakes to avoid for better SEO.
The robots.txt file is one of the most important—yet often overlooked—files on your website. This simple text file tells search engine crawlers which pages they can and cannot access. Get it wrong, and you could accidentally block search engines from indexing your entire site.
In this complete guide, we will walk you through everything you need to know about robots.txt: what it is, how to create one, common directives and patterns, best practices, and the tools that make configuration easy.
What Is a Robots.txt File?
The robots.txt file is a plain text file that sits at the root of your website (e.g., https://example.com/robots.txt). It follows the Robots Exclusion Protocol, a standard that tells web crawlers which parts of your site they should not visit.
Key Points About Robots.txt
- Location: Must be at the root of your domain
- Format: Plain text file
- Purpose: Guides crawler behavior
- Access: Publicly visible to anyone
Important Limitations
Before we go further, understand what robots.txt cannot do:
- It is not a security measure: Robots.txt is advisory. Malicious bots may ignore it.
- It does not hide content: Anyone can view your robots.txt and see what you are blocking.
- It does not prevent indexing: If other sites link to blocked pages, they may still appear in search results (without content).
For truly private content, use proper authentication or password protection.
Why Robots.txt Matters for SEO
Despite its simplicity, robots.txt plays a crucial role in SEO:
Crawl Budget Optimization
Search engines have limited resources to crawl your site. Robots.txt helps you direct crawlers to your most important pages by blocking less important ones.
Prevent Duplicate Content
Block pages that create duplicate content issues, like print versions, filtered views, or parameter-heavy URLs.
Protect Sensitive Areas
Keep crawlers away from admin areas, staging environments, or internal tools (while remembering this is not actual security).
Control Server Load
Prevent crawlers from overwhelming your server by accessing resource-intensive pages.
Robots.txt Syntax
The robots.txt file uses a simple syntax. Let us break it down.
Basic Structure
User-agent: [crawler name]
Disallow: [path to block]
Allow: [path to allow]
User-Agent Directive
The User-agent specifies which crawler the rules apply to.
# All crawlers
User-agent: *
# Just Google
User-agent: Googlebot
# Just Bing
User-agent: Bingbot
Disallow Directive
The Disallow directive blocks access to specified paths.
# Block a specific page
Disallow: /private-page.html
# Block a directory
Disallow: /admin/
# Block all pages
Disallow: /
Allow Directive
The Allow directive permits access to paths that would otherwise be blocked. This is mainly used by Google.
# Block the directory but allow one file
User-agent: *
Disallow: /private/
Allow: /private/public-file.html
Sitemap Directive
The Sitemap directive tells crawlers where to find your sitemap.
Sitemap: https://example.com/sitemap.xml
Crawl-Delay Directive
The Crawl-delay directive asks crawlers to wait between requests. Note: Google ignores this directive.
User-agent: *
Crawl-delay: 10
Common Robots.txt Patterns
Let us look at real-world patterns for different scenarios.
Block All Crawlers
Use this for development or staging sites:
User-agent: *
Disallow: /
Warning: This will remove your site from search results. Use carefully.
Allow All Crawlers
The most permissive robots.txt:
User-agent: *
Disallow:
Or simply have an empty robots.txt file.
Block Specific Directories
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /temp/
Disallow: /cgi-bin/
Block Specific File Types
User-agent: *
Disallow: /*.pdf$
Disallow: /*.doc$
Disallow: /*.xls$
Block Query Parameters
User-agent: *
Disallow: /*?*
Block Specific Crawlers
# Block bad bots
User-agent: BadBot
Disallow: /
# Block specific scrapers
User-agent: AhrefsBot
Disallow: /
WordPress Robots.txt
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /license.txt
Sitemap: https://example.com/sitemap_index.xml
E-commerce Robots.txt
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /search/
Disallow: /*?s=*
Disallow: /*?orderby=*
Disallow: /*?filter_*
Sitemap: https://example.com/sitemap.xml
Complete Robots.txt Example
Here is a comprehensive example for a typical website:
# Robots.txt for example.com
# Last updated: April 2026
# Default rules for all crawlers
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Disallow: /search/
Disallow: /*?*print=*
Disallow: /*?*preview=*
# Allow Google to access CSS and JS for rendering
User-agent: Googlebot
Allow: /*.css$
Allow: /*.js$
Allow: /*.png$
Allow: /*.jpg$
# Slow down aggressive crawlers
User-agent: Bingbot
Crawl-delay: 5
# Block known bad bots
User-agent: MJ12bot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
# Sitemap location
Sitemap: https://example.com/sitemap.xml
Creating Your Robots.txt File
Method 1: Manual Creation
- Open a text editor (Notepad, VS Code, etc.)
- Add your directives
- Save as
robots.txt(notrobots.txt.txt) - Upload to your website root
Method 2: Using RobotsTxtGen
RobotsTxtGen makes creating robots.txt files easy, even for beginners.
Why We Recommend RobotsTxtGen
- Visual interface: No need to memorize syntax
- Common patterns: Pre-built templates for common scenarios
- Validation: Catches syntax errors before deployment
- Best practices: Suggests improvements automatically
- Export ready: Download the finished file
How to Use RobotsTxtGen
- Visit RobotsTxtGen
- Select your website type (blog, e-commerce, etc.)
- Choose directories to block
- Add your sitemap URL
- Download the generated file
- Upload to your server
Method 3: CMS-Specific Solutions
WordPress
- Yoast SEO: Provides robots.txt editor
- Rank Math: Built-in robots.txt management
- Manual: Upload to
public_html/or WordPress root
Shopify
Edit through Settings > Files > robots.txt.liquid
Wix
Edit through SEO Settings > Advanced > robots.txt Editor
Testing Your Robots.txt
Before deploying changes, always test your robots.txt file.
Google Search Console
- Go to Google Search Console
- Navigate to Settings > robots.txt Tester (if available)
- Test specific URLs
Test Commands
Check if a specific URL is blocked:
# Test URL
https://example.com/admin/
Online Validators
Several online tools can validate your robots.txt:
- Google Rich Results Test
- Merkle Robots.txt Tester
- RobotsTxtGen validator
Common Testing Scenarios
Test these scenarios before deploying:
| URL Type | Expected Result |
|---|---|
| Homepage | Allowed |
| Blog posts | Allowed |
| Admin pages | Blocked |
| Search results | Blocked |
| Product pages | Allowed |
| Cart/checkout | Blocked |
| CSS/JS files | Allowed |
| Sitemap | Allowed |
Common Robots.txt Mistakes
Avoid these common mistakes that can hurt your SEO:
1. Blocking Your Entire Site
Mistake:
User-agent: *
Disallow: /
Impact: Your entire site disappears from search results.
When it is okay: Development or staging sites only.
2. Blocking CSS and JavaScript
Mistake:
User-agent: *
Disallow: /css/
Disallow: /js/
Impact: Google cannot render your pages properly, potentially hurting rankings.
Fix: Always allow access to CSS and JS files.
3. Blocking Important Images
Mistake:
User-agent: *
Disallow: /images/
Impact: Google Image Search traffic disappears.
Fix: Only block images you truly do not want indexed.
4. Case Sensitivity Confusion
Important: Robots.txt paths are case-sensitive.
# This only blocks /Admin/ not /admin/
Disallow: /Admin/
5. Trailing Slash Mistakes
# Blocks /admin/ but not /admin
Disallow: /admin/
# Blocks both /admin and /admin/
Disallow: /admin
6. Forgetting the Sitemap
Always include your sitemap location:
Sitemap: https://example.com/sitemap.xml
7. Using Robots.txt Instead of Noindex
Scenario: You want a page to not appear in search results.
Wrong approach: Block with robots.txt (page may still be indexed via links)
Right approach: Use noindex meta tag or X-Robots-Tag header.
Robots.txt vs Meta Robots vs X-Robots-Tag
Understanding when to use each:
| Method | Scope | Use Case |
|---|---|---|
| Robots.txt | Entire directories/patterns | Block crawling of large sections |
| Meta Robots | Individual pages | Control indexing of specific pages |
| X-Robots-Tag | HTTP header | PDF, images, non-HTML content |
When to Use Each
Use robots.txt when:
- Blocking entire directories
- Saving crawl budget
- Blocking non-HTML resources
Use meta robots when:
- Controlling indexing of specific pages
- You want content crawled but not indexed
- Different robots need different instructions
Use X-Robots-Tag when:
- Controlling PDF or image indexing
- You cannot add meta tags (non-HTML files)
- Server-level control is needed
Advanced Robots.txt Techniques
Pattern Matching
Robots.txt supports simple pattern matching:
# Block URLs containing "search"
Disallow: /*search*
# Block URLs ending in .pdf
Disallow: /*.pdf$
# Block URLs with specific parameters
Disallow: /*?sessionid=*
Pattern Matching Characters
| Character | Meaning |
|---|---|
* | Matches any sequence of characters |
$ | Matches end of URL |
Handling Multiple Sitemaps
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/news-sitemap.xml
Sitemap: https://example.com/image-sitemap.xml
Host Directive (Deprecated)
The Host directive was used to specify preferred domain but is now largely ignored. Use canonical URLs instead.
Monitoring and Maintenance
Regular Audits
Review your robots.txt quarterly to ensure it still matches your site structure.
Check for Issues
Use Google Search Console to monitor:
- Crawl errors related to blocked resources
- Important pages being blocked
- Sitemap accessibility
Version Control
Consider keeping your robots.txt in version control to track changes over time.
Frequently Asked Questions
Where do I put the robots.txt file?
Always at the root of your domain: https://example.com/robots.txt
Can I have different robots.txt for subdomains?
Yes, each subdomain can have its own robots.txt:
https://example.com/robots.txthttps://blog.example.com/robots.txt
Does robots.txt affect page speed?
No, robots.txt is only read by crawlers, not by browsers loading your pages.
How long until changes take effect?
Crawlers cache robots.txt for up to 24 hours. Changes may not be immediate.
Should I block bad bots?
You can try, but bad bots often ignore robots.txt. Use server-level blocking for actual protection.
Can I password protect with robots.txt?
No. Robots.txt is advisory only and provides no actual access control.
Robots.txt for Different Platforms
Different website platforms have different ways of managing robots.txt. Here is how to handle common platforms:
Static Sites (HTML/Next.js/Gatsby)
For static sites, simply create a robots.txt file in your public or static folder. It will be served at the root URL automatically.
Apache Server
Place your robots.txt file in the web root directory (usually public_html or www). Ensure the file permissions allow it to be read (typically 644).
Nginx Server
Similar to Apache, place the file in your web root. No special configuration is needed—Nginx serves it automatically.
Content Delivery Networks (CDNs)
If you use a CDN like Cloudflare or Fastly, ensure your robots.txt is being served correctly. Some CDNs cache the file aggressively, so changes may take time to propagate.
Real-World Examples
Let us look at how major websites configure their robots.txt:
News Sites
News sites typically allow crawling of articles but block administrative pages and internal search:
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /search/
Disallow: /print/
Sitemap: https://news-site.com/sitemap.xml
Sitemap: https://news-site.com/news-sitemap.xml
E-Learning Platforms
E-learning sites need to protect course content while allowing indexing of promotional pages:
User-agent: *
Disallow: /dashboard/
Disallow: /courses/content/
Disallow: /my-account/
Allow: /courses/
Allow: /blog/
Sitemap: https://learning-site.com/sitemap.xml
These examples show how different business needs lead to different robots.txt configurations. Always tailor your robots.txt to your specific requirements.
Conclusion
A properly configured robots.txt file is essential for effective SEO. It helps search engines crawl your site efficiently, prevents duplicate content issues, and keeps your crawl budget focused on important pages.
Key takeaways:
- Keep it simple: Start with basic rules and add complexity only as needed
- Test before deploying: Always verify your changes do not block important content
- Remember limitations: Robots.txt is not security—it is guidance
- Include your sitemap: Make it easy for crawlers to find your content map
- Review regularly: As your site evolves, so should your robots.txt
Need help creating your robots.txt file? Try RobotsTxtGen to generate a properly formatted file in minutes, with validation and best practices built in.
Recommended Web Hosting
Your robots.txt is only as useful as the server it lives on. Make sure you're hosting on a fast, reliable platform:
Xserver — Japan's No.1 web hosting. Lightning-fast servers, free SSL, 99.99% uptime. Trusted by 2.5 million websites.
ConoHa WING — Ranked Japan's fastest hosting. No setup fee, WordPress-optimized environment, free domain included.
Have questions about robots.txt? Drop us a line—we are happy to help.
Last updated: April 2026
About ToolScout Team
The ToolScout team reviews and compares the best free tools for freelancers and creators. Our mission is to help you find the perfect tools to grow your business without breaking the bank.