If you are a webmaster, the term ‘robots.txt’ must have crossed your eyes here and there. Ever wondered what it means? How does it apply to your web content and what role does it play?
Almost every website has a robots.txt file as an essential but a few owners have the know-how of what it is used for. The purpose of writing this article is to equip you with the necessary information so that you can use it to further upgrade your site quality.
Gear up and fasten your seat belts for we are going to take you to an interesting venture about the knowledge of the WordPress robots.txt file. You’ll also be able to better manage the access to your site. Once you know about the functioning of the file, it will become easier to use it to your advantage and differentiate yourself from a pool of websites
What exactly is WordPress Robots.txt?
Prior to beginning to dwell on the term Robots.txt WordPress, we need to understand a rather vast term i.e. robot. In the internet world, this word is used for the context of any kind of ‘bot’ that visits the websites.
You can consider the example of Google search engine crawler bots that roam around in the web and help Google index billions of web pages from an enormous number of websites worldwide.
Bots are a handy tool, and they replace the workforce required for their tasks pretty efficiently. You can call them “digital helpers”. They are crucial for the existence of the internet structure and its proper functioning.
However, if you are a website owner, you don’t want these buggers roaming around your website for quality checks. You want to be in control of the outlook and statistics representation of your website. This is a tricky business which gave rise to the robots’ exclusion standard in the late 90s.
Now coming to our subject term i.e. Robots.txt, you can call it an implementation of the above-mentioned standard. With the help of WordPress Robots.txt file, you can control how these bots move about in the web pages.
By using this file, you can regulate and even stop the involvement of bots with your site. Yes, this is behind that human-check in many websites that you hate so much-mystery resolved.
However, the regulation of bot participation is not that simple to control. The ‘evil genius’ bots can easily avoid the robots.txt file and get into your site anyway. And once they have entered, they cannot be made to interact as you want them to even with the help of robots.txt.
Also, many great enterprises fail to realize the benefit of adding commands to the robots.txt which are not even recognized by Google. Consider any rules about amounts of bots that can enter your website, it’ll not be compromised. However, if you still cannot deal with these mischievous beings, you can go for a web remedy available online.
Why bother to get a Robots.txt File for your Website?
As a website owner, this question should be popping in your mind. Why that extra effort and investment? You want your business to run as cost-efficiently as it can, do you think a hundred times before an addition?
You’d be wondering how the stopping of bots from website invasion is going to change the way your website runs. Well, you know a drop as of yet and there’s an ocean left.
But don’t worry as we are here for this very reason. By the end of this article, you’ll be able to see for yourself what’s best for you. To put it in a nutshell, Robots.txt WordPress helps your website in two ways:
- Optimizing your resource usage by virtue of bots’ blockage. These bot visits result in the unnecessary waste of useful resources.
- Improve search engine crawl resources by instructing them to avoid wasting time on the pages that are not to be checked for indexing. In this way, you can control your site supervision by only indexing the pages you think are the best.
Hence, you ultimately achieve the better ranking of your website as your results are optimized for what you want them to be. This clever collaboration of wits and technology results in the overall benefit of your website. Any sane webmaster would desire this.
Robots.txt does not actually stop search engines from inspecting certain pages in your website. If this is your primary goal, you had better choose a meta non-index tag. It is a more direct way of dealing with this particular problem.
The main goal of WordPress Robot.txt is not to stop search engines from indexing the page. Rather it just instructs the bots from crawling about. While Google does not directly crawl in the page you marked, it states clearly that an external link to the page will enable it to crawl there.
This is something that a Google Webmaster analyst pointed out in a Webmaster Central hangout. He exclaimed that if people desire to stop bots from indexing a certain page, they should use a non-index tag rather than a robots.txt file as external links will render the engine not knowing that the pages don’t need to be indexed.
So, if you are reading this article, step back if you are planning on using the file for the purpose mentioned above. It will ultimately result in your loss as you’ll not be getting what you are expecting.
Now that you have got necessary knowledge of what is Robots.txt WordPress and how you can use it to your benefit, you must be wondering how to create and edit Robots.txt WordPress files.
In WordPress, you get a robots.txt file with the initials. So despite your not making any external effort for a robots.txt file in your WordPress file, it should be there already. To check if it is there, you can use a certain testing method. Type ‘/robots.txt’ at the end of your domain and press the search button.
As the file is essentially virtual, you cannot change it. If you are looking to change the file as you want, you’ll have to design a document on your computer which you can edit according to your liking. Check out the three simple methods to do this:
If you are using the famous Yoast SEO plugin, there is an option to design the robots.txt file later to be edited simply from the main interface. To do that, you first need to authorize Yoast SEO’s advanced features by following the path SEO> Dashboard> Features and then toggle on Advanced Settings Page.
After you are done with the task above, you need to go to SEO> Tools and click on the File Editor. WordPress robots.txt location is usually the site’s root folder.
Let’s assume you are not already having one on your website. The yoast allow you to create the file using its interface:
After clicking the button, you can easily edit robots.txt WordPress file easily from the interface.
Way down further, we’ll also discuss what kinds of commands you can input in your robots.txt file that will help you in making your website more optimized and regulated.
All in One SEO is also as popular among the users as Yoast. It is a decent platform. If you are using this plugin, you can design and edit robots.txt WordPress file from its interface too.
What you require for this task is to go to the All in One SEO page and select the feature manager and then activate the robots.txt feature.
After you have activated the file feature as you have seen in the picture above, you can then edit it easily by accessing All in One SEO> Robots.txt.
If you are not using any of the above plugins, you’ll need to use a third-party program. You need to know where the robots.txt is for this task. It should be located in the root folder of the website. In order to connect to it, you can use an FTP client.
For this purpose, you will first create a text file in the folder, name it ‘robots.txt’.
Then you will need to link with your website using the SFTP and transfer the document to the root site folder.
Further changes can also be made in the file by using the SFTP or uploading recent editions of the file. This is one of the good methods of editing the WordPress Robots.txt file.
What can you input in Your Robots.txt File?
It seems pretty important to pose this question before setting up your robots.txt WordPress file. You need to be perfectly aware of what you are doing and how you are going to do it.
You have the physical robots.txt file in your WordPress and now you can change it at your will. But you should be aware of the domain that you should put in the file. Robots.txt control how crawlers link to the website using two methods:
- Disallow method which is specifically used to tell the robots not to access certain designated pages of your website.
- User-agent method is for targeting a specific type of bots. Bots use user agents to recognize themselves. With their help, you can establish a directive which applies to yahoo but not Google.
You will be using the ‘Allow’ command in your niche. Everything on the site is by default allowed and you only need to deselect the areas which you want not allowed. You can block access to one specific folder and its subfolders while allowing access to a particular subfolder.
You basically add rules by first selecting which user agent rule will apply and then selecting which rules to implement with the help of allow and disallow. Crawl-delay and sitemap are secondary commands which are also available. They are not useful for:
- big-time crawlers, since they are ignored by them.
- Tools such as Google search console can make them redundant.
You can better understand the drill by going through some schooling of special cases:
This method is for the condition that you want no crawlers in your website. A complete lockdown will place your site into forced quarantine from the outside crawlers. For this, you can write the following code in your WordPress robots.txt file:
The ‘*’ sign next to the user agent gives the message to the file that all the user agents are to be blocked. The ‘/’ after the disallow signifies that access to all the pages is to be restricted. This method, however, is unlikely to be applicable on a live site. It is better applied to a development site.
This method is often more desirable than the one discussed above and is mostly applied. Consider an example, say you don’t want Bing-bot to be crawling about in your website.
In order to block just Bing from entering your website, you will need to input this code in the file:
This code tells the robots.txt WordPress file to only disallow certain bots having the user-agent of Bing-bot. this method is going to come in a lot handy when you require a specific bugger out of your pages. Check this site out for some service’s known user-agent names.
You might want to disallow bots to wander about a single folder. In such a case, you will impart knowledge from this section. Consider you wish to block access to these folders:
- Complete wp-admin folder
Use the following code for this command:
Now coming to the case that you wish to stop entry to a complete document but there is a particular file inside you want to be indexed. This is the showtime for the ‘Allow’ command. It is very useful for WordPress.
You can get the perfect illustration of this command by the WordPress Robotx.txt virtual file:
This means you are blocking entry to the whole ‘wp-admin’ folder but you are excluding ‘admin-ajax.php’ from the list. This is indeed a very useful command.
This is a privilege specific to WordPress users. You can control the access of bots to your web page search results. In order to stop them from crawling your search result pages, WordPress by default uses the query parameter i.e. ‘?S=’. you can add the following rule in the user agent to achieve this:
The method is also useful against soft 404 errors. This will also speed up your WordPress searches resulting in better and optimized pages.
This is a more generic scenario where you wish to implement various commands for a range of bots. For this, you will add rules for every bot in the user agent and create rules for each of them separately. Consider you want to create a common rule for all bots and specific rules that should be implemented to Bing-bot only. Input this code:
From this command, you will block all the bots from following the ‘wp-admin’ folder but ‘Bing-bot’ will not be allowed in the whole website.
Testing Your Robots.txt File
A little maintenance always proves useful and beneficial in the long haul. Frequent testing ensures maximum performance of the Robots.txt WordPress file. The google search console can perform the examination on your folder.
Go to the site and click on ‘robots.txt tester’ under the ‘Crawler’. After that, you need to submit the URL of the page you want to test the file on. If everything is crawlable, you should see a green ‘allowed’. You can also test the blocked URLs to make sure they are properly blocked.
Stay Safe from the UTF-8 BOM
Byte Order Mark or BOM is essentially an invisible character sometimes put in files due to some old text editors. Google will not read your WordPress Robots.txt file correctly if it is the victim of this sorcery.
For this reason, you should frequently check your files for errors for better and timely diagnosis. Take the example shown below. There is an invisible character in the file due to which Google cannot understand the syntax.
Hence, the first line of your file is not valid and it’s not good at all. UTF-8 can be a nightmare for many.
Google Bot is Predominantly US-based
You should never block Google bot from the United States even if you mean to target a native area neighbouring the US. Sometimes, local crawling occurs but they are mostly US-based.
How to make the best of Your WordPress Robots.txt File
We are positive that by now you have digested a great deal of knowledge and even applied some of it to your benefit. We aspire to make your website bigger and better with our informative and comprehensive articles.
To conclude, it goes without saying that you should always be wary of what you are doing. In the case of our present problem, you need to decide very well, what is your main purpose behind setting up a Robots.txt file. Is it that you want to block bots from entering your site? Then you should use a non-index tag. This is the best solution for complete indexing prevention.
If you are a casual WordPress user, you don’t really want to change your Robots.txt WordPress file. However, if you are facing problems with a bot or want to optimize the search engine interaction, robots.txt is your go-to.
Hopefully, this article provided you sufficient information regarding Robots.txt File. You can contact us in case of any queries. May your endeavours get eased.