Why and how to create a Robots.txt File?
No Comments
Written by Robert on August 17, 2008 – 7:41 pm
Proper Web site design and structure is an integral part of any search-engine optimization campaign. Many Webmasters are not even aware that these issues exist let alone know how to deal with them when they arise. Mastering these tactics requires the willingness to dig deeper than the surface and delve into some fairly technical tasks. Ultimately, the world’s premier search-engine optimization experts must develop advanced skills and techniques to keep their Web sites on top of the search-engine results pages.
The search-engine ranking algorithms are constantly evolving. More and more Web site owners are losing their spots in the rankings as their competition becomes more technically competent and adept at the technical side of search-engine optimization. You can use these techniques to streamline much of the tedious work involved with search-engine optimization.
For example, using the mod_rewrite module discussed in the “Using Mod_Rewrite to Rewrite URLs” task can, depending on the size of your Web site, shave hours or even days off the amount of time normally required to rewrite every URL on your Web site.
Adding a simple line of text to a robots.txt file as discussed in the “Create a Robots.txt File” task can save your Web server from an overflow of search-engine spider traffic. This traffic can waste a tremendous amount of server resources and even force your Web host to charge you extra money for bandwidth overcharges. These are just a few examples of potentially daunting tasks that can be simplified using these techniques.
Create a Robots.txt File
Creating a robots.txt file is a way of speaking directly to the search-engine spiders when they arrive at your site. These spiders are simply robots programmed to obey certain commands. There are numerous scenarios where such an exchange is useful. Perhaps you would rather the spiders not visit certain sections of your site. Or maybe you want to instruct them to visit every single page. Other times you may want to control the frequency at which the spiders visit your site. A robots.txt file allows you to tell the spiders what they may and may not do once they arrive at your domain.
You can create a robots.txt file to prevent searchengine spiders from consuming excessive amounts of bandwidth on your server and also to prevent potential copyright infringements. A robots.txt file provides the search-engine spiders with information about which pages should be crawled and indexed and which should not. It is a text file that resides in the root directory of your Web server.
If you do not provide a robots.txt file, search-engine spiders assume that the entire site should be crawled and indexed. Depending on the content of your Web site, you may have a large number of images stored on your server. These images may be spidered by the search engines and indexed in various image search engines such as Google Images. This could lead to a potentially unexpected and unwanted increase in server bandwidth if your images are found and viewed through a search engine.
You can create a robots.txt file to prevent searchengine spiders from consuming excessive amounts of bandwidth
Prevent this by creating a robots.txt file that disallows searchengine spiders from crawling and indexing your /images directory. If you sell a copyrighted informational product or piece of software on your Web site, the search engines may be able to find and index your intellectual property. Instead of paying for and then downloading your product, a savvy Internet user can potentially download for free.
Prevent this by creating a robots.txt file that disallows search-engine spiders from crawling and indexing the directory where your product is located; or, you can disallow the search engines from indexing a particular file.
To create a robots.txt file, you need nothing more than a simple text editor such as Windows Notepad.exe and a thorough understanding of just what parts of your Web site should and should not be crawled by the searchengine spiders and indexed in the search engines.
A robots.txt file can also be used to tell the search-engine spiders where a sitemap is located with this text: Sitemap: http://www.example.com/yoursitemap.html. A new robots.txt standard has been introduced that adds more commands, such as the ability to force certain crawl delay rates. You can dictate that the search-engine spiders crawl only one page per a certain time period and also specify that they may crawl only during certain hours of the day.
This can aid in bandwidth preservation because some search-engine spiders crawl pages at a very fast rate. You can use the robots.txt generator at www.mcanerin.com/EN/search-engine/robots-txt.asp to simplify the robots.txt creation task.
You specify a crawl delay rate and the location of your sitemap and either disallow all spiders or choose from a list of the most common. The tool then automatically generates the robots.txt file information that should be pasted into your robots.txt file and uploaded to your server via FTP.








