Writing an Appropriate Robots.txt file

Writing a robots.txt file allows spiders to exclude parts of your site from spidering. Conversely spiders know that all other areas of your site are available to be crawled, indexed and included in the Search Engine Result Pages (SERPS). Writing an effective robots.txt can ensure that search engines maintain confidence in your domain

Creating a Robots.txt file

Simply create a text document and save it as robots. This will create a document named robots.txt. Upload this file to the root file for your domain to start instructing the search engine spiders.

Anatomy of a robots.txt file

  1. User-agent : This line of your robots.txt file specifies an individual spider that is to obey a given set of rules. Below is a list of the most common user agents you are likely to use
    • * - wildcard user agent which means all
    • googlebot - The google spider
    • msnbot - The MSN Spider
    • slurp - The Yahoo! Spider
    • For a more complete list of user agents and the search engines or crawlers they are used by refer to the information located at psychedelix.
  2. Syntax : All you do is place the spider to exclude in the following format
    User-Agent:*
    This for example will inform all search engine spiders and crawlers that the following data applies to them.
  3. Disallow : This second part of a robots exclusion protocol allows you to dictate which directories and files within your domain you wish to exclude.
    • Often disallowing images files is useful for conserving the bandwidth of your server espescially on image dense sites
  4. Syntax : having said which spider the rule applies to we set the directories or folders that are not to be spidered.
    Disallow:/images
    It is important to note that all paths are relative to the index page for your site. Like with spiders you can select to disallow all of a site by simply using a command such as
    Disallow:*
    .
  5. White Space and Comments
    • White space is allowed in a robots.txt file but is not encouraged
    • To comment out a line of the code in a robots exclusion protocol file simply use the '#' character.

List of Articles on Ethical Search Engine Optimization


: Hotel Industry Booking Study :: The Horror of Site Submit Pro :: What do you need from Your Site? :
: What is Page Rank? :: Page Rank is Dead - Myth or Reality :: The Replacement for Page Rank? :
: Latent Semantic Indexing :: Using Latent Semantic Indexing :: Robots.txt :
: Writing a robots.txt file :: Server Company Link Request :: Duplicate and Near Duplicate Content :
: Web Site Spiderability :: Big Daddy - the new face of Google :: Page Hijacking and 302 redirects :
: To Submit to Search Engines or not to Submit to Search Engines That is the Question? :: Know Your Customer to Know your User :: Black Hat SEO - Dont Do it! :
: April Fools in Search Engine Land :: Search Engines and Menus :: High Rankings - How do Search Engines fit into Your Business? :
: Google - Da Vinci Code the Game :: Removing the ODP description from your MSN listing :: Viewing the Google index from different Geographic Positions :
: Underused HTML Tags :: Company Law Amendment :

Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 2.5 License.