Writing an Appropriate Robots.txt file
Writing a robots.txt file allows spiders to exclude parts of
your site from spidering. Conversely spiders know that all other
areas of your site are available to be crawled, indexed and
included in the Search Engine Result Pages (SERPS). Writing an
effective robots.txt can ensure that search engines maintain
confidence in your domain
Creating a Robots.txt file
Simply create a text document and save it as robots. This will
create a document named robots.txt. Upload this file to the root
file for your domain to start instructing the search engine
spiders.
Anatomy of a robots.txt file
- User-agent : This line of your robots.txt file
specifies an individual spider that is to obey a given set of
rules. Below is a list of the most common user agents you are
likely to use
- * - wildcard user agent which means all
- googlebot - The google spider
- msnbot - The MSN Spider
- slurp - The Yahoo! Spider
- For a more complete list of user agents and the search engines
or crawlers they are used by refer to the information located at
psychedelix.
- Syntax : All you do is place the spider to
exclude in the following format
User-Agent:*
This for example will inform all search engine spiders and crawlers
that the following data applies to them.
- Disallow : This second part of a robots
exclusion protocol allows you to dictate which directories and
files within your domain you wish to exclude.
- Often disallowing images files is useful for conserving the
bandwidth of your server espescially on image dense sites
- Syntax : having said which spider the rule
applies to we set the directories or folders that are not to be
spidered.
Disallow:/images
It is important to note that all paths are relative to the index
page for your site. Like with spiders you can select to disallow
all of a site by simply using a command such as
Disallow:*
.
- White Space and Comments
- White space is allowed in a robots.txt file but is
not encouraged
- To comment out a line of the code in a robots exclusion
protocol file simply use the '#' character.
List of Articles on Ethical Search Engine Optimization : Hotel Industry Booking Study :: The Horror of Site Submit Pro :: What do you need from Your Site? : : What is Page Rank? :: Page Rank is Dead - Myth or Reality :: The Replacement for Page Rank? : : Latent Semantic Indexing :: Using Latent Semantic Indexing :: Robots.txt : : Writing a robots.txt file :: Server Company Link Request :: Duplicate and Near Duplicate Content : : Web Site Spiderability :: Big Daddy - the new face of Google :: Page Hijacking and 302 redirects : : To Submit to Search Engines or not to Submit to Search Engines That is the Question? :: Know Your Customer to Know your User :: Black Hat SEO - Dont Do it! : : April Fools in Search Engine Land :: Search Engines and Menus :: High Rankings - How do Search Engines fit into Your Business? : : Google - Da Vinci Code the Game :: Removing the ODP description from your MSN listing :: Viewing the Google index from different Geographic Positions : : Underused HTML Tags :: Company Law Amendment :
 This work is licensed under a Creative Commons Attribution-No Derivative Works 2.5 License.
|