What Are Search Engine Spiders? Part Two

Controlling a Search Engine Spiders whilst in your website

Upon arrival at your website the search engine spider automatically seeks to locate a robots.txt file. This text file is used to limit the behaviour of the search engine spider whilst indexing your site. In this file you can dictate the particular web documents, or groups of web documents, that you do not wish to be indexed. This information is used before any of the pages wihtin the site are indexed. More information on the use of Robots text files can be found at http://www.conseoquences.com/articles/robots.php, http://www.conseoquences.com/articles/writing-a-robots-file.php & http://www.robotstxt.org/wc/exclusion.html

Impolite Spiders and Scrapers

Not all web spiders are polite enough to obey the instructions listed in the robots.txt file. Sometimes it may be necessary to exclude them by the use of .htaccess files. Particularly impolite spiders are designed to scrape e-mail addresses that will later be used to send unsolicited spam e-mails or will spoof your email adress for the sending out of malicious emails.

What Search Engine Spiders have visited your Site?

By inspecting your server logs you can find out exactly which Search Engine Spiders and other crawlers have been visiting your current website. Spiders can be identified in the logs via their User Agents. Every legitimate spider will leave a note in these log files as to who they are. This will also give you an idea of which pages are being spidered.

If your website is for business use it is often a good idea to implement a good quality web statistics package. This will give you not only information regarding search engine spiders. In some cases also user behaviour and visiting patterns. If you are looking for an inexpensive package Indicium Web Design have a simple to use package which will give you basic information, you may need to call their office for a quote as their analysis packages are bespoke. For a cost effective package offering spidering information, user information and search terms analysis Open Tracker is good value with the free trial you can find out how much benefit your business will gain from its use. If you are a larger business and require the best analysis package we have seen on the Internet is Urchin, this user and search engine analysis company was bought in 2005 by Google™.

:: Tutorial Index :: What is a Search Engine? :: What are Search Engine Spiders? :: Controlling Search Engine Spiders :: Analysing Log Files :: What Search Engines look for in Web Document Content :: Anatomy of Search Engine Results ::

Copyright Information

This article or tutorial relating to Search Engine Optimization is provided by conSEOquences. This article and the copyright relating to it remains the sole property of its owners. We do allow our articles to be reproduced by other web site owners but only when the article is produced as being from the conSEOquences site and that a link to the original article is provided.