Ensuring that your site is Spiderable.

Your site can only be ranked for pages that have been indexed so ensure your site is spiderable.

Within Ethical SEO the core concern is to make your web documents as relevant as possible for particular queries. Relevancy is determined by two central factors: links and content. Search Engines can only index, and resultantly rank, pages that they can find, process and cache. If a search engine spider can not get to the content within your web documents they are unable to process the content and give your website the rankings your site may otherwise have within the search engine rankings. Duly it is of core concern to ensure that every search engine spider can find every page within your site and that all of the pages within the site are indexed as soon as possible after these have been changed.

What will prevent a Search Engine Spider finding your web pages?

One of the major factors determining that a page can be spidered is the particular url that a page has. Often you will see that the web address for pages contain variables. It is the use of these variable within the page URL that can have an affect upon whether or not the pages within a particular site can be correctly indexed be the search engines.

Within the Google Information for webmasters, It states that

If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.

this advice has been borne out in many websites we have seen in the past. Unfortunately repairing these urls can be problematic de[pending upon how they are created.

Database Driven Websites with Dynamic URLs

Dynamic websites that we have worked on in the past fall in to two main categories: those on Apache servers and those on Microsoft servers. On Apache Servers the best way is to create a system of URL rewriting rules in the .htaccess file. This is a simple enough task for a compotent web developer. All they need to do is to create rewrite rules which utilise regular expressions, as each set of pages that are dynamicaally created will utilise URLs that have the same order of variables in them. On Microsoft Servers the process is generally more complicated. The most effective way to do the redirect is to use your 404 error page and redirect from there to the correct page with a new URL, as no simple URL rewriting rules is available on Microsoft servers. This process is liekly to involve a far higher level of skill from your web developer than usually found and is more likely to rerquire someone who is highly experienced in Microsoft servers, your web based computer language and regular expressions. Companies such as Oyster Web or Indicium Web Design should be able to help.

Using Appropriate HTTP headers to ensure Correct Spidering.

Websites that use dynamic web based technologies can often be sending out misleading and incorrect headers HTTP headers. The HTTP header that is of central concern with wen spidering are the Last Modified and If-Modified Headers. The Last Modified Header is a page specific GET request upon that page whereas the If-Modified header is a Range request. It is the Former of these two that os most important, changing that header will alter the latter. Flat HTML Pages will send out header responses automatically by course when a page is altered. Dynamic pages do not do this. Again it is best to consult an SEO trained Web Developer on how best to conduct this change. Please not don't spam and change headers every time a page is visited or change them every time that Google or another search engine spider visits it.

If you would like more information on making your site spiderable give us a call and see how we can help you enhance your search engine rankings ethically.


List of Articles on Ethical Search Engine Optimization


: Hotel Industry Booking Study :: The Horror of Site Submit Pro :: What do you need from Your Site? :
: What is Page Rank? :: Page Rank is Dead - Myth or Reality :: The Replacement for Page Rank? :
: Latent Semantic Indexing :: Using Latent Semantic Indexing :: Robots.txt :
: Writing a robots.txt file :: Server Company Link Request :: Duplicate and Near Duplicate Content :
: Web Site Spiderability :: Big Daddy - the new face of Google :: Page Hijacking and 302 redirects :
: To Submit to Search Engines or not to Submit to Search Engines That is the Question? :: Know Your Customer to Know your User :: Black Hat SEO - Dont Do it! :
: April Fools in Search Engine Land :: Search Engines and Menus :: High Rankings - How do Search Engines fit into Your Business? :
: Google - Da Vinci Code the Game :: Removing the ODP description from your MSN listing :: Viewing the Google index from different Geographic Positions :
: Underused HTML Tags :: Company Law Amendment :

Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 2.5 License.