Latent Semantic Indexing and Google

Latent Semantic Indexing is a development in the Search Engine industry which allows for more accurate analysing of text and allows search engines to more accurately determine which web documents are 'keyword spamming'. Both MSN and Google have filed patents regarding the development of systems for semantic indexing.

What is Latent Semantic Indexing?

Latent
Of or relating to meaning, especially meaning in language.
Semantic
Present or potential but not evident or active
Indexing
Creating indexes based on key data fields or keywords.

From the definitions above this tells us that Latent Semantic Indexing, which is also referred to as LSI, is understanding the meaning of text from the concepts related to it; this meaning need not be explicit in the text, and that these meanings are then stored.

Why are modern search engines concerned with Latent Semantic Indexing?

One of the most commonly used Unethical SEO techniques is what is referred to as 'Keyword Spamming'. Keyword Spamming is where you repeat a keyword within your text ad nauseum for the sole purpose of increasing the density of that keyword on the page for the sole purpose of increasing the search engine rankings for that document. Latent Semantic Indexing of a page allows the underlying meaning of the document can be ascertained by using all the words of the particular document to find out what the underlying focus of that document is.

In which ways do search engines determine the underlying meaning of a page?

All Search Engines determine the relevance, or focus, of a web document in two ways: On-screen and Off-screen. Latent Semantic Indexing could operate upon both these relevancy indiciators.

The On Document content could be anlysed in the following way - this process is an induction of knowledge determined from reading the patents regarding Latent Semantic Indexing. First the page is analysed for keyword spam; has the document a high level of repeated keywords? has the document got these repeated keywords spread through the document? has the document got all these related keywords in proximity to each other?. If any or none of these indicators of keyword spam is detected the document could be re-evaluated for keyword focus by removing the most common keywords or phrases and all stop words; stop words are commonly occuring words that search engine choose to diregard such as 'it' and 'and'. If a document has been subjected to optimized copywriting procedures for "Search Engine Optimization" this phrase could be removed from the text. This would mean that all the search engine would re-evaluate the document by analysing the remaining words and using the words that are related to that term.

More information on Latent Semantic Indexing and Google can be found on the following pages :

Oyster Web Article on Latent Semantic Indexing


List of Articles on Ethical Search Engine Optimization


: Hotel Industry Booking Study :: The Horror of Site Submit Pro :: What do you need from Your Site? :
: What is Page Rank? :: Page Rank is Dead - Myth or Reality :: The Replacement for Page Rank? :
: Latent Semantic Indexing :: Using Latent Semantic Indexing :: Robots.txt :
: Writing a robots.txt file :: Server Company Link Request :: Duplicate and Near Duplicate Content :
: Web Site Spiderability :: Big Daddy - the new face of Google :: Page Hijacking and 302 redirects :
: To Submit to Search Engines or not to Submit to Search Engines That is the Question? :: Know Your Customer to Know your User :: Black Hat SEO - Dont Do it! :
: April Fools in Search Engine Land :: Search Engines and Menus :: High Rankings - How do Search Engines fit into Your Business? :
: Google - Da Vinci Code the Game :: Removing the ODP description from your MSN listing :: Viewing the Google index from different Geographic Positions :
: Underused HTML Tags :: Company Law Amendment :

Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 2.5 License.