

|
Ethical SEO Providers
Duplicate and Near Duplicate ContentAll Search Engine Optimization schould include a programme for the inspection and altering of website copy. Think about it a Search Engine lists sites based purely upon relevancy. This means your site should give the search engine what it wants in the way it wants it. One of the main ways this will be done is through analysing the content upon a site to discover what that page should be relevant for. Google™ and Content Storage and AnalysisAll search engines would find it difficult, if not impossible to store a complete representation of every page of content that they knew of. As a solution to this they use a reference system for the content instead. In Google this is through the Hashing of content. In simple terms this is a reference system. Where a reference can be used many times to represent a specific value. This helps them to save on memory and makes the anlysing of data more efficent If you think of the Google index as a data set and this contains a predetermined set of hash values. Each hash value then contains specific references to words and sections of web content. This means that has values will reference each other.
Nb: It has been thought that Google stores information in hash values based upon base 62: e.g. hash values from the set (a-v,A-Z,0-9) = 26+26+10=62 (this is an idea rather than something they have disclosed to our knowledge) This means that google will tear your web documents apart and reduce your content to a hash value. All words can be hashed and turned into a larger more representative hash for sentences. These can then be converted into a hash value representation for a paragraph and this can then be turned into a hash representation for the page. By now your page of content has been reduced to a single value. It is our thought that this hash value may be analagous to the document ID's used by Google. As Google now has an simply referenced piece of information if when they analyse your page and have reduced it to a value that is exactly the same as a pre-existing piece of content, whether that be an entire page or a smaller docuemtn such as a paragraph tag, if that value already exists it indicates that you have placed duplicate content upon your site. The same is true if a competitor steals your content they should not gain any benefit from it. Google and Near Duplicate Content.By referencing content in sections and pages against a semantically defined copy of the content it would be possible to compare content to see if it is near duplicate content. That is content that is not exactly the same but is similar enough to be considered as such. Every word will have semanticism, that is words will exist that are closely related to another word. This information is stored in what is known as 'terminal nodes' of the Google Latent Semantic Index. Each end node represent a words and contains the related words and the strength of the relevancy between these words.
So by storing a semantic representation of the text it is possible to compare the similarity of text in sentences, paragraphs and pages more easily than was possible before the advent of the semantic index. Why does Google care about duplicate and near duplicate content?Search Engines return relevant results to every individual search engine query, well they try their best to. Content is a major factor when assesing the relevancy of a web document. If two documents have the same or highly similar content they will have the same content based relevancy score. This will screw up the results as all the high ranking web documents could have the same content and would offer the search engine users little of use after the first result. What will Google do to a site for having duplicate or near duplicate content?Google will definitely apply a penalty to a web document that posts content that is duplicate or near duplicate to that on another web document. This content based penalty is likely to be a point score penalty. ConclusionStay away from duplicate content and near duplicate content. By all means get an SEO copywriter to edit, refine or create your content for your site but remember even the best SEO copywriter may not have expertise in your business are. Often it is best if you supply them with a draft of the page and ask them to edit it. The best solution is to write your own content as you will know it is not duplicate and is completely unique and original to ensure that you do not encounter penalties assosciated with duplicate and near duplicate content. List of Articles on Ethical Search Engine Optimization
|