Shingles Algorithm that will allow for two texts to be compared and in return a numerical value will be provided that somehow correlates to the level of similarity.
What can occur if search-engine Google or Yahoo
defines that your text "Is borrowed" from other site?
Your resource may not be included in search results.
How search engines define similarity of texts?
There is Shingles algorithm, allowing simple duplicate content check to be convinced
that between them exists a similarity.
How the Shingles algorithm works?
Splitting of texts into words, and then comparison of the received matrix.
So to become not important if you have simply rearranged words
or offers (if division goes on 1 word). Text Splitting can be both
by one word, and on some, ie shingle from several words.
This service make possible to check content for uniqueness after document changes.
To Compare you need the original text and altered (rewrite) copy.
News
24.08.2009 v1.4
- Full Screen Button
25.07.2009 v1.3
- Add English translation
Before comparison the text passes the minimum cleanings and changes:
- Strip HTML tags from a string such as <strong>
- Make a string lowercase
- Strip Commas, points, apostrophes, new line character, double blanks, slashes.
- Remove "stop-words"