In its ongoing struggle to make the internet an ideal environment for users, Google has asked webmasters to report sites which duplicate content. Known as scraper sites, these sites can often rank higher than the original site because they also utilize other SEO tactics. To report these scraper sites, Google asks that the original URL, the duplicate site and keywords used by the scraper be included in the Scraper Report.
Google also requests that webmasters submitting these Scraper Reports also optimize their own websites in accordance with Google’s webmaster guidelines. These fall within three categories: Design, Technical and Quality. Under design standards, websites should employ techniques like a clear website hierarchy and limit the number of links on a page. Use of robots.txt and browser functionality are components of the technical guidelines. The quality guidelines emphasize value to the visitor.
There are a number of reasons why scraper sites are drawing the ire of Google administrators. First of all, these sites prevent the authors of outstanding content from receiving their due recognition. This allows scrapers to steal content and artificially inflate their search rankings, while limiting the authorship of genuine creators. Google firmly believes in authorship which recognizes the efforts of premier content creators so that they are encouraged to continue producing outstanding web pages for internet users.
Secondly, these scraper sites are an anathema for search engines. These sites often usurp the positions of authentic sites, shoving them off the front pages of search engine results. Google has difficulty differentiating between the originator of content and the duplicator. This results in confusing results which often devalues the importance of Google and downgrades the user experience.
Like vultures, many scraper sites sweep in and steal the most valuable content from new sites. This prevents these high quality—at least in content—organizations from growing. In some rare cases, these websites fail to survive once their best material has been scavenged by scrapers. This is partitally due to the fact that new sites can’t compete technically with more established organizations, as well as the fact that search engines will often grant more authenticity to an established site.
The horrifying irony is that if Google can’t differentiate between the author and the scraper, the original author could be at risk of being penalized by Google. Unless the original website can definitively prove they are the creator of the content, they may designated as the scraper!
The new Google Scraper reporting system doesn’t produce immediate results, but it should help in the long term. There is also a Digital Millennium Copyright Act process which can remove duplicated content, but this is a more involved and time consuming process.
This effort is the latest in a series of attempts to identify and penalize content duplicators. In 2011, Google attempted to eliminate scrapers from search engine results by enhancing their algorithm. Last year, Google tried to rectify the situation where smaller sites that were producing outstanding content were not ranking highly.