What are the Causes of Duplicate Content?

What are the causes of duplicate content? - The CAG

What are the Causes of
Duplicate Content?

What are the causes of duplicate content? - The CAG

What are the Causes of
Duplicate Content?

What are the Causes of Duplicate Content?

Duplicate content is a similar or exact copy of content that appears on various places, either on different pages of the same website or on other sites. Google hates duplicate content and thus can lead to negative consequences. In this article, our SEO pros from Perth will help you understand the issues and causes of duplicate content. 

What are the issues caused by duplicate content?

  • Search engines cannot choose which version or versions of the content they should index or rank in search results leading to your website losing visibility and ranking.
  • Search engines will be confused as to which content the link equity need to be assigned.
  • With duplicate content, the inbound links from other users will get dispersed to multiple pages instead of pointing to one. It will lead to diluting of link equity for all versions.

What are the causes of duplicate content?

WWW vs non-WWW and HTTP vs HTTPS

When you have accessible versions of the same website, www and non-www (or HTTP and HTTPS), it becomes duplicated for each webpage.

Session IDs

In online stores, the visitor’s history is tracked using sessions. The sessions need to be stored, and we usually use cookies for this. But, search engines often don’t store cookies. So, as an alternative, some systems use unique identifiers called Session IDs to differentiate sessions.

All session IDs are unique, and every internal link on the website gets that session’s Session ID added to its URL. These URL will be considered as new and thereby duplicating content.

Order of URL variables

The order of URL parameters in CMS can lead to duplicate content. CMS creates URLs like /?P1=1&P2=2 or /?P2=2&P1=1 where “P1” represents parameter 1, and “P2” is parameter 2. Even though both the URLs give the same results in most cases, the search engine treats them as separate URLs.

URL variables

URL variables used for tracking and sorting can cause duplicate content. For example, look at the URLs below.

https://www.example.com/product-1?
https://www.example.com/product-1?source=rss

Both these URLs lead to the same page. However, search engines cannot discern this and will treat these URLs as two different pages containing duplicate content.

Scraped or copied content

Scraped or copied content - The CAG

Duplicate content can also be caused by other websites copying your content. These websites don’t always ask your approval to use your content or link to your original content, making the search engine consider it as duplicate content.

Likewise, in e-commerce sites, if multiple online shops sell the same products, they tend to use the brand’s original description in their websites for those products. This action will lead to the appearance of identical contents on different websites.

Comment pagination

Many CMS paginate the comments. The paginated comments will have different URLs leading to duplication.

E.g., article URL & article URL/comment-page-1/

Printer-friendly versions

Printer-friendly versions created by CMS can also cause duplicate content issues when both original and printer friendly version get indexed. Unless you specifically block these versions, Google will include them while indexing.

These are the common causes of duplicate content. We can fix duplicate content by specifying which the original content on your website is. You can achieve this by redirecting or using canonical tags. 

Jargon Buster

Content Management System– CMS is the software used to create and manage digital content.
HTTPSHypertext Transfer Protocol Secure – An extension of the Hypertext Transfer Protocol used for protected communication over the internet.
Universal Resource Locator – URL – The web address of a specific page or file on the internet. It includes the protocol, the domain name, and additional path information.
Session ID – A unique number assigned to a specific user by the site’s server for the duration of that user’s visit. You can store it as a cookie, URL or form field.
Cookies – Text files containing small pieces of identification data sent by the server to your browser when you visit a website.

Peter Machalski | Blog author | Computing Australia

Peter

Peter is the Systems Operations Manager at The Computing Australia Group, he is responsible for managing and maintaining uptime for thousands of client servers. It is a busy portfolio with a lot of responsibility because clients depend on their systems being accessible practically 24 hours a day. It is a far cry from when he started in the industry when most people just worked Monday to Friday, 9 to 5 and we had plenty of time to maintain systems after hours. He also works across other portfolios at The CAG, including projects and service delivery.

Peter Machalski | Blog author | Computing Australia

Peter Machalski

Peter is the Systems Operations Manager at The Computing Australia Group, he is responsible for managing and maintaining uptime for thousands of client servers. It is a busy portfolio with a lot of responsibility because clients depend on their systems being accessible practically 24 hours a day. It is a far cry from when he started in the industry when most people just worked Monday to Friday, 9 to 5 and we had plenty of time to maintain systems after hours. He also works across other portfolios at The CAG, including projects and service delivery.