When using Google Analytics is very difficult to tell if you are analyzing the data correctly. There are factors you need to take into account to make sure your data is as accurate as possible. This is achieved by removing data and traffic sources you don´t need or reflect your consumer´s traffic behaviour.

Good data has a story to tell but great data tells a story. That is the reason you must ensure your data is clean and set up correctly. This will make your story easier to read.

We are going to analyze first how you can clean your data in Google Analytics and how it is possible to set up filters and segments to find more answers about your audience.

Data cleaning in Google Analytics

Filtering out spam

Although GA is somehow aware of common spam sources there are still resources it cannot recognize. Those are the resources that need to be set up manually. The reason you want to remove spam is that this bad data is hiding the story analytics is trying to say. The more cleaner the data is, the more clear is that story. Some spam source can be:

  • Referral sources. Check the websites are referring your own website. There you might be able to see links that do not make sense or that are not entirely useful. This is why you want to remove it as is not real traffic but created by a bot.
  • Language. They are another source where you are able to see injected articles through measurement protocol. Through these technique is possible to inject links to articles that are not relevant.

To solve these common issues you can:

  • Check in the view settings the bot filtering option that will capture this first wave of spam.
  • Create a filter that excludes non-reliable known sources of traffic.
  • Set up custom dimensions in which you will build a filter based on a defined dimension.

Removing internal hits

This type of traffic is not spammy or generated by bots but is traffic coming from your team. This traffic is not bad per se but it does not reflect a transparent image of who your visitor is. If you are not aware of it you might be making bad marketing decisions over unreliable data. Since GA considers every event as a hit and therefore this will affect its data available you must ensure your team’s data is not included. There are ways to remove that by adding the following filters in testing views:

  • The easiest option is to use the google analytics opt-out-add-on chrome extension. This is the simplest way to remove internal traffic. When you have a small team is perfect because it turns off your internal traffic.
  • Remove location filters. For this, you can go to locations and remove the city where you and your team are located to exclude the traffic associated with that city.
  • Filter by IP address. When the IP address is known and fixed you can remove traffic associate with that IP address. If you have a dynamic IP address instead is more difficult to locate the source of traffic and it does not work.
  • Filter by country. With this filter, you can exclude traffic associated with the country where your team is located or coming from countries you are not targetting.
  • Remove internal team cities. You will be able to remove the traffic coming from your teams’ location by city.
  • Filter by campaign source by passing a cookie and filtering the information by that source.
  • The last option is to clean data with Google Tag Manager.

Cross-Domain Tracking

A more complex data cleaning is when you are using different domains through your customer journey and you need to attribute accurately the traffic source.

  • Traffic attribution: The way GA stores information is in a client ID that is assigned to a user. Users have sessions and they can also repeat sessions once you have stored their ID. Your users visit your site from an ad they click on Facebook for instance and they buy your product. This is what would happen in an ideal world in which your sales funnel is set up perfectly. Now you know that your source traffic is Facebook.
google data cleaning

The difficulty comes when the operations happen between different domains. Setting up measures to analyze the traffic source becomes more challenging. For instance, someone comes to your website through an ad and arrives on the cart page. Imagine that cart is built on a subdomain or is built with a landing page on click funnels. How do you manage to track that without losing the initial source of traffic?

cross domain purchase

These end up being cross-domain problems. Meaning that GA considers one session as one traffic sources. This would not be clear as to understand where the origin of this traffic is. So this needs to be specified in google analytics to be able to track the actual sources of the traffic. You need to tell google analytics the way traffic is getting to the website and you want GA to connect the two traffic sources as if it was one.

Solutions for this is to use a Google Analytics debugger (chrome extension) allows you to see the hits breakdown in the console. It stores the client ID and regardless of the domain you are in the domain id remain the same. So all the information is being tracked by client id. If the client ID changes are when you lose the session and attribution and you need to set up cross-domain tracking.

For instance when you need to pass the client id to the cart.com page. This is called decorating the link and we have to do this then google analytics is going to create a new link and not identify the traffic source correctly. The same principles apply to subdomains.
In tracking info you have the domains that you are telling GA to ignore or to identify as the source of the main domain of information. You can set these up without setting up cross-domain tracking but is not as effective as the first option.

Finding answers with Google Analytics

After your data is clean you want to start analysing and making conclusions over the data available. There are three common methods to understand your visitors as they go through their customer journey. These methods include funnel tracking, goal flow and segmentation.

Funnel Tracking vs Goal Flow

To track your customer’s journey you can set up with destination pages and see the user’s behaviour. To set them up we can go into goals and in the goal details, we can define the steps in the funnel. Funnels become very limited in terms of actions that happened in the past. A funnel what basically does is to collect information from a point defined onwards. Therefore is not possible to analyze information backwards.
As a solution to this you can use goal flows in GA to explore traffic from where is coming and what flow is followed until it exited the website. You can have in this way an in-depth analysis of your user’s traffic as they go through the customer journey.


This process allows you to separate a segment of the visitors and analyze that data. The segment is similar to a filter. A segment is where you can use a temporary filter within the report. You can create segments to analyze and compare specific information. This data can be removed without affecting the overall available data traffic.

Share This