?
Solved

Filtering an exported Google Analytics report

Posted on 2014-04-29
9
Medium Priority
?
729 Views
Last Modified: 2016-02-24
Recently I recognized that my GA statistics were skewed by a few visitors whose visits accounted for a disproportionately large percentage of total sessions.  I've applied filters now, so that data going forward will exclude those visits.  I understand that filters can't be applied retroactively, but I've heard that data can be exported, and then filtered somehow.  If this is so, I'd appreciate step-by-step guidance on how to export data, and remove visits from certain cities, so that the bounce rate, duration of sessions and number of pages visited could be evaluated without the bias of those frequent visitors.
0
Comment
Question by:ddantes
  • 6
  • 3
9 Comments
 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40030103
Hi David,
I'm working on a solution for you. Should be able to post it here in less than a half-hour. Regards, Joe
0
 
LVL 57

Accepted Solution

by:
Joe Winograd, EE MVE 2015&2016 earned 2000 total points
ID: 40030165
Hi David,

At GA, in the navigation on the left:

Audience
Geo
Location

Then in the links for the fields, click City:

ga cityAt the bottom, click the "Show rows" drop-down and select 5000 (unfortunately, there's no way to show all rows, so if there are more than 5,000, you'll have to piece them together):

ga show rowsThen at the top, click Export>Excel (XLSX):

ga exportYou'll now have an Excel spreadsheet with two tabs — Summary and Dataset1. Click Dataset1 and you'll have something like this:

ga spreadsheet citiesAt the bottom in the last row will be the totals:

ga spreadsheet bottomHowever, they are not formulas, so you'll need to build your own formulas, but that shouldn't be tough. Sort the spreadsheet by the City column and you'll be able to delete easily all the rows containing the cities that you want to exclude. Then delete the last row with the hard-coded totals and create formulas to give you the statistics that you want. Regards, Joe
0
 

Author Comment

by:ddantes
ID: 40030387
Hi Joe:  I had a premonition you might answer this question.  Thanks for your instructions!  Now that I've exported the data and removed rows from cities I wish to eliminate, I need to create formulas to replace the hard-coded data in the last row.  Sorry to admit, I'm clueless as to how to create those formulas.  I'm interested in bounce rate, time spent on site, and number of pages visited.  I increased the points for this question because the scope of it is more detailed than I had anticipated.
0
How to change the world, one degree at a time.

By embracing technology, we can solve even the biggest problems—including the gender gap.  By earning a degree from WGU, you have an opportunity to gain the knowledge, credentials, and experience it takes to thrive in today’s high-growth IT industry.

 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40030558
Hi David,
The formulas below show the last row with a city as 73, meaning the Totals row is 74. This is arbitrary — just wanted a noticeable row number. Of course, you will change that number to whatever row in your spreadsheet has the last city (and plus one for the Totals row).

Bounce rate:

(1) At the end of Sessions (in cell B74), put in a Sum formula:

=SUM(B2:B73)

This will now be the total number of sessions with the "bad" cities removed.

(2) Add a new column (K) called Bounced Sessions. The formula for it is Sessions (column B) times Bounce Rate (column E). Put this in cell K2:

=B2*E2

(3) Do a Copy Down from K2 to K73. This will automatically adjust the formula, so K3 will be =B3*E3, K4 will be =B4*K4, etc. This means each cell in the K column will have the number of bounced sessions for each city.

(4) At the end of the Bounced Sessions column (in cell K74), put in a SUM formula:

=SUM(K2:K73)

This is the total number of bounced sessions with the "bad" cities removed.

(5) The overall Bounce Rate is the total number of Bounced Sessions (K74) divided by the total number of Sessions (B74):

=K74/B74

Put this in whatever cell you want, but, of course, after the last row or column so it doesn't interfere with calculations.

While you're working on Bounce Rate, I'll think about Pages/Session and Session Duration. Regards, Joe
0
 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40030636
Pages/Session:

(1) Add a new column (L) called Pages. The formula for it is Sessions (column B) times Pages/Session (column F). Put this in cell L2:

=B2*F2

(3) Do a Copy Down from L2 to L73. This will automatically adjust the formula, so L3 will be =B3*F3, L4 will be =B4*L4, etc. This means each cell in the L column will have the number of pages for each city.

(4) At the end of the Pages column (in cell L74), put in a SUM formula:

=SUM(L2:L73)

This is the total number of pages with the "bad" cities removed.

(5) The overall pages per session is the total number of Pages (L74) divided by the total number of Sessions (B74):

=L74/B74

Put this in whatever cell you want, but, of course, after the last row or column so it doesn't interfere with calculations.

Will work on Avg. Session Duration soon, but need to address an issue right now on the home front. Regards, Joe
0
 

Author Comment

by:ddantes
ID: 40030654
Perfect results so far, Joe.
0
 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40030784
> Perfect results so far, Joe.

Glad to hear it!

Avg. Session Duration:

(1) Add a new column (M) called Durations. The formula for it is Sessions (column B) times Avg. Session Duration (column G). Put this in cell M2:

=B2*G2

(3) Do a Copy Down from M2 to M73. This will automatically adjust the formula, so M3 will be =B3*G3, M4 will be =B4*G4, etc. This means each cell in the M column will have the duration of all sessions for each city.

(4) At the end of the Durations column (in cell M74), put in a SUM formula:

=SUM(M2:M73)

This is the total duration of all sessions with the "bad" cities removed.

(5) The overall average session duration is the total duration of all sessions divided by the total number of Sessions (B74):

=M74/B74

Put this in whatever cell you want, but, of course, after the last row or column so it doesn't interfere with calculations.

In the end, all three metrics could be handled with the same technique, but that wasn't obvious going in — I needed to look at exactly what data GA was capturing before I was able to reach that conclusion. Regards, Joe
0
 

Author Comment

by:ddantes
ID: 40030914
Thanks, Joe.  I've been relying on data which is skewed by sampling error, and drawing invalid inferences from it.  This will be a big help.
0
 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40030947
David,
You're very welcome. As always, happy to help — or at least try to. Regards, Joe
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Read on to get a few ideas on how to promote your next corporate event.
In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to choose which pages of your form are visible to your users based on their inputs. The page rules feature provides you with an opportunity to create if:then statements for y…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

750 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question