Solved

What is the proper method to compare histograms (or statistical data) between two data sets?

Posted on 2013-12-24
1
294 Views
Last Modified: 2014-01-22
Hi,

I have a set of data for Company A and Company B which tells me the number of times people visited each company in the month of March.

I'm seeing a stark difference between the data for each company, but I don't know how to define it or explain it mathematically (or statistically?).

In my first column of data, I have the number of times a person visited the store.
In the second column of data, I have the total number of people who visited the store that many times.

When grouping these data points into buckets, I notice quite a different drop off rate (I don't think that's the correct term) from one bucket to the next. What I'd like to know is how I can better explain this phenomenon, with the right terms to use, and also would like guidance on the correct type of analysis to do.

I've included my spreadsheet in the attachment, and catch on to concepts quickly - just need some education and guidance here please.

Thank you!
ExpExchange-Question.xlsx
0
Comment
Question by:lizziesmalls23
1 Comment
 
LVL 100

Accepted Solution

by:
mlmcc earned 500 total points
ID: 39739558
Let me make certain I understand the data.

For Visit Number n the visitiors is the number of distinct visitiors who viisited exactly n times.

So for store A you have 174727 different people who visited exactly once and 121058 different people who visited exactly twice.  The 121058 are not included in the 174727.
So store A had 1,345,997 different people visit

Store B had 913,080 differnet people visit.

The analysis you do is driven by the questions being asked and the information you need to provide.

One question to ask is why did one store (assuming my interpretation of the data is correct) have roughly 47% more visitors?

What questions are you trying to answer?
What issues need to be addressed?

Other issues to consider that will affect the numbers
Are these the same type of store?
  Different franchises of the same chain?
Are they  in the same city?
    If not similar cities?
Are they in similar neighborhoods?
Are they catering to the same socioeconomic class?
What is the mix of the competing stores in their neighborhood

Before commenting on the data itself these types of questions need to be answered

Example
Stores A and B are in the same city on opposite sides of the town, same store (MyMart), same size, same basic inventory of goods.  They are trying to provide inexspensive goods to the mass market.
Store A in is a lower to middle middle class neighborhood.  Heavily populated with blue collar workers.  Store B is in an upper middle class neighborhood with a majority of professional workers.

MyMart may not have the same appeal in neighborhood B.  Looking for a better merchandise, dress shirts/pants instead of khakis and jeans.
Neighborhood B may think of their time as more valuable so they try to get more out of each visit.  Go only when they really need something.  Plan better so they get all the groceries in 1 trip per week instead of making daily trips.  Have more $ or credit so they can buy for a full week or 2 at a time

mlmcc
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Excel VBA - Get files in a folder 5 78
How to add the word OR to the front of each Excel value 4 36
Hard coding time and date into Excel 2 30
Freeze Panes Solution 6 28
This article seeks to propel the full implementation of geothermal power plants in Mexico as a renewable energy source.
This article provides a brief introduction to tissue engineering, the process by which organs can be grown artificially. It covers the problems with organ transplants, the tissue engineering process, and the current successes and problems of the tec…
The viewer will learn how to simulate a series of sales calls dependent on a single skill level and learn how to simulate a series of sales calls dependent on two skill levels. Simulating Independent Sales Calls: Enter .75 into cell C2 – “skill leve…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now