Solved

What is the proper method to compare histograms (or statistical data) between two data sets?

Posted on 2013-12-24
1
297 Views
Last Modified: 2014-01-22
Hi,

I have a set of data for Company A and Company B which tells me the number of times people visited each company in the month of March.

I'm seeing a stark difference between the data for each company, but I don't know how to define it or explain it mathematically (or statistically?).

In my first column of data, I have the number of times a person visited the store.
In the second column of data, I have the total number of people who visited the store that many times.

When grouping these data points into buckets, I notice quite a different drop off rate (I don't think that's the correct term) from one bucket to the next. What I'd like to know is how I can better explain this phenomenon, with the right terms to use, and also would like guidance on the correct type of analysis to do.

I've included my spreadsheet in the attachment, and catch on to concepts quickly - just need some education and guidance here please.

Thank you!
ExpExchange-Question.xlsx
0
Comment
Question by:lizziesmalls23
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 101

Accepted Solution

by:
mlmcc earned 500 total points
ID: 39739558
Let me make certain I understand the data.

For Visit Number n the visitiors is the number of distinct visitiors who viisited exactly n times.

So for store A you have 174727 different people who visited exactly once and 121058 different people who visited exactly twice.  The 121058 are not included in the 174727.
So store A had 1,345,997 different people visit

Store B had 913,080 differnet people visit.

The analysis you do is driven by the questions being asked and the information you need to provide.

One question to ask is why did one store (assuming my interpretation of the data is correct) have roughly 47% more visitors?

What questions are you trying to answer?
What issues need to be addressed?

Other issues to consider that will affect the numbers
Are these the same type of store?
  Different franchises of the same chain?
Are they  in the same city?
    If not similar cities?
Are they in similar neighborhoods?
Are they catering to the same socioeconomic class?
What is the mix of the competing stores in their neighborhood

Before commenting on the data itself these types of questions need to be answered

Example
Stores A and B are in the same city on opposite sides of the town, same store (MyMart), same size, same basic inventory of goods.  They are trying to provide inexspensive goods to the mass market.
Store A in is a lower to middle middle class neighborhood.  Heavily populated with blue collar workers.  Store B is in an upper middle class neighborhood with a majority of professional workers.

MyMart may not have the same appeal in neighborhood B.  Looking for a better merchandise, dress shirts/pants instead of khakis and jeans.
Neighborhood B may think of their time as more valuable so they try to get more out of each visit.  Go only when they really need something.  Plan better so they get all the groceries in 1 trip per week instead of making daily trips.  Have more $ or credit so they can buy for a full week or 2 at a time

mlmcc
0

Featured Post

Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Nicole
This is a research brief on the potential colonization of humans on Mars.
When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question