Solved

Calculate Confidence Level and Number of required samples based on supplied data

Posted on 2013-01-25
11
899 Views
Last Modified: 2014-01-18
Hi,

If I supply say 50 values in col A, how would I calculate the confidence interval and number of samples required to achieve a defined confidence level of say 95%?

So if my samples only returned a confidence level of say 46% it would say I need an additional 100 samples to achieve 95%?

Cheers
0
Comment
Question by:MirageSF
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
11 Comments
 
LVL 92

Accepted Solution

by:
Patrick Matthews earned 500 total points
ID: 38822401
MirageSF,

Please state more clearly what you are trying to do, because I think you may be confused as to what a confidence interval for a sample mean is :)

Suppose you drew a random, unbiased sample of 50 items from a normally distributed population.  Further suppose the following:
You know neither the true population mean nor the true population variance
Your sample mean is 100
Your sample standard deviation is 15
Your desired confidence level is 95%

The confidence interval for your sample mean is a function of the sample mean, the sample standard deviation, and the confidence level.

Based on the above, first you would compute the standard error:

=sample_std_dev / sqrt(sample_size)
=15 / sqrt(50)
~ 2.12

Next, find the t value associated with 95% confidence and 50 degrees of freedom.  This is about 2.01.

Multiply the two to get your margin of error, which in this example is ~ 4.26.

Now, your confidence interval is (sample mean) +/- (margin of error), or 95.74 - 104.26.

What this means is that we expect that there is a 95% chance that the true population mean is a value within the range 95.74 - 104.26.

Taking a bigger sample is not going to increase your confidence level.  Rather, having a larger sample, ceteris paribus, makes the width of the confidence interval narrower.

Please see the attached file for an example of how I would do this.

Q-28009322.xlsx

Patrick
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 38822403
This web page does a good job of explaining it: http://stattrek.com/estimation/confidence-interval-mean.aspx
0
 
LVL 27

Expert Comment

by:aburr
ID: 38822456
Getting more experts to look at you question will not help much.
the key part of the post by matthewspatrick is

"Taking a bigger sample is not going to increase your confidence level.  Rather, having a larger sample makes the width of the confidence interval narrower."

He has given you a good introduction to sampling and a link to more infromation which should help you understand and answer the problem.
0
Salesforce Made Easy to Use

On-screen guidance at the moment of need enables you & your employees to focus on the core, you can now boost your adoption rates swiftly and simply with one easy tool.

 

Author Comment

by:MirageSF
ID: 38822483
Hi,

Thanks for that I have added to my sheet, but not making much sense, using the deviation difference between the mean and 2points before and after the mean I get around 95%+ of values in that range, and yet the proper confidence calculation to the right of the sheet only shows about 6-7% of values fall between the mean and confidence range?

Sample attached
SamplePX.xlsm
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 38822587
You wrote:

using the deviation difference between the mean and 2points before and after the mean I get around 95%+ of values in that range, and yet the proper confidence calculation to the right of the sheet only shows about 6-7% of values fall between the mean and confidence range

That is not how you use a confidence interval.  As I indicated above, the common usage for a confidence interval is to show how the precision of a sample mean when it is used to estimate a population mean.

You cannot use a confidence interval in the way you tried above.  That is not what it is designed to do.

In a normally distributed population, it is true that approximately 95% of the members will be within two standard deviations of the mean.  However:
That is not the same thing as saying "95% will be within the confidence interval for the sample mean"
In looking at how your data are being generated, your source data are not themselves normally distributed
0
 

Author Comment

by:MirageSF
ID: 38822598
Is there a way to fix that so it produces similar stuff as to what is in the example, but normal dist?
0
 

Author Comment

by:MirageSF
ID: 38822690
Sweeping the net I found this interesting command NORMINV(RAND(),$B$1,$B$2), this simple command does away with my SHEET2, which created 5 random numbers and then the average of these was used to create the Central Limit Theroum, this command also allows me to define the Average base and Standard deviation base.
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 38822860
Yes, if you wanted to generate some fake data assuming a normal distribution, you would use that function.  (If you are running Excel 2007 or later, I recommend using NORM.INV() instead; that version of the function performs better both in terms of speed and fidelity.)

Indeed, when I built my sample file posted in http:#a38822401, I used =ROUND(NORM.INV(RAND(),100,15),0) to generate my data points.

But now we're getting kind of far afield.  I think your original question has been answered :)
0
 

Author Closing Comment

by:MirageSF
ID: 38822909
Thanx for that, whilst were on NORM.INV is it possible to get it also to work within a range, say 5-18?
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 38823171
Yes, you could, but be very, very careful about doing that: you could end up with data that are not in fact normally distributed.

To generate random data like that:

=MEDIAN(5,18,NORM.INV(RAND(),mean,st_dev))

The problem comes where you use something like 11.5 for the mean and 7.5 for the standard deviation.  If you force all the values to be between 5 and 18, your sample will not be normally distributed, because you are truncating the tails.
0
 
LVL 24

Expert Comment

by:Steve
ID: 38825893
OK, I thought I would add my 2 cents on this one:

The calculation of Sample size to give an accuracy (resolution) with a 95% CI is, I belive, what the essence of the question is about.

To determine sample size you need to know:
Desired Resolution r
Process Deviation s

95% CI  =  x +/- r   =  x+/- 1.96s/ sqrrt(n)
95% CI  ~  x+/- 2s/sqrrt(n)
Resolution r = 2s/sqrrt(n)

n =  (2s/r)^2

So for example if we wish to estimate cycle time within +/- one minute (r=1)
And we estimate the standard deviation to be five minutes (s = 5)

n = ((2*5)/1)^2 = 100

So if you know the reolution you are trying to acheive you can calculate the nescessary sample size to acheive a 95% confidence.

I hope this makes sense, I had to get my old SixSigma Black Belt notes out for this one :)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to get Spreadsheet Compare 2016 working with the 64 bit version of Office 2016
This article describes how to use a set of graphical playing cards to create a Draw Poker game in Excel or VB6.
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question