?
Solved

Does who answered the questions effect what formula to use?

Posted on 2012-08-22
45
Medium Priority
?
256 Views
Last Modified: 2014-05-19
Hi Experts!

This has been bugging me for years so I thought I'd see if I could get a definitive answer here. I have three scenarios. I ask questions about whether respondents like two products using a 10 point scale. There are three samples:

1) 200 people are asked both questions ("Do you like product A" & "Do you like product B")
2) 75 people are asked question one ONLY, 75 people are asked question two ONLY and 50 people are asked both questions
3) 100 people are asked question one ONLY, 100 people are asked question two ONLY

If I want to know if there is a significant difference between the mean of the answer to question one and the mean of the answer to question two, do I use the same formula to stat test these three samples?

Thanks!
0
Comment
Question by:Ed Matsuoka
  • 19
  • 13
  • 6
  • +3
45 Comments
 
LVL 17
ID: 38323065
If the 75s, the 100 and the 50 are representative samples of the 200, the same as the 200 represent the whole population then there should be no slew to the results. The smaller samples may be a bit more sensitive to exhibiting change which is why you picked 200 to begin with.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 38323222
If asked both questions, then there may be a bias in that the person may  now be comparing the first product with the second. And if the second is not as good as the first, they may say no; whereas, if only asked about the second product, they may answer yes because their liking crosses their "like it" threshold.
0
 
LVL 27

Expert Comment

by:aburr
ID: 38323704
All statistics assume randomness. If the samples are really random, the means should have the same meaning, just a different standard deviation.
But, as phoffric pointed out, the procedures are not exactly the same and the two question test might introduce a bias.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 27

Expert Comment

by:d-glitch
ID: 38325171
It might be possible to move the average value of item A up or down significantly by changing item B.  This would be  a type of Push Poll.

     http://en.wikipedia.org/wiki/Push_poll
0
 

Author Comment

by:Ed Matsuoka
ID: 38329629
Thanks for the replies, guys. I guess I should have stated the question better. I tried to be consistent with the base but realized I should have stated that the final subgroups have the same base since what I am really trying to understand is whether the fact that the samples are multually exclusive, overlap or multually inclusive affect that stats test to use. When we do studies we always rotate the order the questions are asked to minimize order bias so I am assuming (a big assumption I know) that external factors are the same.
0
 
LVL 27

Expert Comment

by:aburr
ID: 38330755
If the sample of people who are asked the questions is randomly taken from a large population each of the three methods should give the same answers as close as to make no practical difference.
But you already have the data. Are the answers taken by the three methods close to each other? If yes, do not worry, if no something non-random is going on.
0
 

Author Comment

by:Ed Matsuoka
ID: 38330807
"Are the answers taken by the three methods close to each other? If yes, do not worry, if no something non-random is going on."

My question is, do I use the same test (ANOVA, t-test) on all three data sets to determine if something non-random is going on? After all, few clients would be willing to pay for three studies, using three different methods of sampling, just to see if the three results are the same or different. What if they were? How would you determine which of the three is "most correct"?
0
 
LVL 27

Expert Comment

by:aburr
ID: 38331382
"What if they were? How would you determine which of the three is "most correct"? "
If they were, I would say there is something wrong with the test.
Hence use the same test on each method.
(I still worry a bit about proffric's comment about two questions vs one even though you said you rotate the questions. Nevertheless that is a question about poll design rather than data treatment.)
0
 
LVL 32

Expert Comment

by:phoffric
ID: 38331537
When asking two questions, I do not see how rotating them will neutralize the bias. Suppose, using an extreme case, both products are pretty good, and when asked just one question, all the respondents answer that they like it (for either product). So, that is 100% Yes for both products. But when asking two questions, a comparison is being made. And there may now be a number of No responses, whereas there were only Yes responses for the one question survey. How do you plan on removing the comparison bias for this apparent discrepancy?
0
 

Author Comment

by:Ed Matsuoka
ID: 38336839
1) aburr, I think my syntax wasn't right in that you said if they were (the same OR different) that is something wrong with the test. This would mean there HAS to be something wrong with the test. I think the problem with a layman asking questions of experts is that there are connotations to words that only the expert is aware of. What I meant was, if I asked the questions of these three samples and got three different results (and of course those differences could be .1 off since that could tip the balances from significant to not) which result should I use? The main reason why I suspect a different test should be used is that if the two questions are asked of the same people they are obviously from the same sample and knowing this could change/simplify the formula used to find differences.

2) phoffric, you are right that rotating probably doesn't neutralize the bias but it does make it less "bias'y" Just like a trial lawyer asks a question knowing even if the judge tells the jury to forget it they won't so, unless you have surveys of only one question, order bias will exist and all you can do is try to blunt its effects.
0
 
LVL 17
ID: 38337405
When you say 75 people are asked question 1 only, I understood this to mean that question 2 was omitted. Usually many other questions are asked as well to mask the interest in that particular product, or just to gain answers to unrelated surveys.
For instance:
How many people in your household?
What car do you drive?
Would you use a professional gardener?
How would you rate product A?
Have you visited the dentist in the last six months?
Do you listen to the radio?
0
 
LVL 17
ID: 38337414
Sorry it posted before I finished.
My point was that if product A is buried in many other questions then the inclusion of product B shouldn't affect the results.
I am assuming that your choice of market sample was made using the same criteria for all groups.
0
 

Author Comment

by:Ed Matsuoka
ID: 38337515
Hi RobinD! No, all the respondents in the three surveys are asked both questions. I posted the question to see if I could get an expert opinion on whether what the sample is for answering the two questions (mutually inclusive, mixed and mutually exclusive) affects what formula I use to find significant differences between them, all other things being equal.
0
 
LVL 17
ID: 38337553
That's what you have in your option 1. Your option three states that 100 people are asked question 1 only and 100 are asked question 2 only. Option two is a mix of 75, 75 unique and only 50 being asked both.
0
 

Author Comment

by:Ed Matsuoka
ID: 38337602
You're right, my mistake. I have two different problems in my (increasingly old!) head and mixed them up. Yes, I was trying to get at whether comparing the answer of two questions should use a different formula for significance between the questions if all the respondents answered both questions, 1/4 answered question1 only/1/4 answered question 2 only/1/2 answered both  or if half the respondents answered question 1 only and half answered question 2 only.
0
 
LVL 17
ID: 38338202
I think your samples could be consideted mutually exclusive unless the questions were combined as in 'Which product do you prefer, A or B?

As pointed out by others,  this bias can be introduced by asking the two rating questions in the same survey, but this didn't seem to be the object of your question here.
0
 

Author Comment

by:Ed Matsuoka
ID: 38338265
Yes, RobinD, I am not looking for info on order bias or normal sample distribution but only on whether, assuming equal bias/equal sample sizes/normal samples should a different formula to be used to determine significances if you are comparing the answers to two questions, based on whether:

1)  Two (or more) groups answered both questions
2)  Two (or more) groups had a mix of those who answered both questions & those who answered only one question
3) Half the groups answered only one question & the other half only answered the other question
0
 

Author Comment

by:Ed Matsuoka
ID: 40047444
I Just noticed that this question has remained unanswered since 2012. So I will post one more example that might be clearer:

1) I ask 200 people if they like the President BEFORE he gives a speech. I ask the same 200 people if they like the President AFTER he gives a speech. Is there a significant difference between the Yeses? (all answered both questions)

2) I randomly ask 150 people if they like the President BEFORE he gives a speech. I randomly ask 150 people if they like the President AFTER he gives a speech. Is there a significant difference between the Yeses? (some answered both questions, some answered only one question)

3) I ask 100 people if they like the President BEFORE he gives a speech. I ask the other 100 people if they like the president AFTER he gives a speech.  Is there a significant difference between the Yeses? (Half answered the BEFORE question and half answered the AFTER question)

In determining the significance, do I use the same formula or a different formulas?

Thanks!
0
 
LVL 85

Assisted Solution

by:ozo
ozo earned 150 total points
ID: 40048181
If the people in the sample are selected randomly, the expected variance in the sample would be proportional to
(number of people in sample)*(probability of yes)*(probability of no)
Whether the asking of one question might affect the answer to a different question asked of the same person is more a question of psychology than statistics, although statistics can help you to test any hypothesis you may have regarding how the asking of one question might affect the answer to another question.
0
 

Author Comment

by:Ed Matsuoka
ID: 40059214
So I would use the same formulas for means and proportions in all three cases?
0
 
LVL 17

Assisted Solution

by:Thibault St john Cholmondeley-ffeatherstonehaugh the 2nd
Thibault St john Cholmondeley-ffeatherstonehaugh the 2nd earned 150 total points
ID: 40060455
The before and after are two different questions, I think you could consider them unrelated. If the president has given many speeches before and this one is no different then people will probably give the same answer to both.
If this speech is very unlike the others then you could expect a proportion of people to change their answer.
0
 

Author Comment

by:Ed Matsuoka
ID: 40062083
Sigh. I must not be stating my case clearly. If I have data in the three separate cases above, assuming I asked a 5 point question where 5 is ""I like the president very much" and 1 is "I don't like the president at all" would I use the same formula (t-test, anova, etc.) to test the mean between the three sets of befores and afters and to test the proportions of "I like the Presidents" and "I don't like the President" against each other and if the formula(s) are different, what are they? Yes, the ideal situation might be to ask the same people the BEFORE and AFTER questions to see if the speech changed their opinion of him but if there is statistical validity to asking the other two sets of people these questions, that must translate into a formula to test them also and if so to use a formula to do it.

I hope this is a little clearer. Have a great day!
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40062197
It is very difficult to give specific answers to a fuzzy question.

With the two products poll, what question are you trying to answer?
  Is product A more popular than B?
  Is asking a group of people about two products different from asking two groups about one product each?  [This could be two T-tests.]

Similarly with the Presidential poll:
   Are you trying to measure the presidents popularity?
   Are you trying to measure the mood of the public?
   Are you trying to measure the effectiveness of a particular speech?  [This could be three T-tests.]

This may help:
     http://www.sfu.ca/~ber1/iat802/pdfs/When%20to%20use%20what%20test.pdf

To break the Presidential poll into three T-tests you need to define two groups:
     Means and SDs before and after the speech for all respondants.
     Mean and SD of the Before-Only group vs the mean and SD of the Before-After group on the Before poll.
     Mean and SD of the After-Only group vs the mean and SD of the Before-After group on the After poll.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40062224
I guess my point is that once you have the data, you can (and need to) answer the question yourself.

For the two products poll:
  1.   Compare the data for A vs B data for all respondents.
  2.   Compare  the data for A-only vs A-both.
  3.   Compare  the data for B-only vs B-both.

If 2 or 3 show a difference, then the results of 1 are suspect.
0
 

Author Comment

by:Ed Matsuoka
ID: 40062312
Thanks for getting back to me so quickly. I guess what I am asking is, for the three situations:

1) 200 people have mean of 3.6 before speech and 4.1 after speech. Is this significant?
2) 150 people have mean of 3.7 before speech and 4.0 after speech. Is this significant?
3) 100 people have mean of 3.8 before speech and 4.2 after speech. Is this significant?

After all, the formula doesn't know where it came from or what its user is trying to discover. The formula doesn't change the numbers I have to test, does it? I guess, to use a perhaps silly analogy, does it matter to the gun if I am aiming for the bottle or the pumpkin?

Or are you saying I need one formula (t-test)

1) one formula: Where BEFORE and AFTER saw the speech
2) Two formulas:Where BEFORE and AFTER saw the speech AND Where BEFORE and AFTER didn't see the speech
3) Where BEFORE and AFTER didn't see the speech

And in situation 2, if both formulas say the mean is significant, it is. If only one formula or no formula says it is significant, than it isn't?

BTW, thanks for the PDF! I will hand a copy of it out to some of our people.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40062577
Here is a better write-up for the T-test.

In the cases you've given so far, I think you can do the necessary analysis by splitting things up into two groups and using the T-test (multiple times if necessary).  But you need to know more than the means of the data sets to do a comparison.  You also need to know the spreads or variances.

The T-test formula doesn't know or care what the numbers are or where they came from, but it can only answer questions that are posed correctly.

==============================================
Looking at your presidential poll question:
Are you intentionally polling before and after a planned speech?
Or are you conducting a general poll over the course of a week and major speech happens in the middle?
Do the people you talk to before and after the speech know that you will be in touch again after the first call?
Can you ask the after-people if they've seen the speech, since you can't ask the before-people that question?

It is really hard to answer these sort of questions in the abstract.
0
 

Author Comment

by:Ed Matsuoka
ID: 40062988
I used the president example because no one seemed to understand the first one I gave but in trying to be more specific I think I am getting you lost in the specifics. I am mainly trying to understand if when I run a means or proportion test in SPSS, Excel or some other program, it matters that the 1) same respondents answered both questions 2) Some respondents answered both and some answered only one of them and 3) different respondents answered the two questions. My thought was that it didn't matter WHAT the questions were about but it matters if the two questions were answered by the same/mixed/different people. Since I have all the respondents I can get the range, standard deviation, variance, kurtosis, etc. but want to know if the same formula should be used to test these three for significance. I upped the points because I see the answer is not as clear cut as I thought it would be. Thanks for your patience!
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40063127
You started out with two products, so let's work with that.
I will make all the assumptions I think I need to make.  You can let me know if I am mistaken.

You want to gauge the public perception of two products A and B.
You do your polling and come up with two sets of data.  
You do a T-test to see if the public perception for these product is the same or different.  Call this T1.  Assume they are different.

Then you notice that some respondents were asked about both products.  And some were only asked about one.
Now you don't care about the products anymore.  Now you are worried about the polling method.
Here the question is:  Does it make a difference if you ask about a person about more than one product?
You do one T-test to compare the A-only data with the A+B data.  Call this T2.
You do a second T-test to compare the B-only data with the A+B data.  Call this T3.

If T2 and T3 show no significant difference, then maybe you are done.  
You can say, with some justification, that it doesn't matter if you ask a person about one or two products in the same poll.

But suppose T2 and/or T3 do show a significant difference.  Is there a problem with your protocol?
You could do some more tests with the data you have, but what is the question you want to answer?
Try this one:  If you ask one person about two products, does it make a difference which one is mentioned first?
Now you will only look at the data from the people who graded both products.  
Hopefully you have randomized and kept track of this level of detail.
So you do a T-test on rating of Product A in the A+B group versus the B+A group.  Call this T4.
Finally another T-test on Product B in the A+B group versus the B+A group.  Call this T5.
====================================================================================

It is probably much better to design the polling protocol rigorously in advance than to worry about how to fix things afterwards.
If you asked people to answer both questions, and they only answered one, maybe those results should be thrown out.
If you asked them to pick one and answer it, and they answer both anyway, consider throwing out those.
If you asked some people about one product or the other, and asked others about both, then maybe you should be thrown out.

If you know what you are trying to find out with a poll, and design a good protocol, you will know in advance what sort of tests you need to run on the data.

If you have a bunch of data dumped on you, and you're trying to mine it, then you have different and harder task.
0
 

Author Comment

by:Ed Matsuoka
ID: 40063185
I keep trying to find a way to explain my problem. I hate to waste your time so let me try one last thing.

1) You ask 200 people if they like the president BEFORE his speech. You ask the same 200 people if they like the president AFTER a speech. What formula would give you a way of finding out if there was a significant difference in approval after he gives his speech? (SAME PEOPLE)

2) You ask 200 people if they like the president. Now you use the answer to a previous question to see that there are 150 coffee drinkers and 150 tea drinkers.  Obviously there will be people who drink both. You want to know if there is a significant difference between coffee and tea drinkers as far as liking the president. (SOME SAME/SOME DIFFERENT). What formula would you use?

3) You ask 150 men and 150 women if they like the president. What formula would test if there is a significant difference between the means/proportions of men vs. women? (DIFFERENT PEOPLE)

Note that the thing I'd like would be the formula/formulas to use. If it is the standard student't t-test, I can get that from the Web but I suspect it is not a Swiss Army knife that can be used for ALL tests of means and proportions. Ideally, I'd like to know if the formula(s) would be different if we assume or don't assume a standard population or a small/large population but that might be pushing it! I also know that if there are more than two objects being compared you have to bring in ANOVA and/or the Marascuilo procedure so I am limiting my question to two things.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40063562
Q1 and Q3 are straight forward, well posed questions.  The T-test is appropriate for the analysis.

Q2 is trickier.  You have to decide exactly what you want to know.
You can use the T-test to compare:
  1.  Coffee drinkers vs non-coffee drinkers
  2.  Coffee only drinkers vs tea only drinkers
  3.  Caffeine drinkers vs caffeine abstainers

You could also use the T-test to do three pair wise comparisons:
    Coffee only  vs  tea only
    Coffee only  vs  coffee+tea
    Tea only vs coffee+tea
0
 

Author Comment

by:Ed Matsuoka
ID: 40064672
So if all I had were the means, standard deviations and variances for coffee drinkers and tea drinkers, you are saying I couldn't use them to come up with a significant difference? THAT is the argument I've had with others since I always felt you could; after all if 3.7 is significantly different from 4.2 (plugging these and the standard deviation/variance into one of a hundred programs on the web) it HAS to be significantly different if some of those people have the same answer since the same answer CAN'T be significant from itself and should drag down any possible significance.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40065205
>>  So if all I had were the means, standard deviations and variances for coffee drinkers and tea drinkers,
       you are saying I couldn't use them to come up with a significant difference?

I am not saying that at all.  If you have the raw data [200 questionnaires with Coffee Y/N, Tea Y/N, President 0-10], then you can use the T-test to check any of the hypotheses I mentioned earlier or any other one you can come up with.  Whether a particular test shows significance or not depends on the actual data.
0
 

Author Comment

by:Ed Matsuoka
ID: 40067891
But you said it I had overlapping data I would need to run 3 t-tests (Coffee only  vs  tea only,   Coffee only  vs  coffee+tea,   Tea only vs coffee+tea) so I thought that meant I couldn't run just one.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40069033
In that same post I also listed three single tests you could run:

      1.  Coffee drinkers vs non-coffee drinkers
      2.  Coffee only drinkers vs tea only drinkers
      3.  Caffeine drinkers vs caffeine abstainers

You have to start with the question you want to answer, then see if you have the relevant data and a method of analyzing it.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40069036
You can also see what data you have and what questions you can answer with it.  This is less likely to be useful.
0
 
LVL 85

Expert Comment

by:ozo
ID: 40069165
Any significance test basically asks what is the probability that the observation could be the result of random variance in your sample, even in absence of any pattern in the population.
There may be different ways of defining the set of observations one is interested in, but testing more hypotheses increases the chance that one of them will accidentally appear significant.
http://xkcd.com/882/
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40070428
Looking back at the original question (is this really two years old?) and all the comments.

I believe aburr gave the correct answer here first:   http:#a38330755

In summary, you realize there may be issues with the way the data was collected,but you assume it is safe to ignore them, at least initially, and use the T-test to analyze the data.

I reiterated his answer with more detail here:    http:#40063127
After you do the first analysis, you can use the T-test again to check your assumptions.
0
 

Author Comment

by:Ed Matsuoka
ID: 40075293
I like the answers you both gave but my problem is, as you phrased it "If you have a bunch of data dumped on you, and you're trying to mine it, then you have different and harder task." Is this harder task a different formula? Or are you both saying I HAVE to use three tests and then, as if this were a democratic election, say it is significant if two of the three results are significant? (Don't worry, I won't keep this merry go round going much longer!)
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 40075368
I think we are both saying that you start off by assuming that you have good data, and run whatever tests are appropriate.

But if you are aware of issues in the way the data was collected, then you may want to address them.  And one way to address them may be by running additional tests.

There is no special formula or prescription for dealing with bad data, because there are so many ways bias can creep into your data in real world practice.  Statistical analysis is as much art as science.  You need to figure out what you want to know, what data to collect, how to analyze it, and what can go wrong.

In your Product A vs Product B case:
     You notice that some people rated both products and some rated only one.  
     You are concerned that this may introduce a bias.
     So do the appropriate initial analysis (the T-test) and then look a little deeper
     if you can.
0
 

Author Comment

by:Ed Matsuoka
ID: 40075388
Hmm, I guess we really have either a philosophical or semantic discussion, if I want to know if there is a significant difference between coffee and tea drinkers in my example above, you are implying my data is "bad". So to get around this "bad" data, you are saying run three t-tests using the exact same formula and if two of the three say there is a significant difference than there is, else there is not. I will do that for instances where I can't collect new data as long as that is that's the statistically valid way to do it. If I remember my basic math (which at my age, I mostly don't)  then the only difference between running three t-tests and running one is that the three would be using smaller sample sizes than the one. Right?
0
 
LVL 27

Assisted Solution

by:d-glitch
d-glitch earned 900 total points
ID: 40075814
I don't think there is a philosophical or semantic difference.  
I think the problems come from trying to answer poorly defined questions with hypothetical data which may have hypothetical flaws.

So what is the question?  And what is the problem?  
Potential questions and problems related to the Presidential survey:

Q:  What is the President's popularity?
P:  He gave a major speech while the poll was being conducted.

Q:  How effective was the President's speech?
P:  Some respondents answered only Before or After.  Some answered both.

Q:  How effective was the President's speech?
P:  Some of the respondents drink coffee and some drink tea.

Q:  Does drinking Coffee or Tea affect how people feel about the President?
P:  Some people drink both, and there was a major speech while the poll was going on.

If you start with a well formed question, and design the polling protocol carefully, and decide beforehand what tests you will need to run, then you won't have problems justifying your results.  If you don't do this, you are likely to have a harder time.

For example, this experiment might be well designed:
    1) You ask 200 people how they feel about the president BEFORE his speech [0 to 10].
        You ask the same 200 people how they feel about the president AFTER his speech [0 to 10].
        You decide beforehand to throw out anyone who didn't see the speech.
        You decide before hand to analyze the data using the T-test.

Note that I am not saying that any particular data is "bad."  
In fact, I can't tell if the data is good or bad, or how good or how bad it is.
The point is that there is a way to analyze the data, even after it is taken, to see if there is any bias.
==========================================================

Your initial post for this question listed three ways to run a survey, and you realized, suggested, or worried that there were potential problems with the way they were conducted.

S1:   200 people asked about A and B
P1:   Is there a difference between A-B vs B-A.

S2:  75 people asked about A.  75 about B.  50 about A and B.
P2:  Is there a difference between A-B vs B-A.
       Is there a difference between asking about two products vs one.

S3:  100 people asked about A.  100 about B.  
P3:  The sample sizes may be smaller than you like.
==========================================================

S3 is the cleanest, if the sample size is large enough.

S1 is probably okay, as long as you make sure to balance the A-B vs B-A order.  
And you can do additional tests to see if the order biases the results.

S2 has the most complications, but you still do the main A vs B analysis, and look at the problems as well.
==========================================================

One way to look at this issue, even you get the data dumped on you after the survey is done, is to ask how you wish it had been run to eliminate all the potential biases you can think of.  Then you can try to think of tests you can run on the data you have to see if any of those biases have actually crept in.
0
 

Author Comment

by:Ed Matsuoka
ID: 40076011
Appreciate all the thought you have put into this so I won't take up much more of your time. But I do not understand what you mean by bias. After all, isn't bias a non-statistical way of saying significance? If the President's speech is good, doesn't that bias the AFTER results? And if I use a really good brewing method for the coffee and a really bad brewing method for the tea, am I not biasing the results? My question is, understanding there is bias, can't I use a test (or tests) to show that the bias is significant? That if the rating for liking the coffee is 3.6 and the rating for tea is 4.5, can't I say MY SAMPLE likes tea significantly more than they like coffee or are you saying because there is what you call bias I can't make any statistical statements about the data? There is the only place I think we are at an impasse. If I have a mean of one million coffee drinkers and one million tea drinkers, my common sense says that even if two hundred thousand of them drink both, if the rating for liking the coffee is 3.6 and the rating for tea is 4.5 THERE IS A SIGNIFICANT DIFFERENCE between them.  It seems at some point the overlap doesn't matter.
0
 
LVL 27

Accepted Solution

by:
d-glitch earned 900 total points
ID: 40076155
>> Isn't bias a non-statistical way of saying significance?
Not exactly, but close enough for the moment.

>> If the President's speech is good, doesn't that bias the AFTER results?
What is the question?  Are you trying to determine presidential popularity or speech effectiveness?
A good speech may or may not affect the president's popularity.  You actually have to run the test to find out.

>> And if I use a really good brewing method for the coffee and a really bad brewing method for the tea, am I not biasing the results?
This would be a horrible way to run a taste test, but I thought you were conducting a poll.
You have to describe one survey/experiment completely, then dig into the details.

>>  My question is, understanding there is bias, can't I use a test (or tests) to show that the bias is significant?
You may think or worry that there is a bias, but you can't tell for sure until you run the test.

>> If the rating for liking the coffee is 3.6 and the rating for tea is 4.5, can't I say MY SAMPLE likes tea significantly more than they like coffee?
Even if all you are asking your sample about is coffee vs tea, you actually have to do the test before you say anything about significance.

In an earlier post I said Here is a better write-up for the T-test and forgot to put in the link.
     http://www.socialresearchmethods.net/kb/stat_t.php

Even with very large samples, you can have a large difference in means that is not significant.  You can't look at just the two means and say anything about significance.  You have to actually do the calculations.
==================================================================================

>>  If I have a mean of one million coffee drinkers and one million tea drinkers, my common sense says that even if two hundred thousand of them drink both, if the rating for liking the coffee is 3.6 and the rating for tea is 4.5 THERE IS A SIGNIFICANT DIFFERENCE between them.

This is really where we disagree:
If you have the data, I think it would be much better to do the T-test calculations than to rely on common sense.

Here again, you have collected and dumped the data without specifying the problem carefully.

If you have that much data, you can include all the people that drink both coffee and tea.
Or you can throw out all the people that drink both.
Or you can look at only the people that drink both.
And you should probably do all these three of these tests and more.
0
 

Author Comment

by:Ed Matsuoka
ID: 40076316
Okay. So I should probably do all three and use the 2 of 3 method of stating significance. That is the most frustrating parts of statistics for a layman like me, that you can run stats that shouldn't be run. I have run many a means test in SPSS where there was such an overlap and I am pretty sure it didn't run three tests but only one. And there are probably many other stats in SPSS and Excel where it could know that tests shouldn't be run but run it anyway. Oh well, thanks for all your hard work!
0
 

Author Closing Comment

by:Ed Matsuoka
ID: 40076344
I am still a little puzzled because if I use a "democratic" method of assigning significance, since those who answered both CAN'T be significant I am left with the possibility of a tie. Then do I say there is a significant difference if the one case is 99% significant and the other is 94% not or is it only significant if both cases are significant? Still, the expert did try to pound into my head the importance of context.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Complex Numbers are funny things.  Many people have a basic understanding of them, some a more advanced.  The confusion usually arises when that pesky i (or j for Electrical Engineers) appears and understanding the meaning of a square root of a nega…
Article by: Nicole
This is a research brief on the potential colonization of humans on Mars.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question