asked on

should I do chi-analysis with cross-tabulation?

Two questions in my survey:
I got a question that ask companies whether they will join a portal.
I got another question that ask companies what kind of size they are in terms of staff strength.

By doing a cross-tabulation, I find that small and accelerating growth companies will be willing to join from the cross-tab.
Intially , I thought only small companies will join.

Question: should I continue with chi-test or chi-analysis?
or is cross-tab enough to prove my hypothesis ?

How can i do chi-test when I m not sure whats the expected value?

ASKER CERTIFIED SOLUTION

d-glitch

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

spiral

ASKER

ok but if i ask them

whether they are just startups, moderate growth, declining.... (which are a few groups NOT two)

should be ok to do chi-test right?

I am trying to standardise the method to use for doing analysis on my survey qns... i got 60+ data...by the time i plot the graph for that 1 question will be very tedious and it will take too long for just doing analysis on 1 question.

JJayJay

There are two reasons to perform a chi-squared-test.

1. To prove with an absolute unsertainty that your data isn't just random static (that there's a correlation between company size and their willingness to join (test for independence).

2. To prove a with an absolute unsertainty a hypothetical formula for the covariation of company size and willingness to join.

I can only tell you about the first. It's fairly simple.

Here the expected value, for any given cell in your crosstab, are equal to the product of the marginal counts divided by the total count.
You'll have to put buisness size into as many groupings as you deem appropriate.

Once you got the expected values you can calculate the chi-squared value for each cell, and their sum is the chi-squared value of the crosstab.

Basically, if you don't do a chi-squared test for independence, you can't prove that your apparently convincing data isn't just a result of an unlucky (and/or too small) draw.

To be even more statistically correct, you should start with a hypothesis saying that there's no correlation, then once you prove that the chances of that being true is only 5% (or 2.5% or 1% or whatever). Then you can accept the alternative hypothesis which is that small and accelerated growth companies will be willing to join.

It would be statisticly incorrect AFAIK to take the correlation koeffecient between a nominial and scale variable.

So, to sum it all up, yes, you should do a Chi-squared test for independence at the very least.

spiral

ASKER

some of the cells in my cross-tab are 0 and 3 are 1s. Can I still do a chi-test? i got 63 responses.

Can someone point me to a good resource that explains regression analysis and linear regression? trying to decide which is the best approach here.

JJayJay

How many companies did you ask?

IIRC then yes, you can do a chi-test with those numbers, but you'll have to place the uncertainty limit pretty high in order to take your alternative hypothesis.

One might argue that _because_ your numbers are so small, it would be necessary to prove that your results aren't just random.

I don't know much about regression, maybe you should try this site:
http://hyperphysics.phy-astr.gsu.edu/hbase/hframe.html

JJayJay

You could also just create larger groups with regards to buisiness size, that would put more companies in each square.

spiral

ASKER

i got 63 responses.... but when ask whether the company is decling (not making business) few answered so resulting in some cells 0. What do u mean by larger groups with regards to business size? can give an example?

JJayJay

say you have a croostab of: "size" by "willingness to join"

You create a crosstab with 2 coulums (will join jes/no)
and 8 rows, for 8 different size categories for the companies.

Say if you were to group the 8 groups into 4, you would have more responces in each cell.
You should group them where responces are thin. For example if there are few responces in size groups 1-3 and 7-8 you should group them, that would give you 5 groups:
"1-3"
"4"
"5"
"6"
"7-8"

If this doesn't make much sense, could you please explain your data, goal and problems in more detail.

spiral

ASKER

ok my company's group according to
startup,
Acelerating. Growth,
Moderate Growth,
Mature,
Declining.

Whether they like to join the website:
Yes,
No,
Dunno

so I got 1 cross-tab

yes no dunno
Growth stage 10 6 4
4 3 4
13 6 1
7 3 1
..declining 0 0 1
-----------------------------------------------------
34 18 11

spiral

ASKER

I like to prove that only start-ups will be likely to be the most to join...but from the cross-tab, seems half-correct.

JJayJay

Right so that gives us:

Yes No marginal
Accelerated 10 6 16
4 3 7
13 6 19
7 3 10
Declining 0 0 0
-Marginal 34 18 52

The chi-value is 0,047235736

Which means that there is is a dependence between growth rate and willingnes to join.

I'll get back to you later

JJayJay

I miscalculated before, here's the result:

Count
              Yes No
Startup             10      6      16
Accelerating growth 4        3        7
Moderate growth        13        6        19
Mature             7      3      10
Declining             0      0      0
              34 18        52



Expected
              Yes        No
Startup             10,46      5,54      16
Accelerating growth      4,58        2,42 7
Moderate growth        12,42        6,58        19
Mature             6,54      3,46      10
Declining             0        0        0
              34        18        52




Chi-squared
             Yes        No
Startup             0,02      0,04
Accelerating growth 0,07        0,14
Moderate growth        0,03        0,05
Mature             0,03      0,06
Declining

Sum:      0,44 <------ This is chi-squared
For a 5% significance level the critical value with 3 degrees of freedom (I excluded the declining growth) is 7,81
Thus your data is significant and not just random.

As you said, the data doesn't meet up to your expectations.

Theory: Maybe accelerated growth companies allready have many offers like yours and are thus not so ready to join, while startup companies and moderate growth companies will take anything they can get?

spiral

ASKER

question: how do u get the critical value of 7.81? 0.44 u round up to 5% significance?

according to the guide, ch-square has some requirements and my data has zero and 1s so it doesnt fulfill the requirements. But is it ok if i leave out the data and do a chi-test (like what u did)?

The sample must be randomly drawn from the population.
Data must be reported in raw frequencies (not percentages);
Measured variables must be independent;
Values/categories on independent and dependent variables must be mutually exclusive and exhaustive;
Observed frequencies cannot be too small.
1) As with any test of statistical significance, your data must be from a random sample of the population to which you wish to generalize your claims.

2) You should only use chi square when your data are in the form of raw frequency counts of things in two or more mutually exclusive and exhaustive categories. As discussed above, converting raw frequencies into percentages standardizes cell frequencies as if there were 100 subjects/observations in each category of the independent variable for comparability. Part of the chi square mathematical procedure accomplishes this standardizing, so computing the chi square of percentages would amount to standardizing an already standardized measurement.

3) Any observation must fall into only one category or value on each variable. In our footwear example, our data are counts of male versus female undergraduates expressing a preference for five different categories of footwear. Each observation/subject is counted only once, as either male or female (an exhaustive typology of biological sex) and as preferring sandals, sneakers, leather shoes, boots, or other kinds of footwear. For some variables, no 'other' category may be needed, but often 'other' ensures that the variable has been exhaustively categorized. (For some kinds of analysis, you may need to include an "uncodable" category.) In any case, you must include the results for the whole sample.

4) Furthermore, you should use chi square only when observations are independent: i.e., no category or response is dependent upon or influenced by another. (In linguistics, often this rule is fudged a bit. For example, if we have one dependent variable/column for linguistic feature X and another column for number of words spoken or written (where the rows correspond to individual speakers/texts or groups of speakers/texts which are being compared), there is clearly some relation between the frequency of feature X in a text and the number of words in a text, but it is a distant, not immediate dependency.)

5) Chi-square is an approximate test of the probability of getting the frequencies you've actually observed if the null hypothesis were true. It's based on the expectation that within any category, sample frequencies are normally distributed about the expected population value. Since (logically) frequencies cannot be negative, the distribution cannot be normal when expected population values are close to zero--since the sample frequencies cannot be much below the expected frequency while they can be much above it (an asymmetric/non-normal distribution). So, when expected frequencies are large, there is no problem with the assumption of normal distribution, but the smaller the expected frequencies, the less valid are the results of the chi-square test. We'll discuss expected frequencies in greater detail later, but for now remember that expected frequencies are derived from observed frequencies. Therefore, if you have cells in your bivariate table which show very low raw observed frequencies (5 or below), your expected frequencies may also be too low for chi square to be appropriately used. In addition, because some of the mathematical formulas used in chi square use division, no cell in your table can have an observed raw frequency of 0.

spiral

ASKER

d-glitch: are we wrong here? please discuss this issue.

spiral

ASKER

please discuss this issue so i can award the points to whoever can enlighten me. Thanks. It is urgent.

JJayJay

Basically the only one that's potentially problematic is the last one, as the expected frequencies are as low as 2,42 3,46 and 4,58.
If you feel this is significant, you should not make the test.

Actually, I think I goofed up again :(

You see, what we are testing is the probability that our hypothesis (that they're independent) is true. The 5% significance level is a decision that if there's 5% or more probability of this being true, we keep the hypothesis, otherwise we don't.
That is, if our number is larger than the critical value, the two variables are dependent.
If our number is smaller than the critical value we have to assume that they're independent.

With a chi-square-value of only 0,44 we have to assume that the two variable are independent.
Contrary to the example in the guide our value is less than the critical:

>Table 1's chi square value of 14.026, with 4 degrees of freedom, handily clears the related
>critical value of 9.49, so we can reject the null hypothesis and affirm the claim that male
>and female undergraduates at University of X differ in their (self-reported) footwear
>preferences.

This unfortunately means that you can't conclude anything much from your data.

Correct me if I'm wrong d_glitch

spiral

ASKER

so i should or should not stick to chi-test?

JJayJay

Well... it depends.

The chi-test tells you that you can't use your survey because the results might as well be random.
Of course, there's no real way to tell wheather this is due to the actual results or the low numbers.

I would say that you should do a chi-test, and make sure you have a larger sample next time.
Also, if you include the "dunno" category in the chi-test, the result might be different.

But yes, it is proper research practice to test for the significance of your findings.

Do you know how to do a confidence interval?

spiral

ASKER

hmm...no i m not very sure about the confidence interval.
I did some calculation but if i include the dunno column i get negative results.
the rule say I cannot include cells with 1...so how?

Also not very sure abt the linear regression that d_glitch talk about. I like to finish the analysis of my survey even though my data is not 100% perfect though. So can more pple join in this discussion?

spiral

ASKER

JayJay : are u there?

JJayJay

Yes I'm still here :)

I still don't know how to do a linear regression.

Confidense intervals tell you where the mean is located within, but I'm betting the interval will span from mature to startup.
Show me your calculations and I'll take a look, there are a lot of ways you could do a confidense interval with that table.

Most likely, with the result of the chi-analysis, any statistical testing you do will give you bad results (again I don't know about the linear regression though).

I suggest sticking to describing the data instead of doing all sorts of statistics that would discredit your results.

spiral

ASKER

           Use Dynamic Portal ?
Stage of Development      Yes      No      Don't know      N
Startup        10      6      4      20
Accelerating Growth      4      3      4      11
Moderate Growth      13      6      1      20
Mature Growth      7      3      1      11
Declining        0      0      1      1
Total        34      18      11      63

hmm whats the use of finding the mean?

SOLUTION

JJayJay

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

spiral

ASKER

sighed....so the confidence level isn't of much help?

JJayJay

I doubt any statistics will do anything but deface your results.