Link to home
Start Free TrialLog in
Avatar of bs329
bs329Flag for United States of America

asked on

Determining most influential independant variables on the dependent variable

I have about twenty independent variables and one dependent variable.  How do I determine the most "influential" independent variables on the dependent variable.  Eventually I'd like to run a regression analysis on the most influential independent variables vs the dependent variable to come up with an equation relating them, but first need to narrow this down.  

Any websites, tools, etc...would be great.  I have all my data in excel too.
Avatar of richdiesal
richdiesal
Flag of United States of America image

You actually narrow it down with regression itself - although it's multiple regression, specifically.  You can compare beta weights to determine the most important predictors in your model by entering all of your IVs simultaneously (you might call this a fully saturated model).  Whichever beta is highest is the best predictor in your dataset.  But be aware that if you have substantial intercorrelations amongst your predictors, this may be misleading, as one predictor may drown out the beta of another it is correlated with.  This approach also highly capitalizes on chance, but without any underlying theory by which to support which predictors you're choosing, it's basically the only way to proceed.

This SPSS-based regression guide might help: http://www.ats.ucla.edu/stat/SPSS/webbooks/reg/chapter1/spssreg1.htm
It also occurred to me you might be looking for free stats software.  OpenStat is pretty good (it mimics SPSS to some degree) and is reasonably user-friendly: http://www.statpages.org/miller/openstat/

R is much more powerful and also free, but at the cost of a statistics non-expert being very unlikely to understand how it works: http://www.r-project.org/
ASKER CERTIFIED SOLUTION
Avatar of Cory Vandenberg
Cory Vandenberg
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Rich,

Good comments, but as with any analysis, business knowledge will weigh in to any decisions, and is usually more important.  The question here seems to want to know a group of most predictive independent variables.  Obviously, if business knowledge says certain variables "need" to be in the model, then the analyst should hand pick which variables to add in or not.  Since it seems that the author isn't sure which variables should be used, I was suggesting stepwise as a valid way of determining a model.  Your points are valid.  I don't think it's necessarily a misleading answer, and it isn't just a way to the single most influential predictor.  If the top predictors aren't highly correlated, they will still come into the model.  Your example, while valid, isn't necessarily what will happen with the top predictors.  Also, if there are a number of highly correlated variables, the model becomes overly complicated with redundant information.  Simplicity and predictability usually come at a trade-off and the "art" part of building models definitely comes in to play.  

Anyways, I was just suggesting an option for the author.  If they really just want to see the predictive power of each independent variable against the dependent variable, regardless of interaction of the independent variables, then yes, a fully-saturated regression model will give the p-values for all variables and then the analyst can perhaps "cherry-pick" variables based on that info and their business knowledge.

WC
Avatar of bs329

ASKER

Hello Rich and WarCrimes,

Thank you for the depth of information.  I am looking for a group of predictors and with my lack of knowledge for regression analysis I have been taking the r-squared values of each individual predictor against the regressor to find which predictors were "substantial", but I see there are obviously much better and accurate ways to determine influential predictors and colinearity.  Thanks, I'm goign to sift through the info.
You're certainly correct - I just took a different interpretation on the asker wanting to "know a group of most predictive independent variables."  The key is if he wants to know the variables most predictive in his sample or in some unspecified population that sample represents.  If the former, stepwise regression alone with little attention to interpretation would be sufficient.  If the latter, some finessing may be required, and this is where the risk of interpretive error increases.  Frankly, we don't even know if the poster is trying to specify a model at all.  In my experience with amateur statisticians, if you hand them an analysis and say "use this," they will rarely pay much attention to the details.

Also, for the record, I didn't say stepwise was only a way to the single most influential predictor - just that the highest beta weight from a simultaneously entered set of predictors and the first variable a forward-stepping stepwise regression pulled out would be the same variable, based on the way each is calculated.
bs329 - That sounds like a good plan.  :)  Looking at the R^2s does give you a good indicator of how well each predictor is related to your criterion individually, and if there is no predictor intercorrelation, that's a fine approach - but if your predictors intercorrelate, that's when the contingencies we both described begin the apply.

Let us know what you find!
richdiesal said:
In my experience with amateur statisticians, if you hand them an analysis and say "use this," they will rarely pay much attention to the details.

--------

Isn't that the truth.  ;)  

Avatar of bs329

ASKER

Hello guys, I read through the material and some other references I found.  I am currently working on a large set of data (thousands of rows, tens of regressors, and one response variable) in excel weekly.  Is there a tool out there where I can take all this data and run it through stepwise regression.  I'd like to find an add-in where I can automate all this given some alpha cut off.  Then with the model that results from the stepwise regression I wanted to compare the response variable to the model value (by plugging the predictors into the model for each response).

thanks,
There is not any fully automated tool to do so that I am aware of.  You could potentially program your own scripts to do what you want (SPSS for example can run Python scripts), but that might take longer than just running it yourself.

The middle ground is using scripting within the stats software itself.  If you were able to conduct your stepwise regression in SPSS, for example, you can just hit the "Paste" button to automatically create code (called syntax, in SPSS-speak) that will recreate your analysis, which you can reuse later without having to go through the interface, which I believe meets your "add-in where I can automate all this given some alpha" criterion.  

I'm not 100% on what you mean by "compare the response variable to the model value" but if you're trying to compute residuals to measure the accuracy of your final regression for each case, that can be done automatically in SPSS as well (and in syntax).
Avatar of bs329

ASKER

Sorry for the late grade.  Thanks for the help.  I"m going to implement step-wise regression and use SPSS to compute residuals.