I have a SAS table and need to run a logistic regression. Two variables divide my population into 20 different categories. It happens that two of these categories are way larger than the others, with more than 80% of the observations. I am thinking of creating a sample of those two categories and reduce them to 10% of the original size. If I can manage to get a good sample, how can I implement this sampling/weight it in the proc logistic? I want to model the likelihood of an observation being A,B,C or D (as defined by the output Variable B )
Variable B
Variable A A B C D
0 70% 12% 1% 1%
1 1% 1% 1% 1%
2 1% 1% 1% 1%
3 1% 1% 1% 1%
4 1% 1% 1% 1%
Start Free Trial