Cust Nanme Product A Product B Product C Product D
Jason 1 0 1 0
Michael 0 1 0 1
Mary 1 0 0 0
Its a table with customer as a column and in remaining columns are 1/0 columns which indicate whether a customer has bought product A, B, C, D etc. The table above only has four products - the actual table I am dealing with has 2000 columns or products, plus millions of customers. According to above table, Jason buys Product A and D. The question I need to answer is how many customer who buy product A also buy product B, how many customers who buy product A also buy product C, how many customers who buy product A also buy product D. After doing all the A's I would need to do the B's ie How many people who buy B also buy C, how many people who buy B also buy D and so on until all possible product combination pairs are covered. What is the most efficent way to do this in SAS. I was thinking if I could do something similar to Sumif in excel I could get my question answered. However any help or comments much appreciated.
Essentially in your query I would create a new column, say A_B
if A=1 and B = 1 then A_B= 1; else A_B = 0;
However to do this for each product is a staggering number of columns to keep track of.
Instead you may want to do something like, if a person bought product A, how many other products did they buy? Not sure what your exact requirement is.
In SAS, you can run a Proc Summary on the data using the Class variable. Then using the _TYPE_ field you can access the record level you desire. However, given you have almost 2000 variables on millions of records, this will produce a very large dataset. Quite frankly, this will produce results that will most likely be meaningless, because with that many variables you are likely to have limitless combinations of those variables.
VeeamÂ® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.
Hi,
no-one seemed to answer your question as I would have thought.
So try this for size. Similar could be done with Macros but I think that it would take longer to process.
data sumif;
set rawdata end=eof;
length RowType $12.;
array Product{4} ProductA--ProductD;
array Combo[&ProdsL1,&ProdsL1];
do i = 1 to &ProdsL1;
do j = i+1 to &Prods;
if Product[i] and Product[j] then Combo[i,j-1] + 1;
end;
end;
if eof then do;
RowType = "Results"; output;
do i = 1 to &ProdsL1;
do j = i+1 to &Prods;
Combo[i,j-1] = i;
end;
end;
RowType = "Product From"; output;
do i = 1 to &ProdsL1;
do j = i+1 to &Prods;
Combo[i,j-1] = j;
end;
end;
RowType = "Product To"; output;
end;
drop CustName ProductA--ProductD i j;
run;
0
hantran99Author Commented:
Thank you. I'm still very new with SAS, so thanks for your help.
http://core.ecu.edu/psyc/wuenschk/SAS/Help/Sum-SAS.htm
Take a look at the missing data points part.
Essentially in your query I would create a new column, say A_B
if A=1 and B = 1 then A_B= 1; else A_B = 0;
However to do this for each product is a staggering number of columns to keep track of.
Instead you may want to do something like, if a person bought product A, how many other products did they buy? Not sure what your exact requirement is.