How do I do a Sumif on a number of columns in SAS

Posted on 2010-01-06
Last Modified: 2013-11-16
consider the following table

Cust Nanme  Product A  Product B  Product C   Product D      
Jason                1              0              1                 0
Michael              0              1              0                 1
Mary                  1              0              0                 0

Its a table with customer as a column and in remaining columns are 1/0 columns which indicate whether a customer has bought product A, B, C, D etc. The table  above only has four products - the actual table I am dealing with has 2000 columns or products, plus millions of customers. According to above table, Jason buys Product A and D. The question I  need to answer is how many customer who buy product A also buy product B, how many customers who buy product A also buy product C, how many customers who buy product A also buy product D. After doing all the A's I would need to do the B's ie How many people who buy B also buy C, how many people who buy B also buy D and so on until all possible product combination pairs are covered. What is the most efficent way to do this in SAS. I was thinking if I could do something similar to Sumif in excel I could get my question answered. However any help or comments much appreciated.
Question by:hantran99
    LVL 39

    Expert Comment

    by:Kyle Abrahams
    Not an expert on SAS but If you wanted to go that route:

    Take a look at the missing data points part.

    Essentially in your query I would create a new column, say A_B
    if A=1 and B = 1 then A_B= 1; else A_B = 0;

    However to do this for each product is a staggering number of columns to keep track of.  

    Instead you may want to do something like, if a person bought product A, how many other products did they buy?  Not sure what your exact requirement is.

    LVL 5

    Expert Comment

    Hi hantran99,

    Find the attached file...this might do the job for you.
    LVL 4

    Expert Comment

    In SAS, you can run a Proc Summary on the data using the Class variable.  Then using the _TYPE_ field you can access the record level you desire.  However, given you have almost 2000 variables on millions of records, this will produce a very large dataset.  Quite frankly, this will produce results that will most likely be meaningless, because with that many variables you are likely to have limitless combinations of those variables.

    This is too much data for excel to process.

    Reference this link for example:

    Author Comment

    All the above is very useful advice.  Let me give them a try and let you know how I go.
    LVL 3

    Accepted Solution

    no-one seemed to answer your question as I would have thought.
    So try this for size. Similar could be done with Macros but I think that it would take longer to process.

    data rawdata;
          CustName = 'Jason';   ProductA=1; ProductB=0; ProductC=1; ProductD=0; output;
          CustName = 'Michael'; ProductA=0; ProductB=1; ProductC=0; ProductD=1; output;
          CustName = 'Mary';    ProductA=1; ProductB=0; ProductC=0; ProductD=0; output;

    %let Prods = 4;
    %let ProdsL1 = %sysevalf(&Prods-1);

    data sumif;
          set rawdata end=eof;
          length RowType $12.;
          array Product{4} ProductA--ProductD;
          array Combo[&ProdsL1,&ProdsL1];
          do i = 1 to &ProdsL1;
            do j = i+1 to &Prods;
                if Product[i] and Product[j] then Combo[i,j-1] + 1;
          if eof then do;
                RowType = "Results"; output;
                do i = 1 to &ProdsL1;
                  do j = i+1 to &Prods;
                      Combo[i,j-1] = i;
                RowType = "Product From"; output;
                do i = 1 to &ProdsL1;
                  do j = i+1 to &Prods;
                      Combo[i,j-1] = j;
                RowType = "Product To"; output;
          drop CustName ProductA--ProductD i j;

    Author Closing Comment

    Thank you. I'm still very new with SAS, so thanks for your help.
    LVL 3

    Expert Comment

    No worries, let me know if you have any questions about the code above and how it works.
    I'm happy to explain.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    IT, Stop Being Called Into Every Meeting

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Suggested Solutions

    This article describes some very basic things about SQL Server filegroups.
    Read about achieving the basic levels of HRIS security in the workplace.
    The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
    This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    11 Experts available now in Live!

    Get 1:1 Help Now