Go Premium for a chance to win a PS4. Enter to Win

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 385
  • Last Modified:

Appending data on multiple lines

I don't know if this is possible in SAS, but this is what I would like to do.
I have a member   00001  Doe, John
John Doe has information coming from multiple tables, ie. claims;drugs;events
The number of rows coming from each table can vary.
Is there a way to format a report so the end result look like this: Multiple steps are fine
  ID       LN     FN        Claims         Drugs        Events
00001  Doe  John     claim 1        drug 1        event 1
                                 claim 2        drug 2        event 2
                                                    drug 3        event  3
                                                    drug 4
                                                    drug 5
00002  Smith Mary    claim 1                 drug1           event1

I dont' want data repeated if possible
  • 2
1 Solution
Reporting is the easy part.  I'm not very much of a proc report user so I'd do it with a PUT statement.

The trick is to get the reporting data set together properly.  

--Does each of the source tables contain the variable ID so we can use that as the merge key?
--How are the claims, drugs and event tables joined if each has multiple records for an ID?  Is there a claim_ID also?  How do I know that drug 3 goes with claim 2?  How do I line up an event with a drug?
--What variables does each table hold?  For example, from which table does drug 1 come from-- claims or drugs?  
moriniaAdvanced Analytics AnalystAuthor Commented:
Each table can be joined on member_id. The tables are not joined as yet, that is the question.  If I do a regular join, I will repetitive data, that is what I am trying to avoid.

There are three different tables.  One table has claims information, another table has drug information and another table has event information all for the respective member.

I am trying to concatenate the information from each table to the respective member.  In the example above member 00001 has two records on the claims table, five records on the drugs table and three records on the event table.

Ideally I want only five lines for member 00001.  The column claims will have two lines populated and three empty lines.  The column drugs will have five lines populated. The column events will have three lines populated and two lines blank.

All three tables have a common variable member_id to join them.  

SAS DATA Step with a BY statement will probably do this well...  You would need to have all three tables sorted by member_id before the datasetp (using PROC SORT or similar)

DATA All_merged;
  SET Claims Drug Event;
BY member_ID;

The end result should look like:

ID       LN     FN        Claims         Drugs        Events
00001  Doe  John     claim 1        drug 1        event 1
00001  Doe  John     claim 2       drug 2        event 2
00001  Doe  John     claim 2       drug 3        event  3
00001  Doe  John     claim 2       drug 4        event  3
00001  Doe  John     claim 2       drug 5        event  3
00002  Smith Mary    claim 1        drug1           event1

As you can see - SAS repeats the last value of a column if it is missing.

To do the report - I've used the following datastep to output to the LOG and an HTML doc.

As you can see, SAS just repeats the last variable when it is missing.

There are a couple options for reporting.  Using the PUT statement is probably easist.  Note, I have created 'New_' variables and used a LAG fuinction so avoid repeating values.


ODS HTML FILE="&work.\_____.htm" 
  headtext="<style>.text {mso-number-format:\@}</style><style>.comma {mso-number-format:\#\,\#\#0\.00}</style><style>.commaz {mso-number-format:\#\,\#\#0}</style><style> td {border:0.5pt}</style>" 

DATA _null_;
 set all_merged;
 by member_id;

 format new_ID $6.      New_LN     New_FN        New_Claims         New_Drugs        New_Events $20.;

 if member_id = lag(member_id) then new_id = ''; else new_ID = member_id;
 if LN = lag(LN) then new_LN = ''; else new_LN = LN;
 if FN = lag(FN) then new_FN = '';else new_FN = FN;
 if Claims = lag(Claims) then new_Claims = ''; else new_Claims = Claims;
 if Drugs = lag(Drugs) then new_Drugs = '';else new_Drugs = Drugs;
 if Events = lag(Events) then new_Events = ''; else new_Events = Events;

/* Output to LOG */

 IF _N_ = 1 THEN PUT  'ID' +4 'LN' +19 'FN' +19 'Claims' +15 'Drugs' +16 'Events' overprint @1 110*'-';

 PUT  new_id 5. +1  new_LN $20.  +1 new_FN $20. +1 new_Claims $20. +1 new_Drugs $20. +1 new_Events $20.;

/* Output to HTML */
 file print ods=(variables=(new_ID New_LN New_FN New_Claims New_Drugs New_Events));
title 'Listing of customers';
label new_ID = 'ID' new_LN = 'Last Name' new_FN = 'First Name' new_Claims = 'Claims' New_drugs = 'Drugs' new_events = 'Events';
put _ods_;


Open in new window

moriniaAdvanced Analytics AnalystAuthor Commented:
WOW.  Thanks a lot.  I  am working on putting this in my code.  

Am I correct that you gave two types of output?  One is a report and the other HTML?

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now