SolvedPrivate

SAS: Create a new variable from a character variable's value

Posted on 2015-02-23
5
28 Views
Last Modified: 2016-02-13
I am using SAS (statistical analysis software).

I have a SAS dataset with 50 variables called ID1-ID50 and 50 variables called Score1-Score50. This dataset has test item scores from a 50 item test. There is one record per examinee. However, the dataset contains data from dozens of different test forms. The items are from an item bank. Each item is identified in the item bank with a 6-digit master ID number. For the first examinee, ID1 might be item 143576; for another examinee, ID1 might be 437892, etc.

I want to create a dataset that scores the item scores (from the Score1-Score50 variables) in which the variable names are the master Item ID numbers. For example, the ID numbers might be Item_143576, item_437892, etc. There are 2,000 different ID numbers, so there will be about 2,000 item score variables in the new dataset.

The simplest way would be something like CALL SYMPUT for data step variables rather than macro variables.

I could use PROC FORMAT to create a huge lookup table of the ID numbers and their index in a 2.000 element array. Then I could use a PUT statement to grab the appropriate array index value for the stored ID value. I was hoping there is something more elegant so that I don't have to create thousands of lines of code to store my proc format and my array statement (with 2,000 variable names).
0
Comment
Question by:gordonwwaugh
  • 3
  • 2
5 Comments
 
LVL 8

Expert Comment

by:ShannonEE
ID: 40627240
I'm not exactly sure if you want to -
use the examinee dataset to update the item database where you are adding performance information into the item database
OR
use the item database to find out which was the correct answer for each item for each examinee and produce a total score for each examinee.

Assuming the first, =>
Two completely different ways to tackle this.

Method 1
Use proc SQL to join your test dataset (the one with ID1-ID50 and Score1-Score50) to the dataset for your item bank, to retrive / or set the data value for each answered item.

You also could use a data step using keyed access to do much the same thing. This requires a
     SET statement identifying the examinee dataset
and another
    SET statement identifying the item database using KEYed access determined by current question ID.
where the second SET statement is within a DO loop which loops over all 50 questions.  To be able to do this neatly use two ARRAY declarations one for ID1-ID50 and the other for SCORE1-SCORE50

(using SQL could get messy doing the update if there is more than one examinee answering any particular item.)


Method 2
Makes sure the test dataset has an identification number for each subject (ie each record)
Transpose the test dataset (BY SubjectID) to produce a new dataset with 4 variables and 50 times the number of records as before as there will be one record for each question for each subject
SubjectID, Question,  ID, Score
where -
    Question contains values 1 to 50
    ID contains the item number and score
    Score contains the associated score.

Next sort by item ID  (either proc SQL combined with next step or proc SORT)
Sum all scores for each item ID (either data step, proc MEANS, or proc SQL)
merge new dataset into database adding sum of scores for each item (data step using merge or keyed access; OR proc SQL)

====

If you could provide information on the columns in the database that you want to use (and how you want to modify them) then we could talk specifics.

Ian
0
 

Author Comment

by:gordonwwaugh
ID: 40627320
My current SAS datasat looks like this. There is one record per examinee. The ID values are test item ID numbers.

ExamineeNum  ID1      Score1       ID2          Score2       ID3      Score3    ...
1                          143         1             578               0             351         1
2                          297         1             325               0             426         1
...

I want it to look like this--still with one record per examinee.

ExamineeNum  s_143  s_297  s_325  s_351 s_426 s_578 ...
1                               1          .           .           1         .            0
2                               .          1           0          .          1           .
...

Obviously, the data above is simplified. My actual dataset has 100,000 examinees and 2,000 unique test item ID numbers.

I just finished doing this using a 2000-element proc format and a 2000-variable array. However, those 4,000 lines of code make the program a bit unwieldy to read--and I had to write a data step and proc freq to generate the 2000 item ID numbers.
0
 
LVL 8

Accepted Solution

by:
ShannonEE earned 500 total points
ID: 40629595
Hi there

Thanks for the update.

Provided that you have sequentially numbered questions with variables ID1 ID2 ... IDx and variables Score1 Score2 ... Scorex where x is the numeric limit (in your case 50) the following code should do the trick.

%let lastQ=4;

PROC TRANSPOSE DATA=work.test1	OUT=WORK.test2	PREFIX=newID	NAME=SourceID;
	BY ExamineeNum;
	VAR ID1-ID&lastQ;
RUN;

PROC TRANSPOSE DATA=work.test1	OUT=WORK.test3	PREFIX=newScore	NAME=SourceScore;
	BY ExamineeNum;
	VAR Score1-Score&lastQ ;
RUN;

data work.test4;
merge work.test2 work.test3;
newName = cats("Q_", newID1);
run;


PROC TRANSPOSE DATA=WORK.test4	OUT=WORK.test5(drop=_NAME_ LABEL="Twice transposed questions");
	BY ExamineeNum;
	ID newName;
	VAR newScore1 ;
RUN;

proc print; run;

Open in new window


Make sure you set the macro variable lastQ.  I have used 4 to give the following results

Examin	ID1	Score1	ID2	Score2	ID3	Score3	ID4	Score4
eeNum
10001	143	1	578	0	351	1	325	1
10002	297	1	325	0	426	1	143	1
10003	143	0	325	1	352	1	578	1
10004	351	1	143	1	297	0	578	0
10005	351	0	297	0	326	0	143	0

Open in new window


transferred into

Examin	Q_143	Q_578	Q_351	Q_325	Q_297	Q_426	Q_352	Q_326
eeNum
10001	1	0	1	1	.	.	.	.
10002	1	.	.	0	1	1	.	.
10003	0	1	.	1	.	.	1	.
10004	1	0	1	.	0	.	.	.
10005	0	.	0	.	0	.	.	0	

Open in new window



You have the flexability to have any prefix on the question names.

The first 2 transposes just get a number of records for each examinee First for the question numbers and then the corresponding scores.

The data step with the merge then puts them together.

Lastly the transpose uses the new name created in the merge as a variable name into which to put the scores. This occurs at the time of putting all the records for each examinee back into a single record.

Note by default where an examinee did not have a qarticular question the transpose puts missing   (shows up as a dot ).

With a small amout of trickery it is possible to get the questions in ascending order.

Sort test2 by newID1,
make a variable ExamineeNum set to missing.
 get only one of each question (in variable newID1,
make up newScore1 for each newID1 and set to missing.
Transpose this data set  - because it is in sorted order the transposed questions will be in order.

Then append your final dataset to this artificial data set (this will pick up variables from the artificial single record data set first and keep in the correct order.  When combining the datasets just delete the first record (which was our dummy record.

You could probably get the questions sorted using proc datasets or onother utility procedure.
Is this what you want?

Ian
0
 

Author Comment

by:gordonwwaugh
ID: 40630812
Perfect! That's exactly what I wanted!
0
 

Author Comment

by:gordonwwaugh
ID: 40630937
Here is the full solution. I added the code for putting the variable names in ascending order in the dataset. The steps involving datasets dummy1, dummy2, etc., sort the variable names.

data test1;
infile cards;
input
      examineeNum
      ID1    Score1         
      ID2    Score2          
      ID3    Score3          
      ID4    Score4          
;
cards;
10001      143      1      578      0      351      1      325      1
10002      297      1      325      0      426      1      143      1
10003      143      0      325      1      352      1      578      1
10004      351      1      143      1      297      0      578      0
10005      351      0      297      0      326      0      143      0
;
run;


%let lastQ=4;

PROC TRANSPOSE DATA=work.test1      OUT=WORK.test2      PREFIX=newID      NAME=SourceID;
      BY ExamineeNum;
      VAR ID1-ID&lastQ;
RUN;


proc sort data=work.test2 out=work.dummy1;
      by newID1;
run;

data work.dummy2;
set work.dummy1(DROP=SourceID);
      by newID1;
if first.newID1;
examineeNum = .;
NewScore1 = .;
NewName = cats('Q_',newID1);
run;

PROC TRANSPOSE DATA=work.dummy2 out=work.dummy3;
ID NewName;
run;

data work.dummy4(DROP=_NAME_);
set work.dummy3;
if _N_=1;
run;

PROC TRANSPOSE DATA=work.test1      OUT=WORK.test3      PREFIX=newScore      NAME=SourceScore;
      BY ExamineeNum;
      VAR Score1-Score&lastQ ;
RUN;

data work.test4;
merge work.test2 work.test3;
newName = cats('Q_', newID1);
run;


PROC TRANSPOSE DATA=WORK.test4      OUT=WORK.test5(drop=_NAME_ LABEL="Twice transposed questions");
      BY ExamineeNum;
      ID newName;
      VAR newScore1 ;
RUN;


data work.test6;
set work.dummy4 work.test5;
if _N_ = 1 then delete;
run;


proc print; run;
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

If your app took Google’s lash recently, here are the 5 most likely reasons.
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
The viewer will learn how to set up a document for the web and print and the recommended PPI for printing.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now