SAS Data Sets Manipulations

Hello, I am having problem with step #2 of my program. I am not sure if I can just use "rename" statement when I am storing records from "Var4" to "Var6" or I have to use something else? The thing is that I still have to unduplicate "TpData" SAS dataset by "Var4" in the step #4.

1. Create a temporary SAS dataset ("TpData") from already existed SAS dataset ("lib.Data1"), then
2. Get total of records in each "Var4"variable and store in "Var6" variable
3. Create "Var1Var2" variable from already existed variables ("Var1" and "Var2") in "lib.Data1" dataset with the following options:
  if "Var1" = 12 and "Var2" = 211 then set "Var1Var2" = 12211
  else set "Var1Var2" = concatenation of "Var1" and "Var2"
4. Unduplicate "TpData" dataset by "Var3" and "Var4"
5. Keep only "Var1", Var2", "Var3", "Var6", and "Var1Var2" variables in the "TpData" dataset

data TpData (keep=Var1 Var2  Var3 Var6 Var1Var2           /*** 1 and 5 ***/
                     rename=(Var4=Var6));                                              /*** 2 ***/
set lib.Data1;
   length Var1Var2  $6.;
 if Var1=12 and Var2=211 then                             /*** 3 ***/            
  else Var1Var2 = trim(left(PUT(Var1, $3.))) || trim(left(PUT(Var2, $3.)));  

PROC SORT data=TpData NODUPKEY;                 /*** 4 ***/
  by Var3 Var4;                                            
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

First and foremost, a RENAME will rename VAR4 -> VAR6 and VAR4 will no longer exist in your data set.  Secondly, in your KEEP statement, you do not KEEP VAR4, so it will not be available once TpData is created.

I am not sure what you mean by this statement:

Get total of records in each "Var4"variable and store in "Var6" variable

If you can clarify, that would be helpful.
labradorchikAuthor Commented:
Hi tobey1,
Thank you very much for your comments!
Step #2 is getting total of individual records (unduplicated) from each record in "Var4" and storing all that information/records in "Var6" for future use, but still would like to keep "Var4" for future use as well. I guess I can just add "Var4" to my "keep" statement, but how can I store a total of individual records in "Var6"?
I hope this clarifies a little bit more. Any comments will be helpful!
Thank you!

So if I am reading this correctly, you want VAR6 to be a constant equal to the total records in the entire dataset?


If VAR4 = "A" you want the total count of A and if it is B then you want the total count of B?
Your Guide to Achieving IT Business Success

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

labradorchikAuthor Commented:
I am sorry for this confusion. I was so wrong here!
I will be actually using two variables ("Var4" and "Var5") in order to get total of records in "Var6".

Here is table example with 10 records of how Var6 should look like:

Var4     Var5       Var6
456          1
456          1             3
456          1
456          2             1
457          1
457          1             2
458          1
458          1             2
458          2
458          2             2

So, for Var4 there are Var6=3 total records of 456 with Var5=1
                                     Var6=1 total record of 456 with Var5=2
                                     Var6=2 total records of 457 with Var5=1
                                     Var6=2 total records of 458 with Var5=1
                             and  Var6=2 total records of 458 with Var5=2  

I do not think it would be possible to do this in the same data step where I created "TpData" SAS data set, so I think this can be created in a separate temporary dataset even before I start my step #1.
Please let me know if you know how may I store this total value of records in "Var6".
Thank you!
Now with a little editing, you can get to your final result, but this will get you your counts of VAR4 and Var5 loaded into Var6.

data Data1;
   input var1 var2 var3 var4 var5;
12 211 3 456 1
12 222 3 456 1
13 211 3 456 1
13 211 3 456 2
14 221 3 457 1
14 222 3 457 1
12 222 3 458 1
12 222 3 458 1
11 211 3 458 2
11 211 3 458 2

proc sql;
   create table TpData AS
      select A.var1, A.var2, A.var3, B.var4, B.var5, B.var6
      from Data1 A left join (
      select var4, var5, count(*) as var6
      from Data1
      group by var4, var5) B ON
      A.var4=B.var4 and A.var5 = B.var5

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
labradorchikAuthor Commented:
This is working! Thank you very much!!
Happy New Year!! :)
No problem.  Happy New Year to you also.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.