Link to home
Start Free TrialLog in
Avatar of Maria Torres
Maria Torres

asked on

SAS: COMPRESS is not properly working -- flat file is produced with trailing blanks for each record

A flat file is produced by one of our SAS program.  When the flat file is opened, each record has trailing blanks (approximately 35,000 blank spaces).

Within the program, the option COMPRESS is assigned to YES.  From my understanding this should remove the blank spaces.
Can someone point me in the right direction as to how remove the trailing spaces?

Thanks you.
Avatar of Ian
Ian
Flag of Australia image

Hi there     CarmenMTorres ,

I think that you have misunderstood a few SAS concepts.

First there is compression of SAS datasets.  That is when SAS writes a data set it is possible to specify that the records be compressed.  Hopefully the stored data set will be smaller than it otherwise be. However that is not the flat file you are needing.  This compression is specified on a LIBNAME statement ---
COMPRESS=NO | YES | CHAR | BINARY
controls the compression of observations in output SAS data sets for a SAS library.
NO
specifies that the observations in a newly created SAS data set be uncompressed (fixed-length records).
YES | CHAR
specifies that the observations in a newly created SAS data set be compressed (variable-length records) by SAS using RLE (Run Length Encoding). RLE compresses observations by reducing repeated consecutive characters (including blanks) to two-byte or three-byte representations.
Tip
Use this compression algorithm for character data.
BINARY
specifies that the observations in a newly created SAS data set be compressed (variable-length records) by SAS using RDC (Ross Data Compression). RDC combines run-length encoding and sliding-window compression to compress the file.

Next there is the compress function used within a SAS data step or in SQL code in proc SQL. (And a few other places that produce content - EG compute blocks in proc REPORT).  This will remove all of a nominated character from a string.

There is the COMPBL, COMPRESS and TRIM functions that you may have been using for removing blanks.

COMPBL Function
Removes multiple blanks from a character string.

COMPRESS Function
Returns a character string with specified characters removed from the original string.

Worth noting here is the following discussing the case where the character to remove wasnt specified and is defaulted to a blank.
The argument has all blanks removed. If the argument is completely blank, then the result is a string with a length of zero. If you assign the result to a character variable with a fixed length, then the value of that variable will be padded with blanks to fill its defined length.

The important part is that the destination variable name has its own length and the output of compress will need to be padded with blanks up to the size of the destination variable.

TRIM Function
Removes trailing blanks from a character string, and returns one blank if the string is missing.

Length of Returned Variable
In a DATA step, if the TRIM function returns a value to a variable that has not previously been assigned a length, then that variable is given the length of the argument.

The Basics
TRIM copies a character argument, removes trailing blanks, and returns the trimmed argument as a result. If the argument is blank, TRIM returns one blank. TRIM is useful for concatenating because concatenation does not remove trailing blanks.
Assigning the results of TRIM to a variable does not affect the length of the receiving variable. If the trimmed value is shorter than the length of the receiving variable, SAS pads the value with new blanks as it assigns it to the variable.


========

My guess is that when you are writing out the data for the flat file, you have managed to have either a dataset variable or a temporary variable with a length of 2^15 ( = 32768)  which contains all the information required but is padded out to the set length.

If you have specified COMPRESS on a LIBNAME statement it will not do what you want.

Really we need a bit more context of what you are doing to spot the real source of the trouble.

Ian
Avatar of Maria Torres
Maria Torres

ASKER

Thank you Shannon for the prompt response.

Yes, I am new to SAS to which is why my question is vague.  Let me explain what we are trying to accomplish.

We have several SAS programs that were originally executed in UNIX.  These programs produced flat DAT files.
We now transferred the SAS Programs from UNIX to Windows SAS, with the appropriate modifications.  At first, the programs appeared to be executing without any issues.  But once we review the flat DAT files, we noticed that the file size is 3 times larger than what it was when executed in UNIX.

When we opened the DAT file for review, we noticed that each delimited records are padded with spaces.  Some records are padded with as much as 3,000 of blank spaces.  

I'm assuming that in Windows, the generated files' records are padded to a specific size; but in UNIX, the files are not.

We are using the TRIM function when we populate the record before putting into a file.  However, this function is not removing the trailing spaces.  We, also, use the COMPRESS= data set option to compress an individual file.

Now we are trying to figure out a way of generating the files without the records being padded.

Any suggestion is appreciated.  Thank you
ASKER CERTIFIED SOLUTION
Avatar of Ian
Ian
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you.  The LRECL worked like a charm.