Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 464
  • Last Modified:

data cleaning

how can i do this
data looks like

EE745697515AU (ARCHI
EE74569789 (AR

i want to clean them as

  • 2
  • 2
1 Solution
lcohanDatabase AnalystCommented:
Assuming you have a space in the data column and you want to get rid of all characters to the right of that space you could build a statement like below but run it in a test environment first!!

--UPDATE table_name SET column_name = LEFT(column_name, Len(column_name) - PatIndex('% %', column_name))

you can run following to see how UPDATEd data will look like:

SELECT LEFT(column_name, Len(column_name) - PatIndex('% %', column_name))
FROM table_name
d507201Database Marketing ConsultantCommented:
If you want to use a DATA step, use the SCAN function to find a specific word in a character string based on a specific delimiter.

In your example, the delimiter should probably be the blank/space but it could be the open parenthesis (.

     word1=scan(string,1,' ');     ** this pulls the first word, to the left of the blank.

     word2=scan(string,2,' ');     ** this pulls the second word, to the right of the blank.

Always specify the length of the new variable _before_ running the scan function.  Without a LENGTH statement the length will default to the length of the first word found and that will cause longer words to be truncated.

If your string is like this -- EE745697515AU (ARCHIE) 10/11/2011 -- and need the date then you could use
     date=scan(string,3,' ');

You can nest scan functions.  If you want the word ARCHIE for the above string,
     part1=scan(string,1,')');        ** this returns EE745697515AU (ARCHIE ;
     word=scan(part1,2,'(');         ** this returns ARCHIE from part1;

so combining the two statements gives us
     word=scan(scan(string,1,')'),2,'(');         ** this returns ARCHIE ;

sam2929Author Commented:
 proc sql;
17          create table myams as(
18            select LEFT(ITEM_ID, Len(ITEM_ID) - PatIndex('% %',ITEM_ID)),ITEM_ID
19         from stg1.aa
20            where item_id is not null
21                    );
ERROR: Function LEN could not be located.
ERROR: Function PATINDEX could not be located.
ERROR: Expression using subtraction (-) requires numeric types.
WARNING: Function LEFT requires at most 1 argument(s). The extra one(s) will be ignored.
lcohanDatabase AnalystCommented:
My sincere apologies....I provided SQL not SAS language/syntax.

You should look at SAS regular expressions consist of CALL RXCHANGE, CALL RXFREE, CALL
RXSUBSTR, RXMATCH, and RXPARSE or PRXCHANGE Routine but same idea - drop all characters after the first space in your input string(column)


d507201Database Marketing ConsultantCommented:
Regular expressions are overkill for something like this.  All SAS functions work just fine in PROC SQL when you are querying a SAS dataset.  

If you're querying a RDBMS table then the function will be passed to the database if the RDBMS has a function that does the same thing (doesn't have to be spelled the same) and it will still work just fine.  But if the database doesn't support the function (Oracle doesn't have anything like SCAN) then the data will be brought into SAS for processing instead of letting the database engine do the work.  If it's a large table or a complex query then performance will be bad.

proc sql;
  create table myams as
    (select item_id
              , scan(item_id,1,' ') as word1
    from stg1.aa
    where item_id ne ' ');

Featured Post


Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now