data cleaning

Posted on 2011-10-11
Last Modified: 2013-11-16
how can i do this
data looks like

EE745697515AU (ARCHI
EE74569789 (AR

i want to clean them as

Question by:sam2929
    LVL 39

    Expert Comment

    Assuming you have a space in the data column and you want to get rid of all characters to the right of that space you could build a statement like below but run it in a test environment first!!

    --UPDATE table_name SET column_name = LEFT(column_name, Len(column_name) - PatIndex('% %', column_name))

    you can run following to see how UPDATEd data will look like:

    SELECT LEFT(column_name, Len(column_name) - PatIndex('% %', column_name))
    FROM table_name
    LVL 7

    Expert Comment

    If you want to use a DATA step, use the SCAN function to find a specific word in a character string based on a specific delimiter.

    In your example, the delimiter should probably be the blank/space but it could be the open parenthesis (.

         word1=scan(string,1,' ');     ** this pulls the first word, to the left of the blank.

         word2=scan(string,2,' ');     ** this pulls the second word, to the right of the blank.

    Always specify the length of the new variable _before_ running the scan function.  Without a LENGTH statement the length will default to the length of the first word found and that will cause longer words to be truncated.

    If your string is like this -- EE745697515AU (ARCHIE) 10/11/2011 -- and need the date then you could use
         date=scan(string,3,' ');

    You can nest scan functions.  If you want the word ARCHIE for the above string,
         part1=scan(string,1,')');        ** this returns EE745697515AU (ARCHIE ;
         word=scan(part1,2,'(');         ** this returns ARCHIE from part1;

    so combining the two statements gives us
         word=scan(scan(string,1,')'),2,'(');         ** this returns ARCHIE ;


    Author Comment

     proc sql;
    17          create table myams as(
    18            select LEFT(ITEM_ID, Len(ITEM_ID) - PatIndex('% %',ITEM_ID)),ITEM_ID
    19         from stg1.aa
    20            where item_id is not null
    21                    );
    ERROR: Function LEN could not be located.
    ERROR: Function PATINDEX could not be located.
    ERROR: Expression using subtraction (-) requires numeric types.
    WARNING: Function LEFT requires at most 1 argument(s). The extra one(s) will be ignored.
    LVL 39

    Expert Comment

    My sincere apologies....I provided SQL not SAS language/syntax.

    You should look at SAS regular expressions consist of CALL RXCHANGE, CALL RXFREE, CALL
    RXSUBSTR, RXMATCH, and RXPARSE or PRXCHANGE Routine but same idea - drop all characters after the first space in your input string(column)

    LVL 7

    Accepted Solution

    Regular expressions are overkill for something like this.  All SAS functions work just fine in PROC SQL when you are querying a SAS dataset.  

    If you're querying a RDBMS table then the function will be passed to the database if the RDBMS has a function that does the same thing (doesn't have to be spelled the same) and it will still work just fine.  But if the database doesn't support the function (Oracle doesn't have anything like SCAN) then the data will be brought into SAS for processing instead of letting the database engine do the work.  If it's a large table or a complex query then performance will be bad.

    proc sql;
      create table myams as
        (select item_id
                  , scan(item_id,1,' ') as word1
        from stg1.aa
        where item_id ne ' ');

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Better Security Awareness With Threat Intelligence

    See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

    SQL Command Tool comes with APEX under SQL Workshop. It helps us to make changes on the database directly using a graphical user interface. This helps us writing any SQL/ PLSQL queries and execute it on the database and we can create any database ob…
    Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
    Video by: Steve
    Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
    Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

    794 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now