Remove carraige returns and non numeric character

Posted on 2009-02-10
Last Modified: 2012-05-06
I have a very large file that contains zip codes -- but it needs to be cleaned up.  for the most part, commas are included between the 5 place zip code, but sometimes there is just a space.  I I need the following done:
- remove any characters -- non numeric
- keep a comma if it exists
- if the zip is more than 5 numbers, remove it
- remove carraige returns

A VBA module will work well since the file I have is a CSV and has many many rows so Excel is not an option.  
Question by:AliciaVee
    LVL 14

    Expert Comment

    basically, you can use the attached VB code to convert the file, however, that sample code only holds an example how to remove non-numerics.  to give you the correct conversion code (where e.g. commas are retained), you need to be a little more specific about the conversion that you need -- it would be best if you posted some examples here.

    in the question title you write "remove carriage returns": do you really need that?  without carriage returns, many programs will not be able to read the file's content properly anymore.


      Dim fh1, fh2 As Integer
      Dim file_line1, file_line2 As String
      Dim i As Integer
      fh1 = FreeFile()
      Open "c:\path\origfile.csv" For Input As fh1
      fh2 = FreeFile()    
      Open "c:\path\newfile.txt" For Output As fh2
      While Not(EOF(fh1))
        ' read a line from orig file
        Line Input #fh1, file_line1
        ' process line
        ' example: remove non-numerics
        file_line2 = ""
        For i = 1 To Len(file_line1)
          If IsNumeric(Mid$(file_line1, i, 1)) Then
            file_line2 = file_line2 & Mid$(file_line1, i, 1)
          End If
        Next i
        ' write line to new file2
        Print #fh2, file_line
      Close fh1
      Close fh2

    Open in new window

    LVL 14

    Expert Comment

    oops, sorry: lines 25 and 26 in the above code must be

        ' write line to new file
        Print #fh2, file_line2

    (the last "2" must be in line 26, not 25)

    Author Comment


    Thanks...okay -- to answer your questions...and hopefully you'll be able to provide a solution, below are examples of how data is stored, and how it should be fixed.  There are 3 general examples.

    Also -- the system that needs this data corrected does not use carraige returns well.  Another item that needs addressing is that while I need non-numeric characters stripped, I do need the asterick to remain * that is considered a wildcard.

    So, attached are what zips might look like. There are two columns, one is the profile identifier, the other column will hold many zips, from as low as 5 zips to as many as 500. I have added a comments column to explain how the zips should appear and/or what is wrong.

    Thanks in advance!

    LVL 14

    Expert Comment

    ok... remark on the side: this seems more like a problem for perl... :-)

    the challenge, as it seems to me, is more to define how exactly the conversion must be done rather than how to technically do it.  so here's my try to define the conversion "algorithm":

    1) try to read 5 chars that are numeric or *:  if a non-numeric is encountered before 5 chars are complete, ignore what's read and goto 3)
    2) the 5 chars read above are valid: use them and add a comma and a space
    3) read (and ignore) chars until a space or a comma is encountered
    4) ignore any commas and spaces until something other is encountered
    5) start at 1) again

    can you try to confirm if this is what's needed?

    also, two more questions:

    a. in your excel, what's the meaning of column "ID" (A1234, B1234 ...)? how are these fields stored in the original file? divided by a space from the Zips?  and how do the IDs have to be processed? do they need to be stored in the target file as well?

    b. when you write you don't want any carriage returns, does that mean that you want all encountered zips "in a row" without any carriage returns?


    Author Comment

    bluelizzard, got the algorim correct as described in 1-5.

    Answers to your questions:
    a. the ID maps back to the user's profile.  so what is in column B (the zips) will be imported back into the system under the identifier which is column A
    b. some of the rows, which I didn't add to the examples, there are carraige returns in the cell.  So, column A has the ID and column B has all the zips, some of which need to be cleaned up this same column there are sometimes carraige returns which also need to be removed.

    LVL 14

    Accepted Solution

    ok... did i understand you right that the original file is a CSV file?  so, what you really have in the file is, for example, this here (with [CR] i mean carriage return):

    A1234, 60016 60018 60025 60067 60068 60069 60074 60076 60077 60091 [CR]
    B1234, 113**, 115**, 110**, 114**, 100**, 105**, 07030, 073**, 076**, 07093, 07047 [CR]
    C1234, 60618 Chicago [CR]


    is that correct? or is there a different divider (semicolon, tab char etc.) between the ID colum and the zip column?

    also: you write that the zip column might contain carriage returns: now, removing them is not a problem, but the question is: how can the code distinguish between a "wrong" and a "right" carriage return? for illustration, consider this example:

    A1234, A1234, 60016 60018 60025 60067 [CR]
    B1234, 07719, 0771, [CR]
    07713, 07713 [CR]

    say the middle [CR] is a "wrong" one that needs to be removed and the zips in the next line belong to ID B1234... but how can the code know that?  i mean, how can the code tell that the first thing in the next line, "07713", isn't just the ID of the next line, but another zip?  is there a specific method how the zip can be recognized?


    Author Comment

    bluelizzard -- sorry for the delay in my response. I got pulled onto another project then traveled for business.  I am now back.

    Okay -- yes your question about CR is correct -- there will be some in the file, all data should be in a column -- with no carrange returns...there is no right carraige return

    there is a column between the ID column and the zip column (no space, comman...etc)

    Hope this helps.

    LVL 14

    Expert Comment

    ok... there are two things that i still don't understand:

    when you write "there is a column between the ID column and the zip column", what do you mean?  what is between the last character of the ID and the first character of the first ZIP?  a space?  a comma? or something else?

    when the ZIPs contain carriage returns, the problem is how to distinguish this from a "real" new line (in a CSV file, there are no columns like in an excel file, so you can't tell if the CR means a new line or just a CR within the ZIPs).  so how can i tell that the start of the new line is a new ID or just a "continued" ZIP?

    i think it would help if you posted a part of an actual file that you want access to read.


    Author Closing Comment

    Wanted to close this question.  Your info/concepts was helpful and I appreciate your time.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    It took me quite some time to sort out all the different properties of combo and list boxes available from Visual Basic at run-time. Not that the documentation is lacking: the help pages are quite thorough and well written. The problem was rather wh…
    Introduction The Visual Basic for Applications (VBA) language is at the heart of every application that you write. It is your key to taking Access beyond the world of wizards into a world where anything is possible. This article introduces you to…
    Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.
    In Microsoft Access, learn different ways of passing a string value within a string argument. Also learn what a “Type Mis-match” error is about.

    755 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now