[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now


Remove carraige returns and non numeric character

Posted on 2009-02-10
Medium Priority
Last Modified: 2012-05-06
I have a very large file that contains zip codes -- but it needs to be cleaned up.  for the most part, commas are included between the 5 place zip code, but sometimes there is just a space.  I I need the following done:
- remove any characters -- non numeric
- keep a comma if it exists
- if the zip is more than 5 numbers, remove it
- remove carraige returns

A VBA module will work well since the file I have is a CSV and has many many rows so Excel is not an option.  
Question by:AliciaVee
  • 5
  • 4
LVL 14

Expert Comment

ID: 23608850
basically, you can use the attached VB code to convert the file, however, that sample code only holds an example how to remove non-numerics.  to give you the correct conversion code (where e.g. commas are retained), you need to be a little more specific about the conversion that you need -- it would be best if you posted some examples here.

in the question title you write "remove carriage returns": do you really need that?  without carriage returns, many programs will not be able to read the file's content properly anymore.


  Dim fh1, fh2 As Integer
  Dim file_line1, file_line2 As String
  Dim i As Integer
  fh1 = FreeFile()
  Open "c:\path\origfile.csv" For Input As fh1
  fh2 = FreeFile()    
  Open "c:\path\newfile.txt" For Output As fh2
  While Not(EOF(fh1))
    ' read a line from orig file
    Line Input #fh1, file_line1
    ' process line
    ' example: remove non-numerics
    file_line2 = ""
    For i = 1 To Len(file_line1)
      If IsNumeric(Mid$(file_line1, i, 1)) Then
        file_line2 = file_line2 & Mid$(file_line1, i, 1)
      End If
    Next i
    ' write line to new file2
    Print #fh2, file_line
  Close fh1
  Close fh2

Open in new window

LVL 14

Expert Comment

ID: 23608858
oops, sorry: lines 25 and 26 in the above code must be

    ' write line to new file
    Print #fh2, file_line2

(the last "2" must be in line 26, not 25)

Author Comment

ID: 23614408

Thanks...okay -- to answer your questions...and hopefully you'll be able to provide a solution, below are examples of how data is stored, and how it should be fixed.  There are 3 general examples.

Also -- the system that needs this data corrected does not use carraige returns well.  Another item that needs addressing is that while I need non-numeric characters stripped, I do need the asterick to remain * that is considered a wildcard.

So, attached are what zips might look like. There are two columns, one is the profile identifier, the other column will hold many zips, from as low as 5 zips to as many as 500. I have added a comments column to explain how the zips should appear and/or what is wrong.

Thanks in advance!

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

LVL 14

Expert Comment

ID: 23619837
ok... remark on the side: this seems more like a problem for perl... :-)

the challenge, as it seems to me, is more to define how exactly the conversion must be done rather than how to technically do it.  so here's my try to define the conversion "algorithm":

1) try to read 5 chars that are numeric or *:  if a non-numeric is encountered before 5 chars are complete, ignore what's read and goto 3)
2) the 5 chars read above are valid: use them and add a comma and a space
3) read (and ignore) chars until a space or a comma is encountered
4) ignore any commas and spaces until something other is encountered
5) start at 1) again

can you try to confirm if this is what's needed?

also, two more questions:

a. in your excel, what's the meaning of column "ID" (A1234, B1234 ...)? how are these fields stored in the original file? divided by a space from the Zips?  and how do the IDs have to be processed? do they need to be stored in the target file as well?

b. when you write you don't want any carriage returns, does that mean that you want all encountered zips "in a row" without any carriage returns?


Author Comment

ID: 23621675

Yes...you got the algorim correct as described in 1-5.

Answers to your questions:
a. the ID maps back to the user's profile.  so what is in column B (the zips) will be imported back into the system under the identifier which is column A
b. yes...in some of the rows, which I didn't add to the examples, there are carraige returns in the cell.  So, column A has the ID and column B has all the zips, some of which need to be cleaned up ..in this same column there are sometimes carraige returns which also need to be removed.

LVL 14

Accepted Solution

bluelizard earned 1500 total points
ID: 23626097
ok... did i understand you right that the original file is a CSV file?  so, what you really have in the file is, for example, this here (with [CR] i mean carriage return):

A1234, 60016 60018 60025 60067 60068 60069 60074 60076 60077 60091 [CR]
B1234, 113**, 115**, 110**, 114**, 100**, 105**, 07030, 073**, 076**, 07093, 07047 [CR]
C1234, 60618 Chicago [CR]


is that correct? or is there a different divider (semicolon, tab char etc.) between the ID colum and the zip column?

also: you write that the zip column might contain carriage returns: now, removing them is not a problem, but the question is: how can the code distinguish between a "wrong" and a "right" carriage return? for illustration, consider this example:

A1234, A1234, 60016 60018 60025 60067 [CR]
B1234, 07719, 0771, [CR]
07713, 07713 [CR]

say the middle [CR] is a "wrong" one that needs to be removed and the zips in the next line belong to ID B1234... but how can the code know that?  i mean, how can the code tell that the first thing in the next line, "07713", isn't just the ID of the next line, but another zip?  is there a specific method how the zip can be recognized?


Author Comment

ID: 23774474
bluelizzard -- sorry for the delay in my response. I got pulled onto another project then traveled for business.  I am now back.

Okay -- yes your question about CR is correct -- there will be some in the file, all data should be in a column -- with no carrange returns...there is no right carraige return

there is a column between the ID column and the zip column (no space, comman...etc)

Hope this helps.

LVL 14

Expert Comment

ID: 23781773
ok... there are two things that i still don't understand:

when you write "there is a column between the ID column and the zip column", what do you mean?  what is between the last character of the ID and the first character of the first ZIP?  a space?  a comma? or something else?

when the ZIPs contain carriage returns, the problem is how to distinguish this from a "real" new line (in a CSV file, there are no columns like in an excel file, so you can't tell if the CR means a new line or just a CR within the ZIPs).  so how can i tell that the start of the new line is a new ID or just a "continued" ZIP?

i think it would help if you posted a part of an actual file that you want access to read.


Author Closing Comment

ID: 31545068
Wanted to close this question.  Your info/concepts was helpful and I appreciate your time.

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
If you’re using QODBC to update QuickBooks data from Microsoft® Access but Access is not showing the updated data, you could have set up QODBC incorrectly.
Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question