Link to home
Start Free TrialLog in
Avatar of AliciaVee
AliciaVee

asked on

Remove carraige returns and non numeric character

I have a very large file that contains zip codes -- but it needs to be cleaned up.  for the most part, commas are included between the 5 place zip code, but sometimes there is just a space.  I I need the following done:
- remove any characters -- non numeric
- keep a comma if it exists
- if the zip is more than 5 numbers, remove it
- remove carraige returns

A VBA module will work well since the file I have is a CSV and has many many rows so Excel is not an option.  
Avatar of bluelizard
bluelizard
Flag of Switzerland image

basically, you can use the attached VB code to convert the file, however, that sample code only holds an example how to remove non-numerics.  to give you the correct conversion code (where e.g. commas are retained), you need to be a little more specific about the conversion that you need -- it would be best if you posted some examples here.

in the question title you write "remove carriage returns": do you really need that?  without carriage returns, many programs will not be able to read the file's content properly anymore.


--bluelizard

  Dim fh1, fh2 As Integer
  Dim file_line1, file_line2 As String
  Dim i As Integer
 
  fh1 = FreeFile()
  Open "c:\path\origfile.csv" For Input As fh1
 
  fh2 = FreeFile()    
  Open "c:\path\newfile.txt" For Output As fh2
 
  While Not(EOF(fh1))
    ' read a line from orig file
    Line Input #fh1, file_line1
 
    ' process line
 
    ' example: remove non-numerics
    file_line2 = ""
    For i = 1 To Len(file_line1)
      If IsNumeric(Mid$(file_line1, i, 1)) Then
        file_line2 = file_line2 & Mid$(file_line1, i, 1)
      End If
    Next i
 
    ' write line to new file2
    Print #fh2, file_line
  Wend
 
  Close fh1
  Close fh2

Open in new window

oops, sorry: lines 25 and 26 in the above code must be

    ' write line to new file
    Print #fh2, file_line2

(the last "2" must be in line 26, not 25)
Avatar of AliciaVee
AliciaVee

ASKER

bluelizard,

Thanks...okay -- to answer your questions...and hopefully you'll be able to provide a solution, below are examples of how data is stored, and how it should be fixed.  There are 3 general examples.

Also -- the system that needs this data corrected does not use carraige returns well.  Another item that needs addressing is that while I need non-numeric characters stripped, I do need the asterick to remain * that is considered a wildcard.

So, attached are what zips might look like. There are two columns, one is the profile identifier, the other column will hold many zips, from as low as 5 zips to as many as 500. I have added a comments column to explain how the zips should appear and/or what is wrong.

Thanks in advance!

ZipCodeExamples.xls
ok... remark on the side: this seems more like a problem for perl... :-)

the challenge, as it seems to me, is more to define how exactly the conversion must be done rather than how to technically do it.  so here's my try to define the conversion "algorithm":

1) try to read 5 chars that are numeric or *:  if a non-numeric is encountered before 5 chars are complete, ignore what's read and goto 3)
2) the 5 chars read above are valid: use them and add a comma and a space
3) read (and ignore) chars until a space or a comma is encountered
4) ignore any commas and spaces until something other is encountered
5) start at 1) again

can you try to confirm if this is what's needed?

also, two more questions:

a. in your excel, what's the meaning of column "ID" (A1234, B1234 ...)? how are these fields stored in the original file? divided by a space from the Zips?  and how do the IDs have to be processed? do they need to be stored in the target file as well?

b. when you write you don't want any carriage returns, does that mean that you want all encountered zips "in a row" without any carriage returns?


--bluelizard
bluelizzard,

Yes...you got the algorim correct as described in 1-5.

Answers to your questions:
a. the ID maps back to the user's profile.  so what is in column B (the zips) will be imported back into the system under the identifier which is column A
b. yes...in some of the rows, which I didn't add to the examples, there are carraige returns in the cell.  So, column A has the ID and column B has all the zips, some of which need to be cleaned up ..in this same column there are sometimes carraige returns which also need to be removed.

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of bluelizard
bluelizard
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
bluelizzard -- sorry for the delay in my response. I got pulled onto another project then traveled for business.  I am now back.

Okay -- yes your question about CR is correct -- there will be some in the file, all data should be in a column -- with no carrange returns...there is no right carraige return

there is a column between the ID column and the zip column (no space, comman...etc)

Hope this helps.

ok... there are two things that i still don't understand:

1)
when you write "there is a column between the ID column and the zip column", what do you mean?  what is between the last character of the ID and the first character of the first ZIP?  a space?  a comma? or something else?

2)
when the ZIPs contain carriage returns, the problem is how to distinguish this from a "real" new line (in a CSV file, there are no columns like in an excel file, so you can't tell if the CR means a new line or just a CR within the ZIPs).  so how can i tell that the start of the new line is a new ID or just a "continued" ZIP?

i think it would help if you posted a part of an actual file that you want access to read.


--bluelizard
Wanted to close this question.  Your info/concepts was helpful and I appreciate your time.