?
Solved

Line Input with invalid characters

Posted on 2005-03-05
7
Medium Priority
?
427 Views
Last Modified: 2010-05-02
Well, well, well...
Im working on a aplication that needs to read some data from txt files.
But that files are like crap. A lot of invalid characters appears on the file.
So i was trying to treat the code, to show on witch line a invalid character appears.
I could observe that some characters (/ # $ =) i could easily change for a space " ". Thats could allow me to continue reading all the fields without no problem.
But , in some lines appears characters like hearts, diamonds , spades, etc... In that case i did notice that this characters delete one character. (Sorry about bad english). For example when i have a heart on the line, if i use len(string), i could receive a invalide result, like 9, and if i open the edit(msdos), and count collum by collum, i could count 10. So in that case i think i could change the character for 2 spaces "  "  (one for the invalid character, and another for the character that dissapear).

But my problem is becamming bigger. I dont know what is happenning, but in some lines, if i open on edit(msdos), i cant see any invalid characters, but if i count all characters on that line i receive a len of 96. But on vb, if i use len(string), it returns just 95 characthers. How the hell should i fix that?? I cant see the invalid characther, but its looks like it exists. if i cant have a fixed length for all lines, i cant read the fields like "name", "phone", using mid().

Please someone help me with this. Maybe u guys already have a code that track for invalid characters and fix the problem. If that invalid characters didnt change the lenght of the line, i could fix it easily.
But with this strange things happens i really need some help.

Thanks...
0
Comment
Question by:Shidartha
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 19

Expert Comment

by:Shauli
ID: 13466085
'this code will replace all weird characters with space

'set a reference to Microsoft VBScript Regular Expression 5.5

Option Explicit

Private Sub Command1_Click()
Dim fileTxt As String, ff As Integer, objRE As New RegExp, strExample As String

'read entire file to memory
ff = FreeFile
Open "you path & FileName" For Binary As #ff
fileTxt = Space(LOF(ff))
Get #ff, , fileTxt
Close #ff

'Example of how to get rid of / # $ = (as these are "legitimate" text characters)
'you can add here any other "legitimate" char
fileTxt = Replace(fileTxt, "/", " ")
fileTxt = Replace(fileTxt, "#", " ")
fileTxt = Replace(fileTxt, "$", " ")
fileTxt = Replace(fileTxt, "=", " ")

'example of how to replace weird characters with space using regualr expression
objRE.Pattern = "[^\u0020-\u007f\t]"
objRE.Global = True
strExample = fileTxt
' Replace occurences of weid characters with "space"
strExample = objRE.Replace(strExample, " ")

'save results in file
Open "you NEW path & FileName" For Binary Access Write As #ff
Put #ff, , strExample
Close #ff
Set objRE = Nothing

MsgBox "Done :)"
End Sub

S
0
 

Author Comment

by:Shidartha
ID: 13466232
Geez Shauli this code is working great i guess.


But i have a problem, the new file, lost the layout completly. U need to fix it, maybe but a break line after read a line, i really dont know how doit cuz i never used binary before. I have more than one register per line.
I cant lost the layout, cuz im using mid() to get the fields.

Please try to fix that.

And i have a doubt:
objRE.Pattern = "[^\u0020-\u007f\t]"    I will always use that pattern?? or i need to change it? I really dont know what is it.
0
 

Author Comment

by:Shidartha
ID: 13466280
I think that the file that u create, all fields are in just one line. Try to fix that, i must that the layout stays like before.

Example

file 1

01012005 JOHN MILTON        3333896
01012005 AL PACINO            3325358
01022005 TONY MONTANA    3354586


file after ur code

01012005 JOHN MILTON        3333896 01012005 AL PACINO            3325358 01022005 TONY MONTANA    3354586



Thanks :P
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 19

Accepted Solution

by:
Shauli earned 2000 total points
ID: 13466329
'Check this one and let me know:

Option Explicit

Private Sub Command1_Click()
Dim fileTxt As String, ff As Integer, objRE As New RegExp, strExample As String

'read entire file to memory
ff = FreeFile
Open "C:\Documents and Settings\Shauli.MOBILE\My Documents\test.txt" For Binary As #ff
fileTxt = Space(LOF(ff))
Get #ff, , fileTxt
Close #ff

'Example of how to get rid of / # $ = (as these are "legitimate" text characters)
'keeps newline char
fileTxt = Replace(fileTxt, vbNewLine, "{nw}")
'you can add here any other "legitimate" char
fileTxt = Replace(fileTxt, "/", " ")
fileTxt = Replace(fileTxt, "#", " ")
fileTxt = Replace(fileTxt, "$", " ")
fileTxt = Replace(fileTxt, "=", " ")

'example of how to replace weird characters with space using regualr expression
objRE.Pattern = "[^\u0020-\u007f\t]"
objRE.Global = True
strExample = fileTxt
' Replace occurences of weid characters with "space"
strExample = objRE.Replace(strExample, " ")
'put back newline
strExample = Replace(strExample, "{nw}", vbNewLine)
'save results in file
ff = FreeFile
Open "C:\Documents and Settings\Shauli.MOBILE\My Documents\testnew.txt" For Binary Access Write As #ff
Put #ff, , strExample
Close #ff
Set objRE = Nothing

MsgBox "Done :)"
End Sub

S
0
 

Author Comment

by:Shidartha
ID: 13466391
Shauli u rox brotha!!!!!


god damt i never could even imagine this solution lol :P


Really really thanks!!!!

Like we say here in Brazil:  Cara você é foda!!!

Translate: U are the man!!

:P

Add me on msn if u want   ShidarthaFR (at) hotmail (dot) com
0
 
LVL 19

Expert Comment

by:Shauli
ID: 13466392
ps.
<<<And i have a doubt:
objRE.Pattern = "[^\u0020-\u007f\t]"    I will always use that pattern?? or i need to change it? I really dont know what is it.>>>
No. you dont have to change it. This pattern takes care of all weird characters. Which means what you call "invalid characters", which are bassically everything which is not between ascii 32 to ascii 127

S
0
 
LVL 19

Expert Comment

by:Shauli
ID: 13466406
Thanks, I'm glad it works for you :)

S
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
Suggested Courses
Course of the Month15 days, left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question