vb.net problem removing weird characters in text

I have a letter that I have read into memory. I then split the contents into an array on vbcrlf.

The problem is that there are some wierd characters that are not line feeds that I don't know how to get rid of.

The original text in a word doc looks like this:


RE: Rosa Gutierrez Amezquita
DOB: March 30, 1940
MR#: 55555


REASON FOR CONSULTATION: I was asked to see this patient by Dr. Maribel Flores for evaluation and management of endstage renal disease. .....

Using a converter component I convert it to a straight text file.

When splitting I get the below. How can I determine what kind of characters these are and remove them or convert them to vbcrlf?  I opened the file in a hex editor and the characters showed up as 2 dots with the values 0d 0a.


 split text
rutledgjAsked:
Who is Participating?
 
CodeCruiserCommented:
Try reading ASCII value of this character using asc function
0
 
CodeCruiserCommented:
You may want to test  vbcrlf as well as VBCR, VBLF.
0
 
rutledgjAuthor Commented:
yes. I tried vbcrlf, vbcr,vblf,vbnewline with no luck
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

 
yatin_81Commented:
Filter out ascii values which do not fall within the range on (A-Z, a-z, 0-9, and the special characters you need). You can write a function to remove these characters
0
 
rutledgjAuthor Commented:
This helped. It was a vertical tab
0
 
Jacques Bourgeois (James Burger)PresidentCommented:
Word does not save its data in a standard text format. Along a CRLF, it records information about the format of the paragraph. You might have problems removing that.

Instead of opening the document in memory, why don't you connect to Word and retrieve the information through the standard Word API, which is the standard practice. This takes care of all the extras. When you call the Text property of a Range, it send it back to you the way you expect it.

If you do not do this because Word is not installed on your users computers, you might be able to do it by working with the word processor that they have. Many word processors can read a Word document. There are also a few tools on the marked and probably available with an Open Source licence that enables you to read Word documents.
0
 
rutledgjAuthor Commented:
Thanks for your info. We do not install word on our servers which is where this process will run. The only solution I have found is to use Aspose.Word.Net and save it as a text file and work with that. I've had no luck finding free tools that can do this and preserve the auto-generated numbers/bullets in the Word document.
0
 
Jacques Bourgeois (James Burger)PresidentCommented:
So your strange characters come from the Aspose conversion to text, not from Word. If so, I would search the Aspose documentation for the cause of these.

"the characters showed up as 2 dots with the values 0d 0a." that you mentioned in your original question are hexadecimal 0D 0A, characters 13 and 10 in ASCII, a carriage return followed by a line feed, the standard way of terminating a line. They should give you the line change you are going after.

Something should couvert them between the file and the memory. How to you read the file in memory?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.