vb.net problem removing weird characters in text

Posted on 2011-10-10
Medium Priority
Last Modified: 2012-05-12
I have a letter that I have read into memory. I then split the contents into an array on vbcrlf.

The problem is that there are some wierd characters that are not line feeds that I don't know how to get rid of.

The original text in a word doc looks like this:

RE: Rosa Gutierrez Amezquita
DOB: March 30, 1940
MR#: 55555

REASON FOR CONSULTATION: I was asked to see this patient by Dr. Maribel Flores for evaluation and management of endstage renal disease. .....

Using a converter component I convert it to a straight text file.

When splitting I get the below. How can I determine what kind of characters these are and remove them or convert them to vbcrlf?  I opened the file in a hex editor and the characters showed up as 2 dots with the values 0d 0a.

 split text
Question by:rutledgj
  • 3
  • 2
  • 2
  • +1
LVL 83

Expert Comment

ID: 36944476
You may want to test  vbcrlf as well as VBCR, VBLF.

Author Comment

ID: 36944490
yes. I tried vbcrlf, vbcr,vblf,vbnewline with no luck
LVL 83

Accepted Solution

CodeCruiser earned 1000 total points
ID: 36944535
Try reading ASCII value of this character using asc function
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.


Expert Comment

ID: 36944593
Filter out ascii values which do not fall within the range on (A-Z, a-z, 0-9, and the special characters you need). You can write a function to remove these characters

Author Closing Comment

ID: 36944644
This helped. It was a vertical tab
LVL 40
ID: 36944878
Word does not save its data in a standard text format. Along a CRLF, it records information about the format of the paragraph. You might have problems removing that.

Instead of opening the document in memory, why don't you connect to Word and retrieve the information through the standard Word API, which is the standard practice. This takes care of all the extras. When you call the Text property of a Range, it send it back to you the way you expect it.

If you do not do this because Word is not installed on your users computers, you might be able to do it by working with the word processor that they have. Many word processors can read a Word document. There are also a few tools on the marked and probably available with an Open Source licence that enables you to read Word documents.

Author Comment

ID: 36944896
Thanks for your info. We do not install word on our servers which is where this process will run. The only solution I have found is to use Aspose.Word.Net and save it as a text file and work with that. I've had no luck finding free tools that can do this and preserve the auto-generated numbers/bullets in the Word document.
LVL 40
ID: 36945100
So your strange characters come from the Aspose conversion to text, not from Word. If so, I would search the Aspose documentation for the cause of these.

"the characters showed up as 2 dots with the values 0d 0a." that you mentioned in your original question are hexadecimal 0D 0A, characters 13 and 10 in ASCII, a carriage return followed by a line feed, the standard way of terminating a line. They should give you the line change you are going after.

Something should couvert them between the file and the memory. How to you read the file in memory?

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since .Net 2.0, Visual Basic has made it easy to create a splash screen and set it via the "Splash Screen" drop down in the Project Properties.  A splash screen set in this manner is automatically created, displayed and closed by the framework itsel…
Microsoft Reports are based on a report definition, which is an XML file that describes data and layout for the report, with a different extension. You can create a client-side report definition language (*.rdlc) file with Visual Studio, and build g…
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Suggested Courses
Course of the Month16 days, 7 hours left to enroll

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question