Help debugging malformed data file using C#

Posted on 2005-04-13
Last Modified: 2010-04-16
I have been tasked with reading in a data file from a TCP/IP stream of data.

The data coming in follows a schema based on WITS (Wellsite Information Transfer Standard, website:

The data coming in follows this general format:

XX - the record type
XX - the item type
XXXX - the data

XX - the record type
XX - the item type
XXXX - the data


Each record starts with a "&&" to delimit the beginning of the record, and a "!!" to delimit the end of the record

For the most part.......the C# code I have written seems to handle this format just fine....and 9 times out of ten the record reads in just fine.

During my testing......I have run into a problem though.

For some reason my READ-IN of the data is being STOPPED.  I am speculating that it is due to this weird "special character" that I can see in the file (when I open the file in Notepad).  The character is like a BOX.....which as I understand what is displayed by the font when it does not have the ability to represent the character.

I have posted online an excerpt of the file.....hoping that the special character will SHOW UP in Notepad when whomever helps me with this problem opens the file.

I am wanting to know.....what is this character?

My other related question....and probably more important can I clear the file of ALL NON alpha numeric characters?  The only characters I want to include are Aa thru Zz, 0 thru 9, .  (period), & (ampersand) and ! (exclamation mark).  All other characters can be cleaned away.

//File excerpt.....


!!   <-----------the special "box" character is before the two !! characters

Here is the file itself.....the actual file is BIG...2 MB.   This is just a small bit of that file...enough to show the special character (I hope):
Question by:knowlton
    LVL 21

    Accepted Solution

    hello again,
    there can be again than one solution but since your prefer using regular expressions, here it is:

    Regex myRegex = new Regex( "[^A-Za-z0-9.!|&]" );
    string input = GetYourInput();
    string input = myRegex.Replace( input, "" );

    you can use \d and \w but since different "flavours" of regex engines can interprate \w differently, i'd stick to using streight and simple way given above.


    p.s. &#9786; is most likely either carriage return or new line...
    LVL 5

    Author Comment

    Thanks for the RegEx!!!


    The problem actually turned out to be further along in the data stream than I originally thought.

    Featured Post

    Free Trending Threat Insights Every Day

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    Join & Write a Comment

    Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
    We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
    It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
    This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor ( If you're looking for how to monitor bandwidth using netflow or packet s…

    731 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    15 Experts available now in Live!

    Get 1:1 Help Now