• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 592
  • Last Modified:

REPLACE

Hi Experts,

I have data coming in ... but it has non-ascii characters in some places. How do I get rid of those or replace with ASCII equivalent ones if
it is possibly ?

Thank you.
0
fpoyavo
Asked:
fpoyavo
  • 2
  • 2
  • 2
  • +2
1 Solution
 
CJCraftCommented:
The following code snippet should give you the idea.

The idea is go through the string and see if it ascii character value in in range of values you consider valid.
You may need to post more details before you get the answer you are after.

byte[] bScrubbed = new byte[(int)dwBytesRead];
int nPos = 0;

// Scrub the non-ascii characters
for (int i = 0; i < (int)dwBytesRead; i ++)
{
   if (((int)bData[i] > 19) && ((int)bData[i] < 125))
   {
      bScrubbed[nPos] = bData[i];
      nPos++;
   }
}
0
 
gregoryyoungCommented:
you can do this using the ASCII encoding class ... I can give you specific code to handle the case if you can tell me what encoding you are coming from (i.e. UTF 16) ...
0
 
fpoyavoAuthor Commented:
Gregory,

Yep. Its UTF 16.
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
tzxie2000Commented:
suggest the data you receive is correct data but in UTF16 and it saved in s[]

char [] s;

//some code to receive data in s

Encoder UTFEncoder = Encoding.UTF16.GetEncoder();

int byteCount = UTFEncoder.GetByteCount(s, 0, s.Length, true);
Byte[] bytes = new Byte[byteCount];
int bytesEncodedCount = UTFEncoder.GetBytes(s, 0,s.Length, bytes, 0, true);
//ok it change to UTF16
// bytesEncodedCount is the real changed bytes number
 Console.WriteLine("{0} bytes used to encode characters.", bytesEncodedCount );
//show the encoded bytes
Console.Write("Encoded bytes: ");
  foreach (Byte b in s) {
        Console.Write("[{0}]", b);
   }





UTFEncoder


0
 
gregoryyoungCommented:
once you have them in a byte array as displayed by tzxie2000 you can then get them into an ASCII string by using the ASCII encoder object .GetString() method.
note that null characters cause issues with strings in .net so they should be removed prior to this process (C style strings)
0
 
ptmcompCommented:
string result = Regex.Replace(input, @"[^\x20-\x7e]", "");  // removes all chars that are not in the range 0x20 - 0x7e
0
 
fpoyavoAuthor Commented:
ptmcomp,

You are good. How about to replace using regex any special character with \\special character ?
Example :  $    to   \\$

Thank you.
0
 
ptmcompCommented:
I'm not sure if I understand your question.
If you want to replace "$" by "\\$" and let's say "@" by "\\@" then you could use this:  
string result = Regex.Replace(input, @"[$@]", @"\\$0");
Note: "$0" stands for the string matched by the expression
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 2
  • 2
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now