fpoyavo
asked on
REPLACE
Hi Experts,
I have data coming in ... but it has non-ascii characters in some places. How do I get rid of those or replace with ASCII equivalent ones if
it is possibly ?
Thank you.
I have data coming in ... but it has non-ascii characters in some places. How do I get rid of those or replace with ASCII equivalent ones if
it is possibly ?
Thank you.
you can do this using the ASCII encoding class ... I can give you specific code to handle the case if you can tell me what encoding you are coming from (i.e. UTF 16) ...
ASKER
Gregory,
Yep. Its UTF 16.
Yep. Its UTF 16.
suggest the data you receive is correct data but in UTF16 and it saved in s[]
char [] s;
//some code to receive data in s
Encoder UTFEncoder = Encoding.UTF16.GetEncoder( );
int byteCount = UTFEncoder.GetByteCount(s, 0, s.Length, true);
Byte[] bytes = new Byte[byteCount];
int bytesEncodedCount = UTFEncoder.GetBytes(s, 0,s.Length, bytes, 0, true);
//ok it change to UTF16
// bytesEncodedCount is the real changed bytes number
Console.WriteLine("{0} bytes used to encode characters.", bytesEncodedCount );
//show the encoded bytes
Console.Write("Encoded bytes: ");
foreach (Byte b in s) {
Console.Write("[{0}]", b);
}
UTFEncoder
char [] s;
//some code to receive data in s
Encoder UTFEncoder = Encoding.UTF16.GetEncoder(
int byteCount = UTFEncoder.GetByteCount(s,
Byte[] bytes = new Byte[byteCount];
int bytesEncodedCount = UTFEncoder.GetBytes(s, 0,s.Length, bytes, 0, true);
//ok it change to UTF16
// bytesEncodedCount is the real changed bytes number
Console.WriteLine("{0} bytes used to encode characters.", bytesEncodedCount );
//show the encoded bytes
Console.Write("Encoded bytes: ");
foreach (Byte b in s) {
Console.Write("[{0}]", b);
}
UTFEncoder
once you have them in a byte array as displayed by tzxie2000 you can then get them into an ASCII string by using the ASCII encoder object .GetString() method.
note that null characters cause issues with strings in .net so they should be removed prior to this process (C style strings)
note that null characters cause issues with strings in .net so they should be removed prior to this process (C style strings)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ptmcomp,
You are good. How about to replace using regex any special character with \\special character ?
Example : $ to \\$
Thank you.
You are good. How about to replace using regex any special character with \\special character ?
Example : $ to \\$
Thank you.
I'm not sure if I understand your question.
If you want to replace "$" by "\\$" and let's say "@" by "\\@" then you could use this:
string result = Regex.Replace(input, @"[$@]", @"\\$0");
Note: "$0" stands for the string matched by the expression
If you want to replace "$" by "\\$" and let's say "@" by "\\@" then you could use this:
string result = Regex.Replace(input, @"[$@]", @"\\$0");
Note: "$0" stands for the string matched by the expression
The idea is go through the string and see if it ascii character value in in range of values you consider valid.
You may need to post more details before you get the answer you are after.
byte[] bScrubbed = new byte[(int)dwBytesRead];
int nPos = 0;
// Scrub the non-ascii characters
for (int i = 0; i < (int)dwBytesRead; i ++)
{
if (((int)bData[i] > 19) && ((int)bData[i] < 125))
{
bScrubbed[nPos] = bData[i];
nPos++;
}
}