# remove backwards question mark inside box character from ASCI file

I have a file that is picking up a character that, when viewed in excel is a backwards question mark inside a box.  In Unix, the character appears like a backwards question mark.  I want to remove the backwards question mark from the ASCII file.  Is there a sed command to do this?  I cannot get it do display in this example.  It appears below the code markers below.  NOT an upside down question mark.  The question mark is actually facing the opposite direction.

[
``````¿
``````
###### Who is Participating?

Hmm, A0 is octal 240 - this is removing the special character just fine:
``````tr -d '\240' < test.txt
``````
output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000

@duncan_roe - sed is understanding hex, this works as well:
``````sed 's/\xa0//g' test.txt
``````
output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000
0

Commented:
I had the same issue recently, creating a .csv file in Unix to be imported into Excel; I can't remember what character combo generated it (probably a "\0nn" being interpreted as a special character), but it should be obvious if you `vi` the source file and display special characters using ":set list"

Once you know the character combo that is generating the odd displayed character, you can remove it used sed

Hope that helps!
0

Author Commented:
It is a backward question mark.  No other characters are present.  If I cut and paste it back into vi, the question mark changes back to normal.
0

Commented:
Can you upload the file (with anything sensitive removed)?
0

PresidentCommented:
do an od or hd command to get a hex dump to see what the actual value is.   You just don't want to arbitrarily cut off the last char of a file, because depending on the file type this may be an end-of-file indicator.  (trim it off and your file is munged and it will break things)
0

Commented:
First I would double check that there are not actually three bytes.  This character is part of the UTF-8 byte order mark, which is inserted by some software when writing a UTF-8 file.  These characters are EF BB BF which appear at the beginning of the file.

If it is a BOM, it can safely be discarded using a number of techniques:

Using awk/sed to detect/remove the byte order mark (BOM)
http://muzso.hu/2011/11/08/using-awk-sed-to-detect-remove-the-byte-order-mark-bom
0

ASCII for upside down question mark is 168 or a8 hex, so removing with sed would be like:
``````sed -i 's/\xa8//g' <file_name>
``````
This would remove all upside down characters from your file file_name

To try before changing your file, leave out the -i

A tr alternative:
``````tr -d '\250' < file_name > new_file_name
``````
0

Author Commented:
In ultraedit, it looks like a space.  in vi it is a backwards question mark.  For example, the character following UNI61130022DD in column 24.  In hex it is a0.
test.txt
0

Author Commented:
The hex dump reports it in hex as a0.
Tried this:  sed 's/\\xa0/\\x20/g' test2.txt > testout.txt
Using KSH.  Writes the testout.txt, but leaves the backwards question mark a0 in the file.
0

Software DeveloperCommented:
The character in your original post is octal 277. You can always tell what the character is if the file is displaying:
Select (highlight) the character
In a bash shell window, type Control-v, then paste the character (middle button)
bash will echo the interpretation of the character
sed does not itself understand octal or hex escapes. But you can get bash to do it
``````sed \$'s/\xa0/\x20/g' test2.txt > testout.text
``````
The trick is to use \$' ... '
0

Software DeveloperCommented:
Yes sed does understand hex escapes. I tested with octal, which it doesn't seem to understand :(
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.