Solved

remove backwards question mark inside box character from ASCI file

Posted on 2013-10-29
13
1,017 Views
Last Modified: 2013-11-06
I have a file that is picking up a character that, when viewed in excel is a backwards question mark inside a box.  In Unix, the character appears like a backwards question mark.  I want to remove the backwards question mark from the ASCII file.  Is there a sed command to do this?  I cannot get it do display in this example.  It appears below the code markers below.  NOT an upside down question mark.  The question mark is actually facing the opposite direction.

[
¿

Open in new window

0
Comment
Question by:eshapley
  • 3
  • 2
  • 2
  • +4
13 Comments
 
LVL 20

Expert Comment

by:tfewster
ID: 39610305
I had the same issue recently, creating a .csv file in Unix to be imported into Excel; I can't remember what character combo generated it (probably a "\0nn" being interpreted as a special character), but it should be obvious if you `vi` the source file and display special characters using ":set list"

Once you know the character combo that is generating the odd displayed character, you can remove it used sed

Hope that helps!
0
 

Author Comment

by:eshapley
ID: 39610328
It is a backward question mark.  No other characters are present.  If I cut and paste it back into vi, the question mark changes back to normal.
0
 
LVL 15

Expert Comment

by:ericpete
ID: 39622645
Can you upload the file (with anything sensitive removed)?
0
 
LVL 47

Expert Comment

by:dlethe
ID: 39622685
do an od or hd command to get a hex dump to see what the actual value is.   You just don't want to arbitrarily cut off the last char of a file, because depending on the file type this may be an end-of-file indicator.  (trim it off and your file is munged and it will break things)
0
 
LVL 38

Expert Comment

by:PaulHews
ID: 39622709
First I would double check that there are not actually three bytes.  This character is part of the UTF-8 byte order mark, which is inserted by some software when writing a UTF-8 file.  These characters are EF BB BF which appear at the beginning of the file.

If it is a BOM, it can safely be discarded using a number of techniques:

Using awk/sed to detect/remove the byte order mark (BOM)
http://muzso.hu/2011/11/08/using-awk-sed-to-detect-remove-the-byte-order-mark-bom
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 37

Expert Comment

by:Gerwin Jansen
ID: 39622811
ASCII for upside down question mark is 168 or a8 hex, so removing with sed would be like:
sed -i 's/\xa8//g' <file_name>

Open in new window

This would remove all upside down characters from your file file_name

To try before changing your file, leave out the -i

A tr alternative:
tr -d '\250' < file_name > new_file_name

Open in new window

0
 

Author Comment

by:eshapley
ID: 39622947
In ultraedit, it looks like a space.  in vi it is a backwards question mark.  For example, the character following UNI61130022DD in column 24.  In hex it is a0.
test.txt
0
 

Author Comment

by:eshapley
ID: 39623028
The hex dump reports it in hex as a0.
Tried this:  sed 's/\\xa0/\\x20/g' test2.txt > testout.txt
Using KSH.  Writes the testout.txt, but leaves the backwards question mark a0 in the file.
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 39623157
The character in your original post is octal 277. You can always tell what the character is if the file is displaying:
Select (highlight) the character
In a bash shell window, type Control-v, then paste the character (middle button)
bash will echo the interpretation of the character
sed does not itself understand octal or hex escapes. But you can get bash to do it
sed $'s/\xa0/\x20/g' test2.txt > testout.text

Open in new window

The trick is to use $' ... '
0
 
LVL 37

Accepted Solution

by:
Gerwin Jansen earned 500 total points
ID: 39624361
Hmm, A0 is octal 240 - this is removing the special character just fine:
tr -d '\240' < test.txt

Open in new window

output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000


@duncan_roe - sed is understanding hex, this works as well:
sed 's/\xa0//g' test.txt

Open in new window

output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 39628795
Yes sed does understand hex escapes. I tested with octal, which it doesn't seem to understand :(
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now