Solved

remove backwards question mark inside box character from ASCI file

Posted on 2013-10-29
13
1,067 Views
Last Modified: 2013-11-06
I have a file that is picking up a character that, when viewed in excel is a backwards question mark inside a box.  In Unix, the character appears like a backwards question mark.  I want to remove the backwards question mark from the ASCII file.  Is there a sed command to do this?  I cannot get it do display in this example.  It appears below the code markers below.  NOT an upside down question mark.  The question mark is actually facing the opposite direction.

[
¿

Open in new window

0
Comment
Question by:eshapley
  • 3
  • 2
  • 2
  • +4
13 Comments
 
LVL 20

Expert Comment

by:tfewster
ID: 39610305
I had the same issue recently, creating a .csv file in Unix to be imported into Excel; I can't remember what character combo generated it (probably a "\0nn" being interpreted as a special character), but it should be obvious if you `vi` the source file and display special characters using ":set list"

Once you know the character combo that is generating the odd displayed character, you can remove it used sed

Hope that helps!
0
 

Author Comment

by:eshapley
ID: 39610328
It is a backward question mark.  No other characters are present.  If I cut and paste it back into vi, the question mark changes back to normal.
0
 
LVL 15

Expert Comment

by:ericpete
ID: 39622645
Can you upload the file (with anything sensitive removed)?
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 47

Expert Comment

by:dlethe
ID: 39622685
do an od or hd command to get a hex dump to see what the actual value is.   You just don't want to arbitrarily cut off the last char of a file, because depending on the file type this may be an end-of-file indicator.  (trim it off and your file is munged and it will break things)
0
 
LVL 38

Expert Comment

by:PaulHews
ID: 39622709
First I would double check that there are not actually three bytes.  This character is part of the UTF-8 byte order mark, which is inserted by some software when writing a UTF-8 file.  These characters are EF BB BF which appear at the beginning of the file.

If it is a BOM, it can safely be discarded using a number of techniques:

Using awk/sed to detect/remove the byte order mark (BOM)
http://muzso.hu/2011/11/08/using-awk-sed-to-detect-remove-the-byte-order-mark-bom
0
 
LVL 37

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 39622811
ASCII for upside down question mark is 168 or a8 hex, so removing with sed would be like:
sed -i 's/\xa8//g' <file_name>

Open in new window

This would remove all upside down characters from your file file_name

To try before changing your file, leave out the -i

A tr alternative:
tr -d '\250' < file_name > new_file_name

Open in new window

0
 

Author Comment

by:eshapley
ID: 39622947
In ultraedit, it looks like a space.  in vi it is a backwards question mark.  For example, the character following UNI61130022DD in column 24.  In hex it is a0.
test.txt
0
 

Author Comment

by:eshapley
ID: 39623028
The hex dump reports it in hex as a0.
Tried this:  sed 's/\\xa0/\\x20/g' test2.txt > testout.txt
Using KSH.  Writes the testout.txt, but leaves the backwards question mark a0 in the file.
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 39623157
The character in your original post is octal 277. You can always tell what the character is if the file is displaying:
Select (highlight) the character
In a bash shell window, type Control-v, then paste the character (middle button)
bash will echo the interpretation of the character
sed does not itself understand octal or hex escapes. But you can get bash to do it
sed $'s/\xa0/\x20/g' test2.txt > testout.text

Open in new window

The trick is to use $' ... '
0
 
LVL 37

Accepted Solution

by:
Gerwin Jansen, EE MVE earned 500 total points
ID: 39624361
Hmm, A0 is octal 240 - this is removing the special character just fine:
tr -d '\240' < test.txt

Open in new window

output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000


@duncan_roe - sed is understanding hex, this works as well:
sed 's/\xa0//g' test.txt

Open in new window

output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 39628795
Yes sed does understand hex escapes. I tested with octal, which it doesn't seem to understand :(
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question