Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

remove backwards question mark inside box character from ASCI file

Posted on 2013-10-29
13
Medium Priority
?
1,284 Views
Last Modified: 2013-11-06
I have a file that is picking up a character that, when viewed in excel is a backwards question mark inside a box.  In Unix, the character appears like a backwards question mark.  I want to remove the backwards question mark from the ASCII file.  Is there a sed command to do this?  I cannot get it do display in this example.  It appears below the code markers below.  NOT an upside down question mark.  The question mark is actually facing the opposite direction.

[
¿

Open in new window

0
Comment
Question by:eshapley
  • 3
  • 2
  • 2
  • +4
13 Comments
 
LVL 21

Expert Comment

by:tfewster
ID: 39610305
I had the same issue recently, creating a .csv file in Unix to be imported into Excel; I can't remember what character combo generated it (probably a "\0nn" being interpreted as a special character), but it should be obvious if you `vi` the source file and display special characters using ":set list"

Once you know the character combo that is generating the odd displayed character, you can remove it used sed

Hope that helps!
0
 

Author Comment

by:eshapley
ID: 39610328
It is a backward question mark.  No other characters are present.  If I cut and paste it back into vi, the question mark changes back to normal.
0
 
LVL 15

Expert Comment

by:Eric AKA Netminder
ID: 39622645
Can you upload the file (with anything sensitive removed)?
0
Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

 
LVL 47

Expert Comment

by:David
ID: 39622685
do an od or hd command to get a hex dump to see what the actual value is.   You just don't want to arbitrarily cut off the last char of a file, because depending on the file type this may be an end-of-file indicator.  (trim it off and your file is munged and it will break things)
0
 
LVL 38

Expert Comment

by:PaulHews
ID: 39622709
First I would double check that there are not actually three bytes.  This character is part of the UTF-8 byte order mark, which is inserted by some software when writing a UTF-8 file.  These characters are EF BB BF which appear at the beginning of the file.

If it is a BOM, it can safely be discarded using a number of techniques:

Using awk/sed to detect/remove the byte order mark (BOM)
http://muzso.hu/2011/11/08/using-awk-sed-to-detect-remove-the-byte-order-mark-bom
0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 39622811
ASCII for upside down question mark is 168 or a8 hex, so removing with sed would be like:
sed -i 's/\xa8//g' <file_name>

Open in new window

This would remove all upside down characters from your file file_name

To try before changing your file, leave out the -i

A tr alternative:
tr -d '\250' < file_name > new_file_name

Open in new window

0
 

Author Comment

by:eshapley
ID: 39622947
In ultraedit, it looks like a space.  in vi it is a backwards question mark.  For example, the character following UNI61130022DD in column 24.  In hex it is a0.
test.txt
0
 

Author Comment

by:eshapley
ID: 39623028
The hex dump reports it in hex as a0.
Tried this:  sed 's/\\xa0/\\x20/g' test2.txt > testout.txt
Using KSH.  Writes the testout.txt, but leaves the backwards question mark a0 in the file.
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 39623157
The character in your original post is octal 277. You can always tell what the character is if the file is displaying:
Select (highlight) the character
In a bash shell window, type Control-v, then paste the character (middle button)
bash will echo the interpretation of the character
sed does not itself understand octal or hex escapes. But you can get bash to do it
sed $'s/\xa0/\x20/g' test2.txt > testout.text

Open in new window

The trick is to use $' ... '
0
 
LVL 38

Accepted Solution

by:
Gerwin Jansen, EE MVE earned 2000 total points
ID: 39624361
Hmm, A0 is octal 240 - this is removing the special character just fine:
tr -d '\240' < test.txt

Open in new window

output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000


@duncan_roe - sed is understanding hex, this works as well:
sed 's/\xa0//g' test.txt

Open in new window

output:

DET0005   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         50000
DET0008   UNI61130022DD                                                                                                                            EA 0000000000000020000000000024.5                                                                                      4144874                         80000
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 39628795
Yes sed does understand hex escapes. I tested with octal, which it doesn't seem to understand :(
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question