Solved

Replacing ascii with latin-1 characters with a dictionary and regex?

Posted on 2006-11-14
2
454 Views
Last Modified: 2012-06-21
In Perl I used this for replacing ascii text with latin-1 characters:

my %latin=(nbsp=>' ',iexcl=>'¡',cent=>'¢',pound=>'£',curren=>'¤',yen=>'¥',brvbar=>'¦',sect=>'§',uml=>'¨',copy=>'©',ordf=>'ª',laquo=>'«',
not=>'¬',shy=>'­',reg=>'®',macr=>'¯',deg=>'°',plusmn=>'±',sup2=>'²',sup3=>'³',acute=>'´',micro=>'µ',
para=>'¶',middot=>'·',cedil=>'¸',sup1=>'¹',ordm=>'º',raquo=>'»',frac14=>'¼',frac12=>'½',frac34=>'¾',
iquest=>'¿',Agrave=>'À',Aacute=>'Á',Acirc=>'Â',Atilde=>'Ã',Auml=>'Ä',Aring=>'Å',AElig=>'Æ',Ccedil=>'Ç',
Egrave=>'È',Eacute=>'É',Ecirc=>'Ê',Euml=>'Ë',Igrave=>'Ì',Iacute=>'Í',Icirc=>'Î',Iuml=>'Ï',ETH=>'Ð',Ntilde=>'Ñ',
Ograve=>'Ò',Oacute=>'Ó',Ocirc=>'Ô',Otilde=>'Õ',Ouml=>'Ö',times=>'×',Oslash=>'Ø',Ugrave=>'Ù',Uacute=>'Ú',
Ucirc=>'Û',Uuml=>'Ü',Yacute=>'Ý',THORN=>'Þ',szlig=>'ß',agrave=>'à',aacute=>'á',acirc=>'â',atilde=>'ã',auml=>'ä',
aring=>'å',aelig=>'æ',ccedil=>'ç',egrave=>'è',eacute=>'é',ecirc=>'ê',euml=>'ë',igrave=>'ì',iacute=>'í',icirc=>'î',i
uml=>'ï',eth=>'ð',ntilde=>'ñ',ograve=>'ò',oacute=>'ó',ocirc=>'ô',otilde=>'õ',ouml=>'ö',divide=>'÷',oslash=>'ø',
ugrave=>'ù',uacute=>'ú',ucirc=>'û',uuml=>'ü',yacute=>'ý',thorn=>'þ',yuml=>'ÿ');

$line =~ s/&(nbsp|iexcl|cent|pound|curren|yen|brvbar|sect|uml|copy|ordf|laquo|not|shy|reg|macr|deg|plusmn|
sup2|sup3|acute|micro|µpara|middot|cedil|sup1|ordm|raquo|frac14|frac12|frac34|iquest|Agrave|Aacute|Acirc|
Atilde|Auml|Aring|AElig|Ccedil|Egrave|Eacute|Ecirc|Euml|Igrave|Iacute|Icirc|Iuml|ETH|Ntilde|Ograve|Oacute|Ocirc
|Otilde|Ouml|times|Oslash|Ugrave|Uacute|Ucirc|Uuml|Yacute|THORN|szlig|agrave|aacute|acirc|atilde|auml|aring|
aelig|ccedil|egrave|eacute|ecirc|euml|igrave|acute|icirc|iuml|eth|ntilde|ograve|oacute|ocirc|otilde|ouml|divide|
oslash|ugrave|uacute|ucirc|uuml|yacute|thorn|yuml)\;/$latin{$1}/g;

 The converted dictionary from above looks like this:
latin1={"nbsp":" ", "iexcl":"¡", "cent":"¢", "pound":"£", "curren":"¤", "yen":"¥", "brvbar":"¦", "sect":"§", "uml":"¨", "copy":"©", "ordf":"ª", "laquo":"«", "not":"¬", "shy":"­", "reg":"®", "macr":"¯", "deg":"°", "plusmn":"±", "sup2":"²", "sup3":"³", "acute":"´", "micro":"µ", "para":"¶", "middot":"·", "cedil":"¸", "sup1":"¹", "ordm":"º", "raquo":"»", "frac14":"¼", "frac12":"½", "frac34":"¾", "iquest":"¿", "Agrave":"À", "Aacute":"Á", "Acirc":"Â", "Atilde":"Ã", "Auml":"Ä", "Aring":"Å", "AElig":"Æ", "Ccedil":"Ç", "Egrave":"È", "Eacute":"É", "Ecirc":"Ê", "Euml":"Ë", "Igrave":"Ì", "Iacute":"Í", "Icirc":"Î", "Iuml":"Ï", "ETH":"Ð", "Ntilde":"Ñ", "Ograve":"Ò", "Oacute":"Ó", "Ocirc":"Ô", "Otilde":"Õ", "Ouml":"Ö", "times":"×", "Oslash":"Ø", "Ugrave":"Ù", "Uacute":"Ú", "Ucirc":"Û", "Uuml":"Ü", "Yacute":"Ý", "THORN":"Þ", "szlig":"ß", "agrave":"à", "aacute":"á", "acirc":"â", "atilde":"ã", "auml":"ä", "aring":"å", "aelig":"æ", "ccedil":"ç", "egrave":"è", "eacute":"é", "ecirc":"ê", "euml":"ë", "igrave":"ì", "iacute":"í", "icirc":"î", "iuml":"ï", "eth":"ð", "ntilde":"ñ", "ograve":"ò", "oacute":"ó", "ocirc":"ô", "otilde":"õ", "ouml":"ö", "divide":"÷", "oslash":"ø", "ugrave":"ù", "uacute":"ú", "ucirc":"û", "uuml":"ü", "yacute":"ý", "thorn":"þ", "yuml":"ÿ"}

The regular expression, when finding nbsp, iexcl, cent, etc in $line, would replace it for it's value from the associative array %latin, and display the character correctly. If possible, I would like to do the same thing in Python, however I'm not sure there's that $1 variable in regular expressions.. is there a way I can replicate this behaviour in Python?
Thanks!
0
Comment
Question by:Tabris42
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 17944147
0
 

Author Comment

by:Tabris42
ID: 17946161
Sorry.. I didn't realize I posted this to the general Languages board.. meant to stick it in Python
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
scoresSpecial  challenge 13 51
changeXy challenge 13 81
edit mobile phone number in office 365 15 64
Delphi: barcode reading on android platform 1 28
If you haven’t already, I encourage you to read the first article (http://www.experts-exchange.com/articles/18680/An-Introduction-to-R-Programming-and-R-Studio.html) in my series to gain a basic foundation of R and R Studio.  You will also find the …
The purpose of this article is to demonstrate how we can use conditional statements using Python.
This video teaches viewers about errors in exception handling.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question