Solved

Replacing ascii with latin-1 characters with a dictionary and regex?

Posted on 2006-11-14
2
452 Views
Last Modified: 2012-06-21
In Perl I used this for replacing ascii text with latin-1 characters:

my %latin=(nbsp=>' ',iexcl=>'¡',cent=>'¢',pound=>'£',curren=>'¤',yen=>'¥',brvbar=>'¦',sect=>'§',uml=>'¨',copy=>'©',ordf=>'ª',laquo=>'«',
not=>'¬',shy=>'­',reg=>'®',macr=>'¯',deg=>'°',plusmn=>'±',sup2=>'²',sup3=>'³',acute=>'´',micro=>'µ',
para=>'¶',middot=>'·',cedil=>'¸',sup1=>'¹',ordm=>'º',raquo=>'»',frac14=>'¼',frac12=>'½',frac34=>'¾',
iquest=>'¿',Agrave=>'À',Aacute=>'Á',Acirc=>'Â',Atilde=>'Ã',Auml=>'Ä',Aring=>'Å',AElig=>'Æ',Ccedil=>'Ç',
Egrave=>'È',Eacute=>'É',Ecirc=>'Ê',Euml=>'Ë',Igrave=>'Ì',Iacute=>'Í',Icirc=>'Î',Iuml=>'Ï',ETH=>'Ð',Ntilde=>'Ñ',
Ograve=>'Ò',Oacute=>'Ó',Ocirc=>'Ô',Otilde=>'Õ',Ouml=>'Ö',times=>'×',Oslash=>'Ø',Ugrave=>'Ù',Uacute=>'Ú',
Ucirc=>'Û',Uuml=>'Ü',Yacute=>'Ý',THORN=>'Þ',szlig=>'ß',agrave=>'à',aacute=>'á',acirc=>'â',atilde=>'ã',auml=>'ä',
aring=>'å',aelig=>'æ',ccedil=>'ç',egrave=>'è',eacute=>'é',ecirc=>'ê',euml=>'ë',igrave=>'ì',iacute=>'í',icirc=>'î',i
uml=>'ï',eth=>'ð',ntilde=>'ñ',ograve=>'ò',oacute=>'ó',ocirc=>'ô',otilde=>'õ',ouml=>'ö',divide=>'÷',oslash=>'ø',
ugrave=>'ù',uacute=>'ú',ucirc=>'û',uuml=>'ü',yacute=>'ý',thorn=>'þ',yuml=>'ÿ');

$line =~ s/&(nbsp|iexcl|cent|pound|curren|yen|brvbar|sect|uml|copy|ordf|laquo|not|shy|reg|macr|deg|plusmn|
sup2|sup3|acute|micro|µpara|middot|cedil|sup1|ordm|raquo|frac14|frac12|frac34|iquest|Agrave|Aacute|Acirc|
Atilde|Auml|Aring|AElig|Ccedil|Egrave|Eacute|Ecirc|Euml|Igrave|Iacute|Icirc|Iuml|ETH|Ntilde|Ograve|Oacute|Ocirc
|Otilde|Ouml|times|Oslash|Ugrave|Uacute|Ucirc|Uuml|Yacute|THORN|szlig|agrave|aacute|acirc|atilde|auml|aring|
aelig|ccedil|egrave|eacute|ecirc|euml|igrave|acute|icirc|iuml|eth|ntilde|ograve|oacute|ocirc|otilde|ouml|divide|
oslash|ugrave|uacute|ucirc|uuml|yacute|thorn|yuml)\;/$latin{$1}/g;

 The converted dictionary from above looks like this:
latin1={"nbsp":" ", "iexcl":"¡", "cent":"¢", "pound":"£", "curren":"¤", "yen":"¥", "brvbar":"¦", "sect":"§", "uml":"¨", "copy":"©", "ordf":"ª", "laquo":"«", "not":"¬", "shy":"­", "reg":"®", "macr":"¯", "deg":"°", "plusmn":"±", "sup2":"²", "sup3":"³", "acute":"´", "micro":"µ", "para":"¶", "middot":"·", "cedil":"¸", "sup1":"¹", "ordm":"º", "raquo":"»", "frac14":"¼", "frac12":"½", "frac34":"¾", "iquest":"¿", "Agrave":"À", "Aacute":"Á", "Acirc":"Â", "Atilde":"Ã", "Auml":"Ä", "Aring":"Å", "AElig":"Æ", "Ccedil":"Ç", "Egrave":"È", "Eacute":"É", "Ecirc":"Ê", "Euml":"Ë", "Igrave":"Ì", "Iacute":"Í", "Icirc":"Î", "Iuml":"Ï", "ETH":"Ð", "Ntilde":"Ñ", "Ograve":"Ò", "Oacute":"Ó", "Ocirc":"Ô", "Otilde":"Õ", "Ouml":"Ö", "times":"×", "Oslash":"Ø", "Ugrave":"Ù", "Uacute":"Ú", "Ucirc":"Û", "Uuml":"Ü", "Yacute":"Ý", "THORN":"Þ", "szlig":"ß", "agrave":"à", "aacute":"á", "acirc":"â", "atilde":"ã", "auml":"ä", "aring":"å", "aelig":"æ", "ccedil":"ç", "egrave":"è", "eacute":"é", "ecirc":"ê", "euml":"ë", "igrave":"ì", "iacute":"í", "icirc":"î", "iuml":"ï", "eth":"ð", "ntilde":"ñ", "ograve":"ò", "oacute":"ó", "ocirc":"ô", "otilde":"õ", "ouml":"ö", "divide":"÷", "oslash":"ø", "ugrave":"ù", "uacute":"ú", "ucirc":"û", "uuml":"ü", "yacute":"ý", "thorn":"þ", "yuml":"ÿ"}

The regular expression, when finding nbsp, iexcl, cent, etc in $line, would replace it for it's value from the associative array %latin, and display the character correctly. If possible, I would like to do the same thing in Python, however I'm not sure there's that $1 variable in regular expressions.. is there a way I can replicate this behaviour in Python?
Thanks!
0
Comment
Question by:Tabris42
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 17944147
0
 

Author Comment

by:Tabris42
ID: 17946161
Sorry.. I didn't realize I posted this to the general Languages board.. meant to stick it in Python
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

If you haven’t already, I encourage you to read the first article (http://www.experts-exchange.com/articles/18680/An-Introduction-to-R-Programming-and-R-Studio.html) in my series to gain a basic foundation of R and R Studio.  You will also find the …
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to implement Singleton Design Pattern in Java.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now