Solved

Replacing ascii with latin-1 characters with a dictionary and regex?

Posted on 2006-11-14
2
446 Views
Last Modified: 2012-06-21
In Perl I used this for replacing ascii text with latin-1 characters:

my %latin=(nbsp=>' ',iexcl=>'¡',cent=>'¢',pound=>'£',curren=>'¤',yen=>'¥',brvbar=>'¦',sect=>'§',uml=>'¨',copy=>'©',ordf=>'ª',laquo=>'«',
not=>'¬',shy=>'­',reg=>'®',macr=>'¯',deg=>'°',plusmn=>'±',sup2=>'²',sup3=>'³',acute=>'´',micro=>'µ',
para=>'¶',middot=>'·',cedil=>'¸',sup1=>'¹',ordm=>'º',raquo=>'»',frac14=>'¼',frac12=>'½',frac34=>'¾',
iquest=>'¿',Agrave=>'À',Aacute=>'Á',Acirc=>'Â',Atilde=>'Ã',Auml=>'Ä',Aring=>'Å',AElig=>'Æ',Ccedil=>'Ç',
Egrave=>'È',Eacute=>'É',Ecirc=>'Ê',Euml=>'Ë',Igrave=>'Ì',Iacute=>'Í',Icirc=>'Î',Iuml=>'Ï',ETH=>'Ð',Ntilde=>'Ñ',
Ograve=>'Ò',Oacute=>'Ó',Ocirc=>'Ô',Otilde=>'Õ',Ouml=>'Ö',times=>'×',Oslash=>'Ø',Ugrave=>'Ù',Uacute=>'Ú',
Ucirc=>'Û',Uuml=>'Ü',Yacute=>'Ý',THORN=>'Þ',szlig=>'ß',agrave=>'à',aacute=>'á',acirc=>'â',atilde=>'ã',auml=>'ä',
aring=>'å',aelig=>'æ',ccedil=>'ç',egrave=>'è',eacute=>'é',ecirc=>'ê',euml=>'ë',igrave=>'ì',iacute=>'í',icirc=>'î',i
uml=>'ï',eth=>'ð',ntilde=>'ñ',ograve=>'ò',oacute=>'ó',ocirc=>'ô',otilde=>'õ',ouml=>'ö',divide=>'÷',oslash=>'ø',
ugrave=>'ù',uacute=>'ú',ucirc=>'û',uuml=>'ü',yacute=>'ý',thorn=>'þ',yuml=>'ÿ');

$line =~ s/&(nbsp|iexcl|cent|pound|curren|yen|brvbar|sect|uml|copy|ordf|laquo|not|shy|reg|macr|deg|plusmn|
sup2|sup3|acute|micro|µpara|middot|cedil|sup1|ordm|raquo|frac14|frac12|frac34|iquest|Agrave|Aacute|Acirc|
Atilde|Auml|Aring|AElig|Ccedil|Egrave|Eacute|Ecirc|Euml|Igrave|Iacute|Icirc|Iuml|ETH|Ntilde|Ograve|Oacute|Ocirc
|Otilde|Ouml|times|Oslash|Ugrave|Uacute|Ucirc|Uuml|Yacute|THORN|szlig|agrave|aacute|acirc|atilde|auml|aring|
aelig|ccedil|egrave|eacute|ecirc|euml|igrave|acute|icirc|iuml|eth|ntilde|ograve|oacute|ocirc|otilde|ouml|divide|
oslash|ugrave|uacute|ucirc|uuml|yacute|thorn|yuml)\;/$latin{$1}/g;

 The converted dictionary from above looks like this:
latin1={"nbsp":" ", "iexcl":"¡", "cent":"¢", "pound":"£", "curren":"¤", "yen":"¥", "brvbar":"¦", "sect":"§", "uml":"¨", "copy":"©", "ordf":"ª", "laquo":"«", "not":"¬", "shy":"­", "reg":"®", "macr":"¯", "deg":"°", "plusmn":"±", "sup2":"²", "sup3":"³", "acute":"´", "micro":"µ", "para":"¶", "middot":"·", "cedil":"¸", "sup1":"¹", "ordm":"º", "raquo":"»", "frac14":"¼", "frac12":"½", "frac34":"¾", "iquest":"¿", "Agrave":"À", "Aacute":"Á", "Acirc":"Â", "Atilde":"Ã", "Auml":"Ä", "Aring":"Å", "AElig":"Æ", "Ccedil":"Ç", "Egrave":"È", "Eacute":"É", "Ecirc":"Ê", "Euml":"Ë", "Igrave":"Ì", "Iacute":"Í", "Icirc":"Î", "Iuml":"Ï", "ETH":"Ð", "Ntilde":"Ñ", "Ograve":"Ò", "Oacute":"Ó", "Ocirc":"Ô", "Otilde":"Õ", "Ouml":"Ö", "times":"×", "Oslash":"Ø", "Ugrave":"Ù", "Uacute":"Ú", "Ucirc":"Û", "Uuml":"Ü", "Yacute":"Ý", "THORN":"Þ", "szlig":"ß", "agrave":"à", "aacute":"á", "acirc":"â", "atilde":"ã", "auml":"ä", "aring":"å", "aelig":"æ", "ccedil":"ç", "egrave":"è", "eacute":"é", "ecirc":"ê", "euml":"ë", "igrave":"ì", "iacute":"í", "icirc":"î", "iuml":"ï", "eth":"ð", "ntilde":"ñ", "ograve":"ò", "oacute":"ó", "ocirc":"ô", "otilde":"õ", "ouml":"ö", "divide":"÷", "oslash":"ø", "ugrave":"ù", "uacute":"ú", "ucirc":"û", "uuml":"ü", "yacute":"ý", "thorn":"þ", "yuml":"ÿ"}

The regular expression, when finding nbsp, iexcl, cent, etc in $line, would replace it for it's value from the associative array %latin, and display the character correctly. If possible, I would like to do the same thing in Python, however I'm not sure there's that $1 variable in regular expressions.. is there a way I can replicate this behaviour in Python?
Thanks!
0
Comment
Question by:Tabris42
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 17944147
0
 

Author Comment

by:Tabris42
ID: 17946161
Sorry.. I didn't realize I posted this to the general Languages board.. meant to stick it in Python
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Windows Script Host (WSH) has been part of Windows since Windows NT4. Windows Script Host provides architecture for building dynamic scripts that consist of a core object model, scripting hosts, and scripting engines. The key components of Window…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now