Solved

Replacing ascii with latin-1 characters with a dictionary and regex?

Posted on 2006-11-14
2
461 Views
Last Modified: 2012-06-21
In Perl I used this for replacing ascii text with latin-1 characters:

my %latin=(nbsp=>' ',iexcl=>'¡',cent=>'¢',pound=>'£',curren=>'¤',yen=>'¥',brvbar=>'¦',sect=>'§',uml=>'¨',copy=>'©',ordf=>'ª',laquo=>'«',
not=>'¬',shy=>'­',reg=>'®',macr=>'¯',deg=>'°',plusmn=>'±',sup2=>'²',sup3=>'³',acute=>'´',micro=>'µ',
para=>'¶',middot=>'·',cedil=>'¸',sup1=>'¹',ordm=>'º',raquo=>'»',frac14=>'¼',frac12=>'½',frac34=>'¾',
iquest=>'¿',Agrave=>'À',Aacute=>'Á',Acirc=>'Â',Atilde=>'Ã',Auml=>'Ä',Aring=>'Å',AElig=>'Æ',Ccedil=>'Ç',
Egrave=>'È',Eacute=>'É',Ecirc=>'Ê',Euml=>'Ë',Igrave=>'Ì',Iacute=>'Í',Icirc=>'Î',Iuml=>'Ï',ETH=>'Ð',Ntilde=>'Ñ',
Ograve=>'Ò',Oacute=>'Ó',Ocirc=>'Ô',Otilde=>'Õ',Ouml=>'Ö',times=>'×',Oslash=>'Ø',Ugrave=>'Ù',Uacute=>'Ú',
Ucirc=>'Û',Uuml=>'Ü',Yacute=>'Ý',THORN=>'Þ',szlig=>'ß',agrave=>'à',aacute=>'á',acirc=>'â',atilde=>'ã',auml=>'ä',
aring=>'å',aelig=>'æ',ccedil=>'ç',egrave=>'è',eacute=>'é',ecirc=>'ê',euml=>'ë',igrave=>'ì',iacute=>'í',icirc=>'î',i
uml=>'ï',eth=>'ð',ntilde=>'ñ',ograve=>'ò',oacute=>'ó',ocirc=>'ô',otilde=>'õ',ouml=>'ö',divide=>'÷',oslash=>'ø',
ugrave=>'ù',uacute=>'ú',ucirc=>'û',uuml=>'ü',yacute=>'ý',thorn=>'þ',yuml=>'ÿ');

$line =~ s/&(nbsp|iexcl|cent|pound|curren|yen|brvbar|sect|uml|copy|ordf|laquo|not|shy|reg|macr|deg|plusmn|
sup2|sup3|acute|micro|µpara|middot|cedil|sup1|ordm|raquo|frac14|frac12|frac34|iquest|Agrave|Aacute|Acirc|
Atilde|Auml|Aring|AElig|Ccedil|Egrave|Eacute|Ecirc|Euml|Igrave|Iacute|Icirc|Iuml|ETH|Ntilde|Ograve|Oacute|Ocirc
|Otilde|Ouml|times|Oslash|Ugrave|Uacute|Ucirc|Uuml|Yacute|THORN|szlig|agrave|aacute|acirc|atilde|auml|aring|
aelig|ccedil|egrave|eacute|ecirc|euml|igrave|acute|icirc|iuml|eth|ntilde|ograve|oacute|ocirc|otilde|ouml|divide|
oslash|ugrave|uacute|ucirc|uuml|yacute|thorn|yuml)\;/$latin{$1}/g;

 The converted dictionary from above looks like this:
latin1={"nbsp":" ", "iexcl":"¡", "cent":"¢", "pound":"£", "curren":"¤", "yen":"¥", "brvbar":"¦", "sect":"§", "uml":"¨", "copy":"©", "ordf":"ª", "laquo":"«", "not":"¬", "shy":"­", "reg":"®", "macr":"¯", "deg":"°", "plusmn":"±", "sup2":"²", "sup3":"³", "acute":"´", "micro":"µ", "para":"¶", "middot":"·", "cedil":"¸", "sup1":"¹", "ordm":"º", "raquo":"»", "frac14":"¼", "frac12":"½", "frac34":"¾", "iquest":"¿", "Agrave":"À", "Aacute":"Á", "Acirc":"Â", "Atilde":"Ã", "Auml":"Ä", "Aring":"Å", "AElig":"Æ", "Ccedil":"Ç", "Egrave":"È", "Eacute":"É", "Ecirc":"Ê", "Euml":"Ë", "Igrave":"Ì", "Iacute":"Í", "Icirc":"Î", "Iuml":"Ï", "ETH":"Ð", "Ntilde":"Ñ", "Ograve":"Ò", "Oacute":"Ó", "Ocirc":"Ô", "Otilde":"Õ", "Ouml":"Ö", "times":"×", "Oslash":"Ø", "Ugrave":"Ù", "Uacute":"Ú", "Ucirc":"Û", "Uuml":"Ü", "Yacute":"Ý", "THORN":"Þ", "szlig":"ß", "agrave":"à", "aacute":"á", "acirc":"â", "atilde":"ã", "auml":"ä", "aring":"å", "aelig":"æ", "ccedil":"ç", "egrave":"è", "eacute":"é", "ecirc":"ê", "euml":"ë", "igrave":"ì", "iacute":"í", "icirc":"î", "iuml":"ï", "eth":"ð", "ntilde":"ñ", "ograve":"ò", "oacute":"ó", "ocirc":"ô", "otilde":"õ", "ouml":"ö", "divide":"÷", "oslash":"ø", "ugrave":"ù", "uacute":"ú", "ucirc":"û", "uuml":"ü", "yacute":"ý", "thorn":"þ", "yuml":"ÿ"}

The regular expression, when finding nbsp, iexcl, cent, etc in $line, would replace it for it's value from the associative array %latin, and display the character correctly. If possible, I would like to do the same thing in Python, however I'm not sure there's that $1 variable in regular expressions.. is there a way I can replicate this behaviour in Python?
Thanks!
0
Comment
Question by:Tabris42
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 17944147
0
 

Author Comment

by:Tabris42
ID: 17946161
Sorry.. I didn't realize I posted this to the general Languages board.. meant to stick it in Python
0

Featured Post

The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Windows Script Host (WSH) has been part of Windows since Windows NT4. Windows Script Host provides architecture for building dynamic scripts that consist of a core object model, scripting hosts, and scripting engines. The key components of Window…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question