Link to home
Start Free TrialLog in
Avatar of sayhi
sayhi

asked on

converting to html entities

hi,
here is what i want to do: covert words/characters to html entities i.e. email = emal and also a webpage or file (which is encoded in HTML not text)..... is there a program out there that does this, so u don't have to do it manually?
what i want to do is type in a word or point to a URL or file and have the program automatically convert for me.

if no program is availabe, maybe i can make a perl script (though i'm still learning perl/cgi) and submit it via a form and have the script do it for me...

well, any help appreciated. thanks.
Avatar of chewymon
chewymon

Avatar of sayhi

ASKER

nah, i seen that already, it's unencoding and requires javascript.
ASKER CERTIFIED SOLUTION
Avatar of dmaryakh
dmaryakh

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sayhi

ASKER

@ dmaryakh

hey, i think this may be what i'm looking for! BUT i don't know how to use it or make it work. i downloaded omnimark and the unimap.zip from the site... i just dunno how to start converting...
Avatar of sayhi

ASKER

@ dmaryakh

hey, i think this may be what i'm looking for! BUT i don't know how to use it or make it work. i downloaded omnimark and the unimap.zip from the site... i just dunno how to start converting...
unimap.xin is the library that has all the functions needed for conversion. I haven't worked with it just yet, but by briefly looking at it I would say you need to use get-unicode-values function. Let me play with this library a little, and I will post you a code that you can use for conversion later today.
Avatar of sayhi

ASKER

Adjusted points to 150
Avatar of sayhi

ASKER

increased points to 160
Avatar of sayhi

ASKER

Adjusted points to 160
OK, sorry it took me longer than I anticipated, but here it is. The script below converts the file that has character entities into HTML entities example: "some … ˜ data & more …" into:
some … ˜ data & more …




place the code below in to a file: converter.xom
-------------------------------

cross-translate
include "unimap.xin"


        find "&" ((LOOKAHEAD NOT WORD-END)ANY)+  => temp ";"
            local counter tester
            local counter unicode-values variable initial-size 0
            set tester to get-unicode-values for "%x(temp)" into unicode-values
            output "&#%d(unicode-values);"

        find any=>tmp
            output "%x(tmp)"
   

To execute, from the command line type:
omnimark -s converter.xom input.file -of output.file -l log.file

where:
  input.file - your original file
  output.file - converted file
  log.file  - log of any omnimark messages (shouldn't be any)


----------------------------------

P.S. I understand that this not 100%what you were asking, but this script could be modified to do what you are asking for by:
1) locate the following string in the begining of unimap.xin  
global counter unicode-entities variable initial {
    "57928" with key "angzarr",
     ...

add the unicode values to the ascii characters in the same manner:

"101" with key "e",
....


2) locate the following rule in convert.xom:
        find any=>tmp
            output "%x(tmp)"

you would need to replace it with
        find any=>tmp
            local counter tester
            local counter unicode-values variable initial-size 0
            set tester to get-unicode-values for "%x(tmp)" into unicode-values
            output "&#%d(unicode-values);"

       
============================

I forgot to mention that the the way the scrips are set-up right now, they all need to be in the same directory (input.file, converter.xom, unimap.xin)
Avatar of sayhi

ASKER

Thanks a million! I greatly appreciate all your help! Thanks again.