converting to html entities

hi,
here is what i want to do: covert words/characters to html entities i.e. email = emal and also a webpage or file (which is encoded in HTML not text)..... is there a program out there that does this, so u don't have to do it manually?
what i want to do is type in a word or point to a URL or file and have the program automatically convert for me.

if no program is availabe, maybe i can make a perl script (though i'm still learning perl/cgi) and submit it via a form and have the script do it for me...

well, any help appreciated. thanks.
LVL 1
sayhiAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

chewymonCommented:
0
sayhiAuthor Commented:
nah, i seen that already, it's unencoding and requires javascript.
0
dmaryakhCommented:
Check out http://www.omnimark.com 
You can definately do this through OmniMark Language. This language is free, easy to learn and (in most cases) more usefull than Perl especially for HTML/SGML/XML processing



Here is a complete UNICODE<=>Character entity OmniMark conversion utility that you can use
http://www.xmeta.com/omlette/index.html


http://www.w3.org/TR/WD-html40-970708/sgml/entities.html#h-10.5.1
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

sayhiAuthor Commented:
@ dmaryakh

hey, i think this may be what i'm looking for! BUT i don't know how to use it or make it work. i downloaded omnimark and the unimap.zip from the site... i just dunno how to start converting...
0
sayhiAuthor Commented:
@ dmaryakh

hey, i think this may be what i'm looking for! BUT i don't know how to use it or make it work. i downloaded omnimark and the unimap.zip from the site... i just dunno how to start converting...
0
dmaryakhCommented:
unimap.xin is the library that has all the functions needed for conversion. I haven't worked with it just yet, but by briefly looking at it I would say you need to use get-unicode-values function. Let me play with this library a little, and I will post you a code that you can use for conversion later today.
0
sayhiAuthor Commented:
Adjusted points to 150
0
sayhiAuthor Commented:
increased points to 160
0
sayhiAuthor Commented:
Adjusted points to 160
0
dmaryakhCommented:
OK, sorry it took me longer than I anticipated, but here it is. The script below converts the file that has character entities into HTML entities example: "some &hellip; &tilde; data &amp; more &hellip;" into:
some &#8230; &#732; data &#38; more &#8230;




place the code below in to a file: converter.xom
-------------------------------

cross-translate
include "unimap.xin"


        find "&" ((LOOKAHEAD NOT WORD-END)ANY)+  => temp ";"
            local counter tester
            local counter unicode-values variable initial-size 0
            set tester to get-unicode-values for "%x(temp)" into unicode-values
            output "&#%d(unicode-values);"

        find any=>tmp
            output "%x(tmp)"
   

To execute, from the command line type:
omnimark -s converter.xom input.file -of output.file -l log.file

where:
  input.file - your original file
  output.file - converted file
  log.file  - log of any omnimark messages (shouldn't be any)


----------------------------------

P.S. I understand that this not 100%what you were asking, but this script could be modified to do what you are asking for by:
1) locate the following string in the begining of unimap.xin  
global counter unicode-entities variable initial {
    "57928" with key "angzarr",
     ...

add the unicode values to the ascii characters in the same manner:

"101" with key "e",
....


2) locate the following rule in convert.xom:
        find any=>tmp
            output "%x(tmp)"

you would need to replace it with
        find any=>tmp
            local counter tester
            local counter unicode-values variable initial-size 0
            set tester to get-unicode-values for "%x(tmp)" into unicode-values
            output "&#%d(unicode-values);"

       
============================

0
dmaryakhCommented:
I forgot to mention that the the way the scrips are set-up right now, they all need to be in the same directory (input.file, converter.xom, unimap.xin)
0
sayhiAuthor Commented:
Thanks a million! I greatly appreciate all your help! Thanks again.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Development Software

From novice to tech pro — start learning today.