Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Dealing with html tags in a deliminated text file

Posted on 2006-07-04
14
Medium Priority
?
331 Views
Last Modified: 2010-04-05
Hello all,
Im trying to parse html text into a deliminated tstringlist, which can be imported into a database or other applications. Im wondering whats the best way to handle it. I can not use quotes sense I want to keep the html tags intact. But I have been looking into escaping/unescaping the html tags,

example of "escaping":  turn "<td  class="no">" into "&lt;td  class=&quot;no&quot;&gt;"

or is it better to just leave the html intact, and just use different chars in the tstringlist deliminater & QuoteChar?
will I run into trouble with databases? MySQL and/or MSSQL
Once the data is in a database, and the data is put into a html page later down the road, will there be problems displaying the escaped html?
Or will it have to be unescaped before its put back into a html page?

Sorry this problem doesnt have an exact answer
I will up the points if needed, especially if I get a good explaination on the why and what fors :)
0
Comment
Question by:LMuadDIb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 4
14 Comments
 
LVL 28

Expert Comment

by:2266180
ID: 17040465
well .. the way I do it is to bas64encode the page when saving it to db and bas64decode it when reading it out. (but I don't use tstringlist for such operations) I would stick to this since it's best practice when it comes to special characters.

regarding the loading the html text into a tstringlist ... do you really need to have delimited and quoted text in it? if so, you can try using characters that will not appear in the html like #1 and #2 or whatever non-printable char ;) (you can define them as constants so you will not hardcode it throu your code)

give su more details on why you need the quotes and delimiters in the tstringlist, maybe there are better alternatvies
0
 
LVL 4

Author Comment

by:LMuadDIb
ID: 17042969
well,  i was thinking about deliminated text file for output...
then I can easily import them into different databases or an xml database from the deliminated text file

think of a html table, each html table tr row would be a line in the deliminated text file, and each table td cell would be deliminated & quoted on that line row
if I want table row 2-4 and table cells 3,7,8 I can easily parse the table by looping th etext file lines and using deliminated text. Is there a better way to go about this?

My text file will not just hold tables though, practically any html tag. DIV tags, List tags, Form tags etc...
Im going to check out bas64 encoding/decoding
0
 
LVL 28

Accepted Solution

by:
2266180 earned 1000 total points
ID: 17045303
I gave the matter some thinking and this is what I came to. You can have one of the following 2 cases:
1) have the html text saved with no delimiters and such (bas64 encoded or url encoded or etc).
in this case you will always have to delimit the html text before loading it into a tstringlist
2) have the html text saved with delimiters and such (again bas64 encoded or url encoded or etc)
in this case you will always have to remove the delimiters and such before loading the html into a html display so,ething (like a twebbrowser or serving it from server side to a client, etc)

you will have to decide which operations are done more often: loading into a tstringlist and needing delimiters or sending to some html display. after this, you pick the option that is more suitable for your case :)
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 4

Author Comment

by:LMuadDIb
ID: 17048069
got a question about bas64 encoding

is this a standard encoding/decoding across platforms?
if I Encode the strings, will it be compatible if someone using .net or linux system be able to decode it without knowing how I encoded?
this might be a stupid question on my part, so try not to laugh at me lol =)
0
 
LVL 28

Expert Comment

by:2266180
ID: 17048785
I am not lauching. it is perfectly normal to ask questions about something you don't know ;)

bas64 encoding/decoding is a standard (RFC 2152). you can read more (general) info here: http://en.wikipedia.org/wiki/Base64
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 17051135
Why do you have to encode the HTML text at all? I see no reason to do this as it
won't make a difference to the database, or the TStringList, what the HTML Text is.

If I were you, I'd use the freeware FastHTML Parser at:
http://www.jazarsoft.com/main.php

This way, you can catch each tag in the OnFoundTag event
and place it in the Stringlist. Be aware, though, that this control
removes the < and > from the tags.



0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 17051156
Since his site is having problems, I suggest downloading it from Torry's:
http://www.torry.net/vcl/internet/html/jshtmpsr.zip
0
 
LVL 4

Author Comment

by:LMuadDIb
ID: 17051541
the encoding is needed because I will use the html text in a xml file as well as html web pages
the data will be stored in a database, but at times in a xml file directly

I built my own html parser component, but its xml based
It allows me alot more control in parsing the html tags then a standard html parser, so instead of dealing with strings I parse by nodes
I know it will not be the fastest, but the ease of use makes up for it

And Im working on a subcomponent that will provide basic output for it (deliminated text file)
0
 
LVL 26

Assisted Solution

by:EddieShipman
EddieShipman earned 400 total points
ID: 17052477
OH, well the XML standard basically says that in order to store formatted data like
HTML or XML, you must BASE64 encode and enclose in a CDATA Section.

0
 
LVL 28

Expert Comment

by:2266180
ID: 17052604
I don't think he meant including formatted data into XML but using AS XML :) at least that is what I understood
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 17053429
[quote]I will use the html text in a xml file as well as html web pages[/quote]

I read differently...
0
 
LVL 28

Expert Comment

by:2266180
ID: 17053665
my bad :)
0
 
LVL 4

Author Comment

by:LMuadDIb
ID: 17053760
actually both :)
I will use the html as xml, but primarily the html will be inserted into a xml node

thanx for your time
0
 
LVL 4

Author Comment

by:LMuadDIb
ID: 17053766
.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…
Suggested Courses

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question