Format html code

I want a function that formats the html table code just like frontpage.. like this

<TABLE>
  <TBODY>
     <TR>
       <TD>
           SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK
           DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS
           LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF
       </TD>
     </TR>
  </TBODY>
</TABLE>

It has to be very fast, and use cstring, the code i want to pass to the function has no leading spaces. The code should also never linebreak in a sentence like:
<font co
lor="red">

this is right:
<font
color="red">

You understand right???


neo23Asked:
Who is Participating?
 
smitty1276Connect With a Mentor Commented:

You are going to have to have a linked list of tags and everything contained within them.  Well... actually it would be a tree.

Each node has a flag to indicate whether it contains information about an HTML tag or text information within other tags.  If it contains a tag, it contains the text of the tag ("<TABLE>").  Otherwise it will simply contain text.  Each node will also contain an ordered array of pointers to sub-nodes (or NULL if none exist).

Example... HTML code:
<p>This is a <b>test</b>.</p>

Data structure would be a node containing the "<p>" tag.
-The first pointer in the pointer array would contain a pointer to a sub-node which contained the text "This is a ".  
-The second pointer in the array would point to a sub-node containing the tag "<b>", which itself would have a pointer to a sub-node containing the text "test".
-The third pointer in the array would simply contain the remaining text "." .  When you run out of sub-nodes, you know to go ahead and print the "</p>" tag.

You could store entire HTML documents in this way.

struct node
{
  node *parent;
  int   indent_spaces; //see note below

  int  type; //1 = tag info, 2 = text, NULL = toplevel
  char *text;  //text or tag
 

  int  sub_count;
  node **subs; //pointer array
};

Check the type of node... if it is a tag info node, set the indent_spaces to parent->indent_spaces + 2.  If it is a text node, set the indent_spaces equal to the parent->indent_spaces.

Top level node would contain the entire document... init the node to type=NULL, text=NULL, and set the sub-node pointers to the tags contained (possibly only <HTML>, which would contain maybe 2 pointers to <HEAD> AND <BODY>).
0
 
smitty1276Commented:
You already have the code to parse out the tags... right?
0
 
AxterCommented:
Please give an example of the input.
Like what is the raw data?
You gave us a good example of what you want the results to look like, but we need an equally good example of what you're starting with.
0
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

 
AxterCommented:
I should have suggested to you in your previous question that you might get better results if you post your MFC questions on the MFC topic area.

Since you already posted this question here, you  can post a question in the MFC topic area with zero points, and put a link to this question in your zero point question.
0
 
Andrey_KulikCommented:
Hi

it's very simple app... is this homework or not?

i have some questions about it:

1. <![CDATA[ ]]> section possible ??
2. is HTML correct ?
3. is HTML wellformed ? ( <tag attr=value> possible? or have all attribute values a quotes ? <tag attr="value"> and <tag attr='value'> )

Best regards
Andrey

0
 
neo23Author Commented:
All tag attributes like <TABLE> are in uppercase, all the html code comes without linebreaks. The code comes directly from mshtml.dll. You just have to help me format the tables..nothing else..
0
 
AxterCommented:
Is this homework?
0
 
neo23Author Commented:
Nope..I have not been in school since 1995
0
 
neo23Author Commented:
Just a VB dude switchin to C, thats all..
0
 
AxterCommented:
I should have figured out this wasn't homework, because you're using CString.  But it doesn't hurt to ask.
0
 
AxterCommented:
>>help me format the tables..nothing else

So exactly what does the raw data look like:
Example:
<TABLE><TBODY><TR><TD>SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF</TD></TR></TBODY></TABLE>
0
 
neo23Author Commented:
Yes, just like that
0
 
neo23Author Commented:
But the important thing to remember is that it can be a table within a table..if that is the case, that table should go one step further to the right than the mother table.. you get me right??

<TABLE>
 <TBODY>
    <TR>
      <TD>
          SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK
          DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS
          LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF
      </TD>
      <TD>
          <TABLE>
             <TBODY>
                <TR>
                   <TD>
                    SDAKFOLASDKFLOASDFKLOASDFKLASDFK
                    LASDKFLOKASDFOLKASDL
                   </TD>
                </TR>
             </TBODY>
           </TABLE>
      </TD>
    </TR>
 </TBODY>
</TABLE>
0
 
neo23Author Commented:
Ive asked a whole lot of programmers this, but they say its to hard, maybe it cant be done or you need to ask some hacker or something..
0
 
AxterCommented:
Just out of curiousity, why do you want to do this?
It's going to look the same via HTML browswer.
0
 
AxterCommented:
>>maybe it cant be done

It can be done, but it takes some good logic.  Need good algorithm.
0
 
neo23Author Commented:
Can u do it? I know its gonna look the same, its for a text editor im working on.
0
 
AxterCommented:
My schedule is tight this week, but if I have time, I'll give it a shot.
0
 
neo23Author Commented:
Can u do it? I know its gonna look the same, its for a text editor im working on.
0
 
neo23Author Commented:
Can u do it? I know its gonna look the same, its for a text editor im working on.
0
 
PacmanCommented:
listening and waiting for Axter's code ...
(hope it looks good).
0
 
PacmanCommented:
> Ive asked a whole lot of programmers this, but they say its to hard,

professionals or hobby programmers?  ;-)
0
 
PacmanCommented:
I'm just asking because the statement have different meanings:

a) said by hobby programmer
=> means: I don't know how to do this.

b) said by professional programmer
=> means: I don't have the time to do this.

;-)
0
 
neo23Author Commented:
both I guess :)
0
 
DanRollinsCommented:
Read it into a browser, access the HTML DOM.  THen you can easily walk the table objects.  AN interesting challenge, nay, and actual intellectual exercise, but not worth doing.
0
 
neo23Author Commented:
Yes it is, cause DOM is sloooow when you walk large sized docs.. I mean..its not that big deal. I could do it in five minutes in vb.. Remember everything is on a line, all code is on one line. Just find the first set of <TABLE><TBODY><TR><TD> and outdent it, keep track of it if there is no </TBODY></TABLE> outdent the next <TABLE><TBODY><TR><TD> one step further to the right. Why is everything so hard to do in C++?? I could do it myself using the CString library, but it tends to slow down..
0
 
PacmanCommented:
>> I could do it myself using the CString library, but it tends to slow down..

CString is to slow ?
There must be something wrong with your algorithm.
If you need good performance then what about writing a parser?

This is a very simple parser-application.
"building compilers lesson 1".

Build a grammar and parse the text token for token.
Then create output with indented text.

But this is pretty much work.
Maybe there's someone out there doing this for you.
0
 
Andrey_KulikCommented:
You can use XML parser ... i think... and some serializers support this formating.
0
 
Andrey_KulikCommented:
try MSXML
0
 
DanRollinsCommented:
>>...I could do it in five minutes in vb..

Write it in VB, and post the code here (it'll be less than 20 lines if you can write it in 5 minutes). I'll provide a translation in 3 minutes.

-- Dan
0
 
KangaRooCommented:
It is not that simple, HTML allows some elements to be with or without closing tag, ie

<div>
   <p>Sometext
</div>

and
<div>
   <p>
      Sometext
   </p>
</div>

are equivalent. Then there are the comments, basically a comment is one big tag....
0
 
smitty1276Commented:
BTW... certain, specific tags that do not use a closing tag should always be treated as text... "<BR>" as an example... since they don't contain any information and therefore will have no sub-nodes.  "&nbsp;" might be another example.

It would help to have a file containing all of the tags that may or MAY NOT have closing tags.  That way you can assume that any tag that isn't on the list WILL DEFINATELY have a closing tag.
0
 
DanRollinsCommented:
Well all you really need to search for is "<TABLE" and then process until you get to a matching "/TABLE>"  As you encounter <TR and <TD, you do some indenting.  just ignore <DIV et. al

sHtml= sHtmlOrig;
sHtml.MakeUpper();

int nOffset= sHtml.Find( "<TABLE" );

Now you can use sSubStr= sHtml.Mid( nOffset ) to get a string that starts at the start of the table, and you can scan forward with sSubStr.Find("whatever").  Remember to take the output from the original since the working set has been unshifted for simplicty of comparison.

-- Dan

0
 
neo23Author Commented:
DanRollins you are on the right way, Im just new to C, could you show me how... just a code sample..
0
 
DanRollinsCommented:
>>could you show me how... just a code sample..

Well, I could write your program for you but that would take away all of your fun!  I've got a better idea...

I suggest that you try it yourself.  It will be a good chance for you to learn about the CString data type.  If you run into any snags, just post a question here.  I'll be glad to help.  As I said, I'll even coach you in how to convert a VB function.

-- Dan
0
 
DanRollinsCommented:
hi neo23,
Do you have any additional questions?  Do any comments need clarification?

-- Dan
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.