Solved

Format html code

Posted on 2001-06-20
36
220 Views
Last Modified: 2010-04-02
I want a function that formats the html table code just like frontpage.. like this

<TABLE>
  <TBODY>
     <TR>
       <TD>
           SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK
           DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS
           LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF
       </TD>
     </TR>
  </TBODY>
</TABLE>

It has to be very fast, and use cstring, the code i want to pass to the function has no leading spaces. The code should also never linebreak in a sentence like:
<font co
lor="red">

this is right:
<font
color="red">

You understand right???


0
Comment
Question by:neo23
  • 12
  • 8
  • 5
  • +4
36 Comments
 
LVL 2

Expert Comment

by:smitty1276
Comment Utility
You already have the code to parse out the tags... right?
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
Please give an example of the input.
Like what is the raw data?
You gave us a good example of what you want the results to look like, but we need an equally good example of what you're starting with.
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
I should have suggested to you in your previous question that you might get better results if you post your MFC questions on the MFC topic area.

Since you already posted this question here, you  can post a question in the MFC topic area with zero points, and put a link to this question in your zero point question.
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
Hi

it's very simple app... is this homework or not?

i have some questions about it:

1. <![CDATA[ ]]> section possible ??
2. is HTML correct ?
3. is HTML wellformed ? ( <tag attr=value> possible? or have all attribute values a quotes ? <tag attr="value"> and <tag attr='value'> )

Best regards
Andrey

0
 

Author Comment

by:neo23
Comment Utility
All tag attributes like <TABLE> are in uppercase, all the html code comes without linebreaks. The code comes directly from mshtml.dll. You just have to help me format the tables..nothing else..
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
Is this homework?
0
 

Author Comment

by:neo23
Comment Utility
Nope..I have not been in school since 1995
0
 

Author Comment

by:neo23
Comment Utility
Just a VB dude switchin to C, thats all..
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
I should have figured out this wasn't homework, because you're using CString.  But it doesn't hurt to ask.
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
>>help me format the tables..nothing else

So exactly what does the raw data look like:
Example:
<TABLE><TBODY><TR><TD>SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF</TD></TR></TBODY></TABLE>
0
 

Author Comment

by:neo23
Comment Utility
Yes, just like that
0
 

Author Comment

by:neo23
Comment Utility
But the important thing to remember is that it can be a table within a table..if that is the case, that table should go one step further to the right than the mother table.. you get me right??

<TABLE>
 <TBODY>
    <TR>
      <TD>
          SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK
          DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS
          LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF
      </TD>
      <TD>
          <TABLE>
             <TBODY>
                <TR>
                   <TD>
                    SDAKFOLASDKFLOASDFKLOASDFKLASDFK
                    LASDKFLOKASDFOLKASDL
                   </TD>
                </TR>
             </TBODY>
           </TABLE>
      </TD>
    </TR>
 </TBODY>
</TABLE>
0
 

Author Comment

by:neo23
Comment Utility
Ive asked a whole lot of programmers this, but they say its to hard, maybe it cant be done or you need to ask some hacker or something..
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
Just out of curiousity, why do you want to do this?
It's going to look the same via HTML browswer.
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
>>maybe it cant be done

It can be done, but it takes some good logic.  Need good algorithm.
0
 

Author Comment

by:neo23
Comment Utility
Can u do it? I know its gonna look the same, its for a text editor im working on.
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
My schedule is tight this week, but if I have time, I'll give it a shot.
0
 

Author Comment

by:neo23
Comment Utility
Can u do it? I know its gonna look the same, its for a text editor im working on.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:neo23
Comment Utility
Can u do it? I know its gonna look the same, its for a text editor im working on.
0
 
LVL 9

Expert Comment

by:Pacman
Comment Utility
listening and waiting for Axter's code ...
(hope it looks good).
0
 
LVL 9

Expert Comment

by:Pacman
Comment Utility
> Ive asked a whole lot of programmers this, but they say its to hard,

professionals or hobby programmers?  ;-)
0
 
LVL 9

Expert Comment

by:Pacman
Comment Utility
I'm just asking because the statement have different meanings:

a) said by hobby programmer
=> means: I don't know how to do this.

b) said by professional programmer
=> means: I don't have the time to do this.

;-)
0
 

Author Comment

by:neo23
Comment Utility
both I guess :)
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
Read it into a browser, access the HTML DOM.  THen you can easily walk the table objects.  AN interesting challenge, nay, and actual intellectual exercise, but not worth doing.
0
 

Author Comment

by:neo23
Comment Utility
Yes it is, cause DOM is sloooow when you walk large sized docs.. I mean..its not that big deal. I could do it in five minutes in vb.. Remember everything is on a line, all code is on one line. Just find the first set of <TABLE><TBODY><TR><TD> and outdent it, keep track of it if there is no </TBODY></TABLE> outdent the next <TABLE><TBODY><TR><TD> one step further to the right. Why is everything so hard to do in C++?? I could do it myself using the CString library, but it tends to slow down..
0
 
LVL 9

Expert Comment

by:Pacman
Comment Utility
>> I could do it myself using the CString library, but it tends to slow down..

CString is to slow ?
There must be something wrong with your algorithm.
If you need good performance then what about writing a parser?

This is a very simple parser-application.
"building compilers lesson 1".

Build a grammar and parse the text token for token.
Then create output with indented text.

But this is pretty much work.
Maybe there's someone out there doing this for you.
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
You can use XML parser ... i think... and some serializers support this formating.
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
try MSXML
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
>>...I could do it in five minutes in vb..

Write it in VB, and post the code here (it'll be less than 20 lines if you can write it in 5 minutes). I'll provide a translation in 3 minutes.

-- Dan
0
 
LVL 7

Expert Comment

by:KangaRoo
Comment Utility
It is not that simple, HTML allows some elements to be with or without closing tag, ie

<div>
   <p>Sometext
</div>

and
<div>
   <p>
      Sometext
   </p>
</div>

are equivalent. Then there are the comments, basically a comment is one big tag....
0
 
LVL 2

Accepted Solution

by:
smitty1276 earned 170 total points
Comment Utility

You are going to have to have a linked list of tags and everything contained within them.  Well... actually it would be a tree.

Each node has a flag to indicate whether it contains information about an HTML tag or text information within other tags.  If it contains a tag, it contains the text of the tag ("<TABLE>").  Otherwise it will simply contain text.  Each node will also contain an ordered array of pointers to sub-nodes (or NULL if none exist).

Example... HTML code:
<p>This is a <b>test</b>.</p>

Data structure would be a node containing the "<p>" tag.
-The first pointer in the pointer array would contain a pointer to a sub-node which contained the text "This is a ".  
-The second pointer in the array would point to a sub-node containing the tag "<b>", which itself would have a pointer to a sub-node containing the text "test".
-The third pointer in the array would simply contain the remaining text "." .  When you run out of sub-nodes, you know to go ahead and print the "</p>" tag.

You could store entire HTML documents in this way.

struct node
{
  node *parent;
  int   indent_spaces; //see note below

  int  type; //1 = tag info, 2 = text, NULL = toplevel
  char *text;  //text or tag
 

  int  sub_count;
  node **subs; //pointer array
};

Check the type of node... if it is a tag info node, set the indent_spaces to parent->indent_spaces + 2.  If it is a text node, set the indent_spaces equal to the parent->indent_spaces.

Top level node would contain the entire document... init the node to type=NULL, text=NULL, and set the sub-node pointers to the tags contained (possibly only <HTML>, which would contain maybe 2 pointers to <HEAD> AND <BODY>).
0
 
LVL 2

Expert Comment

by:smitty1276
Comment Utility
BTW... certain, specific tags that do not use a closing tag should always be treated as text... "<BR>" as an example... since they don't contain any information and therefore will have no sub-nodes.  "&nbsp;" might be another example.

It would help to have a file containing all of the tags that may or MAY NOT have closing tags.  That way you can assume that any tag that isn't on the list WILL DEFINATELY have a closing tag.
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
Well all you really need to search for is "<TABLE" and then process until you get to a matching "/TABLE>"  As you encounter <TR and <TD, you do some indenting.  just ignore <DIV et. al

sHtml= sHtmlOrig;
sHtml.MakeUpper();

int nOffset= sHtml.Find( "<TABLE" );

Now you can use sSubStr= sHtml.Mid( nOffset ) to get a string that starts at the start of the table, and you can scan forward with sSubStr.Find("whatever").  Remember to take the output from the original since the working set has been unshifted for simplicty of comparison.

-- Dan

0
 

Author Comment

by:neo23
Comment Utility
DanRollins you are on the right way, Im just new to C, could you show me how... just a code sample..
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
>>could you show me how... just a code sample..

Well, I could write your program for you but that would take away all of your fun!  I've got a better idea...

I suggest that you try it yourself.  It will be a good chance for you to learn about the CString data type.  If you run into any snags, just post a question here.  I'll be glad to help.  As I said, I'll even coach you in how to convert a VB function.

-- Dan
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
hi neo23,
Do you have any additional questions?  Do any comments need clarification?

-- Dan
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now