Link to home
Start Free TrialLog in
Avatar of neo23
neo23

asked on

Format html code

I want a function that formats the html table code just like frontpage.. like this

<TABLE>
  <TBODY>
     <TR>
       <TD>
           SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK
           DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS
           LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF
       </TD>
     </TR>
  </TBODY>
</TABLE>

It has to be very fast, and use cstring, the code i want to pass to the function has no leading spaces. The code should also never linebreak in a sentence like:
<font co
lor="red">

this is right:
<font
color="red">

You understand right???


Avatar of smitty1276
smitty1276

You already have the code to parse out the tags... right?
Avatar of Axter
Please give an example of the input.
Like what is the raw data?
You gave us a good example of what you want the results to look like, but we need an equally good example of what you're starting with.
I should have suggested to you in your previous question that you might get better results if you post your MFC questions on the MFC topic area.

Since you already posted this question here, you  can post a question in the MFC topic area with zero points, and put a link to this question in your zero point question.
Hi

it's very simple app... is this homework or not?

i have some questions about it:

1. <![CDATA[ ]]> section possible ??
2. is HTML correct ?
3. is HTML wellformed ? ( <tag attr=value> possible? or have all attribute values a quotes ? <tag attr="value"> and <tag attr='value'> )

Best regards
Andrey

Avatar of neo23

ASKER

All tag attributes like <TABLE> are in uppercase, all the html code comes without linebreaks. The code comes directly from mshtml.dll. You just have to help me format the tables..nothing else..
Is this homework?
Avatar of neo23

ASKER

Nope..I have not been in school since 1995
Avatar of neo23

ASKER

Just a VB dude switchin to C, thats all..
I should have figured out this wasn't homework, because you're using CString.  But it doesn't hurt to ask.
>>help me format the tables..nothing else

So exactly what does the raw data look like:
Example:
<TABLE><TBODY><TR><TD>SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF</TD></TR></TBODY></TABLE>
Avatar of neo23

ASKER

Yes, just like that
Avatar of neo23

ASKER

But the important thing to remember is that it can be a table within a table..if that is the case, that table should go one step further to the right than the mother table.. you get me right??

<TABLE>
 <TBODY>
    <TR>
      <TD>
          SDAKFOLASDKFLOASDFKLOASDFKLOASDKFLOASDFK
          DSAFLOASDKFLASKDFOLAKSDFLOASDKFOAKSLDFKOAS
          LASDKFLOKASDFOLKASDLOFKASDOFKLAOSDFKLOASDKLF
      </TD>
      <TD>
          <TABLE>
             <TBODY>
                <TR>
                   <TD>
                    SDAKFOLASDKFLOASDFKLOASDFKLASDFK
                    LASDKFLOKASDFOLKASDL
                   </TD>
                </TR>
             </TBODY>
           </TABLE>
      </TD>
    </TR>
 </TBODY>
</TABLE>
Avatar of neo23

ASKER

Ive asked a whole lot of programmers this, but they say its to hard, maybe it cant be done or you need to ask some hacker or something..
Just out of curiousity, why do you want to do this?
It's going to look the same via HTML browswer.
>>maybe it cant be done

It can be done, but it takes some good logic.  Need good algorithm.
Avatar of neo23

ASKER

Can u do it? I know its gonna look the same, its for a text editor im working on.
My schedule is tight this week, but if I have time, I'll give it a shot.
Avatar of neo23

ASKER

Can u do it? I know its gonna look the same, its for a text editor im working on.
Avatar of neo23

ASKER

Can u do it? I know its gonna look the same, its for a text editor im working on.
listening and waiting for Axter's code ...
(hope it looks good).
> Ive asked a whole lot of programmers this, but they say its to hard,

professionals or hobby programmers?  ;-)
I'm just asking because the statement have different meanings:

a) said by hobby programmer
=> means: I don't know how to do this.

b) said by professional programmer
=> means: I don't have the time to do this.

;-)
Avatar of neo23

ASKER

both I guess :)
Read it into a browser, access the HTML DOM.  THen you can easily walk the table objects.  AN interesting challenge, nay, and actual intellectual exercise, but not worth doing.
Avatar of neo23

ASKER

Yes it is, cause DOM is sloooow when you walk large sized docs.. I mean..its not that big deal. I could do it in five minutes in vb.. Remember everything is on a line, all code is on one line. Just find the first set of <TABLE><TBODY><TR><TD> and outdent it, keep track of it if there is no </TBODY></TABLE> outdent the next <TABLE><TBODY><TR><TD> one step further to the right. Why is everything so hard to do in C++?? I could do it myself using the CString library, but it tends to slow down..
>> I could do it myself using the CString library, but it tends to slow down..

CString is to slow ?
There must be something wrong with your algorithm.
If you need good performance then what about writing a parser?

This is a very simple parser-application.
"building compilers lesson 1".

Build a grammar and parse the text token for token.
Then create output with indented text.

But this is pretty much work.
Maybe there's someone out there doing this for you.
You can use XML parser ... i think... and some serializers support this formating.
try MSXML
>>...I could do it in five minutes in vb..

Write it in VB, and post the code here (it'll be less than 20 lines if you can write it in 5 minutes). I'll provide a translation in 3 minutes.

-- Dan
It is not that simple, HTML allows some elements to be with or without closing tag, ie

<div>
   <p>Sometext
</div>

and
<div>
   <p>
      Sometext
   </p>
</div>

are equivalent. Then there are the comments, basically a comment is one big tag....
ASKER CERTIFIED SOLUTION
Avatar of smitty1276
smitty1276

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
BTW... certain, specific tags that do not use a closing tag should always be treated as text... "<BR>" as an example... since they don't contain any information and therefore will have no sub-nodes.  "&nbsp;" might be another example.

It would help to have a file containing all of the tags that may or MAY NOT have closing tags.  That way you can assume that any tag that isn't on the list WILL DEFINATELY have a closing tag.
Well all you really need to search for is "<TABLE" and then process until you get to a matching "/TABLE>"  As you encounter <TR and <TD, you do some indenting.  just ignore <DIV et. al

sHtml= sHtmlOrig;
sHtml.MakeUpper();

int nOffset= sHtml.Find( "<TABLE" );

Now you can use sSubStr= sHtml.Mid( nOffset ) to get a string that starts at the start of the table, and you can scan forward with sSubStr.Find("whatever").  Remember to take the output from the original since the working set has been unshifted for simplicty of comparison.

-- Dan

Avatar of neo23

ASKER

DanRollins you are on the right way, Im just new to C, could you show me how... just a code sample..
>>could you show me how... just a code sample..

Well, I could write your program for you but that would take away all of your fun!  I've got a better idea...

I suggest that you try it yourself.  It will be a good chance for you to learn about the CString data type.  If you run into any snags, just post a question here.  I'll be glad to help.  As I said, I'll even coach you in how to convert a VB function.

-- Dan
hi neo23,
Do you have any additional questions?  Do any comments need clarification?

-- Dan