Solved

Http Question...

Posted on 2001-07-14
6
491 Views
Last Modified: 2012-06-21
Dear Experts,

I want to get the informatin from a remote website. I am using MFC classes for getting the information. It is working fine. I am able to get the whole html file but what i want is only text content in the webpage. I am using the following code.

try
{
CInternetSession *strSess = new CInternetSession("new");
DWORD dwServiceType = AFX_INET_SERVICE_HTTP;
CString pServer;
CString pObject;
INTERNET_PORT nport;
CString strUrl = "http://www.google.com";
AfxParseURL(strUrl,dwServiceType,pServer,pObject,nport);
CHttpConnection* httpcon = strSess->GetHttpConnection(pServer,nport);
CHttpFile * httpFile = httpcon->OpenRequest(1,pObject,NULL,1,NULL,NULL,INTERNET_FLAG_EXISTING_CONNECT);
          BOOL result = httpFile->SendRequest(NULL,0,NULL,0);
          CStdioFile str("c:\\test.txt",CFile::modeCreate | CFile::modeReadWrite);

          while(httpFile->ReadString(text))
          {    
               str.WriteString(text+"\n");    
          }

     
}
catch(CInternetException * thro)
{
          TCHAR strError[255];
          cout<<"There was some error";
          thro->GetErrorMessage(strError,255);
          cout<<strError<<endl;
}

how to do this.

thanks.
0
Comment
Question by:jamesasp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 32

Expert Comment

by:jhance
ID: 6282288
>>whole html file but what i want is only text content in the webpage

Can you be more specific?  The "text content" and the "html file" for a web page are the SAME THING.
0
 
LVL 7

Expert Comment

by:KangaRoo
ID: 6283154
You mean you want the markup tags removed (and all comment, script and style elements)?
Then you'd parse the file and remove anything that's not between <body> and </body>.
In the body element, remove the markuptags, basically anything between angled braces '<' and '>'
This also takes care of the comments and most script and style declarations since most html designers place those within a comments.
Finally, replace &xxx; codes (like &nbsp; &lt;) with their proper characters.
0
 

Author Comment

by:jamesasp
ID: 6283239
Hai,

thanks for your comments.Yes what KangaRoo meant is right.
I want only the text excluding the html commands.
for example suppose consider the following commands.

<html>
<head>Welcome</head>
<body>
<table>
<tr>
<td>Hello welcome to this page</td>
</tr>
</table>
</body>
</html>

from this page i want to extract only

welcome and Hello welcome to this page.

how to do this. there is any command in MFC to get the innertext what we do while using ieapplication.

thanks
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 32

Expert Comment

by:jhance
ID: 6283338
No, there are no MFC classes or functions to parse the HTML.  For this you need an HTML parser.  See:

http://www.w3.org/MarkUp/implementations.html

for information and software to do this.

What you are trying to do is NON-TRIVIAL!
0
 

Accepted Solution

by:
tyronen earned 30 total points
ID: 6288496
There is a sample on how to use Microsoft's MSHTML as an HTML parser.  You must have IE 4.0 or later

http://msdn.microsoft.com/downloads/samples/internet/default.asp?url=/downloads/samples/internet/browser/walkall/default.asp

- tyronen
0
 

Author Comment

by:jamesasp
ID: 6322031
Thanks for all
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question