Solved

Http Question...

Posted on 2001-07-14
6
493 Views
Last Modified: 2012-06-21
Dear Experts,

I want to get the informatin from a remote website. I am using MFC classes for getting the information. It is working fine. I am able to get the whole html file but what i want is only text content in the webpage. I am using the following code.

try
{
CInternetSession *strSess = new CInternetSession("new");
DWORD dwServiceType = AFX_INET_SERVICE_HTTP;
CString pServer;
CString pObject;
INTERNET_PORT nport;
CString strUrl = "http://www.google.com";
AfxParseURL(strUrl,dwServiceType,pServer,pObject,nport);
CHttpConnection* httpcon = strSess->GetHttpConnection(pServer,nport);
CHttpFile * httpFile = httpcon->OpenRequest(1,pObject,NULL,1,NULL,NULL,INTERNET_FLAG_EXISTING_CONNECT);
          BOOL result = httpFile->SendRequest(NULL,0,NULL,0);
          CStdioFile str("c:\\test.txt",CFile::modeCreate | CFile::modeReadWrite);

          while(httpFile->ReadString(text))
          {    
               str.WriteString(text+"\n");    
          }

     
}
catch(CInternetException * thro)
{
          TCHAR strError[255];
          cout<<"There was some error";
          thro->GetErrorMessage(strError,255);
          cout<<strError<<endl;
}

how to do this.

thanks.
0
Comment
Question by:jamesasp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 32

Expert Comment

by:jhance
ID: 6282288
>>whole html file but what i want is only text content in the webpage

Can you be more specific?  The "text content" and the "html file" for a web page are the SAME THING.
0
 
LVL 7

Expert Comment

by:KangaRoo
ID: 6283154
You mean you want the markup tags removed (and all comment, script and style elements)?
Then you'd parse the file and remove anything that's not between <body> and </body>.
In the body element, remove the markuptags, basically anything between angled braces '<' and '>'
This also takes care of the comments and most script and style declarations since most html designers place those within a comments.
Finally, replace &xxx; codes (like &nbsp; &lt;) with their proper characters.
0
 

Author Comment

by:jamesasp
ID: 6283239
Hai,

thanks for your comments.Yes what KangaRoo meant is right.
I want only the text excluding the html commands.
for example suppose consider the following commands.

<html>
<head>Welcome</head>
<body>
<table>
<tr>
<td>Hello welcome to this page</td>
</tr>
</table>
</body>
</html>

from this page i want to extract only

welcome and Hello welcome to this page.

how to do this. there is any command in MFC to get the innertext what we do while using ieapplication.

thanks
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 32

Expert Comment

by:jhance
ID: 6283338
No, there are no MFC classes or functions to parse the HTML.  For this you need an HTML parser.  See:

http://www.w3.org/MarkUp/implementations.html

for information and software to do this.

What you are trying to do is NON-TRIVIAL!
0
 

Accepted Solution

by:
tyronen earned 30 total points
ID: 6288496
There is a sample on how to use Microsoft's MSHTML as an HTML parser.  You must have IE 4.0 or later

http://msdn.microsoft.com/downloads/samples/internet/default.asp?url=/downloads/samples/internet/browser/walkall/default.asp

- tyronen
0
 

Author Comment

by:jamesasp
ID: 6322031
Thanks for all
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
ASP.net build a IF/Then Walkthrough Guide 1 314
Find Visual Studio Tools 2 134
White board coding practice 3 98
object oriented programming comparison 5 93
Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question