Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Http Question...

Posted on 2001-07-14
6
Medium Priority
?
507 Views
Last Modified: 2012-06-21
Dear Experts,

I want to get the informatin from a remote website. I am using MFC classes for getting the information. It is working fine. I am able to get the whole html file but what i want is only text content in the webpage. I am using the following code.

try
{
CInternetSession *strSess = new CInternetSession("new");
DWORD dwServiceType = AFX_INET_SERVICE_HTTP;
CString pServer;
CString pObject;
INTERNET_PORT nport;
CString strUrl = "http://www.google.com";
AfxParseURL(strUrl,dwServiceType,pServer,pObject,nport);
CHttpConnection* httpcon = strSess->GetHttpConnection(pServer,nport);
CHttpFile * httpFile = httpcon->OpenRequest(1,pObject,NULL,1,NULL,NULL,INTERNET_FLAG_EXISTING_CONNECT);
          BOOL result = httpFile->SendRequest(NULL,0,NULL,0);
          CStdioFile str("c:\\test.txt",CFile::modeCreate | CFile::modeReadWrite);

          while(httpFile->ReadString(text))
          {    
               str.WriteString(text+"\n");    
          }

     
}
catch(CInternetException * thro)
{
          TCHAR strError[255];
          cout<<"There was some error";
          thro->GetErrorMessage(strError,255);
          cout<<strError<<endl;
}

how to do this.

thanks.
0
Comment
Question by:jamesasp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 32

Expert Comment

by:jhance
ID: 6282288
>>whole html file but what i want is only text content in the webpage

Can you be more specific?  The "text content" and the "html file" for a web page are the SAME THING.
0
 
LVL 7

Expert Comment

by:KangaRoo
ID: 6283154
You mean you want the markup tags removed (and all comment, script and style elements)?
Then you'd parse the file and remove anything that's not between <body> and </body>.
In the body element, remove the markuptags, basically anything between angled braces '<' and '>'
This also takes care of the comments and most script and style declarations since most html designers place those within a comments.
Finally, replace &xxx; codes (like &nbsp; &lt;) with their proper characters.
0
 

Author Comment

by:jamesasp
ID: 6283239
Hai,

thanks for your comments.Yes what KangaRoo meant is right.
I want only the text excluding the html commands.
for example suppose consider the following commands.

<html>
<head>Welcome</head>
<body>
<table>
<tr>
<td>Hello welcome to this page</td>
</tr>
</table>
</body>
</html>

from this page i want to extract only

welcome and Hello welcome to this page.

how to do this. there is any command in MFC to get the innertext what we do while using ieapplication.

thanks
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 32

Expert Comment

by:jhance
ID: 6283338
No, there are no MFC classes or functions to parse the HTML.  For this you need an HTML parser.  See:

http://www.w3.org/MarkUp/implementations.html

for information and software to do this.

What you are trying to do is NON-TRIVIAL!
0
 

Accepted Solution

by:
tyronen earned 90 total points
ID: 6288496
There is a sample on how to use Microsoft's MSHTML as an HTML parser.  You must have IE 4.0 or later

http://msdn.microsoft.com/downloads/samples/internet/default.asp?url=/downloads/samples/internet/browser/walkall/default.asp

- tyronen
0
 

Author Comment

by:jamesasp
ID: 6322031
Thanks for all
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
Suggested Courses

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question