Solved

How to load a cgi URL which have meaning only from a certain parent URL

Posted on 2004-04-18
28
545 Views
Last Modified: 2010-08-05
I'm working in C++.

I'm using "CreateURLMoniker(m_LastMoniker, wszURL, &pMk))) "
to download an html to my computer.
The problem is that there is a site whcih contains a cgi that is working
on from a certain url.
I tried to code something like this (notice the m_LastMoniker):
CreateURLMoniker(0, parentURL, &m_LastMoniker)))
hr = m_pMSHTML->QueryInterface(IID_IPersistMoniker, (LPVOID*)&pPMk)
hr = pPMk->Load(FALSE, pMk, pBCtx, STGM_READ);
//code for waiting until the page is loaded & then operating this line (with m_LastMoniker):
CreateURLMoniker(m_LastMoniker, cgiURL, &pMk)))

but it doesn't work !!!

any solution ?
0
Comment
Question by:MagicianH
  • 12
  • 11
  • 4
28 Comments
 
LVL 6

Expert Comment

by:Mafalda
ID: 10853632
Could you post the full code fragment including the cgi address ?
0
 

Author Comment

by:MagicianH
ID: 10857226
if you insist :)

 
// Use an asynchronous Moniker to load the specified resource
HRESULT CApp::LoadURLFromMoniker(const char* linkName)
{
      HRESULT hr;
      OLECHAR  wszURL[MAX_PATH*sizeof(OLECHAR)];
      if (0 == MultiByteToWideChar(CP_ACP, 0, linkName, -1, wszURL, MAX_PATH*sizeof(OLECHAR)))
      {
            return E_FAIL;
      }
 
      // Ask the system for a URL Moniker
      LPMONIKER pMk = NULL;
      LPBINDCTX pBCtx = NULL;
      LPPERSISTMONIKER pPMk = NULL;

      if (FAILED(hr = CreateURLMoniker(m_LastMoniker, wszURL, &pMk)))
            return hr;
      if (FAILED(hr = CreateBindCtx(0, &pBCtx)))
            return hr;

//      HlinkSimpleNavigateToMoniker(pMk,NULL,NULL,m_LastMoniker,pBCtx,NULL,HLNF_INTERNALJUMP,0);

      if (SUCCEEDED(hr = m_pMSHTML->QueryInterface(IID_IPersistMoniker, (LPVOID*)&pPMk)))
      {
            // Call Load on the IPersistMoniker
            // This may return immediately, but the document isn't loaded until
            // MSHTML reports READYSTATE_COMPLETE. See the implementation of  
            // IPropertyNotifySink::OnChanged above and see how the app waits
            // for this state change before walking the document's object model.
            TCHAR szBuff[MAX_PATH];
            wsprintf(szBuff, "Loading %s...\n", linkName);
            ODS(szBuff);
            cout << szBuff;
            hr = pPMk->Load(FALSE, pMk, pBCtx, STGM_READ);
      }

      if (m_LastMoniker) m_LastMoniker->Release();
      m_LastMoniker = pMk;
      if (pBCtx) pBCtx->Release();
      return hr;
}
 
// Load the specified document and start pumping messages
HRESULT CApp::Run(const char* linkName, const char* fileName)
{
      HRESULT hr;
      MSG msg;
 
      hr = LoadURLFromMoniker(linkName);
      int count=0;
      if (SUCCEEDED(hr) || E_PENDING == hr)
      {
            while (GetMessage(&msg, NULL, 0, 0))
            {
                  count++;
                  if (WM_USER_STARTWALKING == msg.message && NULL == msg.hwnd)
                  {
                        hr = Save(fileName);
                  }
                  else
                  {
                        DispatchMessage(&msg);
                  }
            }
      }
      return hr;
}
 
// Save the object model.
HRESULT CApp::Save(const char* fileName)
{
      HRESULT hr;
      IPersistFile*      pFile = NULL;
      OLECHAR  wFileName[MAX_PATH*sizeof(OLECHAR)];
/*
      LPSTREAM      pIStream = NULL;
      IStorage*      pIStorage;
      LPPERSISTSTREAMINIT pPStr = NULL;
      char strName[]="stream";
      OLECHAR  wStrName[55];

      static int count=0;
*/
      assert(m_pMSHTML);
      if (!m_pMSHTML)
            return E_UNEXPECTED;
 
      if (READYSTATE_COMPLETE != m_lReadyState)
      {
            ODS("Shouldn't get here 'til MSHTML changes readyState to READYSTATE_COMPLETE\n");
            DebugBreak();
            return E_UNEXPECTED;
      }
 
//      count++;
//      wsprintf(fileName,"%s%d",fileName,count);
//      wsprintf(strName,"%s%d",strName,count);
      if (0 == MultiByteToWideChar(CP_ACP, 0, fileName, -1, wFileName, MAX_PATH*sizeof(OLECHAR)))
            return E_FAIL;
      if (SUCCEEDED(hr = m_pMSHTML->QueryInterface(IID_IPersistFile, (LPVOID*)&pFile)))
      {
            hr = pFile->Save(wFileName, FALSE);
      }
/*
      else if (SUCCEEDED(hr = m_pMSHTML->QueryInterface(IID_IPersistStreamInit, (LPVOID*)&pPStr)))
      {
            if (0 == MultiByteToWideChar(CP_ACP, 0, fileName, -1, wFileName, MAX_PATH*sizeof(OLECHAR)))
                  return E_FAIL;
            if (0 == MultiByteToWideChar(CP_ACP, 0, strName, -1, wStrName, MAX_PATH*sizeof(OLECHAR)))
                  return E_FAIL;
            hr = StgCreateDocfile(wFileName, STGM_SIMPLE | STGM_READWRITE | STGM_CREATE | STGM_SHARE_EXCLUSIVE, 0, &pIStorage);
            hr = pIStorage->CreateStream(wStrName,STGM_READWRITE | STGM_SHARE_EXCLUSIVE ,0,0,&pIStream);
            hr=pPStr->Save(pIStream, FALSE);
            pIStream->Release();
            pIStorage->Release();
      }
*/
      // We're done so post ourselves a quit to terminate the message pump.
      PostQuitMessage(0);
 
      return hr;
}
 
    g_pApp->Run("http://www.maariv.co.il/channels/1/ART/650/492.html");
    g_pApp->Run("http://www.maariv.co.il/cgi-bin/chanprint.pl");

What I wanted to do is simply download several pages to my HD, that's all...

I searched about 30 offline browsers to find what I need but I didn't find it.
I need simething very very simple:
1. Go to a news site.
2. read the list of articles links.
3. download the *Printing* version of them:
    to download the printing version means to load the regular article page &
    then to load the next URL which points to the printing version of the article.
    the problem in the offline browsers is that they allow "filters"
    but my first page needs a certain filter but the next page requires a different filter:
    I didn't find any program which have that option yet...
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 10857259
The http://www.maariv.co.il/cgi-bin/chanprint.pl address is wrong.
Try opening it in a browser and you will get only the logo !

Always check the pages first in IE to see that they are the once you really need to read.
In my opinion you might need to read a different frame or a totally different address ;o)

To get the right address right click on the fram and copy the address from the properties.

Cheers
Mafalda
0
 

Author Comment

by:MagicianH
ID: 10857383
that's what I tried to tell you.
there is no link like that.
from the page: http://www.maariv.co.il/channels/1/ART/650/492.html"
there is a button which loads the cgi (view the source of it & find "/cgi-bin/chanprint.pl")
THAT's what I'm trying to do - you see the problem ?
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 10857617
The link is for a perl script that prints the page (The printer Icon)
You would need to track down the parameters passed to this script when clicking on the link.
Then you would be able to knw where it is taken the data from and then you will be able to see how to extract this info automatically from the file (or cookies/objects).
I tried to view the pages changing at the status bar but it is too fast ...
Do you have such problem in other places ?
0
 

Author Comment

by:MagicianH
ID: 10857807
Other places have java script that I resolved by myself or just simple URL's.

so do you have an idea how to find the parameters ?
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 10857875
You could try to slow the internet down and check the changes in the status bar ... but not always you can see there all the line.
In addition the information might be transferred through another channel (like TCP/IP)
0
 

Author Comment

by:MagicianH
ID: 10858263
that's it ? I'm stuck ?
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 10858429
Do you really need the printing version ?
Maybe you can continue for now ignoring this kind of perl scripts ?
What is the case in other sites ?
So many technologies ... you will always find something bizar ;o)
In other words -> are you really stuck ?
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 10858433
OK, I read your comments again and you do need the printing version ... sorry ;o)
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 10858581
Other things coming to my mind at the moment:
Try monitoring the traffic and see what is activated when the perl script runs



I hope that some other experts will have better suggestions ...

Sorry,
Mafalda
0
 

Author Comment

by:MagicianH
ID: 10858620
How do I call other experts for help ?
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 10859406
You wait for several days ... usually you get feedback during the first day but sometimes it takes longer ;o)
Do not forget that many experts live in different timezones than you are !
You can always close after a few days and repost to bring to top of the list
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 49

Expert Comment

by:DanRollins
ID: 10956921
I think that the server has tracked the page you were looking at so it knows what to print.  I cannot understand the language of the text on that page, so I can only provide a general answer:

1) Use a Webbrowser control and open the parent page.

2) Use the DOM to search though the document.links collection to find the one that has an href of chanprint.pl (I see from the source that it does not have a name or id attribute, so you can't go right to it)

3) Execute its click() member function of that A tag.

-- Dan
0
 

Author Comment

by:MagicianH
ID: 10971076
Dan - I already thought about the "click" solution, isn't there a nicer way to do it ???
0
 
LVL 49

Accepted Solution

by:
DanRollins earned 500 total points
ID: 10973530
I can't think of any generalized solution... the server has probably set a "cookie crumb" so that the next page will know what page invoked it.  There are other possibilities also, but as I say, I can't read that page and there are many variables.  

Anyway, most solutions (to anything -- in life or programming) are not particulary "nice"  or clean when you dig below the surface.  "Honey, I'm going to the store to get some soap" includes all kinds of ugly non-nice things like the tricks involved in making fuel injection work in an internal combustion engine work and the incredibly complex suite of mechanisms and optical sciences used in the UPC-code reader at the market and the process by which the soap was manufactured and transported to the neighborhood.  So, if it sounds complicated, just put it in a function called GotoPage() and that will clean it right up :)

-- Dan
0
 

Author Comment

by:MagicianH
ID: 11047136
:)
10x for the philosophy lesson...
0
 

Author Comment

by:MagicianH
ID: 11222180
Dan, you solution with the "click" didn't work,
since I'm using IHTMLDocument to load the page & in that case a new browser is opened instead...
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 11226947
Did you use a Webbrowser control, as I suggested?  If so, it should work exactly as if the user had clicked a certain button.
0
 

Author Comment

by:MagicianH
ID: 11242466
No I'm not, I'm have a console application.

If there is no other way, I can use a non-console application,
but the problem is that I don't see that I can have enough control over it,
for example to download only text, not to run ActiveX etc...
I suppose there *is* a way but the documentation other it isn't enough.
ofcourse without using the menu of the iternet explorer.
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 11242867
MagicianH,
I never understood what additional helpfull info you saw in Dan's comments ... ;o)
Did you try to see what the program was sending when the icon was clicked ?
Cheers,
Mafalda
0
 

Author Comment

by:MagicianH
ID: 11242891
How can I do it ? I'm looking at the browser but I can't see anything...
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 11243331
Ok it is tricky, you need a sniffing program ...
0
 

Author Comment

by:MagicianH
ID: 11243516
What program, for example ?
0
 
LVL 6

Expert Comment

by:Mafalda
ID: 11247743
There are many.
See details in  http://grc.com/oo/packetsniff.htm

Another optional software
http://lists.gpick.com/pages/Packet_Sniffing.htm
0
 

Author Comment

by:MagicianH
ID: 11257135
I used the sniffer & no data was sent to the host except the call to script.
But I noticed that the HTTP request had field which was called:
"Referer" & in it the name of the original page. I think the script is using it.
how can I simulate it ?
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 11264860
You can make the Referrer be correct by using the technique that I have described.  That is, use an *actual* browser, but with external controls, such as forcing an OnClick event.

-- Dan
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now