[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 571
  • Last Modified:

How to load a cgi URL which have meaning only from a certain parent URL

I'm working in C++.

I'm using "CreateURLMoniker(m_LastMoniker, wszURL, &pMk))) "
to download an html to my computer.
The problem is that there is a site whcih contains a cgi that is working
on from a certain url.
I tried to code something like this (notice the m_LastMoniker):
CreateURLMoniker(0, parentURL, &m_LastMoniker)))
hr = m_pMSHTML->QueryInterface(IID_IPersistMoniker, (LPVOID*)&pPMk)
hr = pPMk->Load(FALSE, pMk, pBCtx, STGM_READ);
//code for waiting until the page is loaded & then operating this line (with m_LastMoniker):
CreateURLMoniker(m_LastMoniker, cgiURL, &pMk)))

but it doesn't work !!!

any solution ?
0
MagicianH
Asked:
MagicianH
  • 12
  • 11
  • 4
1 Solution
 
MafaldaCommented:
Could you post the full code fragment including the cgi address ?
0
 
MagicianHAuthor Commented:
if you insist :)

 
// Use an asynchronous Moniker to load the specified resource
HRESULT CApp::LoadURLFromMoniker(const char* linkName)
{
      HRESULT hr;
      OLECHAR  wszURL[MAX_PATH*sizeof(OLECHAR)];
      if (0 == MultiByteToWideChar(CP_ACP, 0, linkName, -1, wszURL, MAX_PATH*sizeof(OLECHAR)))
      {
            return E_FAIL;
      }
 
      // Ask the system for a URL Moniker
      LPMONIKER pMk = NULL;
      LPBINDCTX pBCtx = NULL;
      LPPERSISTMONIKER pPMk = NULL;

      if (FAILED(hr = CreateURLMoniker(m_LastMoniker, wszURL, &pMk)))
            return hr;
      if (FAILED(hr = CreateBindCtx(0, &pBCtx)))
            return hr;

//      HlinkSimpleNavigateToMoniker(pMk,NULL,NULL,m_LastMoniker,pBCtx,NULL,HLNF_INTERNALJUMP,0);

      if (SUCCEEDED(hr = m_pMSHTML->QueryInterface(IID_IPersistMoniker, (LPVOID*)&pPMk)))
      {
            // Call Load on the IPersistMoniker
            // This may return immediately, but the document isn't loaded until
            // MSHTML reports READYSTATE_COMPLETE. See the implementation of  
            // IPropertyNotifySink::OnChanged above and see how the app waits
            // for this state change before walking the document's object model.
            TCHAR szBuff[MAX_PATH];
            wsprintf(szBuff, "Loading %s...\n", linkName);
            ODS(szBuff);
            cout << szBuff;
            hr = pPMk->Load(FALSE, pMk, pBCtx, STGM_READ);
      }

      if (m_LastMoniker) m_LastMoniker->Release();
      m_LastMoniker = pMk;
      if (pBCtx) pBCtx->Release();
      return hr;
}
 
// Load the specified document and start pumping messages
HRESULT CApp::Run(const char* linkName, const char* fileName)
{
      HRESULT hr;
      MSG msg;
 
      hr = LoadURLFromMoniker(linkName);
      int count=0;
      if (SUCCEEDED(hr) || E_PENDING == hr)
      {
            while (GetMessage(&msg, NULL, 0, 0))
            {
                  count++;
                  if (WM_USER_STARTWALKING == msg.message && NULL == msg.hwnd)
                  {
                        hr = Save(fileName);
                  }
                  else
                  {
                        DispatchMessage(&msg);
                  }
            }
      }
      return hr;
}
 
// Save the object model.
HRESULT CApp::Save(const char* fileName)
{
      HRESULT hr;
      IPersistFile*      pFile = NULL;
      OLECHAR  wFileName[MAX_PATH*sizeof(OLECHAR)];
/*
      LPSTREAM      pIStream = NULL;
      IStorage*      pIStorage;
      LPPERSISTSTREAMINIT pPStr = NULL;
      char strName[]="stream";
      OLECHAR  wStrName[55];

      static int count=0;
*/
      assert(m_pMSHTML);
      if (!m_pMSHTML)
            return E_UNEXPECTED;
 
      if (READYSTATE_COMPLETE != m_lReadyState)
      {
            ODS("Shouldn't get here 'til MSHTML changes readyState to READYSTATE_COMPLETE\n");
            DebugBreak();
            return E_UNEXPECTED;
      }
 
//      count++;
//      wsprintf(fileName,"%s%d",fileName,count);
//      wsprintf(strName,"%s%d",strName,count);
      if (0 == MultiByteToWideChar(CP_ACP, 0, fileName, -1, wFileName, MAX_PATH*sizeof(OLECHAR)))
            return E_FAIL;
      if (SUCCEEDED(hr = m_pMSHTML->QueryInterface(IID_IPersistFile, (LPVOID*)&pFile)))
      {
            hr = pFile->Save(wFileName, FALSE);
      }
/*
      else if (SUCCEEDED(hr = m_pMSHTML->QueryInterface(IID_IPersistStreamInit, (LPVOID*)&pPStr)))
      {
            if (0 == MultiByteToWideChar(CP_ACP, 0, fileName, -1, wFileName, MAX_PATH*sizeof(OLECHAR)))
                  return E_FAIL;
            if (0 == MultiByteToWideChar(CP_ACP, 0, strName, -1, wStrName, MAX_PATH*sizeof(OLECHAR)))
                  return E_FAIL;
            hr = StgCreateDocfile(wFileName, STGM_SIMPLE | STGM_READWRITE | STGM_CREATE | STGM_SHARE_EXCLUSIVE, 0, &pIStorage);
            hr = pIStorage->CreateStream(wStrName,STGM_READWRITE | STGM_SHARE_EXCLUSIVE ,0,0,&pIStream);
            hr=pPStr->Save(pIStream, FALSE);
            pIStream->Release();
            pIStorage->Release();
      }
*/
      // We're done so post ourselves a quit to terminate the message pump.
      PostQuitMessage(0);
 
      return hr;
}
 
    g_pApp->Run("http://www.maariv.co.il/channels/1/ART/650/492.html");
    g_pApp->Run("http://www.maariv.co.il/cgi-bin/chanprint.pl");

What I wanted to do is simply download several pages to my HD, that's all...

I searched about 30 offline browsers to find what I need but I didn't find it.
I need simething very very simple:
1. Go to a news site.
2. read the list of articles links.
3. download the *Printing* version of them:
    to download the printing version means to load the regular article page &
    then to load the next URL which points to the printing version of the article.
    the problem in the offline browsers is that they allow "filters"
    but my first page needs a certain filter but the next page requires a different filter:
    I didn't find any program which have that option yet...
0
 
MafaldaCommented:
The http://www.maariv.co.il/cgi-bin/chanprint.pl address is wrong.
Try opening it in a browser and you will get only the logo !

Always check the pages first in IE to see that they are the once you really need to read.
In my opinion you might need to read a different frame or a totally different address ;o)

To get the right address right click on the fram and copy the address from the properties.

Cheers
Mafalda
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
MagicianHAuthor Commented:
that's what I tried to tell you.
there is no link like that.
from the page: http://www.maariv.co.il/channels/1/ART/650/492.html"
there is a button which loads the cgi (view the source of it & find "/cgi-bin/chanprint.pl")
THAT's what I'm trying to do - you see the problem ?
0
 
MafaldaCommented:
The link is for a perl script that prints the page (The printer Icon)
You would need to track down the parameters passed to this script when clicking on the link.
Then you would be able to knw where it is taken the data from and then you will be able to see how to extract this info automatically from the file (or cookies/objects).
I tried to view the pages changing at the status bar but it is too fast ...
Do you have such problem in other places ?
0
 
MagicianHAuthor Commented:
Other places have java script that I resolved by myself or just simple URL's.

so do you have an idea how to find the parameters ?
0
 
MafaldaCommented:
You could try to slow the internet down and check the changes in the status bar ... but not always you can see there all the line.
In addition the information might be transferred through another channel (like TCP/IP)
0
 
MagicianHAuthor Commented:
that's it ? I'm stuck ?
0
 
MafaldaCommented:
Do you really need the printing version ?
Maybe you can continue for now ignoring this kind of perl scripts ?
What is the case in other sites ?
So many technologies ... you will always find something bizar ;o)
In other words -> are you really stuck ?
0
 
MafaldaCommented:
OK, I read your comments again and you do need the printing version ... sorry ;o)
0
 
MafaldaCommented:
Other things coming to my mind at the moment:
Try monitoring the traffic and see what is activated when the perl script runs



I hope that some other experts will have better suggestions ...

Sorry,
Mafalda
0
 
MagicianHAuthor Commented:
How do I call other experts for help ?
0
 
MafaldaCommented:
You wait for several days ... usually you get feedback during the first day but sometimes it takes longer ;o)
Do not forget that many experts live in different timezones than you are !
You can always close after a few days and repost to bring to top of the list
0
 
DanRollinsCommented:
I think that the server has tracked the page you were looking at so it knows what to print.  I cannot understand the language of the text on that page, so I can only provide a general answer:

1) Use a Webbrowser control and open the parent page.

2) Use the DOM to search though the document.links collection to find the one that has an href of chanprint.pl (I see from the source that it does not have a name or id attribute, so you can't go right to it)

3) Execute its click() member function of that A tag.

-- Dan
0
 
MagicianHAuthor Commented:
Dan - I already thought about the "click" solution, isn't there a nicer way to do it ???
0
 
DanRollinsCommented:
I can't think of any generalized solution... the server has probably set a "cookie crumb" so that the next page will know what page invoked it.  There are other possibilities also, but as I say, I can't read that page and there are many variables.  

Anyway, most solutions (to anything -- in life or programming) are not particulary "nice"  or clean when you dig below the surface.  "Honey, I'm going to the store to get some soap" includes all kinds of ugly non-nice things like the tricks involved in making fuel injection work in an internal combustion engine work and the incredibly complex suite of mechanisms and optical sciences used in the UPC-code reader at the market and the process by which the soap was manufactured and transported to the neighborhood.  So, if it sounds complicated, just put it in a function called GotoPage() and that will clean it right up :)

-- Dan
0
 
MagicianHAuthor Commented:
:)
10x for the philosophy lesson...
0
 
MagicianHAuthor Commented:
Dan, you solution with the "click" didn't work,
since I'm using IHTMLDocument to load the page & in that case a new browser is opened instead...
0
 
DanRollinsCommented:
Did you use a Webbrowser control, as I suggested?  If so, it should work exactly as if the user had clicked a certain button.
0
 
MagicianHAuthor Commented:
No I'm not, I'm have a console application.

If there is no other way, I can use a non-console application,
but the problem is that I don't see that I can have enough control over it,
for example to download only text, not to run ActiveX etc...
I suppose there *is* a way but the documentation other it isn't enough.
ofcourse without using the menu of the iternet explorer.
0
 
MafaldaCommented:
MagicianH,
I never understood what additional helpfull info you saw in Dan's comments ... ;o)
Did you try to see what the program was sending when the icon was clicked ?
Cheers,
Mafalda
0
 
MagicianHAuthor Commented:
How can I do it ? I'm looking at the browser but I can't see anything...
0
 
MafaldaCommented:
Ok it is tricky, you need a sniffing program ...
0
 
MagicianHAuthor Commented:
What program, for example ?
0
 
MafaldaCommented:
There are many.
See details in  http://grc.com/oo/packetsniff.htm

Another optional software
http://lists.gpick.com/pages/Packet_Sniffing.htm
0
 
MagicianHAuthor Commented:
I used the sniffer & no data was sent to the host except the call to script.
But I noticed that the HTTP request had field which was called:
"Referer" & in it the name of the original page. I think the script is using it.
how can I simulate it ?
0
 
DanRollinsCommented:
You can make the Referrer be correct by using the technique that I have described.  That is, use an *actual* browser, but with external controls, such as forcing an OnClick event.

-- Dan
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 12
  • 11
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now