Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Is a request more than headers and get/post data?

Posted on 2003-11-29
Last Modified: 2008-02-01
There is a specific website with a lot of information. I want that information in my database.
The information on that site is organized as follows:

You start at one (the 1st) of the Main pages. From each Main page, you can click to the next Main page, or to one of the ten Sub pages belonging to that Main Page.

From each Sub page, you can click to one of its ten Leaf pages. The leaf pages contain the information I want.

When I said "click", I actually meant "submitting a form". Whereever you submit, you are always submitting the same form to the same location (an aspx file, don't know what that is), only the submitted data differs. This means that you can get any Main, Sub or Leaf page by submitting the same form, only the hidden data differs.
(I think it's no use trying to understand the data, because one important hidden value is 12k of alphanumerical characters and plus signs).

My first attempt, I think it's a dead end:
open the 1st Main page in a frame, change it's content and submit the form using javascript. Problem: For browser security reasons this won't work. Maybe it's possible to get the content from screen using Windows functions (in Delphi or C), please let me know if you know how.

Second attempt:
Save one of the pages on my hard disk. Change it's form.action so that it works correctly.
(This is a check that I can still get to any of the pages manually).
Now, change its action so that it posts to my PHP file. In the PHP, display $_REQUEST and $_SERVER. Use the $_REQUEST data to post using a library called Snoopy, sending the headers from $_SERVER using Snoopy's rawheader variable. (i remove the HOST and CONNECTION headers).

There are basically 3 things that can happen when you post the form:
1. You get the page you wanted
2. You get an error indicating that the submitted data or headers or whatever is malformed (it's always exactly the same error). it's a runtime error of the content management system application.
3. You get the 1st main page, which means that the data was not that malformed.

Posting with Snoopy, I can get the Main and Subpages, but when trying to access a leaf page I get happening 3.

This means that there is a difference between posting the above data and posting the form on my local drive by clicking on it with a mouse, although as I described I caught the headers and posted data from the mouse click and sent that almost exactly. Does this mean there's more information than Headers and Posted data?

I think the problem is not in Snoopy, which can be downloaded at http://sourceforge.net using the search form on the left.
Question by:alberthendriks
  • 2

Accepted Solution

petoskey-001 earned 255 total points
ID: 9877304
It's probably setup like this to prevent the very thing your trying to achieve - the download of their databases.  For instance the Kelly Blue Book does something like this when searching for the value of a car.  It would be hard to automate, and they could also have something on their site to only allow a certain user a few hundred uses per day.

They could use Javascript to create hidden input variables, or modify the cookie info.  This would let them check that a real client with Javascript turned on was running.  Can you give us the URL of the site your looking at?

Expert Comment

ID: 10429801
Do you still need help with this?

Author Comment

ID: 10430021
It was difficult because it uses sessions. We worked it around the dirty way by creating a Windows macro that mouseclicks to all the data and copy-pastes it into a database.

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How can I make this form submit to itself? 10 35
Custom Wordpress Loop 22 38
Log in through ID 5 17
form validation - make sure at least 1 checkbox is selected 18 28
This article will explain how to display the first page of your Microsoft Word documents (e.g. .doc, .docx, etc...) as images in a web page programatically. I have scoured the web on a way to do this unsuccessfully. The goal is to produce something …
Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question