Extracting HTML tag contents to use as a parameter


I'm no expert at Perl, although I can hack together basic scripts. Our Perl developer was recently released for professional negligence, so I am having to pick up the pieces of one of his projects.

What I need to be able to do is to Extract the contents of the HTML <Title> tag from the page which called the script so I can pass it as a parameterin a query string to a pricing search tool.

We currently have it calling a fixed field through:
my $q=new CGI;
my $part = $q->param('Part');

but this means altering the layout of several thousand pages to include a page specific part description. A better option is to just pass the contents of the Title tag.
This is a UNIX script.

Anyone have any ideas?


Who is Participating?
Here's a module that will help extract the info.

Are you saying that your "database" of parts discriptions is kept within the <title> parts discriptions </title> tags and is spread across several thousand pages of static html?  Can you show us an example of the html and Perl script you're using and/or provide a link to one of the pages?
asparakAuthor Commented:
The Title tag looks like a standard HTML tag : <title>My Server Model here</title>

Our Current test version, only available on our internal network while I try to fix this contains the following:

<a href="/cgi-bin/pTester3.cgi?Part="My Server Model here" target="_blank">Price It <span class="rightarrow">&raquo;</span></a>

What I need to do is to change this to:
<a href="/cgi-bin/pTester3.cgi" target="_blank">Price It <span class="rightarrow">&raquo;</span></a> to remove that hard coding issue.

The cgi script then needs to parse the HTML file it was called from and extract the description from the Title tag, to complete the following piece of code within pTester3.cgi
my $q=new CGI;
#my $part = $q->param('Part');
#Need to fix the following line:
my $part=$q->(the title of the page I was called from);
print $q->redirect(-URL=>"https://$portal/webquery/Query?Query.findButton=Find&Query.prodDescOutput=$part&Query.selectedOuputColumns=partNum%2CunitPrice");

This issues a command to our secure server to display up to the minute pricing information back to an authorised user for that product group. $portal is a parameter extracted from the session cookie to point the command to the right pricing information portal.

I have anonimised things a little.
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

It is not possible it do it in the manor you're wanting because the html page won't passing anything to the cgi script.  If you don't want to pass the info in the link, you'll probably need the Perl script to read and parse the tags in the calling html file, which would not be very efficient.
asparakAuthor Commented:

I'll need to try and get the admins to have HeadParser installed on the server. Then try to figure out how to write the code to get HTTP_Referer and parse it.

Not sure I'm up to this, but I'll give it a go, unless you can think of a better way to approach this. All I have to go on are the few sparse notes of the developer before he was escorted from the building.

One thought I have was to embed some code in the head or something to populate the part dynamically as the Head is templated and so one change could be made to all the pages. It's just the body portion of the page we want to try to avoid having to customise.

> The cgi script then needs to parse the HTML file it was called from and extract the description from the Title tag
i.g. impossible. Dot.
You either need active scripting on client side to do that, or you need to tell you CGI to request the page itself (which is totally unreliable).

You have following choices:
  1. leave as is, means that the generated page contains links with GET or POST request which carry "your title"
  2. build sessions server-side where you store the title and then can get it back after your CGI is called again with same session-ID
asparakAuthor Commented:
It's been a nasty hack ,but with the help of a friend who's far better at perl than I, we have managed to test HTML::HeadParser successfully on dev. Just need to push it out to live now.

Probably explains why it took our developer 6 months to do.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.