Solved

Extracting HTML tag contents to use as a parameter

Posted on 2004-10-15
8
219 Views
Last Modified: 2013-12-25
Hi,

I'm no expert at Perl, although I can hack together basic scripts. Our Perl developer was recently released for professional negligence, so I am having to pick up the pieces of one of his projects.

What I need to be able to do is to Extract the contents of the HTML <Title> tag from the page which called the script so I can pass it as a parameterin a query string to a pricing search tool.

We currently have it calling a fixed field through:
my $q=new CGI;
my $part = $q->param('Part');

but this means altering the layout of several thousand pages to include a page specific part description. A better option is to just pass the contents of the Title tag.
This is a UNIX script.

Anyone have any ideas?

Thanks

Rik
0
Comment
Question by:asparak
  • 4
  • 3
8 Comments
 
LVL 28

Expert Comment

by:FishMonger
ID: 12319096
Are you saying that your "database" of parts discriptions is kept within the <title> parts discriptions </title> tags and is spread across several thousand pages of static html?  Can you show us an example of the html and Perl script you're using and/or provide a link to one of the pages?
0
 
LVL 1

Author Comment

by:asparak
ID: 12319455
The Title tag looks like a standard HTML tag : <title>My Server Model here</title>

Our Current test version, only available on our internal network while I try to fix this contains the following:

<a href="/cgi-bin/pTester3.cgi?Part="My Server Model here" target="_blank">Price It <span class="rightarrow">&raquo;</span></a>

What I need to do is to change this to:
<a href="/cgi-bin/pTester3.cgi" target="_blank">Price It <span class="rightarrow">&raquo;</span></a> to remove that hard coding issue.

The cgi script then needs to parse the HTML file it was called from and extract the description from the Title tag, to complete the following piece of code within pTester3.cgi
my $q=new CGI;
#my $part = $q->param('Part');
#Need to fix the following line:
my $part=$q->(the title of the page I was called from);
print $q->redirect(-URL=>"https://$portal/webquery/Query?Query.findButton=Find&Query.prodDescOutput=$part&Query.selectedOuputColumns=partNum%2CunitPrice");

This issues a command to our secure server to display up to the minute pricing information back to an authorised user for that product group. $portal is a parameter extracted from the session cookie to point the command to the right pricing information portal.

I have anonimised things a little.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 12320522
It is not possible it do it in the manor you're wanting because the html page won't passing anything to the cgi script.  If you don't want to pass the info in the link, you'll probably need the Perl script to read and parse the tags in the calling html file, which would not be very efficient.
0
 
LVL 28

Accepted Solution

by:
FishMonger earned 125 total points
ID: 12320640
Here's a module that will help extract the info.

http://search.cpan.org/~gaas/HTML-Parser-3.36/lib/HTML/HeadParser.pm
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 28

Expert Comment

by:FishMonger
ID: 12320678
0
 
LVL 1

Author Comment

by:asparak
ID: 12320864
Right,

I'll need to try and get the admins to have HeadParser installed on the server. Then try to figure out how to write the code to get HTTP_Referer and parse it.

Not sure I'm up to this, but I'll give it a go, unless you can think of a better way to approach this. All I have to go on are the few sparse notes of the developer before he was escorted from the building.

One thought I have was to embed some code in the head or something to populate the part dynamically as the Head is templated and so one change could be made to all the pages. It's just the body portion of the page we want to try to avoid having to customise.

Thanks
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 12331944
> The cgi script then needs to parse the HTML file it was called from and extract the description from the Title tag
i.g. impossible. Dot.
You either need active scripting on client side to do that, or you need to tell you CGI to request the page itself (which is totally unreliable).

You have following choices:
  1. leave as is, means that the generated page contains links with GET or POST request which carry "your title"
  2. build sessions server-side where you store the title and then can get it back after your CGI is called again with same session-ID
0
 
LVL 1

Author Comment

by:asparak
ID: 12336838
It's been a nasty hack ,but with the help of a friend who's far better at perl than I, we have managed to test HTML::HeadParser successfully on dev. Just need to push it out to live now.

Probably explains why it took our developer 6 months to do.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

The following is a collection of cases for strange behaviour when using advanced techniques in DOS batch files. You should have some basic experience in batch "programming", as I'm assuming some knowledge and not further explain the basics. For some…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now