Parsing Help Needed

Posted on 2005-05-13
Last Modified: 2008-02-01

I am writing an interface to Netbackup's bpdbjobs command (by reading from a pipe). I want to tokenize each line into subcomponents, however, it is possible that some fields in the middle could be null at any given iteration: Here is a sample input:

388347         Backup   Done  73          hostx-a_fs_u01              incr                              000:15:20     hostx    05/09/05   19:40:37   000:56:02
388349         Backup   Done   0           dbs050_hosty_hot.full full   9786112  5942 000:37:40     hosty    05/09/05   19:25:17   000:37:43

As you can see the integer fields in the middle can be blank, however, I still want to tokenize this with those fields being empty strings. (if I split using (split / +/, $line) I will not get the behavior I desire.

my( $jid, $jtype, $jstat, $jcode, $jpol, $jsched, $jprog, $jperf, $jstart, $jclnt, $jsdate, $jstime, $jdur) = split(/ +/, $_);

This above statement will assign the job start time ( 000:15:20 for line 1 of the sample above) to the $jprog variable for lines that contain the null values. What I need is for it to assign an empty string in that case.

What approach can I use to achieve this?

Question by:rhugga
    LVL 12

    Accepted Solution

    Hi rhugga,
    In case your data isn't separated by tabs (doesn't look that way), you can use optional digit fields:

    if($line =~ m/(\d+)\s+(\w+)\s+(\w+)\s+(\d+)\s+(\w+)\s+(\w+)\s+(\d*)\s+(\d*)\s+ ... /){
        my ($num1,$op,$status,$num2,$host,$back_type,$op_num1,$op_num2) = ( $1,$2,$3,$4,$5,$6,$7,$8,...);

    $op_num1, $op_num2 can be empty when there's no data (\d*).

    (The \w character class is insufficient for the host/file name, but it's here just for illustration)


    LVL 1

    Author Comment

    Sweet, that will do what I need. Thanks,
    LVL 1

    Author Comment


    I can't get this to work. Here is a sample data line and my current piece of code. Do you see anything wrong?

    32698 Archive       Done   0                        L5500_Log_966222arch2.full    full    3145760 11302 000:05      adcbkp10 06/01/05 10:45:01

    if ($_ =~ m/(\d+)\s+(\w)\s+(\w)\s+(\d+)\s+([A-Za-z0-9\.]+)\s+(\w)\s+(\d+)\s+(\d+)\s+(\d*)\s+(\w)\s+(\d*)\s+(\d*)/ )
    my ($jobid, $jobtype, $jobstatus, $jobpolicy, $jobsched, $job_progress, $job_perf, $job_start, $job_client, $job_sdate, $job_stime) = ($1
    , $2, $3, $4, $5, $6,$7, $8, $9, $10, $11);

    When I try and print $jobid and $jobtype I get empty strings.

    LVL 84

    Expert Comment

    The match will fail
    (\w) matches a single \w character, but "Archive", "Done", "full" and "adcbkp10" contain multiple \w characters
    Also, [A-Za-z0-9\.] does not match the "_" in L5500_Log_966222arch2.full
    (\d*) does not match the : in 000:05, nor the / in 06/01/05

    my ($jobid, $jobtype, $jobstatus, $jobpolicy, $jobsched, $job_progress, $job_perf, $job_start, $job_client, $job_sdate, $job_stime) = m/(\d+)\s+(\w+)\s+(\w+)\s+(\d+)\s+([A-Za-z0-9_.]+)\s+(\w+)\s+(\d+)\s+(\d+)\s+([[\d:]*)\s+(\w+)\s+([\d\/]*)\s+([\d:]*)/ ){
        print "$jobid, $jobtype, $jobstatus, $jobpolicy, $jobsched, $job_progress, $job_perf, $job_start, $job_client, $job_sdate, $job_stime\n";
       print "no match\n";

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    IT, Stop Being Called Into Every Meeting

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
    Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

    758 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now