• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 441
  • Last Modified:

Parsing Help Needed

I am writing an interface to Netbackup's bpdbjobs command (by reading from a pipe). I want to tokenize each line into subcomponents, however, it is possible that some fields in the middle could be null at any given iteration: Here is a sample input:

388347         Backup   Done  73          hostx-a_fs_u01              incr                              000:15:20     hostx    05/09/05   19:40:37   000:56:02
388349         Backup   Done   0           dbs050_hosty_hot.full full   9786112  5942 000:37:40     hosty    05/09/05   19:25:17   000:37:43

As you can see the integer fields in the middle can be blank, however, I still want to tokenize this with those fields being empty strings. (if I split using (split / +/, $line) I will not get the behavior I desire.

my( $jid, $jtype, $jstat, $jcode, $jpol, $jsched, $jprog, $jperf, $jstart, $jclnt, $jsdate, $jstime, $jdur) = split(/ +/, $_);

This above statement will assign the job start time ( 000:15:20 for line 1 of the sample above) to the $jprog variable for lines that contain the null values. What I need is for it to assign an empty string in that case.

What approach can I use to achieve this?

  • 2
1 Solution
Hi rhugga,
In case your data isn't separated by tabs (doesn't look that way), you can use optional digit fields:

if($line =~ m/(\d+)\s+(\w+)\s+(\w+)\s+(\d+)\s+(\w+)\s+(\w+)\s+(\d*)\s+(\d*)\s+ ... /){
    my ($num1,$op,$status,$num2,$host,$back_type,$op_num1,$op_num2) = ( $1,$2,$3,$4,$5,$6,$7,$8,...);

$op_num1, $op_num2 can be empty when there's no data (\d*).

(The \w character class is insufficient for the host/file name, but it's here just for illustration)


rhuggaAuthor Commented:
Sweet, that will do what I need. Thanks,
rhuggaAuthor Commented:

I can't get this to work. Here is a sample data line and my current piece of code. Do you see anything wrong?

32698 Archive       Done   0                        L5500_Log_966222arch2.full    full    3145760 11302 000:05      adcbkp10 06/01/05 10:45:01

if ($_ =~ m/(\d+)\s+(\w)\s+(\w)\s+(\d+)\s+([A-Za-z0-9\.]+)\s+(\w)\s+(\d+)\s+(\d+)\s+(\d*)\s+(\w)\s+(\d*)\s+(\d*)/ )
my ($jobid, $jobtype, $jobstatus, $jobpolicy, $jobsched, $job_progress, $job_perf, $job_start, $job_client, $job_sdate, $job_stime) = ($1
, $2, $3, $4, $5, $6,$7, $8, $9, $10, $11);

When I try and print $jobid and $jobtype I get empty strings.

The match will fail
(\w) matches a single \w character, but "Archive", "Done", "full" and "adcbkp10" contain multiple \w characters
Also, [A-Za-z0-9\.] does not match the "_" in L5500_Log_966222arch2.full
(\d*) does not match the : in 000:05, nor the / in 06/01/05

my ($jobid, $jobtype, $jobstatus, $jobpolicy, $jobsched, $job_progress, $job_perf, $job_start, $job_client, $job_sdate, $job_stime) = m/(\d+)\s+(\w+)\s+(\w+)\s+(\d+)\s+([A-Za-z0-9_.]+)\s+(\w+)\s+(\d+)\s+(\d+)\s+([[\d:]*)\s+(\w+)\s+([\d\/]*)\s+([\d:]*)/ ){
    print "$jobid, $jobtype, $jobstatus, $jobpolicy, $jobsched, $job_progress, $job_perf, $job_start, $job_client, $job_sdate, $job_stime\n";
   print "no match\n";

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now