extract data from text file

Hi,

I have a large text file from which I need to extract lines as below:
time-range 02293404
 absolute end 23:59 19 September 2009
time-range 10May2010_SR405807
 absolute end 23:59 10 May 2010
time-range 11july2010
 absolute end 23:59 11 July 2010


The final output would be:
scheduler <name> start-date YYYY-MM-DD.HH:MM stop-date YYYY-MM-DD.HH:MM

For start-data we can use static date as: 2009-01-01.00:00
Stop-date would be the end date above.

Eg,
scheduler A start-date 2009-01-01.00:00 stop-date 2009-09-10.23:59
scheduler B start-date 2009-01-01.00:00 stop-date 2010-10-19.23:59
scheduler C start-date 2009-01-01.00:00 stop-date 2010-11-11.23:59

Thanks for all the help in advance.
LVL 32
dpk_walAsked:
Who is Participating?
 
sjklein42Connect With a Mentor Commented:
%mm = ('Jan','01', 'Feb','02', 'Mar','03', 'Apr','04', 'May','05', 'Jun','06', 'Jul','07', 'Aug','08', 'Sep','09', 'Oct','10', 'Nov','11', 'Dec','12');

while ( <> )
{
	s/[\r\n]//g;

	if ( $_ ne '' )
	{
		# time-range 10May2010_SR405807

		if ( ! ( /^time-range / ) ) { die"*** expected time-range not found\n"; }
		$schedName = $';

		#  absolute end 23:59 10 May 2010

		$_ = <>;
		if ( ! ( /^ absolute end ([0-9]+)\:([0-9]+) ([0-9]+) ([a-z]+) ([0-9]+)/i ) )
			{ die"*** expected absolute end not found\n"; }
		($hh, $min, $dd, $mmm, $yyyy) = ($1, $2, $3, $4, $5);

		# scheduler A start-date 2009-01-01.00:00 stop-date 2009-09-10.23:59

		$mm = $mm{substr($mmm,0,3)};
		print "scheduler $schedName start-date 2009-01-01.00:00 stop-date "
			. $yyyy . "-" . $mm . "-" . $dd . "." . $hh . ":" . $min . "\n";
	}
}

Open in new window


C:\temp>perl foo.pl foo.txt
scheduler 02293404 start-date 2009-01-01.00:00 stop-date 2009-09-19.23:59
scheduler 10May2010_SR405807 start-date 2009-01-01.00:00 stop-date 2010-05-10.23:59
scheduler 11july2010 start-date 2009-01-01.00:00 stop-date 2010-07-11.23:59

Open in new window

0
 
wilcoxonCommented:
Where do you get scheduler names from?  Your sample text file lines don't appear to include the info.
0
 
dpk_walAuthor Commented:
Sorry for the typo:
>> time-range 02293404
the word after time-range is the scheduler name; in original post the names are 02293404, 10May2010_SR405807, 11july2010.

Regards.
0
 
dpk_walAuthor Commented:
Thank you!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.