[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

use perl to parse text file

Posted on 2008-10-06
15
Medium Priority
?
752 Views
Last Modified: 2012-05-05
Hi,

Given the input listed below (two entries from a text file called "input.txt" that contains 130 such entries), I would like to use perl to create an output file called "output.txt" with one line per entry formatted as follows:

L802; $413,400.00; 4335 Hastings Drive, Cumming, GA 30041
L803; $150,000.00; 1225 Lanier Place, Cumming, GA 30041

Thanks,
Rick




ADNUM: L802
NOTICE OF SALE UNDER POWER
GEORGIA, FORSYTH COUNTY
By virtue of a Power of Sale contained in that certain Security Deed from Soo Han Cha and Yong Deok Choi to Mortgage Electronic Registration Systems, Inc., acting solely as nominee for Ryland Mortgage Company, an Ohio Corporation, dated November 21, 2006, recorded December 18, 2006, in Deed Book 4564, Page 60-83, Forsyth County, Georgia Records, said Security Deed having been given to secure a Note of even date in the original principal amount of Four Hundred Thirteen Thousand Four Hundred and 00/100 dollars ($413,400.00), with interest thereon as provided for therein, said Security Deed having been last sold, assigned and transferred to Mortgage Electronic Registration Systems, Inc., there will be sold at public outcry to the highest bidder for cash before the courthouse door of Forsyth County, Georgia, within the legal hours of sale on the first Tuesday in October, 2008, all property described in said Security Deed including but not limited to the following described property:
ALL THAT TRACT OR PARCEL OF LAND LYING AND BEING IN LAND LOT 785 OF THE 2ND DISTRICT, 1ST SECTION, FORSYTH COUNTY, GEORGIA, BEING LOT 19, JAMES CREEK SUBDIVISION, POD B-1, AS PER PLAT RECORDED IN PLAT BOOK 89, PAGES 56-62, FORSYTH COUNTY, GEORGIA RECORDS, SAID PLAT BEING INCORPORATED HEREIN AND MADE A PART HEREOF BY REFERENCE.
Said property is commonly known as 4335 Hastings Drive, Cumming, GA 30041.
The indebtedness secured by said Security Deed has been and is hereby declared due because of default under the terms of said Security Deed and Note, including but not limited to the nonpayment of the indebtedness as and when due. The indebtedness remaining in default, this sale will be made for the purpose of paying the same, all expenses of the sale, including attorneys' fees and all other payments provided for under the terms of the Security Deed and Note.
Said property will be sold subject to the following items which may affect the title to said property: all zoning ordinances; matters which would be disclosed by an accurate survey or by an inspection of the property; any outstanding taxes, including but not limited to ad valorem taxes, which constitute liens upon said property; special assessments; all outstanding bills for public utilities which constitute liens upon said property; all restrictive covenants, easements, rights-of-way and any other matters of record superior to said Security Deed. To the best of the knowledge and belief of the undersigned, the party in possession of the property is Soo Han Cha and Yong Deok Choi or tenant(s).
The sale will be conducted subject (1) to confirmation that the sale is not prohibited under the U.S. Bankruptcy Code and (2) to final confirmation and audit of the status of the loan with the holder of the Security Deed. THE ABOVE LAW FIRM IS ACTING AS A DEBT COLLECTOR. ANY INFORMATION OBTAINED WILL BE USED FOR THAT PURPOSE.
MORTGAGE ELECTRONIC REGISTRATION SYSTEMS, INC. as Attorney in Fact for SOO HAN CHA AND YONG DEOK CHOI
Lender Contact: COUNTRYWIDE, Loss Mitigation Dept., 7105 Corporate Drive, PTX-A-274, Plano, TX 75024
TELEPHONE NUMBER: 800-669-6087
Attorney Contact: Adorno & Yoss LLC, Two Midtown Plaza, 1349 West Peachtree Suite 1500, Atlanta, GA 30309
TELEPHONE NUMBER: (888) 890-5309 ADORNO FILE NO. 215400.4043
WWW.ADORNO.COM/ATLDOCS/SALES.HTML
L802 9/10, 17, 24, 10/1

ADNUM: L803
NOTICE OF SALE UNDER POWER
GEORGIA, FORSYTH COUNTY
By virtue of a Power of Sale contained in that certain Security Deed from Melissa E. Fyfe to Home Capital Inc., dated July 3, 2003, recorded August 18, 2003, in Deed Book 2964, Page 439-452, Forsyth County, Georgia Records, said Security Deed having been given to secure a Note of even date in the original principal amount of One Hundred Fifty Thousand and 00/100 dollars ($150,000.00), with interest thereon as provided for therein, said Security Deed having been last sold, assigned and transferred to Countrywide Home Loans, Inc, there will be sold at public outcry to the highest bidder for cash before the courthouse door of Forsyth County, Georgia, within the legal hours of sale on the first Tuesday in October, 2008, all property described in said Security Deed including but not limited to the following described property:
ALL THAT TRACT OR PARCEL OF LAND LYING AND BEING IN LAND LOTS 76 AND 77, 2ND DISTRICT, 1ST SECTION, FORSYTH COUNTY, GEORGIA AND BEING A 1.2923 ACRE TRACT AS SHOWN ON SURVEY DATED MARCH 23, 1994, PREPARED BY MARK A. BUCKNER, GEORGIA REGISTERED LAND SURVEYOR #2422, AND BEING MORE PARTICULARLY, DESCRIBED AS FOLLOWS:
TO FIND THE POINT OF BEGINNING: BEGIN AT THE COMMON CORNERS OF LAND LOTS 76, 77, 140 AND 141; RUNNING THENCE NORTH 01 DEGREES 00 MINUTES 00 SECONDS EAST A DISTANCE OF 150 FEET TO AN IRON PIN; RUNNING THENCE SOUTH 89 DEGREES 00 MINUTES 00 SECONDS EAST AT A DISTANCE OF 150 FEET TO AN IRON PIN; RUNNING THENCE NORTH 01 DEGREES 00 MINUTES 00 SECONDS EAST A DISTANCE OF 50.32 FEET TO AN IRON PIN; RUNNING THENCE NORTH 00 DEGREES 59 MINUTES 23 SECONDS EAST A DISTANCE OF 50 FEET TO AN IRON IN AND THE TRUE POINT OF BEGINNING; RUNNING THENCE NORTH 89 DEGREES 02 MINUTES 19 SECONDS WEST A DISTANCE OF 192.51 FEET TO AN IRON PIN LOCATED ON THE EAST RIGHT OF WAY OF LANIER DRIVE (40 FOOT RIGHT OF WAY); RUNNING THENCE NORTH 00 DEGREES 22 MINUTES 28 SECONDS EAST ALONG THE EAST RIGHT OF WAY OF LANIER DRIVE A DISTANCE OF 186.08 FEET TO AN IRON PIN; RUNNING THENCE SOUTH 88 DEGREES 58 MINUTES 00 SECONDS EAST A DISTANCE OF 342.95 FEET TO AN IRON PIN; RUNNING THENCE SOUTH 00 DEGREES 23 MINUTES 06 SECONDS WEST A DISTANCE OF 136.18 FEET TO AN IRON PIN; RUNNING THENCE NORTH 88 DEGREES 57 MINUTES 18 SECONDS WEST A DISTANCE OF 149.88 FEET TO AN IRON PIN; RUNNING THENCE SOUTH 00 DEGREES 59 MINUTES 23 SECONDS WEST A DISTANCE OF 49.68 FEET TO AN IRON PIN AND THE POINT OF BEGINNING.
Said property is commonly known as 1225 Lanier Place, Cumming, GA 30041.
The indebtedness secured by said Security Deed has been and is hereby declared due because of default under the terms of said Security Deed and Note, including but not limited to the nonpayment of the indebtedness as and when due. The indebtedness remaining in default, this sale will be made for the purpose of paying the same, all expenses of the sale, including attorneys' fees and all other payments provided for under the terms of the Security Deed and Note.
Said property will be sold subject to the following items which may affect the title to said property: all zoning ordinances; matters which would be disclosed by an accurate survey or by an inspection of the property; any outstanding taxes, including but not limited to ad valorem taxes, which constitute liens upon said property; special assessments; all outstanding bills for public utilities which constitute liens upon said property; all restrictive covenants, easements, rights-of-way and any other matters of record superior to said Security Deed. To the best of the knowledge and belief of the undersigned, the party in possession of the property is Melissa E. Fyfe or tenant(s).
The sale will be conducted subject (1) to confirmation that the sale is not prohibited under the U.S. Bankruptcy Code and (2) to final confirmation and audit of the status of the loan with the holder of the Security Deed. THE ABOVE LAW FIRM IS ACTING AS A DEBT COLLECTOR. ANY INFORMATION OBTAINED WILL BE USED FOR THAT PURPOSE.
COUNTRYWIDE HOME LOANS, INC as Attorney in Fact for MELISSA E. FYFE
Lender Contact: COUNTRYWIDE, Loss Mitigation Dept., 7105 Corporate Drive, PTX-A-274, Plano, TX 75024
TELEPHONE NUMBER: 800-669-6087
Attorney Contact: Adorno & Yoss LLC, Two Midtown Plaza, 1349 West Peachtree Suite 1500, Atlanta, GA 30309
TELEPHONE NUMBER: (888) 890-5309 ADORNO FILE NO. 214500.3127
WWW.ADORNO.COM/ATLDOCS/SALES.HTML
L803 9/10, 17, 24, 10/1
0
Comment
Question by:rickmatt
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 6
15 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 22657011
perl -00ne '/ADNUM:\s*(\w+)[\s\S]*?\ property is commonly known as (.+)/ and print "$1: $2 $3\n"' input.txtjavascript:void(toggleRichText('textBottom',1,'/Programming/Languages/Scripting/Perl/Q_23791300.html%23notices'))
0
 
LVL 84

Expert Comment

by:ozo
ID: 22657016
perl -00ne '/ADNUM:\s*(\w+)[\s\S]*?\((\$[\d,.]+)\)[\s\S]*?Said property is commonly known as (.+)/ and print "$1: $2 $3\n"' input.txt
0
 

Author Comment

by:rickmatt
ID: 22657865
ozo,

Thanks, that's a great regex.  I get a good result on the first record in the file, but then it stops.  Can you help me make it loop?

Thanks,
Rick
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 39

Expert Comment

by:Adam314
ID: 22671252
When I run ozo's second post on the file, it looks like it returns both records.  Did you want something different?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22672133
In your example, it looks like records are separated by a blank line.
Is that true of the file you ran it on?
0
 

Author Comment

by:rickmatt
ID: 22673514
I'm running it on a Ubuntu linux box.
The input file has about 150 records in it, each record is separated by one blank line.
The sample input that I provided was copied into the question text field.

When I run the one liner, I get the following output:

L802: $413,400.00 4335 Hastings Drive, Cumming, GA 30041.

If I change the argument from -00ne to -00pe, then I get the same output, one line, followed by the entire contents of the input file, including the processed line (if that helps at all).

I was going to try to convert the one liner, but regexes are such deep magic to me I wasn't even sure how to chop it up.

Thanks,
Rick
0
 
LVL 84

Expert Comment

by:ozo
ID: 22673780
As an experiment, what do you see with
perl -00pe 'BEGIN{$\="-"'x40}' input.txt > cat -ve
0
 
LVL 84

Expert Comment

by:ozo
ID: 22673945
perl -00pe 'BEGIN{$\="-"'x40}' input.txt | cat -ve
0
 

Author Comment

by:rickmatt
ID: 22673962
I'm attaching my input file, but here's what I get when I run the new script:

$ perl -00pe 'BEGIN{$\="-"'x40}' input.txt > cat -ve
Bareword found where operator expected at -e line 2, near "00pe"
        (Missing operator before pe?)
Bareword found where operator expected at -e line 2, near "pe BEGIN"
        (Do you need to predeclare pe?)
syntax error at -e line 2, near "linux:"
syntax error at -e line 2, near ";}"
Execution of -e aborted due to compilation errors.

input.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 22674003

perl -00pe 'BEGIN{$\="-"x40}' input.txt | cat -ve
0
 
LVL 84

Expert Comment

by:ozo
ID: 22674024
I's not an empty line, it has a "\r\n"
0
 

Author Comment

by:rickmatt
ID: 22674026
It looks like the whole file prints back with ^M$ at the end of each line, like below:

FORECLOSURES^M$
^M$
ADNUM: L802^M$
NOTICE OF SALE UNDER POWER^M$
GEORGIA, FORSYTH COUNTY^M$

By the way, I'm running perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
0
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 22674170
PERLIO=crlf perl  -00ne '/ADNUM:\s*(\w+)[\s\S]*?\((\$[\d,.]+)\)[\s\S]*?Said property is commonly known as (.+)/ and print "$1: $2 $3\n"' input.txt
0
 

Author Comment

by:rickmatt
ID: 22674339
That works!  The statement as provided gave me about 8 of 150 results.  I looked at the records again and found that the data wasn't as consistent as I thought (It's a text grab from an online legal notice).

When I changed "Said property is commonly known as" to "known as" that number jumped to 84 of 150.

Scanning further, I see that many records do not have the dollar amount in parentheses.  Eliminating the "\(' and "\)" yielded 110 results.

I could go on, but you probably get the picture.

Anyway, thanks, thanks, thanks!  I really respect your skill with regex.

Rick
0
 

Author Closing Comment

by:rickmatt
ID: 31503527
Solution worked perfectly when the data was consistent.  I saw how many answers ozo had provided and I kind of expected that he would provide the solution for this question.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question