Hi I posted a question yesterday and assigned points before realizing that there was an issue with the results. The question is here:
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_23198749.htmlTo restate what I am trying to do here are 2 sample lines of text
Sample 2 lines in the text file (one of about 60000 lines, each of which will be parsed) each one represents a "record":
Line 1) AN 0000001--DT Jnl Article--MT Print^PDF--AU Smith, T.E.--PA JAW--TI The Life and Times of Dr. Water--DE Water Quality^Training^Coliforms
^Water Industry--AB Overall it blah blah blah
Line 2) AN 0000002--DT Jnl Article--MT Print--AU Smith, T.E.--PA STA--TI Water Conservation in Africa--DE Water Quality^Conservation^Water
Industry^Africa--AB There is an abstract here
Line 3) .... etc.
What I need to do is
1) find all records that have PA JAW or PA ST(A|B|C|D|E|F|G)
2) Create an alphabetical list of the terms used in the "DE Water Quality^Training^Coliforms
^Water Industry"
So it will cycle through each line and would not repeat any particular term used again in the list
So in the above record after reading line 1 the list would be
Coliforms
Training
Water Industry
Water Quality
After the second line it would be
Africa
Coliforms
Conservation
Training
Water Industry
Water Quality
3) The -- are actually \x1e but EE wouldn't display it when I copied and pasted it
*** 4) This is where the catch is that I didn't know yesterday. Some of the Descriptors may have a \x1e in it - for example here is one example
"DE Water Quality^Distribution Systems^Metering^Zurich, \x1eSwitzerland^Associatio
ns^Members
hips^Assoc
iation \x1e Management^Strategic Planning\x1eAB Abstract here"
So in the output it made
Association
Management
Switzerland
Zurich,
That is basically what I need to fix in the output. Also there may be extra whitespace surrounding the \x1e so it needs to parse it together as a single space.
The solution I originally accepted was:
while( <> ){
@key{split/\^/,$1}=() if /\x1e(PA JAW|PA ST[A-G])\x1e/ && /\x1eDE\s*([^\x1e]*)/;
}
$\=$/;
print for sort keys %key;
Unfortunately I don't understand it well enough to figure out how to modify it.