• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 350
  • Last Modified:

perl extract "subfields" from text string

Hi,

I have a file with contents as follows:

000000001 ZZZ  $$aTextinA$$bTextinB$$cTextinC
     
There will be several of these, but have just included one for clarity.

For each line, I'm wanting to split the data into variables/fields using a perl script.  Each field is delimited by "$$".  So, for example, from the above I'm wanting to end up with:

$subfield_a = "TextinA"
$subfield_b = "TextinB"
$subfield_c = "TextinC"
           

Equally, if, say, the line was as follows:

000000001 ZZZ  $$aTextinA$$cTextinC
           
I would want to end up with:

$subfield_a = "TextinA"
$subfield_b = ""
$subfield_c = "TextinC"
           
i.e. if a field is not present (in this case $$b)
            
I'm currently doing that as follows (while looping round the file)

@fields = split ('\$\$',$line);
        
	@suba = grep {/^a/} @fields;
        if ( $suba[0] =~ /^a(.*)/ ) {
          $subfield_a = $1;
        } else {
          $subfield_a = "";
        }
        @subb = grep {/^b/} @fields;
        if ( $subb[0] =~ /^b(.*)/ ) {
          $subfield_b = $1;
        } else {
          $subfield_b = "";
        }
	@subc = grep {/^c/} @fields;
        if ( $subc[0] =~ /^c(.*)/ ) {
          $subfield_c = $1;
        } else {
          $subfield_c = "";
        }

Open in new window

           
But I'm guessing there is a far more efficient way to be achievening this?
0
yelbow
Asked:
yelbow
1 Solution
 
wilcoxonCommented:
Try this...
my @tfields = split '\$\$', $line;
shift @tfields; # get rid of "header"
my %fields = (map { m{(\w)(.*)}; $1 => $2 } @tfields);

Open in new window

The %fields hash now looks like this for your two examples:
a => TextInA
b => TextInB
c => TextInC

and
a => TextInA
c => TextInC

Let me know if you have any questions...
0
 
yelbowAuthor Commented:
Perfect, an awful lot cleaner - thanks so much
0
 
ozoCommented:
my %fields = $line=~/\$\$(\w)([^\$\n]*)/g;

Open in new window


Or, if you really want the variables in $subfield_a,  $subfield_b, $subfield_c:
  ${"subfield_$1"}=$2 while $line=~/\$\$(.)([^\$\n]*)/g;
but I would not recommend that method.
better might be
 ${${{a=>\$subfield_a,b=>\$subfield_b,c=>\$subfield_c}}{$1}}=$2 while $line=~/\$\$(.)([^\$\n]*)/g;
better still would be to use %fields instead of $subfield_a,  $subfield_b, $subfield_c
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now