perl extract "subfields" from text string


I have a file with contents as follows:

000000001 ZZZ  $$aTextinA$$bTextinB$$cTextinC
There will be several of these, but have just included one for clarity.

For each line, I'm wanting to split the data into variables/fields using a perl script.  Each field is delimited by "$$".  So, for example, from the above I'm wanting to end up with:

$subfield_a = "TextinA"
$subfield_b = "TextinB"
$subfield_c = "TextinC"

Equally, if, say, the line was as follows:

000000001 ZZZ  $$aTextinA$$cTextinC
I would want to end up with:

$subfield_a = "TextinA"
$subfield_b = ""
$subfield_c = "TextinC"
i.e. if a field is not present (in this case $$b)
I'm currently doing that as follows (while looping round the file)

@fields = split ('\$\$',$line);
	@suba = grep {/^a/} @fields;
        if ( $suba[0] =~ /^a(.*)/ ) {
          $subfield_a = $1;
        } else {
          $subfield_a = "";
        @subb = grep {/^b/} @fields;
        if ( $subb[0] =~ /^b(.*)/ ) {
          $subfield_b = $1;
        } else {
          $subfield_b = "";
	@subc = grep {/^c/} @fields;
        if ( $subc[0] =~ /^c(.*)/ ) {
          $subfield_c = $1;
        } else {
          $subfield_c = "";

Open in new window

But I'm guessing there is a far more efficient way to be achievening this?
Who is Participating?
wilcoxonConnect With a Mentor Commented:
Try this...
my @tfields = split '\$\$', $line;
shift @tfields; # get rid of "header"
my %fields = (map { m{(\w)(.*)}; $1 => $2 } @tfields);

Open in new window

The %fields hash now looks like this for your two examples:
a => TextInA
b => TextInB
c => TextInC

a => TextInA
c => TextInC

Let me know if you have any questions...
yelbowAuthor Commented:
Perfect, an awful lot cleaner - thanks so much
my %fields = $line=~/\$\$(\w)([^\$\n]*)/g;

Open in new window

Or, if you really want the variables in $subfield_a,  $subfield_b, $subfield_c:
  ${"subfield_$1"}=$2 while $line=~/\$\$(.)([^\$\n]*)/g;
but I would not recommend that method.
better might be
 ${${{a=>\$subfield_a,b=>\$subfield_b,c=>\$subfield_c}}{$1}}=$2 while $line=~/\$\$(.)([^\$\n]*)/g;
better still would be to use %fields instead of $subfield_a,  $subfield_b, $subfield_c
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.