[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Deleting n number of instances of a pattern after x number of them

Posted on 2013-11-06
8
Medium Priority
?
242 Views
Last Modified: 2013-11-08
I have a file that contains data in this form (this is a shorten version)

<product>
<isbn>0000000001</isbn>
<contributor>
<role>A01</role>
<author>Lee, Stan</author>
</contributor>
<contributor>
<role>A01</role>
<author>Steranko, Jim</author>
</contributor>
<contributor>
<role>A01</role>
<author>Adams, Neal</author>
</contributor>
<contributor>
<role>A01</role>
<author>Smith, Barry</author>
</contributor>
</product>

The problem is that sometimes the number of author composites can be as high as 50 - if not more - and I only need to the first 5. Is there a regular expression (since I'm using Perl) to keep a set number of <contributor> ... </contributor> (say 5) and delete the rest?
0
Comment
Question by:hadrons
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
8 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 39628853
$_='
<product>
<isbn>0000000001</isbn>
<contributor>
<role>A01</role>
<author>Lee, Stan</author>
</contributor>
<contributor>
<role>A01</role>
<author>Steranko, Jim</author>
</contributor>
<contributor>
<role>A01</role>
<author>Adams, Neal</author>
</contributor>
<contributor>
<role>A01</role>
<author>Smith, Barry</author>
</contributor>
</product>';
s{(<product>((?!</product>).)*?(<contributor>.*?</contributor>\s*){5})((?!</product>).)*}{$1}sg;
print;
0
 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39629120
Might I suggest a small modification to the regex above?

s{<product>.*?((<contributor>(.+?)</contributor>){1,5}).*?(?<!</product>)}{$1}sig;

Open in new window

0
 

Author Comment

by:hadrons
ID: 39630704
Hi, the regular expression work great when I used it the form Ozo provided (I did include the modification provided by bigdogman,) but I want to integrate this expression in a large script I have (see below) and the substitutions weren't made.  

#!/usr/bin/perl

use strict;
use Encode qw(encode decode);
use File::Copy;


## start process 5
system ("echo processing part 5 of 10: Editing of parsed files \.\.\.");
my @files = glob("Edit19");      
foreach my $file(@files) {
          system ("echo currently processing file: $file");
                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED, '>:encoding(UTF-8)', ($file . "edited.txt") or warn "Cannot open file for write: $!";  

while (<FILE>) {

s{(<product>((?!</product>).)*?(<contributor>.*?</contributor>\s*){1,5})((?!</product>).)*}{$1}sg;


print PARSED;
}
}
close FILE;
close PARSED;
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39630810
My apologies, I gave you the wrong group #. Try this one:

s{<product>(?!<\/product>).*?((<contributor>.*?<\/contributor>\s*){1,5})(?!<\/product>).*}{$1}sig;

Open in new window

0
 

Author Closing Comment

by:hadrons
ID: 39631988
I worked with some files and the pattern match did the substitutes perfectly; there were some problems with the newlines, but that could be due to the engine I'm using
0
 
LVL 84

Expert Comment

by:ozo
ID: 39632810
$/="";
while (<FILE>) {
0
 

Author Comment

by:hadrons
ID: 39634516
Excellent
0
 

Author Comment

by:hadrons
ID: 39634919
It wasn't provided in the example I gave, but the regular expression is deleted additional data after the first 5 contributors to the </product>, so for example:

<product>
<isbn>0000000001</isbn>
<contributor>
<role>A01</role>
<author>Lee, Stan</author>
</contributor>
<contributor>
<role>A01</role>
<author>Steranko, Jim</author>
</contributor>
<contributor>
<role>A01</role>
<author>Adams, Neal</author>
</contributor>
<contributor>
<role>A01</role>
<author>Smith, Barry</author>
</contributor>
<additional>
<additional data>1</additional data>
</additional>
</product>

Would leave just this:

<product>
<isbn>0000000001</isbn>
<contributor>
<role>A01</role>
<author>Lee, Stan</author>
</contributor>
<contributor>
<role>A01</role>
<author>Steranko, Jim</author>
</contributor>
<contributor>
<role>A01</role>
<author>Adams, Neal</author>
</contributor>
<contributor>
<role>A01</role>
<author>Smith, Barry</author>
</contributor>
</product>

With whatever was under it deleted also; is there way to correct this?
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question