Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Perl's split function & regexp

Posted on 2003-03-06
3
Medium Priority
?
198 Views
Last Modified: 2013-12-25
I'm currently working on parsing data, and need to split a large group of text (below is an example) at #.# or #.## or ##.## or (#) or (##).  Sometimes what I need to split it on uses tabs, spaces, or could be right next to a word (or referred to in the text following, but I don't want to split it there).  So my question, how do I use a regular expression in the split function to split at 4.3 for example grab the text up to 4.4 (and make note of the footnote (2).  I've got it grabbing the text up to 4.4 but for some reason it doesn't want to give me the 4.3 since it's splitting on that it's not retaining it in the line.  Here is an example of how I'm splitting it (I'm a regexp newbie, so don't be too harsh):

********************
Example text:

4.3 Dated as of June 22, 2000, by and among the company, as Issuer, and Group, as Guarantor. (2) 4.4 Amended and Restated Rights Agreement, dated as of February 11, 1999, between the company. (5) 4.5 Specimen of Class A common stock certificate. (2) 10.1 Agreement, dated as of May 31, 1990, between the company, and Amendment thereto. (6)*
***************

# I'm splitting the docs into paragraphs, and if the paragraph contains a #.# (etc) I'll grab it then...
if ($paragraph =~ m/([\s\s|\t][\d+][.|-][\d+|A-Za-z][\s\s|\t])/) {

# then I'll push the 'split'lines into an array for further breakdown
my (@newlines) = split(/\s\s\d+[.|-][\d+|A-Za-z]\s\s/, $paragraph);

****************************
From here however I'd receive:

Dated as of June 22, 2000, by and among the company, as Issuer, and Group, as Guarantor. (2)

...as output.

Any help is truly appreciated!
0
Comment
Question by:buzzbuzz
1 Comment
 
LVL 85

Accepted Solution

by:
ozo earned 200 total points
ID: 8084375
$paragraph='4.3 Dated as of June 22, 2000, by and among the company, as Issuer, and Group, as
Guarantor. (2) 4.4 Amended and Restated Rights Agreement, dated as of February 11,
1999, between the company. (5) 4.5 Specimen of Class A common stock certificate. (2)
10.1 Agreement, dated as of May 31, 1990, between the company, and Amendment
thereto. (6)*';
my (@newlines) = $paragraph =~  /(.+?)(?=$|\s+\d+[.|-][\dA-Za-z]+\s+)/gs;
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Measuring Server's processing rate with a simple powershell command. The differences in processing rate also was recorded in different use-cases, when a server in free and busy states.
The viewer will learn how to count occurrences of each item in an array.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Suggested Courses

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question