More RegEx help - splitting a large block of text at a certain point

Tom Knowlton
Tom Knowlton used Ask the Experts™
on
Consider the following snippet

		FL_FILL_NO_DATA=NO
		FL_RIGHT_JUSTIFY=NO
		FL_BOUNDARY=NO
		FL_UPDATE=S
		FL_VERIFY=K
		FL_MUST_ENTER=NO
		FL_MUST_COMPLETE=NO
		FL_INDEX_FIELD=NO
		FL_REGISTER=NO
		FL_DESKEW=NO
		FL_RESERVE_PUNC=NO
		FL_FIELD_LIST=NO
		FL_BB_NUMBER=0
		FL_CD_NUMBER=0
		FL_CD_MULTI=NO
		FL_VT_NUMBER=0
		FL_VT_INVALID=NO
		FL_NONDISP=NO
		FL_TAB_STOP=NO
		FL_RASCY=NO
		FL_FASCY=NO
		FL_COND_LINK=NO
		FL_ORIGINX=-1
		FL_ORIGINY=-1
		FL_TEXT_BUCKET=
{
FW_BM_URL_HEIGHT=0

FW_BM_URL_WIDTH=0

FW_DISABLE_SG=0

FW_FORMWARE_SG=1

FW_SUBMIT_TYPE=0

FW_GEN_FORM_END=0

FW_URL_WIN=0

FW_ST_DEFAULT=0

FW_MS_MINIMUM=0

FW_MS_MAXIMUM=1


}
		FL_EDIT_TYPE=0
		FL_ENGINE_USE=18
		FL_OCR_OFFSET=0
		FL_OCR_LENGTH=0
		FL_BLACK=0
		FL_WHITE=0
		FL_RVERIFY=YES
		FL_FEEDIT=NO
		FL_FSEDIT=NO
		FL_EXPORT_POS=0
		FL_PASSWORD=NO
		FL_NONKEY=NO
		FL_TEXT_FIELD=NO
		FL_CREATE_FULL_PAGE_FILE=NO
	}
	FIELD=
	{
		FL_NAME=po_number_line_1
		FL_TYPE=ANY
		FL_LENGTH=50
		FL_ROW=21
		FL_COL=10
		FL_ZONETYPE=KEY
		FL_ZONEX=-1
		FL_ZONEY=-1
		FL_ZONEH=0
		FL_ZONEW=0
		FL_ZOOM=100
		FL_BKCL=16777215
		FL_TEXTCL=0
		FL_OCR_FONT=REG
		FL_OCR_READ_LENGTH=FIXED
		FL_OCR_READ_FORMAT=WORD
		FL_OCR_CONFIDENCE=50
		FL_OCR_ERRORS_ALLOWED=99
		FL_OCR_CHARS_INCH=0
		FL_OCR_DOTS_INCH=0
		FL_LOAD_NEXT_IMAGE=NO

Open in new window




This snippet above is just a small part of the file.  There is tons of text on either side.

But this part:


FIELD=
      {
            FL_NAME=po_number_line_1


has some special features that I want to take advantage of in order to SPLIT the file.


This is the first location where FL_NAME begins using the "_<<integer>>" pattern, in this case it is "_1" because it is the first FL_NAME entry to use this formatting.


Where I want to split the file is at the instance of  "FIELD=" that immediately precedes
the FIRST occurrence of the "_1" naming convention.

So, the first part of the file would end right before "FIELD="

The second part of the file would begin at the 'F' in "FIELD="

Does that make sense?


Caution is warranted because there are instances of both FIELD= and FL_NAME that precede this particular point in the file, but none of those instances have FL_NAME entries that end with "underscore integer" naming.  I just want to find the first area where this happens and split the file as I described.


Many thanks for your speed.

( please provide working code in C#".  If you can split my little snippet in 2 parts, I think it will work on the larger file as well, so you can use that for your test )


Tom
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2011
Commented:
Please try:

var re = new Regex(@"(?=FIELD=\s+{\s+FL_NAME=\w+_\d+)");

var test = File.ReadAllText(@"C:\Temp\test.txt");

string[] filesplit = re.Split(test);

Open in new window

Tom KnowltonWeb developer

Author

Commented:
This works, thanks!


Question:

filesplit[0] has the first part of the file.

Is there a way to put all of the rest of the file into

filesplit[1]

???

Right now filesplit has 101 strings.  I just want it to have 2 elements; the first part and the remainder of the file.
Tom KnowltonWeb developer

Author

Commented:
Great job!
Top Expert 2011

Commented:
Yes, just add the number of elements to the Split call as follows:

var re = new Regex(@"(?=FIELD=\s+{\s+FL_NAME=\w+_\d+)");

var test = File.ReadAllText(@"C:\Temp\test.txt");

string[] filesplit = re.Split(test, 2);

Open in new window

Tom KnowltonWeb developer

Author

Commented:
That did it.

Thanks!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial