asked on

More RegEx help - splitting a large block of text at a certain point

Consider the following snippet

		FL_FILL_NO_DATA=NO
		FL_RIGHT_JUSTIFY=NO
		FL_BOUNDARY=NO
		FL_UPDATE=S
		FL_VERIFY=K
		FL_MUST_ENTER=NO
		FL_MUST_COMPLETE=NO
		FL_INDEX_FIELD=NO
		FL_REGISTER=NO
		FL_DESKEW=NO
		FL_RESERVE_PUNC=NO
		FL_FIELD_LIST=NO
		FL_BB_NUMBER=0
		FL_CD_NUMBER=0
		FL_CD_MULTI=NO
		FL_VT_NUMBER=0
		FL_VT_INVALID=NO
		FL_NONDISP=NO
		FL_TAB_STOP=NO
		FL_RASCY=NO
		FL_FASCY=NO
		FL_COND_LINK=NO
		FL_ORIGINX=-1
		FL_ORIGINY=-1
		FL_TEXT_BUCKET=
{
FW_BM_URL_HEIGHT=0

FW_BM_URL_WIDTH=0

FW_DISABLE_SG=0

FW_FORMWARE_SG=1

FW_SUBMIT_TYPE=0

FW_GEN_FORM_END=0

FW_URL_WIN=0

FW_ST_DEFAULT=0

FW_MS_MINIMUM=0

FW_MS_MAXIMUM=1


}
		FL_EDIT_TYPE=0
		FL_ENGINE_USE=18
		FL_OCR_OFFSET=0
		FL_OCR_LENGTH=0
		FL_BLACK=0
		FL_WHITE=0
		FL_RVERIFY=YES
		FL_FEEDIT=NO
		FL_FSEDIT=NO
		FL_EXPORT_POS=0
		FL_PASSWORD=NO
		FL_NONKEY=NO
		FL_TEXT_FIELD=NO
		FL_CREATE_FULL_PAGE_FILE=NO
	}
	FIELD=
	{
		FL_NAME=po_number_line_1
		FL_TYPE=ANY
		FL_LENGTH=50
		FL_ROW=21
		FL_COL=10
		FL_ZONETYPE=KEY
		FL_ZONEX=-1
		FL_ZONEY=-1
		FL_ZONEH=0
		FL_ZONEW=0
		FL_ZOOM=100
		FL_BKCL=16777215
		FL_TEXTCL=0
		FL_OCR_FONT=REG
		FL_OCR_READ_LENGTH=FIXED
		FL_OCR_READ_FORMAT=WORD
		FL_OCR_CONFIDENCE=50
		FL_OCR_ERRORS_ALLOWED=99
		FL_OCR_CHARS_INCH=0
		FL_OCR_DOTS_INCH=0
		FL_LOAD_NEXT_IMAGE=NO

Open in new window

This snippet above is just a small part of the file. There is tons of text on either side.

But this part:

FIELD=
{
FL_NAME=po_number_line_1

has some special features that I want to take advantage of in order to SPLIT the file.

This is the first location where FL_NAME begins using the "_<<integer>>" pattern, in this case it is "_1" because it is the first FL_NAME entry to use this formatting.

Where I want to split the file is at the instance of "FIELD=" that immediately precedes
the FIRST occurrence of the "_1" naming convention.

So, the first part of the file would end right before "FIELD="

The second part of the file would begin at the 'F' in "FIELD="

Does that make sense?

Caution is warranted because there are instances of both FIELD= and FL_NAME that precede this particular point in the file, but none of those instances have FL_NAME entries that end with "underscore integer" naming. I just want to find the first area where this happens and split the file as I described.

Many thanks for your speed.

( please provide working code in C#". If you can split my little snippet in 2 parts, I think it will work on the larger file as well, so you can use that for your test )

Tom

ASKER CERTIFIED SOLUTION

wdosanjos

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Tom Knowlton

ASKER

This works, thanks!

Question:

filesplit[0] has the first part of the file.

Is there a way to put all of the rest of the file into

filesplit[1]

???

Right now filesplit has 101 strings. I just want it to have 2 elements; the first part and the remainder of the file.

Tom Knowlton

ASKER

Great job!

wdosanjos

Yes, just add the number of elements to the Split call as follows:

var re = new Regex(@"(?=FIELD=\s+{\s+FL_NAME=\w+_\d+)");

var test = File.ReadAllText(@"C:\Temp\test.txt");

string[] filesplit = re.Split(test, 2);

Open in new window

Tom Knowlton

ASKER

That did it.

Thanks!