Link to home
Start Free TrialLog in
Avatar of Tom Knowlton
Tom KnowltonFlag for United States of America

asked on

More RegEx help - splitting a large block of text at a certain point

Consider the following snippet

		FL_FILL_NO_DATA=NO
		FL_RIGHT_JUSTIFY=NO
		FL_BOUNDARY=NO
		FL_UPDATE=S
		FL_VERIFY=K
		FL_MUST_ENTER=NO
		FL_MUST_COMPLETE=NO
		FL_INDEX_FIELD=NO
		FL_REGISTER=NO
		FL_DESKEW=NO
		FL_RESERVE_PUNC=NO
		FL_FIELD_LIST=NO
		FL_BB_NUMBER=0
		FL_CD_NUMBER=0
		FL_CD_MULTI=NO
		FL_VT_NUMBER=0
		FL_VT_INVALID=NO
		FL_NONDISP=NO
		FL_TAB_STOP=NO
		FL_RASCY=NO
		FL_FASCY=NO
		FL_COND_LINK=NO
		FL_ORIGINX=-1
		FL_ORIGINY=-1
		FL_TEXT_BUCKET=
{
FW_BM_URL_HEIGHT=0

FW_BM_URL_WIDTH=0

FW_DISABLE_SG=0

FW_FORMWARE_SG=1

FW_SUBMIT_TYPE=0

FW_GEN_FORM_END=0

FW_URL_WIN=0

FW_ST_DEFAULT=0

FW_MS_MINIMUM=0

FW_MS_MAXIMUM=1


}
		FL_EDIT_TYPE=0
		FL_ENGINE_USE=18
		FL_OCR_OFFSET=0
		FL_OCR_LENGTH=0
		FL_BLACK=0
		FL_WHITE=0
		FL_RVERIFY=YES
		FL_FEEDIT=NO
		FL_FSEDIT=NO
		FL_EXPORT_POS=0
		FL_PASSWORD=NO
		FL_NONKEY=NO
		FL_TEXT_FIELD=NO
		FL_CREATE_FULL_PAGE_FILE=NO
	}
	FIELD=
	{
		FL_NAME=po_number_line_1
		FL_TYPE=ANY
		FL_LENGTH=50
		FL_ROW=21
		FL_COL=10
		FL_ZONETYPE=KEY
		FL_ZONEX=-1
		FL_ZONEY=-1
		FL_ZONEH=0
		FL_ZONEW=0
		FL_ZOOM=100
		FL_BKCL=16777215
		FL_TEXTCL=0
		FL_OCR_FONT=REG
		FL_OCR_READ_LENGTH=FIXED
		FL_OCR_READ_FORMAT=WORD
		FL_OCR_CONFIDENCE=50
		FL_OCR_ERRORS_ALLOWED=99
		FL_OCR_CHARS_INCH=0
		FL_OCR_DOTS_INCH=0
		FL_LOAD_NEXT_IMAGE=NO

Open in new window




This snippet above is just a small part of the file.  There is tons of text on either side.

But this part:


FIELD=
      {
            FL_NAME=po_number_line_1


has some special features that I want to take advantage of in order to SPLIT the file.


This is the first location where FL_NAME begins using the "_<<integer>>" pattern, in this case it is "_1" because it is the first FL_NAME entry to use this formatting.


Where I want to split the file is at the instance of  "FIELD=" that immediately precedes
the FIRST occurrence of the "_1" naming convention.

So, the first part of the file would end right before "FIELD="

The second part of the file would begin at the 'F' in "FIELD="

Does that make sense?


Caution is warranted because there are instances of both FIELD= and FL_NAME that precede this particular point in the file, but none of those instances have FL_NAME entries that end with "underscore integer" naming.  I just want to find the first area where this happens and split the file as I described.


Many thanks for your speed.

( please provide working code in C#".  If you can split my little snippet in 2 parts, I think it will work on the larger file as well, so you can use that for your test )


Tom
ASKER CERTIFIED SOLUTION
Avatar of wdosanjos
wdosanjos
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tom Knowlton

ASKER

This works, thanks!


Question:

filesplit[0] has the first part of the file.

Is there a way to put all of the rest of the file into

filesplit[1]

???

Right now filesplit has 101 strings.  I just want it to have 2 elements; the first part and the remainder of the file.
Great job!
Yes, just add the number of elements to the Split call as follows:

var re = new Regex(@"(?=FIELD=\s+{\s+FL_NAME=\w+_\d+)");

var test = File.ReadAllText(@"C:\Temp\test.txt");

string[] filesplit = re.Split(test, 2);

Open in new window

That did it.

Thanks!