Tom Knowlton
asked on
More RegEx help - splitting a large block of text at a certain point
Consider the following snippet
This snippet above is just a small part of the file. There is tons of text on either side.
But this part:
FIELD=
{
FL_NAME=po_number_line_1
has some special features that I want to take advantage of in order to SPLIT the file.
This is the first location where FL_NAME begins using the "_<<integer>>" pattern, in this case it is "_1" because it is the first FL_NAME entry to use this formatting.
Where I want to split the file is at the instance of "FIELD=" that immediately precedes
the FIRST occurrence of the "_1" naming convention.
So, the first part of the file would end right before "FIELD="
The second part of the file would begin at the 'F' in "FIELD="
Does that make sense?
Caution is warranted because there are instances of both FIELD= and FL_NAME that precede this particular point in the file, but none of those instances have FL_NAME entries that end with "underscore integer" naming. I just want to find the first area where this happens and split the file as I described.
Many thanks for your speed.
( please provide working code in C#". If you can split my little snippet in 2 parts, I think it will work on the larger file as well, so you can use that for your test )
Tom
FL_FILL_NO_DATA=NO
FL_RIGHT_JUSTIFY=NO
FL_BOUNDARY=NO
FL_UPDATE=S
FL_VERIFY=K
FL_MUST_ENTER=NO
FL_MUST_COMPLETE=NO
FL_INDEX_FIELD=NO
FL_REGISTER=NO
FL_DESKEW=NO
FL_RESERVE_PUNC=NO
FL_FIELD_LIST=NO
FL_BB_NUMBER=0
FL_CD_NUMBER=0
FL_CD_MULTI=NO
FL_VT_NUMBER=0
FL_VT_INVALID=NO
FL_NONDISP=NO
FL_TAB_STOP=NO
FL_RASCY=NO
FL_FASCY=NO
FL_COND_LINK=NO
FL_ORIGINX=-1
FL_ORIGINY=-1
FL_TEXT_BUCKET=
{
FW_BM_URL_HEIGHT=0
FW_BM_URL_WIDTH=0
FW_DISABLE_SG=0
FW_FORMWARE_SG=1
FW_SUBMIT_TYPE=0
FW_GEN_FORM_END=0
FW_URL_WIN=0
FW_ST_DEFAULT=0
FW_MS_MINIMUM=0
FW_MS_MAXIMUM=1
}
FL_EDIT_TYPE=0
FL_ENGINE_USE=18
FL_OCR_OFFSET=0
FL_OCR_LENGTH=0
FL_BLACK=0
FL_WHITE=0
FL_RVERIFY=YES
FL_FEEDIT=NO
FL_FSEDIT=NO
FL_EXPORT_POS=0
FL_PASSWORD=NO
FL_NONKEY=NO
FL_TEXT_FIELD=NO
FL_CREATE_FULL_PAGE_FILE=NO
}
FIELD=
{
FL_NAME=po_number_line_1
FL_TYPE=ANY
FL_LENGTH=50
FL_ROW=21
FL_COL=10
FL_ZONETYPE=KEY
FL_ZONEX=-1
FL_ZONEY=-1
FL_ZONEH=0
FL_ZONEW=0
FL_ZOOM=100
FL_BKCL=16777215
FL_TEXTCL=0
FL_OCR_FONT=REG
FL_OCR_READ_LENGTH=FIXED
FL_OCR_READ_FORMAT=WORD
FL_OCR_CONFIDENCE=50
FL_OCR_ERRORS_ALLOWED=99
FL_OCR_CHARS_INCH=0
FL_OCR_DOTS_INCH=0
FL_LOAD_NEXT_IMAGE=NO
This snippet above is just a small part of the file. There is tons of text on either side.
But this part:
FIELD=
{
FL_NAME=po_number_line_1
has some special features that I want to take advantage of in order to SPLIT the file.
This is the first location where FL_NAME begins using the "_<<integer>>" pattern, in this case it is "_1" because it is the first FL_NAME entry to use this formatting.
Where I want to split the file is at the instance of "FIELD=" that immediately precedes
the FIRST occurrence of the "_1" naming convention.
So, the first part of the file would end right before "FIELD="
The second part of the file would begin at the 'F' in "FIELD="
Does that make sense?
Caution is warranted because there are instances of both FIELD= and FL_NAME that precede this particular point in the file, but none of those instances have FL_NAME entries that end with "underscore integer" naming. I just want to find the first area where this happens and split the file as I described.
Many thanks for your speed.
( please provide working code in C#". If you can split my little snippet in 2 parts, I think it will work on the larger file as well, so you can use that for your test )
Tom
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Great job!
Yes, just add the number of elements to the Split call as follows:
var re = new Regex(@"(?=FIELD=\s+{\s+FL_NAME=\w+_\d+)");
var test = File.ReadAllText(@"C:\Temp\test.txt");
string[] filesplit = re.Split(test, 2);
ASKER
That did it.
Thanks!
Thanks!
ASKER
Question:
filesplit[0] has the first part of the file.
Is there a way to put all of the rest of the file into
filesplit[1]
???
Right now filesplit has 101 strings. I just want it to have 2 elements; the first part and the remainder of the file.