madacebo
asked on
Regex problem: period not matching newline in s///s?
Here's my perl script:
#!C:/perl/bin/perl.exe -Wpi.orig
s#<RD:Documento>([0-9]{1,4 })( D\.?P\.?R\.? )([0-9]{1,4})(.*Juez.*</TA >)#<RD:Doc umento>$1$ 2$3$4\n<CM >HTMLINSER T:<a href="/Download/DPR4/Sumar ios_Escoli os/Tomo$1/ $1DPR$3-Es colio.pdf" target=centerframe><center ><font face=Arial size=1>Ver Escolio</A></CM>\n#s;
And here's my text file, between but not including "---"s:
---
<RD>Así lo pronunció y manda el Tribunal y lo certifica el señor Secretario General. El Juez Presidente señor Andréu García y el Juez Asociado señor Negrón García inhibidos. El Juez Asociado señor Rebollo López no intervino. <CM>LineFeed</CM>
<RD><CM>LineFeed</CM>
<RD>Francisco R. Agrait Lladó, Secretario General <CM>LineFeed</CM>
<RD><CM>LineFeed</CM>
<RD><PB>
<RD>
<RD:Documento>135 DPR 259 -- In Re: Colton, Villanueva, Miró, Figueroa, Brunet
<RD>
<RD>
<TA:1,1; JU:LF; BR:AL:0.00972222,0.0298611 ,HZ:0.0097 2222,0.029 8611,VT:0. 00972222,0 .0298611>
<RO><CE: HI; VA:CN><JU:CN><CM>HTMLINSER T:<a href="/Download/DPR4/135/1 35_259.DOC " target=centerframe><center ><font face=Arial size=1>Grabar la Decisión</A></CM>
</TA>
<RD>
<TA:2,1.5,3.5; JU:CN; BR:AL:0.00972222,0.0298611 ,HZ:0.0097 2222,0.029 8611,VT:0. 00972222,0 .0298611; SD:255,255,0>
<RO><CE: MR:1; SD:0,0,128><JU:CN><CM>C Ta info</CM>
<BD+><FC:255,255,255><BC:D C>Informac ión del Documento<IT+><FC><BC></CE >
<RO><CE><FT:Arial,SR><PT:8 >Partes:<B D></CE><CE ><FD:Nombr e>In re: Pedro Colton Fontán, Osvaldo Villanueva Díaz, Aurelio Miró Carrión, Angel Figueroa Vivas, Juan E. Brunet Justiniano</FD:Nombre></CE >
<RO><CE><BD+>Fecha:<BD></C E><CE><FD: Fecha><HD+ >3/4/1994< /FD:Fecha> <HD>4 de marzo de 1994<BD+></CE>
<RO><CE>Cita:<BD></CE><CE> <FD:DocID> <HD+> 135DPR259</FD:DocID><HD-> 135 DPR 259</CE>
<RO><CE><BD+><HD>Juez:<BD> <HD-></CE> <CE> <FD:Autor><BD-> Resolución</FD:Autor><BD+> </TA>
<RD>
<RD:Seccion>Opinión
<RD>
<RD:Pagina>Página: 259
<RD>
<RD>CE-86-666<CM>MJSEP</CM >
<HR>
<RD><CM>MJSEPEND</CM>
Conducta Profesional<CM>MJSEP</CM>
<HR>
---
According to this app I downloaded, The Regex Coach, the regex matches a good chunk of the file, specifically from the beginning of the line starting with <RD:Documento> until the end of the line that contains "Juez", but the script does nothing. When I fiddle with the regex to pinpoint the problem, I find that up to the end of the third expression it does match as I expect, but when I add in the fourth, (.*Juez.*</TA>), it stops working. Does it have something to do with the newlines that should be matched by that expression?
Thanks!
#!C:/perl/bin/perl.exe -Wpi.orig
s#<RD:Documento>([0-9]{1,4
And here's my text file, between but not including "---"s:
---
<RD>Así lo pronunció y manda el Tribunal y lo certifica el señor Secretario General. El Juez Presidente señor Andréu García y el Juez Asociado señor Negrón García inhibidos. El Juez Asociado señor Rebollo López no intervino. <CM>LineFeed</CM>
<RD><CM>LineFeed</CM>
<RD>Francisco R. Agrait Lladó, Secretario General <CM>LineFeed</CM>
<RD><CM>LineFeed</CM>
<RD><PB>
<RD>
<RD:Documento>135 DPR 259 -- In Re: Colton, Villanueva, Miró, Figueroa, Brunet
<RD>
<RD>
<TA:1,1; JU:LF; BR:AL:0.00972222,0.0298611
<RO><CE: HI; VA:CN><JU:CN><CM>HTMLINSER
</TA>
<RD>
<TA:2,1.5,3.5; JU:CN; BR:AL:0.00972222,0.0298611
<RO><CE: MR:1; SD:0,0,128><JU:CN><CM>C Ta info</CM>
<BD+><FC:255,255,255><BC:D
<RO><CE><FT:Arial,SR><PT:8
<RO><CE><BD+>Fecha:<BD></C
<RO><CE>Cita:<BD></CE><CE>
<RO><CE><BD+><HD>Juez:<BD>
<RD>
<RD:Seccion>Opinión
<RD>
<RD:Pagina>Página: 259
<RD>
<RD>CE-86-666<CM>MJSEP</CM
<HR>
<RD><CM>MJSEPEND</CM>
Conducta Profesional<CM>MJSEP</CM>
<HR>
---
According to this app I downloaded, The Regex Coach, the regex matches a good chunk of the file, specifically from the beginning of the line starting with <RD:Documento> until the end of the line that contains "Juez", but the script does nothing. When I fiddle with the regex to pinpoint the problem, I find that up to the end of the third expression it does match as I expect, but when I add in the fourth, (.*Juez.*</TA>), it stops working. Does it have something to do with the newlines that should be matched by that expression?
Thanks!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.