Solved

Gawk - Find all occurences of a string within XML - Also include 100 bytes before and after the match

Posted on 2015-02-09
2
225 Views
Last Modified: 2015-02-10
On Windows, I am currently using gawk to find the first occurrence of a string + 100 bytes for all XMLs withing a directory:

gawk "/[some string]/" { match ( $0, /[some string]/); print substr($0,RSTART,RLENGTH + 100) FILENAME; }" C:\XML*.xml > C:\Results.txt

Open in new window


What I would like to do now is output all the matches (not just the first) to C:\Results.txt for each XML and also include 100 characters before the match + 100 characters after the match.

Is it possible to easily change this to get the desired results?

I understand that gawk might not be the best tool for the job, but this is just a one time task and if this is slow I can let this run overnight.
0
Comment
Question by:Mr P
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
Comment Utility
If the 100 characters are on the same line as the match, you can use
match ( $0, /some string/){print substr($0,RSTART-100,RLENGTH + 200)FILENAME; }

if there can me more than one match on a line, and the matches are at least 100 characters apart, you might use
/some string/{while(match ( $0, /some string/)){ print substr($0,RSTART-100,RLENGTH + 200) FILENAME; $0=substr($0,RSTART+1)} }'
0
 

Author Closing Comment

by:Mr P
Comment Utility
This worked great.  Thank you, Ozo.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Here we come across an interesting topic of coding guidelines while designing automation test scripts. The scope of this article will not be limited to QTP but to an overall extent of using VB Scripting for automation projects. Introduction Now…
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now