• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 265
  • Last Modified:

Gawk - Find all occurences of a string within XML - Also include 100 bytes before and after the match

On Windows, I am currently using gawk to find the first occurrence of a string + 100 bytes for all XMLs withing a directory:

gawk "/[some string]/" { match ( $0, /[some string]/); print substr($0,RSTART,RLENGTH + 100) FILENAME; }" C:\XML*.xml > C:\Results.txt

Open in new window

What I would like to do now is output all the matches (not just the first) to C:\Results.txt for each XML and also include 100 characters before the match + 100 characters after the match.

Is it possible to easily change this to get the desired results?

I understand that gawk might not be the best tool for the job, but this is just a one time task and if this is slow I can let this run overnight.
Mr P
Mr P
1 Solution
If the 100 characters are on the same line as the match, you can use
match ( $0, /some string/){print substr($0,RSTART-100,RLENGTH + 200)FILENAME; }

if there can me more than one match on a line, and the matches are at least 100 characters apart, you might use
/some string/{while(match ( $0, /some string/)){ print substr($0,RSTART-100,RLENGTH + 200) FILENAME; $0=substr($0,RSTART+1)} }'
Mr PAuthor Commented:
This worked great.  Thank you, Ozo.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

7 new features that'll make your work life better

It’s our mission to create a product that solves the huge challenges you face at work every day. In case you missed it, here are 7 delightful things we've added recently to monday to make it even more awesome.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now