?
Solved

Gawk - Find all occurences of a string within XML - Also include 100 bytes before and after the match

Posted on 2015-02-09
2
Medium Priority
?
252 Views
Last Modified: 2015-02-10
On Windows, I am currently using gawk to find the first occurrence of a string + 100 bytes for all XMLs withing a directory:

gawk "/[some string]/" { match ( $0, /[some string]/); print substr($0,RSTART,RLENGTH + 100) FILENAME; }" C:\XML*.xml > C:\Results.txt

Open in new window


What I would like to do now is output all the matches (not just the first) to C:\Results.txt for each XML and also include 100 characters before the match + 100 characters after the match.

Is it possible to easily change this to get the desired results?

I understand that gawk might not be the best tool for the job, but this is just a one time task and if this is slow I can let this run overnight.
0
Comment
Question by:Mr P
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 40598318
If the 100 characters are on the same line as the match, you can use
match ( $0, /some string/){print substr($0,RSTART-100,RLENGTH + 200)FILENAME; }

if there can me more than one match on a line, and the matches are at least 100 characters apart, you might use
/some string/{while(match ( $0, /some string/)){ print substr($0,RSTART-100,RLENGTH + 200) FILENAME; $0=substr($0,RSTART+1)} }'
0
 

Author Closing Comment

by:Mr P
ID: 40602228
This worked great.  Thank you, Ozo.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is about my first experience with programming Arduino.
What do responsible coders do? They don't take detrimental shortcuts. They do take reasonable security precautions, create important automation, implement sufficient logging, fix things they break, and care about users.
Simple Linear Regression
Introduction to Processes
Suggested Courses
Course of the Month8 days, 4 hours left to enroll

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question