Link to home
Start Free TrialLog in
Avatar of AXISHK
AXISHK

asked on

bash script

how to write a bash script to scan through all the files in a folder and sub-folders and write the full file path to a log file ?

Any refer for "http://xxxxx" string but do not include a passing parameter "yyy" in the string should be listed out ?

Tks
SOLUTION
Avatar of omarfarid
omarfarid
Flag of United Arab Emirates image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of AXISHK
AXISHK

ASKER

Level1/Level2/file1: http://www.mydomain.com/about.html

Leve/1/file2: http://www.otherdomain.com/about.html

Level1/file3:http://www.otherdomain.com/about.html

"findfile mydomain" will log
1. Leve/1/file2: http://www.otherdomain.com/about.html
2. Level1/file3:http://www.otherdomain.com/about.html
cd to the directory just above "Level1", then run

D="otherdomain"
find Level1 -type f | xargs grep -oH "http://.*$D[^ ]*"

Inside the script "findfile" use

D="$1"  
instead of
D="otherdomain"

The above will display the matching part starting with "http://" up to the first space or end-of-line.
If you want to stop the match at a different or another character please let me know.

For example, to hide everything following a colon or an ampersand sign or a space (including the respective character itself) use:

find Level1 -type f | xargs grep -oH "http://.*$D[^:& ]*"
Sorry, re-reading your Q and particularly your subsequent comment I found my answer should have looked like this:

cd to the directory just above "Level1", then run

D="mydomain"
find Level1 -type f | xargs grep -oH "http://[^ ]*" | grep -v "$D"

Inside the script "findfile" use

D="$1"  
instead of
D="mydomain"

The above will display the matching part starting with "http://" up to the first space or end-of-line.
It will exclude all results containing "mydomain" (or what's passed as the first parameter to the script "findfile") from being displayed.

If you want to stop the match at (a) different or another character(s) please let me know.
For example, to hide everything following a colon or an ampersand sign or a space (including the respective character itself) use:

find Level1 -type f | xargs grep -oH "http://[^:& ]*" | grep -v "$D"
try this:

find $1 -xargs grep -v $2 >> /tmp/fileslist

This can be run as:

findfile /path/to/dir mydomain

If you don't want to have the dir name passed with the command then:

find . -xargs grep -v $1 >> /tmp/fileslist

This can be run as:

cd /path/to/dir

findfile  mydomain
Avatar of AXISHK

ASKER

xargs grep -oH "http://.*$1[^ ]*"


Any special mean for . and two * in the above expression ?

One more check, is it possible to include one more condition, ie

Can I list all files that either don't include http://mydomain.com or contain words "iframe" ?
The -e option lets you specify multiple patterns for search

The -o option gives you exact match

The -v option lets you search for not having pattern

Please see man page

http://linux.die.net/man/1/grep
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of AXISHK

ASKER

Tks