Link to home
Start Free TrialLog in
Avatar of DevSupport
DevSupport

asked on

bash script to find values from xml

Hi Experts,

We have been working on a script which scans through  xml files and pulls out information like database name, databaseServername etc from multiple xml files.

I am attaching sample xmls which are used by script to scan.

The script is as follows:
nohup timeout 15 cat context.xml
        sleep 10
        cat nohup.out|grep -i "driverClassName="|grep -i oracle >/dev/null 2>&1
 if [ $? -eq 0 ]
  then
        OUr=`cat nohup.out|grep url=|awk -F= '{print $2}'`
        DS=`echo $OUr|awk -F: '{print $4}'|sed s/@//g`
        DB=`echo $OUr|awk -F: '{print $NF}'|sed s/\".*//g`
        DB1=`echo $DB|sed s/\".*//g`

else
        DS=`cat nohup.out |grep url=|grep jdbc|head -1|awk -F/ '{print $3}'|awk -F";" '{print $1}'|awk -F":" '{print $1}'`
        DB=`cat nohup.out |grep url=|grep jdbc|head -1|awk -F";" '{print $2}'|awk -F= '{print $2}'|sed s/\".*//g`
        Ru=`cat nohup.out|grep "Environment name"|grep PegaRULES|awk '{print $3}'|awk -F= '{print $2}'|sed s/\"//g`
   if [ -z "$Ru" ]
    then
        Ru=`cat nohup.out|grep username=|awk -F= '{print $2}'|sed s/\"//g`
   fi
 fi

Open in new window


This works for 90 percent of cases which have only one Resource Name (example: context_works.xml and context_works2.xml)

But, in 10 percent of xmls when there are multiple resources (like context_doesnotwork.xml), it doesnt picks the right one, due to the oracle grep check, but I would like to be able to pick the resource which has the name=jdbc/PegaRULES.(It picks Oracle instead of picking the resource with name jdbc/PegaRULES). I would like to eliminate resources other than the ones with name=jdbc/PegaRULES before the grep for Oracle in the file.
Is there a way by which I can do that without losing much of the current logic?
context_works.xml
context_works2.xml
context_doesnotwork.xml
Avatar of gr8gonzo
gr8gonzo
Flag of United States of America image

Any reason you don't just use a scripting language that can parse XML like PHP? It would be more reliable than grep...
Avatar of DevSupport
DevSupport

ASKER

sorry I dont know PHP. If you could tell me a PHP way to change this code I am more than willing to do it if it all works well.
Thanks
Can you test following code at your environment:
#!/bin/bash
unset -f UseFullPath
UseFullPath ()
{
	if test ! -f /bin/sleep
	then
		echo "/bin/sleep No such file"
		Ret=1
		return $Ret
	fi
	SLEEP="/bin/sleep"
	if test ! -f /bin/cat
	then
		echo "/bin/cat No such file"
		Ret=2
		return $Ret
	fi
	CAT="/bin/cat"
	if test ! -f /bin/rm
	then
		echo "/bin/rm No such file"
		Ret=3
		return $Ret
	fi
	RM="/bin/rm"
	if test ! -f /usr/bin/nohup
	then
		echo "/usr/bin/nohup No such file"
		Ret=4
		return $Ret
	fi
	NOHUP="/usr/bin/nohup"
	if test ! -f /bin/egrep
	then
		if test ! -f /bin/grep
		then
			echo "/bin/egrep No such file"
			Ret=5
			return $Ret
		fi
		EGREP="/bin/grep -E "
	else
		EGREP="/bin/egrep"
	fi
	if test ! -f /bin/awk
	then
		if test ! -f /bin/gawk
		then
			echo "/bin/awk or /bin/gawk No such file"
			Ret=6
			return $Ret
		fi
		AWK="/bin/gawk"
	else
		AWK="/bin/awk"
	fi
	if test ! -f /bin/sed
	then
		echo "/bin/sed No such file"
		Ret=7
		return $Ret
	fi
	SED="/bin/sed"
	if test ! -f /usr/bin/head
	then
		echo "/usr/bin/head No such file"
		Ret=8
		return $Ret
	fi
	HEAD="/usr/bin/head"
	Ret=$?
	return $Ret
}
UseFullPath $@
Ret=$?
if test 0 -eq $Ret
then
	if test -f ./nohup.out
	then
		echo "Cleaning old file: ./nohup.out"
		echo $RM -f ./nohup.out
		$RM -f ./nohup.out
	fi
	$NOHUP timeout 15 $CAT context.xml
	$SLEEP 10
	$EGREP -i "driverClassName=.*oracle" ./nohup.out | $EGREP -v "^$" >/dev/null 2>&1
	Ret=$?
	if test 0 -eq $Ret
	then
		OUr=''`$AWK -F"=" '{
			if ( 0 != index( $0, "url="))
			{
				if ( 0 != index( $0, "oracle"))
				{
					printf( "%s\n", substr($2, 2, length($2)-3));
				}
			}
		}' ./nohup.out`''
		DS=`echo $OUr | $AWK -F: '{print $4}' | $SED s/@//g`
		DB=`echo $OUr | $AWK -F: '{print $NF}' | $SED s/\".*//g`
		DB1=`echo $DB | $SED s/\".*//g`
		echo "OUr $OUr"
		echo "DS  $DS"
		echo "DB  $DB"
		echo "DB1 $DB1"
	else
		DS=`$CAT ./nohup.out | $EGREP url= | $EGREP jdbc | $HEAD -1 | $AWK -F/ '{print $3}' | $AWK -F";" '{print $1}' | $AWK -F":" '{print $1}'`
		DB=`$CAT ./nohup.out | $EGREP url= | $EGREP jdbc | $HEAD -1 | $AWK -F";" '{print $2}' | $AWK -F= '{print $2}' | $SED s/\".*//g`
		Ru=`$CAT ./nohup.out | $EGREP "Environment name" | $EGREP PegaRULES | $AWK '{print $3}' | $AWK -F= '{print $2}' | $SED s/\"//g`
		if test -z "$Ru"
		then
			Ru=`$CAT ./nohup.out | $EGREP username= | $AWK -F= '{print $2}' | $SED s/\"//g`
		fi
		echo DS $DS
		echo DB $DB
		echo Ru $Ru
	fi
else
	/bin/ls UseFullPathError >/dev/null 2>&1
fi

Open in new window

Sample output:
test1
$ /bin/cp -ip context_works.xml context.xml
/bin/cp: overwrite 'context.xml'? y
$ ./29066436.sh
Cleaning old file: ./nohup.out
/bin/rm -f ./nohup.out
/usr/bin/nohup: ignoring input and appending output to 'nohup.out'
DS server1
DB db1
Ru rules1

Open in new window

test2
$ /bin/cp -ip context_works2.xml context.xml
/bin/cp: overwrite 'context.xml'? y
$ ./29066436.sh
Cleaning old file: ./nohup.out
/bin/rm -f ./nohup.out
/usr/bin/nohup: ignoring input and appending output to 'nohup.out'
OUr jdbc:oracle:thin:@myorgserver:1552:myorgdb
DS  myorgserver
DB  myorgdb
DB1 myorgdb

Open in new window

test3
$ /bin/cp -ip context_doesnotwork.xml context.xml
/bin/cp: overwrite 'context.xml'? y
$ ./29066436.sh
Cleaning old file: ./nohup.out
/bin/rm -f ./nohup.out
/usr/bin/nohup: ignoring input and appending output to 'nohup.out'
OUr jdbc:oracle:thin:@server4:1234:db4
DS  server4
DB  db4
DB1 db4

Open in new window

After performance testing code changes, will provide updated code.
Updated code for testing:
#!/bin/bash
unset -f SetPathEnv
SetPathEnv ()
{
	echo $PATH | /bin/grep -E "^\/bin:\/usr\/bin:\/usr\/local\/bin:\/sbin:\/usr\/sbin:" >/dev/null 2>&1
	Ret=$?
	if test 0 -ne $Ret
	then
		export PATH="/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:$PATH"
	fi
	Ret=$?
	return $Ret
}
unset -f UseFullPath
UseFullPath ()
{
	SetPathEnv $@
	for RequiredFiles in /bin/sleep /bin/cat /bin/rm /usr/bin/nohup /bin/grep /bin/sed /usr/bin/head
	do
		if test ! -f "$RequiredFiles"
		then
			echo "$RequiredFiles No such file"
			Ret=1
			return $Ret
		fi
	done
	if test ! -f /bin/grep
	then
		if test ! -f /bin/egrep
		then
			echo "/bin/egrep No such file"
			Ret=2
			return $Ret
		fi
		EGREP="/bin/egrep"
	else
		EGREP="/bin/grep -E"
	fi
	SLEEP="/bin/sleep"
	CAT="/bin/cat"
	RM="/bin/rm"
	NOHUP="/usr/bin/nohup"
	if test ! -f /bin/awk
	then
		if test ! -f /bin/gawk
		then
			echo "/bin/awk or /bin/gawk No such file"
			Ret=6
			return $Ret
		fi
		AWK="/bin/gawk"
	else
		AWK="/bin/awk"
	fi
	SED="/bin/sed"
	HEAD="/usr/bin/head"
	Ret=$?
	return $Ret
}
UseFullPath $@
Ret=$?
if test 0 -eq $Ret
then
	if test -f ./nohup.out
	then
		echo "Cleaning old file: ./nohup.out"
		echo $RM -f ./nohup.out
		$RM -f ./nohup.out
	fi
	$NOHUP timeout 15 $CAT context.xml
	$SLEEP 10
	$EGREP -i "driverClassName=.*oracle" ./nohup.out | $EGREP -v "^$" >/dev/null 2>&1
	Ret=$?
	if test 0 -eq $Ret
	then
		OUr=''`$AWK -F"=" '{ if ( 0 != index( $0, "url=")) if ( 0 != index( $0, "oracle")) printf( "%s\n", substr($2, 2, length($2)-3)); }' ./nohup.out`''
		DS=`echo $OUr | $AWK -F: '{print $4}' | $SED s/@//g`
		DB=`echo $OUr | $AWK -F: '{print $NF}' | $SED s/\".*//g`
		DB1=`echo $DB | $SED s/\".*//g`
		echo "OUr $OUr"
		echo "DS  $DS"
		echo "DB  $DB"
		echo "DB1 $DB1"
	else
		DS=''`$EGREP "url=.*jdbc" ./nohup.out  | $HEAD -1 | $SED "s/.*\///;s/\;.*//;"`''
		DB=''`$EGREP "url=.*jdbc" ./nohup.out | $HEAD -1 | $SED "s/\(.*\)\;\(.*=\)\(.*\)\;\(.*\)\;\(.*\)/\3/;"`''
		Ru=''`$EGREP "Environment name.*PegaRULES" ./nohup.out  | $SED "s/.*value=\"//;s/\".*//;"`''
		if test -z "$Ru"
		then
			Ru=`$EGREP username= ./nohup.out  | $SED "s/.*=\"//;s/\"//;"`
		fi
		echo DS $DS
		echo DB $DB
		echo Ru $Ru
	fi
else
	/bin/ls UseFullPathError >/dev/null 2>&1
fi

Open in new window

test1
$ /bin/cp -ip context_works.xml ./context.xml
/bin/cp: overwrite './context.xml'? y
$ ./29066436.sh
Cleaning old file: ./nohup.out
/bin/rm -f ./nohup.out
/usr/bin/nohup: ignoring input and appending output to 'nohup.out'
DS server1
DB cursor
Ru rules1

Open in new window

test2
$ /bin/cp -ip context_works2.xml ./context.xml
/bin/cp: overwrite './context.xml'? y
$ ./29066436.sh
Cleaning old file: ./nohup.out
/bin/rm -f ./nohup.out
/usr/bin/nohup: ignoring input and appending output to 'nohup.out'
OUr jdbc:oracle:thin:@myorgserver:1552:myorgdb
DS  myorgserver
DB  myorgdb
DB1 myorgdb

Open in new window

test3
$ /bin/cp -ip context_doesnotwork.xml context.xml
/bin/cp: overwrite 'context.xml'? y
$ ./29066436.sh
Cleaning old file: ./nohup.out
/bin/rm -f ./nohup.out
/usr/bin/nohup: ignoring input and appending output to 'nohup.out'
OUr jdbc:oracle:thin:@server4:1234:db4
DS  server4
DB  db4
DB1 db4

Open in new window


Let us know if any change required?
Hi Murugesan,

Your script looks pretty comprehensive.  I haven't checked it all, but here are some comments and possible improvements:

SetPathEnv ()
{
      echo $PATH | /bin/grep -E "^\/bin:\/usr\/bin:\/usr\/local\/bin:\/sbin:\/usr\/sbin:" >/dev/null 2>&1
      Ret=$?
      if test 0 -ne $Ret
      then
            export PATH="/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:$PATH"
      fi
      Ret=$?
      return $Ret
}


What's the point of that 2nd "Ret=$?"?  Are you trying to store the return code of the export command, if the previous test succeeded?  If so, why?

Also, you're calling /bin/grep without checking that /bin/grep exists.
And you're calling "echo" and "test" without full paths or checks that they exist.
That's not consistent with the rest of your code's checks for commands.

And usually we'd write this:
    if test 0 -ne $Ret
like this:
    if test $Ret -ne 0            # Note the variable on the left and value on the right
or this:
    if [ $Ret -ne 0 ]

And this kind of thing:
   Ret=6
    return $Ret

could be written more simply/concisely like this:
   return 6

And this:
     Ret=$?
      return $Ret

could be written:
       return $?

And this kind of thing:
     if test ! -f /bin/cat
      then
            echo "/bin/cat No such file"
            Ret=2
            return $Ret
      fi
      CAT="/bin/cat"

could be made more concise like this:
   [ -f /bin/cat ] && CAT="/bin/cat" || echo "/bin/cat No such file" || exit 2

Or you could just let the script crash with it's own message about not finding /bin/cat and all those other commands, by putting this before them all:
    set -e
Which should make writing, reading and maintaining your code simpler.
Hi Pinehaven/DevSupport,

Thank you for all the comments.

0. comprehensive
The script (not only given script) needs to be divided into multiple scripts so that readability / maintenance can be handled.

1. What's the point of that 2nd "Ret=$?"?  Are you trying to store the return code of the export command, if the previous test succeeded?  If so, why?
The script can be modified by other person (used to handling return values if it happens by other mistakes Example: including other command before Ret=$?)

2. Also, you're calling /bin/grep without checking that /bin/grep exists
Agreed, but most of the times it used to be there in /bin/grep location.
Script always require further updates after completing all testing.

3. And you're calling "echo" and "test" without full paths or checks that they exist.
echo and test also provided by shell (not only /bin/echo ...)
$ echo $0
bash
$ type test
test is a shell builtin
$ type echo
echo is a shell builtin
$ /usr/bin/which test
/bin/test
$ /usr/bin/which echo
/bin/echo

Open in new window


4. And usually we'd write this:
Good programmer need to write value on left and variable on right while comparing.

5. could be written: return $?
Answered at: 1.

6. [ -f /bin/cat ] && CAT="/bin/cat" || echo "/bin/cat No such file" || exit 2
Good comment. Agreed.
written sample codes (/bin/cat , ... ), so that the programmer/tester can update the same using for loop. Given hint using sample /bin/cat /bin/awk or /bin/gawk ...

7. set -e
Agreed.
Written the script to be handled at all platforms ( AIX/HP-UX/Linux/SunOS/UNIX/Windows_CYGWIN_NT_OR_mingw). Once the script finalized, no need to modify the script (Example set -e or set -x or ...) Of course that can be handled using getopts using -d 7 or -e ValidateBinaries ...
$ # OS names (/bin/uname -s | /bin/sed "s/-.*//;") not sorted for business.
$ echo "AIX HP-UX Linux SunOS UNIX Windows_CYGWIN_NT_or_mingw" | /usr/bin/tr " " "\n" |  /bin/sort -u
AIX
HP-UX
Linux
SunOS
UNIX
Windows_CYGWIN_NT_or_mingw

Open in new window

Hi Murugesan,

"Pinehaven" is not my name.  It is my approximate location in New Zealand.

>> 1. What's the point of that 2nd "Ret=$?"?  Are you trying to store the return code of the export command, if the previous test succeeded?  If so, why?
> The script can be modified by other person (used to handling return values if it happens by other mistakes Example: including other command before Ret=$?)

That doesn't answer my question.  You've done "Ret=$?" twice in that section.  What command are you trying to return the return-code of, at the end of that section?

>> 2. Also, you're calling /bin/grep without checking that /bin/grep exists
> Agreed, but most of the times it used to be there in /bin/grep location.

True, but the same applies for all the other /bin/commands you tested the presence of before running them, right?  They are in those locations most of the time.

> Script always require further updates after completing all testing.
What has this got to do with my point?

Any comments on this point of mine?:
> And this kind of thing:
>   Ret=6
>    return $Ret
>could be written more simply/concisely like this:
>   return 6


>> 4. And usually we'd write this:
> Good programmer need to write value on left and variable on right.

Why is that better?
Where did you learn that good programmers do that?
Got any URL which supports that?
(The basis of my claim that people usually put the variable on the left, is from reading code for a few decades.)

> written sample codes (/bin/cat , ... ), so that the programmer/tester can update the same using for loop. Given hint using sample /bin/cat /bin/awk or /bin/gawk ...
What are you talking about?  Update same what?  What has this got to do with loops?

> Once the script finalized, no need to modify the script (Example set -e or set -x or ...) Of course that can be handled using getopts using -d 7 or -e ValidateBinaries ...
Based on the "UnixOS" topic area chosen, it looks as if this is for some kind of Unix, so "set -e" should work (i.e. it should force the script to crash if there's an error).
Are you trying to make it so it can be ported to Cygwyn on Windows?  If so, does Cygwyn support "set -e"?
Hi tel2,

>> Pinehaven
got it :)
Writing updated code for your comment.
Hi tel2,

Thank you for all your comments/updates.

Written good programmer referring all languages (not related to 29066436/script alone) .

One of the question at experts-exchanged informed awk not found. Hence handled gawk.
>> it should force the script to crash if there's an error
yes, hence handling that error before starting the script.

Using "set -e" applicable on Windows also.
However (not related to 29066436) while handling more changes, difficult to find error while using set -e

@DevSupport
Here goes updated code based on comment from tel2 :)
29066436.sh
#!/bin/bash
if test "bash" = "$0"
then
	echo Cannot execute this script in current shell
else
	if test ! -f ./UseFullPath.sh
	then
		echo "./UseFullPath.sh No such file"
	else
		. ./UseFullPath.sh
		UseFullPath
		Ret=$?
		if test 0 -eq $Ret
		then
			if test -f ./nohup.out
			then
				echo "Cleaning old file: ./nohup.out"
				echo $RM -f ./nohup.out
				$RM -f ./nohup.out
			fi
			$NOHUP timeout 15 $CAT context.xml
			$SLEEP 10
			$EGREP -i "driverClassName=.*oracle" ./nohup.out | $EGREP -v "^$" >/dev/null 2>&1
			Ret=$?
			if test 0 -eq $Ret
			then
				OUr=''`$AWK -F"=" '{ if ( 0 != index( $0, "url=")) if ( 0 != index( $0, "oracle")) printf( "%s\n", substr($2, 2, length($2)-3)); }' ./nohup.out`''
				DS=`echo $OUr | $AWK -F: '{print $4}' | $SED s/@//g`
				DB=`echo $OUr | $AWK -F: '{print $NF}' | $SED s/\".*//g`
				DB1=`echo $DB | $SED s/\".*//g`
				echo "OUr $OUr"
				echo "DS  $DS"
				echo "DB  $DB"
				echo "DB1 $DB1"
			else
				DS=''`$EGREP "url=.*jdbc" ./nohup.out  | $HEAD -1 | $SED "s/.*\///;s/\;.*//;"`''
				DB=''`$EGREP "url=.*jdbc" ./nohup.out | $HEAD -1 | $SED "s/\(.*\)\;\(.*=\)\(.*\)\;\(.*\)\;\(.*\)/\3/;"`''
				Ru=''`$EGREP "Environment name.*PegaRULES" ./nohup.out  | $SED "s/.*value=\"//;s/\".*//;"`''
				if test -z "$Ru"
				then
					Ru=`$EGREP username= ./nohup.out  | $SED "s/.*=\"//;s/\"//;"`
				fi
				echo DS $DS
				echo DB $DB
				echo Ru $Ru
			fi
		else
			exit $Ret
		fi
	fi
fi

Open in new window

SetEnv.sh
#!/bin/bash
unset -f SetEnv
SetEnv ()
{
	for RequiredDir in /bin /usr/bin /sbin /usr/sbin
	do
		if test ! -d "$RequiredDir"
		then
			echo "$RequiredDir No such directory"
			return 2
		fi
	done
	if test ! -f /bin/grep
	then
		if test ! -f /bin/egrep
		then
			echo "Use related path for grep or egrep in $0"
			return 3
		else
			EGREP="/bin/egrep "
		fi
	else
		EGREP="/bin/grep -E "
	fi
	echo $PATH | $EGREP "^\/bin:\/usr\/bin:\/usr\/local\/bin:\/sbin:\/usr\/sbin:" >/dev/null 2>&1
	Ret=$?
	if test 0 -ne $Ret
	then
		export PATH="/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:$PATH"
		Ret=$?
	fi
	return $Ret
}

Open in new window

UseFullPath.sh
#!/bin/bash
unset -f UseFullPath
UseFullPath ()
{
	if test ! -f ./SetEnv.sh
	then
		echo "./SetEnv.sh No such file"
		return 1
	fi
	. ./SetEnv.sh
	SetEnv
	Ret=$?
	if test 0 -ne $Ret
	then
		return $Ret
	fi
	for RequiredFiles in /bin/sleep /bin/cat /bin/rm /usr/bin/nohup /bin/sed /usr/bin/head
	do
		if test ! -f "$RequiredFiles"
		then
			echo "$RequiredFiles No such file"
			echo "Update $0 having related path"
			return 4
		fi
	done
	SLEEP="/bin/sleep"
	CAT="/bin/cat"
	RM="/bin/rm"
	NOHUP="/usr/bin/nohup"
	if test ! -f /bin/awk
	then
		if test ! -f /bin/gawk
		then
			echo "/bin/awk or /bin/gawk No such file"
			return 6
		fi
		AWK="/bin/gawk"
	else
		AWK="/bin/awk"
	fi
	SED="/bin/sed"
	HEAD="/usr/bin/head"
	return $Ret
}

Open in new window

Hi Murugesan,

> Written good programmer referring all languages (not related to 29066436/script alone) .
I never tried to imply that it was related to 29066436 alone.  Please answer my 3 questions.  Here they are again:
    >> 4. And usually we'd write this:
    > Good programmer need to write value on left and variable on right.
    a) Why is that better?
    b) Where did you learn that good programmers do that?
    c) Got any URLs which supports that?
    (The basis of my claim that people usually put the variable on the left, is from reading code for a few decades.)

Answer the 3 questions (a, b & c) above, please.

> One of the question at experts-exchanged informed awk not found. Hence handled gawk.
I never complained about your handling of awk/gawk.

> Using "set -e" applicable on Windows also.
> However (not related to 29066436) while handling more changes, difficult to find error while using set -e

Why is it difficult to find errors while using set -e?  See example below:
    #!/bin/bash
    set -e
    echo "\$0=$0"
    /sbin/cat myfile
    echo "End"
Now if I'll run the script and show the output:
./test1.sh
    $0=./test1.sh
    ./test1.sh: line 4: /sbin/cat: No such file or directory
So, the error message tells me the name of the script which failed, the line it failed in, the command that failed, and the reason it failed.  So, why do you claim it is difficult to find errors while using set -e?

> if test "bash" = "$0"
How is the above condition ever going to be true?  $0 returns the path used when the script was called, not the name of the shell (see example immediately above).

Have you actually tested your code, Murugesan?  That includes forcing things to fail, to make sure your conditions are working, especially things you seem to be experimenting with, like the "bash" test above.
My last comment at
https://www.experts-exchange.com/questions/29066436/bash-script-to-find-values-from-xml.html?anchor=a42360114¬ificationFollowed=199975789&anchorAnswerId=42360114#a42360114

Have you actually tested your code???
Thank you for the comment.
For fun:
I have written the code like:
/usr/bin/strings -a /bin/ls > script.sh
Need comment or query from DevSupport.
Written that script not allowing following type of execution:
. ./29066436.sh
if that exception is not handled script will make current shell to exit when error happens.
Hi Murugesan,

It's really hard to understand what you're talking about.  If you could please just speak simple English, instead of cryptic riddles with irrelevant extras, that would be a lot more efficient for all of us.  Let's try again to see if you can make yourself clear.

> My last comment at
https://www.experts-exchange.com/questions/29066436/bash-script-to-find-values-from-xml.html?anchor=a42360114¬ificationFollowed=199975789&anchorAnswerId=42360114#a42360114

What last comment of yours are you talking about?  Just try quoting it, please.  That was my post.

> Have you actually tested your code???
> Thank you for the comment.

Are you asking me now?  I haven't provided a solution, but yes, I think I tested the little bits of code that I posted here.  Now, what's the answer to my question?  Have you tested your code?

> For fun:
> I have written the code like:
> /usr/bin/strings -a /bin/ls > script.sh

What has that got to do with it?  I'm not in the mood for fun with irrelevant things right now.  I'm trying to understand you.

And I see that again you have failed to answer my clear & simple questions about this claim of yours:
> Good programmer need to write value on left and variable on right.
So I'll give you the first few examples I found when I searched the web for Linux/UNIX conditions:
Example 1:
     if [ $# -lt 1 ]; then
      Source: https://www.ibm.com/developerworks/library/l-bash-test/index.html
Example 2:
     [ $# -eq 0 ] && directorys=`pwd` || directorys=$@
      Source: http://tldp.org/LDP/abs/html/fto.html
Example 3:
     if [ "$a" -gt 0 ] && [ "$a" -lt 5 ]
      Source: http://tldp.org/LDP/abs/html/nestedifthen.html
Note how all 3 examples have the variables on the left, and the values (literals) on the right.  So, does that mean all the programmers who wrote those conditions are not good programmers?  If so, I'm amazed how few good programmers there are on the planet, because it's very rare that I see the variable on the right and the value on the left.
If I was to rewrite example 3 in the style you seem to prefer, I think it would look like this:
    if [ 0 -le "$a" ] && [ 5 -ge "$a" ]
And you find that clearer?  I doubt most programmers would find it clearer, and it's good practice to make it so others can easily read your code.

tel2
>> Are you asking me now
yes
I have not written for scripting,
written for C C++ programs.
hence proceeding the same using script too :)

without testing I cannot provide related script here.

Hence agreed on your comment return number instead of return $Ret
Used return $Ret to save $? while using any commands. Since during enhancement/error handling, preventing future errors/exceptions.

>> it so others can easily read your code
I agree.
Also it includes users/script/related OS/future modification including svn (history)
Thanks for those answers, Murugesan.

>>>    if [ 0 -le "$a" ] && [ 5 -ge "$a" ]
>>> And you find that clearer?  I doubt most programmers would find it clearer, and it's good practice to make it so others can easily read your code.
>> it so others can easily read your code
> I agree.
> Also it includes users/script/related OS/future modification including svn (history)


If you agree, then why did you make this baseless claim in the first place?:
  > Good programmer need to write value on left and variable on right.
>> if [ 0 -le "$a" ] && [ 5 -ge "$a" ]
Related to this found few errors during 2009, I cannot remember that error and svn history not at my system right :)
Hence those kind of error (
I guess old script used [[ format using if. => Cannot remember the format which was used by old script. It was something like [[
Usage of IDOC files instead of EDI files reported few error while using IDOC files while using script to submit them to server
)
was resolved using related given format (if test).

echo -n Thank you and welcome | /usr/bin/wc
0       4      21
My question has nothing to do with the [ ] or [[ ]] convention.  That's just you going off on an irrelevant side-track again.  It's about the order of variables & values.
Answer the question, please!  Here it is again:

>>>    if [ 0 -le "$a" ] && [ 5 -ge "$a" ]
>>> And you find that clearer?  I doubt most programmers would find it clearer, and it's good practice to make it so others can easily read your code.
>> it so others can easily read your code
I agree.
> Also it includes users/script/related OS/future modification including svn (history)

If you agree, then why did you make this baseless claim in the first place?:
  > Good programmer need to write value on left and variable on right.
>> If you agree, then why did you make this baseless claim in the first place?:
Based on this here goes updated code:
29066436.sh
#!/bin/bash
if [ "bash" = "$0" ]
then
        echo Cannot execute this script in current shell
else
        if [ ! -f ./UseFullPath.sh ]
        then
                echo "./UseFullPath.sh No such file"
        else
                . ./UseFullPath.sh
                UseFullPath
                Ret=$?
                if [ 0 -eq $Ret ]
                then
                        if [ -f ./nohup.out ]
                        then
                                echo "Cleaning old file: ./nohup.out"
                                echo $RM -f ./nohup.out
                                $RM -f ./nohup.out
                        fi
                        $NOHUP timeout 15 $CAT context.xml
                        $SLEEP 10
                        $EGREP -i "driverClassName=.*oracle" ./nohup.out | $EGREP -v "^$" >/dev/null 2>&1
                        Ret=$?
                        if [ 0 -eq $Ret ]
                        then
                                OUr=''`$AWK -F"=" '{ if ( 0 != index( $0, "url=")) if ( 0 != index( $0, "oracle")) printf( "%s\n", substr($2, 2, length($2)-3)); }' ./nohup.out`''
                                DS=`echo $OUr | $AWK -F: '{print $4}' | $SED s/@//g`
                                DB=`echo $OUr | $AWK -F: '{print $NF}' | $SED s/\".*//g`
                                DB1=`echo $DB | $SED s/\".*//g`
                                echo "OUr $OUr"
                                echo "DS  $DS"
                                echo "DB  $DB"
                                echo "DB1 $DB1"
                        else
                                DS=''`$EGREP "url=.*jdbc" ./nohup.out  | $HEAD -1 | $SED "s/.*\///;s/\;.*//;"`''
                                DB=''`$EGREP "url=.*jdbc" ./nohup.out | $HEAD -1 | $SED "s/\(.*\)\;\(.*=\)\(.*\)\;\(.*\)\;\(.*\)/\3/;"`''
                                Ru=''`$EGREP "Environment name.*PegaRULES" ./nohup.out  | $SED "s/.*value=\"//;s/\".*//;"`''
                                if [ -z "$Ru" ]
                                then
                                        Ru=`$EGREP username= ./nohup.out  | $SED "s/.*=\"//;s/\"//;"`
                                fi
                                echo DS $DS
                                echo DB $DB
                                echo Ru $Ru
                        fi
                else
                        exit $Ret
                fi
        fi
fi

Open in new window

UseFullPath.sh
#!/bin/bash
unset -f UseFullPath
UseFullPath ()
{
        if [ ! -f ./SetEnv.sh ]
        then
                echo "./SetEnv.sh No such file"
                return 1
        fi
        . ./SetEnv.sh
        SetEnv
        Ret=$?
        if [ 0 -ne $Ret ]
        then
                return $Ret
        fi
        for RequiredFiles in /bin/sleep /bin/cat /bin/rm /usr/bin/nohup /bin/sed /usr/bin/head
        do
                if [ ! -f "$RequiredFiles" ]
                then
                        echo "$RequiredFiles No such file"
                        echo "Update $0 having related path"
                        return 4
                fi
        done
        SLEEP="/bin/sleep"
        CAT="/bin/cat"
        RM="/bin/rm"
        NOHUP="/usr/bin/nohup"
        if [ ! -f /bin/awk ]
        then
                if [ ! -f /bin/gawk ]
                then
                        echo "/bin/awk or /bin/gawk No such file"
                        return 6
                fi
                AWK="/bin/gawk"
        else
                AWK="/bin/awk"
        fi
        SED="/bin/sed"
        HEAD="/usr/bin/head"
        return $Ret
}

Open in new window

SetEnv.sh
#!/bin/bash
unset -f SetEnv
SetEnv ()
{
        for RequiredDir in /bin /usr/bin /sbin /usr/sbin
        do
                if [ ! -d "$RequiredDir" ]
                then
                        echo "$RequiredDir No such directory"
                        return 2
                fi
        done
        if [ ! -f /bin/grep ]
        then
                if [ ! -f /bin/egrep ]
                then
                        echo "Use related path for grep or egrep in $0"
                        return 3
                else
                        EGREP="/bin/egrep "
                fi
        else
                EGREP="/bin/grep -E "
        fi
        echo $PATH | $EGREP "^\/bin:\/usr\/bin:\/usr\/local\/bin:\/sbin:\/usr\/sbin:" >/dev/null 2>&1
        Ret=$?
        if [ 0 -ne $Ret ]
        then
                export PATH="/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:$PATH"
                Ret=$?
        fi
        return $Ret
}

Open in new window

Thank you again for the comments.
Hi Murugesan,

I see you've changed "test" to "[ ]".  Although my personal preference is the "[ ]" style, it is only a matter of style as far as I know, and I never complained about it.  My complaint was about you saying:
> Good programmer need to write value on left and variable on right.
And I see you are still doing that in your awk code above, e.g.:
>  if ( 0 != index( $0, "url=")) if ( 0 != index( $0, "oracle"))
Also, why do you have 2 "if"s instead of 1 "if" with some kind of "and" between the conditions?  Something like this:
      if (0 != index($0, "url=") && 0 != index($0, "oracle"))
or better:
      if (index($0, "url=") != 0 && index($0, "oracle") != 0)
Hi Tel2/DevSupport

In C++ and C during 2003 saw following kind of code:
Following is only sample(since all changes/enhancements were handled for 10 years):
alignment.c
#include <stdio.h>
int main()
{
        int alignment = 7;
        printf( "Before if [ %d ]\n", alignment);
        if ( alignment = 1)
        {
                printf( "Inside if [ %d ]\n", alignment);
        }
        else
        {
                printf( "else   if [ %d ]\n", alignment);
        }
        return 0;
}
$ /usr/bin/gcc -Wall  alignment.c  -o ./alignment
alignment.c: In function 'main':
alignment.c:6:2: warning: suggest parentheses around assignment used as truth va
lue [-Wparentheses]
  if ( alignment = 1)
$ echo $?
0
$

Open in new window

During 2003 -Wall was not present in Makefile
Hence following output caused error at server as  well as client (and depot) applications:
$ ./alignment
Before if [ 7 ]
Inside if [ 1 ]
Hence following the same including script.
Written script for:
1. automation
2. Setup.sh => Same like at Windows we have Setup.exe for Windows OS right :)
3. Makefile (handled $$ for variables)
Script was written for following(all) platforms:
1. AIX 5.1 5.2 and 5.3
2. SunOS 2.6 2.7 and 2.8
3. Linux 2.6.16 2.6.17 and 2.6.18, CentOS, Fedora, OpenSUSE x86 and x86_64. Cannot remember versions used other than RHEL.
4. CYGWIN_NT 6.1
5. HP-UX 11.00 11.11 11.22 11.23 PA-RISC IPF
Hence while handling the script, C and C++ together, the team handled using if test instead of if [ format and all verification including all types of testing and especially knowledge transfer and document on the same. The documentation was secure and was not released outside (including myself:)
Of course if ( alignment = 1) not applicable for java.
Anyhow some other errors happened using java ( Webmethods 6 and 7 ) too :)
Sorry, I had a work issue that took me away from this for a while. Here's some simple XML parsing code in PHP:

<?php
// Error checking
if(!isset($argv[1])) { echo "Syntax: " . basename(__FILE__) . " file.xml\n"; die(); }
$file = $argv[1];
if(!file_exists($argv[1])) { echo "Error: File does not exist!\n"; die(); }

// Load up the XML file
$dom = simplexml_load_file($file);

// Loop through all the <Resource> nodes
foreach($dom->Resource as $resource)
{
  // Look for the <Resource> tag with the name of "jdbc/pegarules" (case-insensitive)
  if(stripos($resource["name"], "jdbc/pegarules") !== false)
  {
     // Create $url_parts array
     parse_str(str_replace(";","&",$resource["url"]),$url_parts);

     // Dump the parts of the Resource's "url" attribute
     print_r($url_parts);
  }

  // Check if the class name contains "oracle" (case-insensitive)
  if(stripos($resource["driverClassName"], "oracle") !== false)
  {
    // Do something here...
  }
}

Open in new window


It wasn't clear what you wanted the output to be in each scenario, so I tried to provide a generic example that illustrated how the XML parsing could be done in one line and then how you could loop through and access all the nodes/attributes.
Thank You @Murugesan Nagarajan and @tel2 for all your expert comments and scripts.

I tried to execute 42362524 and its not the result I am expecting.

For context_doesnotwork.xml:

The result says Our: jdbc:oracle:thin:@server4:1234:db4, DS server4, DB db4

I am expecting server 1, db1 and Our = jdbc:sqlserver://server1;databaseName=db1 because Resource name="jdbc/PegaRULES" . That is the problem in the script which I have given in my question

Also, for the first xml (context_works.xml) it says DB = cursor instead of DB=db1

Ru should be user instead of rules1

for context_works2:

there is no Ru=myorguser

Hope you could resolve this.

Thanks
@gr8gonzo

How do I add it into my bigger.sh script? Should I just append it it into the bash script?

To your question of what I wanted the output, I am setting a few variables like servername=server1 dbname=db1 Our=long url etc based on what is jdbc/PegaRULES (sql connection or  oracle connection ) please refer to the xmls attached in the question
@DevSupport, you would need the PHP engine to execute it. There's a decent chance it's already on your system somewhere, so you'd save it as a file like "processxml.php" and then run it with php:

php processxml.php the_xml_file_you_want_to_process.xml
Incidentally, PHP is by no means the only scripting language for this - there's plenty of other options (Python would be another good option). Trying to parse XML via string functions is usually a bad idea, though.
@gr8gonzo:

If I set the variables in your php script, how does my shell script (parent script) know what the values are?
I wasn't sure how you wanted the output, so I left that part open. For example, if you want to control as much as possible from bash, you could have the PHP file generate and write a temporary bash script and echo the filename so that the parent bash script gets the output and then turns around and executes that temporary file and cleans it up afterwards.
@gr8gonzo:

For the first xml context_works.xml: I am trying parse_str(str_replace(";","&",$resource["databasename"]),$databasename);
but I am not getting any result.

I am trying to obtain the servername  , is it possible to get just server1 instead of [jdbc:sqlserver://server1]

Also
if(stripos($resource["driverClassName"], "oracle") !== false)
  {
    Here I want:
parse_str(str_replace(";","&",$resource["url"]),$url_parts);
 myorgserver
myorgdb
  }

Is it possible please?
So I had used parse_str and str_replace to try and make the "url" value look like a normal URL query string, so that parse_str would just return an easy array. If you have a variety of values in that url field, then it's probably better to figure out which approach to use for parsing the URL and then take the appropriate approach. For example:

if(stripos($resource["driverClassName"], "oracle") !== false)
{
  if(strpos($resource["url"],"jdbc:oracle") === 0) // The " === 0" checks to see if "jdbc:oracle" is at the very beginning of the string
  {
    // Parse "url" for Oracle JDBC driver
    // Example: jdbc:oracle:thin:@myorgserver:1552:myorgdb
    
    $pieces = explode(":",$resource["url"]); // Split the URL by the : colon character.
    $server = $pieces[3];                    // Array starts at zero, so the server name is index 3, which is the 4th piece from the left
    $db = $pieces[5];                        // The DB name is index 5, which is the 6th piece from the left
  }
  elseif(strpos($resource["url"],"jdbc:sqlserver") === 0) 
  {
    // Parse "url" for SQL Server JDBC driver
    // Example: jdbc:sqlserver://server1;databaseName=db1;SelectMethod=cursor;SendStringParametersAsUnicode=false;MultiSubnetFailover=True
    
    $pieces = explode(";",$resource["url"]);   // Split the URL by the ; semi-colon character.

    // Loop through the resulting pieces
    foreach($pieces as $piece)
    {
      if(stripos($piece,"databasename") === 0) // Look for a string piece like "databaseName=abcdef"
      {
        // If so, split it by the = sign and create a variable called $db from the value
        list($varname,$db) = explode("=",$piece);
      }
      elseif(stripos($piece,"://")) // Look for the "://" text that signifies the separate of protocol and server name
      {
        // If so, split it by the :// and create a variable called $db from the value
        list($varname,$server) = explode("://",$piece);
      }
    }
  }
}

echo "DB = " . isset($db) ? $db : "(Not found)\n";
echo "SERVER = " . isset($server) ? $server : "(Not found)\n";

Open in new window

@gr8gonzo:
Does this look good to you? I am getting undefined variable error for last two lines and if I remove it I dont get any output

<?php
// Error checking
if(!isset($argv[1])) { echo "Syntax: " . basename(__FILE__) . " file.xml\n"; die(); }
$file = $argv[1];
if(!file_exists($argv[1])) { echo "Error: File does not exist!\n"; die(); }

// Load up the XML file
$dom = simplexml_load_file($file);

// Loop through all the <Resource> nodes
foreach($dom->Resource as $resource)
{
  // Look for the <Resource> tag with the name of "jdbc/pegarules" (case-insensitive)
        if(stripos($resource["driverClassName"], "oracle") !== false)
{
  if(strpos($resource["url"],"jdbc:oracle") === 0) // The " === 0" checks to see if "jdbc:oracle" is at the very beginning of the string
  {
    // Parse "url" for Oracle JDBC driver
    // Example: jdbc:oracle:thin:@myorgserver:1552:myorgdb

    $pieces = explode(":",$resource["url"]); // Split the URL by the : colon character.
    $server = $pieces[3];                    // Array starts at zero, so the server name is index 3, which is the 4th piece from the left
    $db = $pieces[5];                        // The DB name is index 5, which is the 6th piece from the left
  }
  elseif(strpos($resource["url"],"jdbc:sqlserver") === 0)
  {
    // Parse "url" for SQL Server JDBC driver
    // Example: jdbc:sqlserver://server1;databaseName=db1;SelectMethod=cursor;SendStringParametersAsUnicode=false;MultiSubnetFailover=True

    $pieces = explode(";",$resource["url"]);   // Split the URL by the ; semi-colon character.

    // Loop through the resulting pieces
    foreach($pieces as $piece)
    {
      if(stripos($piece,"databasename") === 0) // Look for a string piece like "databaseName=abcdef"
      {
        // If so, split it by the = sign and create a variable called $db from the value
        list($varname,$db) = explode("=",$piece);
      }
      elseif(stripos($piece,"://")) // Look for the "://" text that signifies the separate of protocol and server name
      {
        // If so, split it by the :// and create a variable called $db from the value
        list($varname,$server) = explode("://",$piece);
      }
    }
  }
}

echo "DB = " . isset($db) ? $db : "(Not found)\n";
echo "SERVER = " . isset($server) ? $server : "(Not found)\n";

}

Open in new window

Which file are you running that code against?
@gr8gonzo:
I would be running it across all three files (actually a bunch of files which can have a format of any of the above type of file)

I am trying to find these values for context.xml files which are in multiple servers (to give you some background). The file can be of any of the three formats.

Error:

PHP Notice:  Undefined variable: db in /Scripts/phpcode1.php on line 49
PHP Notice:  Undefined variable: server in /Scripts/phpcode1.php on line 50
@gr8gonzo:

Why does this not give any output for the context files? If I uncomment the last two lines it gives errors

<?php
// Error checking
if(!isset($argv[1])) { echo "Syntax: " . basename(__FILE__) . " file.xml\n"; die(); }
$file = $argv[1];
if(!file_exists($argv[1])) { echo "Error: File does not exist!\n"; die(); }

// Load up the XML file
$dom = simplexml_load_file($file);

// Loop through all the <Resource> nodes
foreach($dom->Resource as $resource)
{
  // Look for the <Resource> tag with the name of "jdbc/pegarules" (case-insensitive)
        if(stripos($resource["name"], "jdbc/pegarules") !== false)
{
  if(strpos($resource["url"],"jdbc:oracle") === 0) // The " === 0" checks to see if "jdbc:oracle" is at the very beginning of the string
  {
    // Parse "url" for Oracle JDBC driver
    // Example: jdbc:oracle:thin:@myorgserver:1552:myorgdb

    $pieces = explode(":",$resource["url"]); // Split the URL by the : colon character.
    $server = $pieces[3];                    // Array starts at zero, so the server name is index 3, which is the 4th piece from the left
    $db = $pieces[5];                        // The DB name is index 5, which is the 6th piece from the left
  }
  elseif(strpos($resource["url"],"jdbc:sqlserver") === 0)
  {
    // Parse "url" for SQL Server JDBC driver
    // Example: jdbc:sqlserver://server1;databaseName=db1;SelectMethod=cursor;SendStringParametersAsUnicode=false;MultiSubnetFailover=True

    $pieces = explode(";",$resource["url"]);   // Split the URL by the ; semi-colon character.

    // Loop through the resulting pieces
    foreach($pieces as $piece)
    {
      if(stripos($piece,"databasename") === 0) // Look for a string piece like "databaseName=abcdef"
      {
        // If so, split it by the = sign and create a variable called $db from the value
        list($varname,$db) = explode("=",$piece);
      }
      elseif(stripos($piece,"://")) // Look for the "://" text that signifies the separate of protocol and server name
      {
        // If so, split it by the :// and create a variable called $db from the value
        list($varname,$server) = explode("://",$piece);
      }
    }
  }
}

//echo "DB = " . isset($db) ? $db : "(Not found)\n";
//echo "SERVER = " . isset($server) ? $server : "(Not found)\n";

}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of gr8gonzo
gr8gonzo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@gr8gonzo: Thank you for your reply! there is a check that needs to occur first whether the resource xml name is jdbc/PegaRULES
and I am looking to get the DB and Server name of only that resource and not every resource.
Now I get only the following values:
DB = (Not found)
SERVER = (Not found) for the below code

<?php
// Error checking
if(!isset($argv[1])) { echo "Syntax: " . basename(__FILE__) . " file.xml\n"; die(); }
$file = $argv[1];
if(!file_exists($argv[1])) { echo "Error: File does not exist!\n"; die(); }

// Load up the XML file
$dom = simplexml_load_file($file);

// Loop through all the <Resource> nodes
foreach($dom->Resource as $resource)
{
  // Look for the <Resource> tag with the name of "jdbc/pegarules" (case-insensitive)
        if(stripos($resource["driverClassName"], "oracle") !== false)
{
  if(strpos($resource["url"],"jdbc:oracle") === 0) // The " === 0" checks to see if "jdbc:oracle" is at the very beginning of the string
  {
    // Parse "url" for Oracle JDBC driver
    // Example: jdbc:oracle:thin:@myorgserver:1552:myorgdb

    $pieces = explode(":",$resource["url"]); // Split the URL by the : colon character.
    $server = $pieces[3];                    // Array starts at zero, so the server name is index 3, which is the 4th piece from the left
    $db = $pieces[5];                        // The DB name is index 5, which is the 6th piece from the left
  }
  elseif(strpos($resource["url"],"jdbc:sqlserver") === 0)
  {
    // Parse "url" for SQL Server JDBC driver
    // Example: jdbc:sqlserver://server1;databaseName=db1;SelectMethod=cursor;SendStringParametersAsUnicode=false;MultiSubnetFailover=True

    $pieces = explode(";",$resource["url"]);   // Split the URL by the ; semi-colon character.

    // Loop through the resulting pieces
    foreach($pieces as $piece)
    {
      if(stripos($piece,"databasename") === 0) // Look for a string piece like "databaseName=abcdef"
      {
        // If so, split it by the = sign and create a variable called $db from the value
        list($varname,$db) = explode("=",$piece);
      }
      elseif(stripos($piece,"://")) // Look for the "://" text that signifies the separate of protocol and server name
      {
        // If so, split it by the :// and create a variable called $db from the value
        list($varname,$server) = explode("://",$piece);
      }
    }
  }
}

echo "DB = " . (isset($db) ? $db : "(Not found)") . "\n";
echo "SERVER = " . (isset($server) ? $server : "(Not found)") . "\n";
}

Open in new window

I could not get a proper answer for this question. I am closing it and going to follow another approach. Thank You all for your help!