Link to home
Start Free TrialLog in
Avatar of Yves_
Yves_Flag for Switzerland

asked on

Little RSS Bash Script

Hello,

I am trying to do a bash script which I can run with a variable. It should search for the variable in a rss.xml which contains more than one item which has two value's I would need <title> and <description>.

rss.xml looks like this:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>RSS Feed</title>
    <link>feed.rss.com</link>
    <description>RSS Feed</description>
    <copyright>asdfasdf</copyright>
    <ttl>5</ttl>
    <lastBuildDate>Tue, 15 Nov 2011 17:22:08 GMT</lastBuildDate>
    <item>
      <title>This is the Title 1</title>
      <link>http://compare.this.link.com/link1</link>
      <description>Some test description</description>
      <pubDate>Mon, 14 Nov 2011 22:18:18 GMT</pubDate>
    </item>
    <item>
      <title>This is the Title 2</title>
      <link>http://compare.this.link.com/link2</link>
      <description>Some test description 2</description>
      <pubDate>Mon, 14 Nov 2011 21:47:06 GMT</pubDate>
    </item>
  </channel>
</rss>

Now variable $1 looks like this "link2" now I want the bash to get me back the title "This is the Title 2" and the description "Some test description 2".

So far I am not so far :)

#!/bin/sh

wget http://feed.rss.com/rss.xml
cat rss.xml | grep '<title>' | awk -F\> '{ print $2 }' | awk -F\< '{ print $1 }'
cat rss.xml | grep '<description>' | awk -F\> '{ print $2 }' | awk -F\< '{ print $1 }'

Like this I get all the titles and all the descriptions...

Thanks a lot for your help
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

egrep "<title>|<description>" rss.xml |awk -F"<|>" '{printf "%16s:\t%s\n", $2, $3}'


Sorry, miunderstood the Q! Let's see...
ASKER CERTIFIED SOLUTION
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Yves_

ASKER

Wow that works like a charm... But I am kind a standing on the line now because of the variable I want to give the two results to two variables but because of the " and ' its not really working:

#!/bin/sh
title=egrep "<title>|<link>|<description>" rss.xml |awk -F"<|>" '{print $3}' | grep -B1 "$1" | grep -v "$1"
description=egrep "<title>|<link>|<description>" rss.xml |awk -F"<|>" '{print $3}' | grep -A1 "$1" | grep -v "$1"
echo "Title is: $title"
echo "Description is: $description"

Would be cool if you can help me with that too. I am also trying to understand the code... but ehm if you are now to bash scripts its kind a heavy.
title=$(egrep "<title>|<link>|<description>" rss.xml |awk -F"<|>" '{print $3}' | grep -B1 "$1" | grep -v "$1")
description=$(egrep "<title>|<link>|<description>" rss.xml |awk -F"<|>" '{print $3}' | grep -A1 "$1" | grep -v "$1")
echo "Title is: $title"
echo "Description is: $description"

See the $(  ) construct. That's how to assign a variable to the result of a command.
Avatar of Yves_

ASKER

This works perfect. Thank you.

Script is almost done. I am now struggeling at the last thing which is about file searching and manipulation.

I want the script to search for .tib .gho .ghs files in the directory $2 and check if there is also a .md5 with the same name as the file it finds. If not it creates one with the same name .md5 and the value $3

Example:
$2 = is /share/Backup and contains a win2k8-srv.tib file but no win2k8-srv.md5 file so the script creates a win2k8-srv.md5 with a value $3
Here you go:


find $2 -type f -name "*.tib" -o -name "*.gho" -o -name "*.ghs" | while read file
 do
  [[ ! -e ${file%"."*}.md5 ]] && echo $3 > ${file%"."*}.md5
 done

Avatar of Yves_

ASKER

something seams wrong:

[~] # find /root/ -type f -name "*.tib" -o -name "*.gho" -o -name "*.ghs"
BusyBox v1.01 (2011.08.03-17:51+0000) multi-call binary

Usage: find [PATH...] [EXPRESSION]

Search for files in a directory hierarchy.  The default PATH is
the current directory; default EXPRESSION is '-print'

EXPRESSION may consist of:
        -follow         Dereference symbolic links.
        -name PATTERN   File name (leading directories removed) matches PATTERN.
        -print          Print (default and assumed).

        -type X         Filetype matches X (where X is one of: f,d,l,b,c,...)
        -perm PERMS     Permissions match any of (+NNN); all of (-NNN);
                        or exactly (NNN)
        -mtime TIME     Modified time is greater than (+N); less than (-N);
                        or exactly (N) days
Avatar of Yves_

ASKER

I found it its the -o expression he does not understand...
OK,

BusyBox does not have all the options the original GNU commands have.

So you must split it into three:

find $2 -type f -name "*.tib" | while read file
 do
  [[ ! -e ${file%"."*}.md5 ]] && echo $3 > ${file%"."*}.md5
 done
find $2 -type f -name "*.gho"  | while read file
 do
  [[ ! -e ${file%"."*}.md5 ]] && echo $3 > ${file%"."*}.md5
 done
find $2 -type f -name "*.ghs" | while read file
 do
  [[ ! -e ${file%"."*}.md5 ]] && echo $3 > ${file%"."*}.md5
 done
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of xterm
xterm

Your second part is technically another question.  I think you should award woolmilkporc his points for the great answers.
Avatar of Yves_

ASKER

@xterm: I would have done it before. But I was abroad with no internet connection at all...

@woolmilkporc: thanks again for this amazing help. i could not have done it with out this tips. you are a genius!