get urls from sitemap (sed or grep)

How can I use sed or grep to create a list with only URLs from my sitemap file?

sitemap file looks like:
	<url>
		<loc>http://www.domain.coml/bla</loc>
		<lastmod>2011-10-16</lastmod>
		<changefreq>monthly</changefreq>
		<priority>0.8</priority>
	</url>
	<url>
		<loc>http://www.domain.com/bla2</loc>
		<lastmod>2011-10-16</lastmod>
		<changefreq>monthly</changefreq>
		<priority>0.8</priority>
	</url>

Open in new window

DennieAsked:
Who is Participating?
 
Maciej SConnect With a Mentor sysadminCommented:
sed version:
sed '/loc/!d;s/.*>\([^<]*\)<.*/\1/' sitemap

Open in new window

0
 
PapertripConnect With a Mentor Commented:
Here is a simple awk syntax that will do the trick.

[root@broken ee]# cat sitemap
 <url>
                <loc>http://www.domain.com/bla</loc>
                <lastmod>2011-10-16</lastmod>
                <changefreq>monthly</changefreq>
                <priority>0.8</priority>
        </url>
        <url>
                <loc>http://www.domain.com/bla2</loc>
                <lastmod>2011-10-16</lastmod>
                <changefreq>monthly</changefreq>
                <priority>0.8</priority>
        </url>
[root@broken ee]# awk -F'[<|>]' '/loc/{print $3}' sitemap
http://www.domain.com/bla
http://www.domain.com/bla2
[root@broken ee]#

Open in new window

0
 
Gerwin Jansen, EE MVEConnect With a Mentor Topic Advisor Commented:
Just grep is not possible but grep and sed combined:

cat sitemap | grep "<[/]*loc>" | sed 's/[<][/]*loc[>]//g;s/^[ \t]*//'

Open in new window


grep will filter out the lines containing the url's, like this:

            <loc>http://www.domain.coml/bla</loc>
            <loc>http://www.domain.com/bla2</loc>

first sed command will remove the loc start and end tags, like this:

            http://www.domain.coml/bla
            http://www.domain.com/bla2

adding the second sec command (after the ;) will remove the with space at the beginning of the lines, like this:

http://www.domain.coml/bla
http://www.domain.com/bla2
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.