Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 297
  • Last Modified:

Linux single field extract from HTML

Hi,

Can someone help out please with a linux command (been trying sed) to extract the src string from the HTML below:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
                            <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
... a whole bunch of html including other img tags...
</script>
<img src="/path/to/file/12345667.jpg" alt="" /></body>
</html>

I would like to pipe this HTML file into the command and have the command output just the src of the very last IMG tag in the file (/path/to/file/12345667.jpg in this case), which I will assign to a bash variable for subsequent use.

Very grateful
BT
0
brothertom
Asked:
brothertom
1 Solution
 
farzanjCommented:
Try this
sed -ne '/src/'p filename  | sed 's/.*src=[^\/]*\([^ "]*\).*/\1/'

Open in new window

0
 
brothertomAuthor Commented:
Thanks farzanj.

Tiny refinement for the actual file did the trick...

sed -ne '/img src/'p filename  | sed 's/.*src=[^\/]*\([^ "]*\).*/\1/' | tail -1

BT
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now