I'm parsing a HTML file witk awk, and I want my field separator to be any sequence of <> tags, leaving fields as the text outside of the tags.
Currently I have
FS = "[ \t]*<[^>]*>[ \t]*"
which works well for one tag (and any surrounding whitespace), but gives lots of empty fields when several tags <b><i> etc are next to each other.
I'd like something like
FS = "[[ \t]*<[^>]*>[ \t]*]+"
but that doesn't work. I've tried a number of permutations with brackets, parentheses and plus signs.
What's the proper way of doing this?