[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 294
  • Last Modified:

Need RegEx: shrink string without breaking HTML rules.

I am looking for a regex to shrink a string. The problem is, the string might contain HTML tags. I don't want to kill all HTML from the string, I just need to shrink the string so, that the result is valid HTML.

Additionally I would like then not to cut in the middle of words, so I need to shrink it "back" to the next space/return in the string.

Can someone help me with this or point to a page with matching regex?
Thanks
0
Smoerble
Asked:
Smoerble
1 Solution
 
Fernando SotoCommented:
To your question, "I am looking for a regex to shrink a string. The problem is, the string might contain HTML tags. I don't want to kill all HTML from the string, I just need to shrink the string so, that the result is valid HTML."

Can you give an example of a string and how the string is to be shrunken, where in the string do you want to delete the characters?
0
 
HackneyCabCommented:
I thought about this problem a few weeks ago, and decided that it was very messy territory.

For a start, it's hard to work out the length of a string when that string contains HTML tags. You have to count the length of the string, then subtract the length of the tags to get the length that will display on the page. That's a programmatic process that I don't think can be done in regular expressions alone.

In the end I just stripped out HTML to avoid the rather big task of working around possible corruption of HTML elements and HTML entities. (If you break an entity in half, you make XML invalid, and risk seeing HTML render incorrectly.)
0
 
SmoerbleAuthor Commented:
Hackney: good points, thanks.

After I discussed your input, we found a logical and visual correct approach we want to implement. For ths we need several RegEx:

1) count all characters inside HTML tags (including the < and >)
2) count all characters OUTSIDE HTML tags
3) find the full string from <table> tag to </table> tag and replace it with a cimplete new string
4) From a starting point, find the NEXT sentence end (colon, semicolon, dot etc).

Additionally to this we need a more complex thing which I need to open an own questions I hope, someone can help me on these 4 tasks?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
SmoerbleAuthor Commented:
No help on these tasks?
0
 
HackneyCabCommented:
The problem is, you can't count characters with regular expressions. You'd have to be using a scripting language, such as Perl or PHP for that.
0
 
ozoCommented:
if you can strip out HTML, .then counting characters before and after will give you 1 and 2
you can count characters outside of <> with something like
$count=()=/(?:\G|>|^)([^<>])/g
but you may need something more complicated for things like
 <IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
 <script>if (a<b && a>c)</script>

s#<table>.*?</table>#cimplete new string#s

/\G.*?[:;.]/
You may want some context around that so you don' t match . in numbers or abbreviations
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now