Link to home
Start Free TrialLog in
Avatar of Smoerble
SmoerbleFlag for Germany

asked on

Need RegEx: shrink string without breaking HTML rules.

I am looking for a regex to shrink a string. The problem is, the string might contain HTML tags. I don't want to kill all HTML from the string, I just need to shrink the string so, that the result is valid HTML.

Additionally I would like then not to cut in the middle of words, so I need to shrink it "back" to the next space/return in the string.

Can someone help me with this or point to a page with matching regex?
Thanks
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

To your question, "I am looking for a regex to shrink a string. The problem is, the string might contain HTML tags. I don't want to kill all HTML from the string, I just need to shrink the string so, that the result is valid HTML."

Can you give an example of a string and how the string is to be shrunken, where in the string do you want to delete the characters?
I thought about this problem a few weeks ago, and decided that it was very messy territory.

For a start, it's hard to work out the length of a string when that string contains HTML tags. You have to count the length of the string, then subtract the length of the tags to get the length that will display on the page. That's a programmatic process that I don't think can be done in regular expressions alone.

In the end I just stripped out HTML to avoid the rather big task of working around possible corruption of HTML elements and HTML entities. (If you break an entity in half, you make XML invalid, and risk seeing HTML render incorrectly.)
Avatar of Smoerble

ASKER

Hackney: good points, thanks.

After I discussed your input, we found a logical and visual correct approach we want to implement. For ths we need several RegEx:

1) count all characters inside HTML tags (including the < and >)
2) count all characters OUTSIDE HTML tags
3) find the full string from <table> tag to </table> tag and replace it with a cimplete new string
4) From a starting point, find the NEXT sentence end (colon, semicolon, dot etc).

Additionally to this we need a more complex thing which I need to open an own questions I hope, someone can help me on these 4 tasks?
No help on these tasks?
The problem is, you can't count characters with regular expressions. You'd have to be using a scripting language, such as Perl or PHP for that.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial