Link to home
Start Free TrialLog in
Avatar of azaram
azaram

asked on

Regular expression to hyphenate long words

I have a website where user-driven content is being displayed in columns. Words longer than, say, 30 characters are pushing out the column widths. This includes URLs that users enter, which is not uncommon.
I need a regex that will find these long words and insert a hyphen after the nth character. This will render URLs useless, but that's OK, because the content in the columns is just summary content that links to a full version that won't need to be hyphenated..

Thanks!
SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of azaram
azaram

ASKER

Hi ozo,
I'm not really sure what that was that you sent.. It appears to be a word-manipulation library for perl?
I'm coding in ASP and would prefer a regex solution if anyone has one.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
it is a regular expression library to find the hypenenation points in English.
There's another library that interprets TeX patterns to determine hypenation points
http://www.ccl.net/cca/text-processing/tex/latex/polish/hyphen.english
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Is it an idea to just quote the appropriate code snippet along with the url? This would preserve the snippets in the paq when the external website is changed or removed. It might be easier on azaram as well.
@Roonan:

var str = "abcdefghijklmnopqrstuvwxyz";
document.write(str.replace(/(\S{10})(\S)/g, "$1<wbr>$2"));
Change <wbr> to <br>
That's not necessairy.
gops: your own suggestion uses  <wbr>


so why not?
 var strSoftHyphen = (navigator.userAgent.toLowerCase().indexOf("applewebkit") > -1 || document.all) ? "&shy;" : "<wbr/>"; // use soft-hyphen for IE and Opera which are known to implement it correctly

Open in new window

<wbr> is not supported by all browsers. I can see this is not working in my browser (IE6)
Avatar of azaram

ASKER

Thanks, I have a good solution now.

gops1 got most points as this solution did exactly what I wanted. A simple regex would have been better, but I think this one was more comprehensive. I did leave it open for JS solutions. I've hacked it for now by inserting <script lang....>document.write(SoftWrap('kdflksdjflksdjflksdf'),20)</script> wherever it's needed for now. That works fine, but a bit stupid/verbose.. next step I'll probably wrap each  bit of user-driven content in a <div id="usercontent1023"> for example and have a script that runs through all divs that match usercontent.... and transform them.

Roonaan, gave you part marks for the Regex, which is specificalyl what I was asking for, but in practice I couldn't get it working properly in classic ASP with the regex object and regex replace. I put that down more to my lack of regex knowledge and that I didn't play with it too much because the softwrap JS worked fine..

Gops: your suggestion uses <wbr> or &shy; depending on browser...
But what platform?

this works  on IE6 on windows:

<span style="width:100px">test<wbr>testtest<wbr>testtest<wbr>testtest<wbr>testtest<wbr>testtest<wbr>test</span>