Link to home
Start Free TrialLog in
Avatar of Bodestone
BodestoneFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Regex remove CRLF and leading/trailing spaces

I am looking to a regex expression to remove the carriage return and any following whitespace on the next line of a string.

I should be able to work it out but have a real blind spot when it comes to regex.

For background this is for a javascript replace that will be used on the innerHTML of an element to make further parsing and modification easier.

I know about appendChild and insertBefore and so on. It's just there are some cases where direct string manipulation may make things a lot easier but flatteningit out would make it more so.
//therefore
<div>
    <span>tabs or spaces</span>
</div>

//becomes
<div><span>tabs or spaces</span></div>

Open in new window

Avatar of mayne171
mayne171

Well, let me see here. it would something like \s+ (space 1 or more times). A tab is a bunch of spaces so it may work. If not I know that \t+ is a tab 1 or more times. \n\r is CRLF or Windows (I think) and \n or Linux. I may be wrong but that is off the top of my head. If you need me to produce code...just let me know...
Sudo-code: (string).replace('/[\s\t]+[\r\n]$/g', "");

This says to look for spaces, tabs 1 or more times as well as a CRLF at the end of the line. (Haven't tested but that is how I would start...
Replace the following regex with "":

[\r\n]+\s*

Avatar of kaufmed
I would suggest adding checks for whitespace before and after your check for newline:
\s*[\r\n]+\s*

Open in new window

\s should be whitepace, including tabs as far as I remember

But do we not need to do this in TWO goes? Or are we line oriented?

String.prototype.trim=function() {
  return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
}

Anyway, I think you just want to remove the non parsable whitespace that FF dislikes:

(code from http://www.javascriptkit.com/dhtmltutors/getxml3.shtml - I think there is a nicer one out there)



//REMOVE white spaces in XML file. Intended mainly for NS6/Mozilla
for (i=0;i<msgobj.childNodes.length;i++){
  if ((msgobj.childNodes[i].nodeType == 3)&& (!notWhitespace.test(msgobj.childNodes[i].nodeValue))) { // that is, if it's a whitespace text node
    msgobj.removeChild(msgobj.childNodes[i])
    i--
  }
}

Open in new window

Try this one also:

\r|\n| {2,}

I designed it for this kind of strings:

<div>
    <span><tab>tabs or spaces<tab></span>
</div>

where <tab> is representing a tab key.

after passing the string through this expression it will become:
<div><span>tabs or spaces</span></div>

but if you write like this:
<div>
    <span> tabs or spaces </span>
</div>
It will become:
<div><span> tabs or spaces </span></div>
that is one of the limitation. :(
@Shahan_Developer

>>  \r|\n| {2,}

Does not account for tabs.
Nevermind...  I just woke up and read your comment incorrectly  :)
ASKER CERTIFIED SOLUTION
Avatar of Pui_Yun
Pui_Yun
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Bodestone

ASKER

Cheers and sorry for the delay
 I initally had a look at these  and have been finding that only the first instance was removed or I could see no change. I was then confused about whether I needed to enclose the regex in quotes. I assumed no or it would take it a a string literal but when i didn;t with some of the above it seemed to cause the whole script to fail. I'm running it in grease monkey and haven't got round to getting the failures piped to the error console yet.
I've also been somewhat busy with an annoying ETL process but will have another look at all the suggestions again in a bit.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I looked at all the preceding ones and while they gave me clues my almost non existent, though now slightly increased, grasp on regex sis not help me complete the picture. Each one was missing one part of the puzzle such as carriage returns not preceded or followed by whitespace.

Additional points given for spelling out the exact rules for defining the search parameter.