Regex remove CRLF and leading/trailing spaces

I am looking to a regex expression to remove the carriage return and any following whitespace on the next line of a string.

I should be able to work it out but have a real blind spot when it comes to regex.

For background this is for a javascript replace that will be used on the innerHTML of an element to make further parsing and modification easier.

I know about appendChild and insertBefore and so on. It's just there are some cases where direct string manipulation may make things a lot easier but flatteningit out would make it more so.
//therefore
<div>
    <span>tabs or spaces</span>
</div>

//becomes
<div><span>tabs or spaces</span></div>

Open in new window

LVL 10
BodestoneAsked:
Who is Participating?
 
Pui_YunCommented:
Hi bodestone,
Not sure if this will help, but I've included a html file, that uses javascript to do the regex.  See attached code.

Hope it helps.
P.
<html>
<head>
    <title>test</title>
    <script language="javascript" type="text/javascript">
    		function test ()
    		{
    			var strTest = document.getElementById("divTest");
    			alert(strTest.innerHTML);
    			alert(strTest.innerHTML.replace(/\r?\n\s*/g,''));    			
    		}
		</script>
</head>
<body>
<div id="divTest">

<div>
    <span>tabs or spaces</span>
</div>




<div></div>
</div>
<input type="button" value="test" onclick="test()" />
</body>
</html>

Open in new window

0
 
mayne171Commented:
Well, let me see here. it would something like \s+ (space 1 or more times). A tab is a bunch of spaces so it may work. If not I know that \t+ is a tab 1 or more times. \n\r is CRLF or Windows (I think) and \n or Linux. I may be wrong but that is off the top of my head. If you need me to produce code...just let me know...
0
 
mayne171Commented:
Sudo-code: (string).replace('/[\s\t]+[\r\n]$/g', "");

This says to look for spaces, tabs 1 or more times as well as a CRLF at the end of the line. (Haven't tested but that is how I would start...
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
SuperdaveCommented:
Replace the following regex with "":

[\r\n]+\s*

0
 
käµfm³d 👽Commented:
I would suggest adding checks for whitespace before and after your check for newline:
\s*[\r\n]+\s*

Open in new window

0
 
Michel PlungjanIT ExpertCommented:
\s should be whitepace, including tabs as far as I remember

But do we not need to do this in TWO goes? Or are we line oriented?

String.prototype.trim=function() {
  return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
}

Anyway, I think you just want to remove the non parsable whitespace that FF dislikes:

(code from http://www.javascriptkit.com/dhtmltutors/getxml3.shtml - I think there is a nicer one out there)



//REMOVE white spaces in XML file. Intended mainly for NS6/Mozilla
for (i=0;i<msgobj.childNodes.length;i++){
  if ((msgobj.childNodes[i].nodeType == 3)&& (!notWhitespace.test(msgobj.childNodes[i].nodeValue))) { // that is, if it's a whitespace text node
    msgobj.removeChild(msgobj.childNodes[i])
    i--
  }
}

Open in new window

0
 
Shahan AyyubSenior Software Engineer - iOSCommented:
Try this one also:

\r|\n| {2,}

I designed it for this kind of strings:

<div>
    <span><tab>tabs or spaces<tab></span>
</div>

where <tab> is representing a tab key.

after passing the string through this expression it will become:
<div><span>tabs or spaces</span></div>

but if you write like this:
<div>
    <span> tabs or spaces </span>
</div>
It will become:
<div><span> tabs or spaces </span></div>
that is one of the limitation. :(
0
 
käµfm³d 👽Commented:
@Shahan_Developer

>>  \r|\n| {2,}

Does not account for tabs.
0
 
käµfm³d 👽Commented:
Nevermind...  I just woke up and read your comment incorrectly  :)
0
 
BodestoneAuthor Commented:
Cheers and sorry for the delay
 I initally had a look at these  and have been finding that only the first instance was removed or I could see no change. I was then confused about whether I needed to enclose the regex in quotes. I assumed no or it would take it a a string literal but when i didn;t with some of the above it seemed to cause the whole script to fail. I'm running it in grease monkey and haven't got round to getting the failures piped to the error console yet.
I've also been somewhat busy with an annoying ETL process but will have another look at all the suggestions again in a bit.
0
 
käµfm³d 👽Commented:
For JS, if you are passing the pattern directly to a string.replace() call, then the syntax would be

    var string = "testing the regex";
    var result = string.replace(/\s*[\r\n]+\s*/, "");

with forward-slashes ( / ) demarking the pattern--no quotes. If you were to use the RegExp object, then you would use quotes, but you would have to double-up the backslashes within the pattern:

    var string = "testing the regex";
    var reg = new RegExp("\\s*[\\r\\n]+\\s*");
    var result = string.replace(reg, "");
0
 
BodestoneAuthor Commented:
I looked at all the preceding ones and while they gave me clues my almost non existent, though now slightly increased, grasp on regex sis not help me complete the picture. Each one was missing one part of the puzzle such as carriage returns not preceded or followed by whitespace.

Additional points given for spelling out the exact rules for defining the search parameter.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.