Regex remove CRLF and leading/trailing spaces

I am looking to a regex expression to remove the carriage return and any following whitespace on the next line of a string.

I should be able to work it out but have a real blind spot when it comes to regex.

For background this is for a javascript replace that will be used on the innerHTML of an element to make further parsing and modification easier.

I know about appendChild and insertBefore and so on. It's just there are some cases where direct string manipulation may make things a lot easier but flatteningit out would make it more so.
//therefore
<div>
    <span>tabs or spaces</span>
</div>

//becomes
<div><span>tabs or spaces</span></div>

Open in new window

LVL 10
BodestoneAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mayne171Commented:
Well, let me see here. it would something like \s+ (space 1 or more times). A tab is a bunch of spaces so it may work. If not I know that \t+ is a tab 1 or more times. \n\r is CRLF or Windows (I think) and \n or Linux. I may be wrong but that is off the top of my head. If you need me to produce code...just let me know...
0
mayne171Commented:
Sudo-code: (string).replace('/[\s\t]+[\r\n]$/g', "");

This says to look for spaces, tabs 1 or more times as well as a CRLF at the end of the line. (Haven't tested but that is how I would start...
0
SuperdaveCommented:
Replace the following regex with "":

[\r\n]+\s*

0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

käµfm³d 👽Commented:
I would suggest adding checks for whitespace before and after your check for newline:
\s*[\r\n]+\s*

Open in new window

0
Michel PlungjanIT ExpertCommented:
\s should be whitepace, including tabs as far as I remember

But do we not need to do this in TWO goes? Or are we line oriented?

String.prototype.trim=function() {
  return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
}

Anyway, I think you just want to remove the non parsable whitespace that FF dislikes:

(code from http://www.javascriptkit.com/dhtmltutors/getxml3.shtml - I think there is a nicer one out there)



//REMOVE white spaces in XML file. Intended mainly for NS6/Mozilla
for (i=0;i<msgobj.childNodes.length;i++){
  if ((msgobj.childNodes[i].nodeType == 3)&& (!notWhitespace.test(msgobj.childNodes[i].nodeValue))) { // that is, if it's a whitespace text node
    msgobj.removeChild(msgobj.childNodes[i])
    i--
  }
}

Open in new window

0
Shahan AyyubSenior Software EngineerCommented:
Try this one also:

\r|\n| {2,}

I designed it for this kind of strings:

<div>
    <span><tab>tabs or spaces<tab></span>
</div>

where <tab> is representing a tab key.

after passing the string through this expression it will become:
<div><span>tabs or spaces</span></div>

but if you write like this:
<div>
    <span> tabs or spaces </span>
</div>
It will become:
<div><span> tabs or spaces </span></div>
that is one of the limitation. :(
0
käµfm³d 👽Commented:
@Shahan_Developer

>>  \r|\n| {2,}

Does not account for tabs.
0
käµfm³d 👽Commented:
Nevermind...  I just woke up and read your comment incorrectly  :)
0
Pui_YunCommented:
Hi bodestone,
Not sure if this will help, but I've included a html file, that uses javascript to do the regex.  See attached code.

Hope it helps.
P.
<html>
<head>
    <title>test</title>
    <script language="javascript" type="text/javascript">
    		function test ()
    		{
    			var strTest = document.getElementById("divTest");
    			alert(strTest.innerHTML);
    			alert(strTest.innerHTML.replace(/\r?\n\s*/g,''));    			
    		}
		</script>
</head>
<body>
<div id="divTest">

<div>
    <span>tabs or spaces</span>
</div>




<div></div>
</div>
<input type="button" value="test" onclick="test()" />
</body>
</html>

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
BodestoneAuthor Commented:
Cheers and sorry for the delay
 I initally had a look at these  and have been finding that only the first instance was removed or I could see no change. I was then confused about whether I needed to enclose the regex in quotes. I assumed no or it would take it a a string literal but when i didn;t with some of the above it seemed to cause the whole script to fail. I'm running it in grease monkey and haven't got round to getting the failures piped to the error console yet.
I've also been somewhat busy with an annoying ETL process but will have another look at all the suggestions again in a bit.
0
käµfm³d 👽Commented:
For JS, if you are passing the pattern directly to a string.replace() call, then the syntax would be

    var string = "testing the regex";
    var result = string.replace(/\s*[\r\n]+\s*/, "");

with forward-slashes ( / ) demarking the pattern--no quotes. If you were to use the RegExp object, then you would use quotes, but you would have to double-up the backslashes within the pattern:

    var string = "testing the regex";
    var reg = new RegExp("\\s*[\\r\\n]+\\s*");
    var result = string.replace(reg, "");
0
BodestoneAuthor Commented:
I looked at all the preceding ones and while they gave me clues my almost non existent, though now slightly increased, grasp on regex sis not help me complete the picture. Each one was missing one part of the puzzle such as carriage returns not preceded or followed by whitespace.

Additional points given for spelling out the exact rules for defining the search parameter.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.