Solved

how to search for html-data between specified tags with regex in javascript?

Posted on 2012-04-04
8
303 Views
Last Modified: 2012-04-09
I tried some regular expressions for getting <tag>-data with match so that i get an array of every occurence in the document.. This was one regular expression:

<(?<xhtml>.*).*>(?<text>.*)</\k<xhtml>>

It works great when tested here:

http://www.regexlib.com/%28A%28eDPN3vQkkqa9ESh2c8Iz7vUzZ8_UOnYRuZ3cjrpHIWOCnJGeCiwnmuoC7tqjginBeHl3eDLJziSZCjR8UvYAIHVFlTpd7wkvjyL2HSnYT6jKqLct0tArFfr_lOAidcV6ZwVbcYLcMCoDVcyJ4Fy-3DwWWckcsFYvjRQNHYpXsohwXaZ_kYUcwYtrKk1rK-Zd0%29%29/RETester.aspx

But it generates error: Invalid quantifier
and an arrow points at /(here< at the beginning of the expression,
when tested in javascript with:

var xhtmlArray = responseText.match(/<(?<xhtml>.*).*>(?<text>.*)</\k<xhtml>>/);


This is the source:

<changedata><boxobjectid>box_object_11</boxobjectid><prevsibling>none</prevsibling><nextsibling>box_object_2</nextsibling><xhtml>

		<p class="title_object" id="title_object_1">Titelrad</p>

		<p class="text_object" id="text_object_1">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed a sem non lectus bibendum tempus. Aliquam tristique ultrices condimentum. Sed sem nisi, suscipit quis elementum vel, pellentesque eget ligula. Cras tristique elit vel nisl fermentum ut consectetur neque consectetur. Nulla vitae nibh non lacus tincidunt aliquam pulvinar quis velit. Vestibulum rhoncus rhoncus nisl ut molestie. Nullam quis sapien laoreet mauris vestibulum mattis at quis lacus. Suspendisse vel lorem non tortor mollis ultricies sit amet vel ante. Aenean massa ligula, aliquet id dapibus at, pellentesque ac nulla. Maecenas at purus nec velit luctus tempor et ut quam. Nam sed justo eros, quis congue ipsum. Mauris volutpat congue dolor non pharetra. Nulla facilisi.</p>

	</xhtml></changedata>

Open in new window


I ONLY want to get the code inside the <xhtml>-tag. And to get it stored in an Array..

What am i doing wrong? I even tried adding /gi after the regex to account for many objects. I feel I still suck on regex. But it's so goood!

// Best regards!
0
Comment
Question by:walkman69
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 18

Expert Comment

by:zc2
ID: 37808812
The following expression should extract the content between the xhtml tags.
Note, it returns an array and if it's not empty the first item is the whole string matched (with xhtml tags) and the second is the content.

responseText.match( /<xhtml>(.*)<\/xhtml>/m );
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37809029
My suggestion would be:

var xhtmlArray = responseText.match(/<xhtml[^>]*>((?:[^<]|<(?!\/xhtml>))*)<\/xhtml>/g);

Open in new window

0
 
LVL 2

Author Comment

by:walkman69
ID: 37813591
kaufmed: Long time no seen, That's quite an expression, what does it do ? Obviously it works, but I don't really get the syntax. zc2, sorry but your solution generated: xhtmlArray is null.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37813692
Greetings  = D

Here's a breakdown:

<xhtml       -  literal text
[^>]*        -  zero or more (*) of any charcter NOT ([^...]) a closing bracket (>)
>            -  literal text
(            -  start of capture group
(?:          -  start of non-capturing group
[^<]         -  any character NOT ([^...]) an opening bracket
|            -  OR
<            -  literal text
(?!\/xhtml>) - NOT followed by [(?! ... )] the literal text "/xhtml>"; forward slash is escaped (\/)
)            - end of non-capturing group
*            - zero or more of the thing to the left; in this case, the entire non-capturing group
)            - end of capture group
<\/xhtml>    - literal text; forward slash is escaped (\/)

Open in new window

0
 
LVL 2

Author Comment

by:walkman69
ID: 37815349
but doesnt this contradict itself?

[^>]*>  Zero or more characters not a closing bracket, and it must be followed by a closing bracket? Obviously it's not.. but why?
0
 
LVL 2

Author Comment

by:walkman69
ID: 37815362
ok.. I think i get it.. Zero or more.. it's always that, and then a closing bracket.
Otherwise it doesn't account for <xhtml attributes>
0
 
LVL 2

Author Comment

by:walkman69
ID: 37815367
But actually i'd rather want only the text inside the <xhtml>.. those could be stripped away..
Is there an easy way to do that with regex? Or do i have to include some string-handling functions?
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 37815610
i'd rather want only the text inside the <xhtml>
Sure. This is the reason I used the capture group. You could do something like this:

var pattern = /<xhtml[^>]*>((?:[^<]|<(?!\/xhtml>))*)<\/xhtml>/g;
var xhtmlArray = pattern.exec(responseText);

while (xhtmlArray != null)
{
    alert(xhtmlArray[1]);
    xhtmlArray = pattern.exec(responseText);
}

Open in new window

0

Featured Post

How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The task A number given should be formatted for easy reading by separating digits into triads. Format must be made inline via JavaScript, i.e., frameworks / functions are not welcome. So let’s take a number like this “12345678.91¿ and format i…
Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
Suggested Courses

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question