Want to use Regex to dynamically encode ampersand in urls

Hi!
I've been scouring the web for an answer and I think my limitation is that I'm not that familiar with Regular Expressions and how they work. Especially, how to add the code to my loop or template file.

I found this code:

text = Regex.Replace(text, @"
    # Match & that is not part of an HTML entity.
    &                  # Match literal &.
    (?!                # But only if it is NOT...
      \w+;             # an alphanumeric entity,
    | \#[0-9]+;        # or a decimal entity,
    | \#x[0-9A-F]+;    # or a hexadecimal entity.
    )                  # End negative lookahead.", 
    "&",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

Open in new window


But I don't know how to add it to my file. I have a custom field for a Indiebound link. Most of the links contain the ampersand, so code isn't validationg. Here is the section of code:

if ( ! is_active_sidebar( 'sidebar-books' ) ) {
	return;
}
?>

<div id="buy-it" class="widget-area" role="complementary">
	<?php if (is_single() && is_post_type('book')) : ?>
	<aside class="beige buy">
		<h2>Buy It!</h2>
		<ul>
			<?php if(get_field('indie_bookstores_link') !=false) { ?> 
			<li><a href="<?php the_field('indie_bookstores_link'); ?>" target="_blank"><img src="<?php bloginfo('url'); ?>/wp-content/uploads/2015/08/indiebound.png"></a></li><?php } ?>
			<?php if(get_field('amazon_link') !=false) { ?>
			<li><a href="<?php the_field('amazon_link'); ?>" target="_blank"><img src="<?php bloginfo('url'); ?>/wp-content/uploads/2015/08/amazon.png"></a></li><?php } ?>
			<?php if(get_field('barnes_n_noble_link') !=false) { ?>
			<li><a href="<?php the_field('barnes_n_noble_link'); ?>" target="_blank"><img src="<?php bloginfo('url'); ?>/wp-content/uploads/2015/08/barnes-noble.png"></a></li><?php } ?>
		</ul>
	</aside>
	<?php endif; ?>
</div><!-- #buy-it -->

Open in new window

LVL 9
Alicia St RoseOwner & Principle Developer/DesignerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

frankhelkCommented:
I'll meditate a bit about that, but let me give a little suggestion first ;-)

I'm not a RegEx guru, but for designing and testing RegEx, I use the free tool Expresso. Given some basic understanding of RegEx, it's nice for learning and experimenting, too.
Ray PaseurCommented:
Maybe the best way to ask this question would be to show us exactly what you have for input and exactly what you want for output.  Ampersands are an "overloaded" character -- they have different meanings in different contexts, and there may be PHP functions that already address the context you are using.  But to know that, we would have to see the inputs and outputs.
Alicia St RoseOwner & Principle Developer/DesignerAuthor Commented:
I'm not a RegEx guru, but for designing and testing RegEx, I use the free tool Expresso.

frankhelk, thank you for the suggestion I'll look into it

Maybe the best way to ask this question would be to show us exactly what you have for input and exactly what you want for output.

Ray Paseur, here's a couple of examples of the links that have been added to the custom fields:

http://www.indiebound.org/search/book?searchfor=bruce+hale+chameleon+wore+chartreuse&x=0&y=0

http://www.amazon.com/Chameleon-Wore-Chartreuse-Gecko-Mystery/dp/0152024859/ref=sr_1_4?s=books&ie=UTF8&qid=1440576439&sr=1-4&refinements=p_82%3AB000APLXEC

They aren't validating in W3C
CompTIA Security+

Learn the essential functions of CompTIA Security+, which establishes the core knowledge required of any cybersecurity role and leads professionals into intermediate-level cybersecurity jobs.

Ray PaseurCommented:
This one does not validate, but there is no instance of "amp" in the validator output.
https://validator.w3.org/check?uri=http%3A%2F%2Fwww.indiebound.org%2Fsearch%2Fbook%3Fsearchfor%3Dbruce%2Bhale%2Bchameleon%2Bwore%2Bchartreuse%26x%3D0%26y%3D0&charset=%28detect+automatically%29&doctype=Inline&group=0

The Amazon.com page does not validate either, but that may be more a matter of sensitivity of the W3 validator than an indication that anything is wrong.  It's common to see URLs with ampersands in them.  What kind of failure is this causing?
Alicia St RoseOwner & Principle Developer/DesignerAuthor Commented:
Hi Ray,
It's not the page on the Indiebound site I'm trying to validate. It's this page on a site I'm building:

https://validator.w3.org/nu/?doc=http%3A%2F%2Fsandbox.intrepidrealist.com%2Fbruce-hale%2Fbooks%2Fchet-gecko-series%2Fthe-chameleon-wore-chartreuse-chet-gecko-mystery-no-1%2F

The links to indiebound and Amazon are causing errors because of the ampersand. I need to encode them apparently. But I want this to happen dynamically, because my client isn't going to remember to do it. And I've already got loads of these links all over the site for his books:

http://sandbox.intrepidrealist.com/bruce-hale
Ray PaseurCommented:
Have you tried translating the URLs with this?
http://php.net/manual/en/function.htmlspecialchars.php

I am not suggesting that is needed, just that it already exists and is what we usually use to "entitize" the special characters.  I'm not sure that it's always needed, but it may be enough to satisfy your requirements.  Just a thought.
Alicia St RoseOwner & Principle Developer/DesignerAuthor Commented:
Hi Ray,
I found this code in the comments section of the page you linked to. It looks like what I need, though it's been voted down!

<?php 
function formspecialchars($var) 
    { 
        $pattern = '/&(#)?[a-zA-Z0-9]{0,};/'; 
        
        if (is_array($var)) {    // If variable is an array 
            $out = array();      // Set output as an array 
            foreach ($var as $key => $v) {      
                $out[$key] = formspecialchars($v);         // Run formspecialchars on every element of the array and return the result. Also maintains the keys. 
            } 
        } else { 
            $out = $var; 
            while (preg_match($pattern,$out) > 0) { 
                $out = htmlspecialchars_decode($out,ENT_QUOTES);       
            }                             
            $out = htmlspecialchars(stripslashes(trim($out)), ENT_QUOTES,'UTF-8',true);     // Trim the variable, strip all slashes, and encode it 
            
        } 
        
        return $out; 
    } 
?>

Open in new window


My issue is not know where to put the code!!
I'm still green on some things, I guess! ;)
Ray PaseurCommented:
Hmm... I am still not seeing a failure when I click links to Indiebound or Amazon.  Can you please post a link to a page that illustrates the issue?  Thanks.
Alicia St RoseOwner & Principle Developer/DesignerAuthor Commented:
Hi Ray,
It has nothing to do with clicking the links. The links on this page do not validate in W3C because they have ampersands. Can you please tell me how to dynamically remove those ampersands and replace with HTML entities? It looks like I have to do it with regular expressions.

Are you not able to see the W3C errors on the link below? Numbers 7 through 12  are errors.

https://validator.w3.org/nu/?doc=http%3A%2F%2Fsandbox.intrepidrealist.com%2Fbruce-hale%2Fbooks%2Fchet-gecko-series%2Fthe-chameleon-wore-chartreuse-chet-gecko-mystery-no-1%2F
Ray PaseurCommented:
The W3C validator is too strict - to the point of incompetence at times like this.  Modern browser developers do not - and cannot - follow their guidelines 100%, and for good reason.  If the developers did, they would produce browsers that did not work for the majority of the people who go online and want to do economically valuable stuff like learn, share, sell, and buy.  Your site works correctly. The only people who are "offended" are the W3 validator people, and they are not a representative slice of online activity.

Here is a redacted version of your script (it's in the code snippet below)
http://iconoun.com/demo/temp_laughhearty.php

Here's the validator link
https://validator.w3.org/nu/?doc=http%3A%2F%2Ficonoun.com%2Fdemo%2Ftemp_laughhearty.php

The validator barks about ampersands in line 27 and 29.  Curiously it does not complain about line 38, where the ampersand should probably be "entitized."

Executive summary: stop worrying!  You're already doing it right.  

<!DOCTYPE html>

<!-- http://www.experts-exchange.com/questions/28748819/Want-to-use-Regex-to-dynamically-encode-ampersand-in-urls.html#a41081070 -->

<html dir="ltr" lang="en-US">
<head>
<meta charset="utf-8" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">

<style type="text/css">
/* STYLE SHEET HERE */
</style>

<title>EE_Q_28748819</title>
</head>
<body>

<!-- MODIFIED CODE -->
<p>
<a href="http://www.indiebound.org/search/book?searchfor=bruce+hale+chameleon+wore+chartreuse&amp;x=0&amp;y=0" target="_blank">
<img src="http://sandbox.intrepidrealist.com/bruce-hale/wp-content/uploads/2015/08/indiebound.png" alt="Indiebound logo"></a>
</p>
<!-- /MODIFIED CODE -->

<!-- ORIGINAL CODE -->
<br><a href="http://www.indiebound.org/search/book?searchfor=bruce+hale+chameleon+wore+chartreuse&x=0&y=0" target="_blank">
<img src="http://sandbox.intrepidrealist.com/bruce-hale/wp-content/uploads/2015/08/indiebound.png" alt="Indiebound logo"></a>
<br><a href="http://www.amazon.com/Chameleon-Wore-Chartreuse-Gecko-Mystery/dp/0152024859/ref=sr_1_4?s=books&ie=UTF8&qid=1440576439&sr=1-4&refinements=p_82%3AB000APLXEC" target="_blank">
<img src="http://sandbox.intrepidrealist.com/bruce-hale/wp-content/uploads/2015/08/amazon.png"  alt="Amazon logo"></a>
<br><a href="http://www.barnesandnoble.com/w/chameleon-wore-chartreuse-bruce-hale/1100151572?ean=9780152024857" target="_blank">
<img src="http://sandbox.intrepidrealist.com/bruce-hale/wp-content/uploads/2015/08/barnes-noble.png"  alt="Barnes & Noble logo"></a>

<footer id="colophon" class="site-footer" role="contentinfo">
  <div class="site-info">
  &copy; Bruce Hale 2005 - 2015. All Rights Reserved.
  <span class="sep"> | </span>
  WordPress design & development by <a href="http://intrepidrealist.com" target="_blank">Intrepid Realist Design</a>
  </div><!-- .site-info -->
</footer><!-- #colophon -->
<!-- /ORIGINAL CODE -->

</body>
</html>

Open in new window

Also, all of those jokes about regular expressions are true - they are much more trouble than they are worth in most cases, including this case, and I say that confidently as an author of hundreds of regular expressions.  Regular expressions are not powerful enough to be used with self-aware documents like XML and HTML.  So please, just don't do that.  It's right to use the W3 validator for advice; it's wrong to treat it as a higher authority that you must obey.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Alicia St RoseOwner & Principle Developer/DesignerAuthor Commented:
I'm accepting your solution because you are giving me permission to cry uncle on this one!
Thanks!
Ray PaseurCommented:
:-)

I think you're on firm ground.  Best of luck with the project! ~Ray
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
WordPress

From novice to tech pro — start learning today.