Link to home
Start Free TrialLog in
Avatar of Alicia St Rose
Alicia St RoseFlag for United States of America

asked on

Want to use Regex to dynamically encode ampersand in urls

Hi!
I've been scouring the web for an answer and I think my limitation is that I'm not that familiar with Regular Expressions and how they work. Especially, how to add the code to my loop or template file.

I found this code:

text = Regex.Replace(text, @"
    # Match & that is not part of an HTML entity.
    &                  # Match literal &.
    (?!                # But only if it is NOT...
      \w+;             # an alphanumeric entity,
    | \#[0-9]+;        # or a decimal entity,
    | \#x[0-9A-F]+;    # or a hexadecimal entity.
    )                  # End negative lookahead.", 
    "&",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

Open in new window


But I don't know how to add it to my file. I have a custom field for a Indiebound link. Most of the links contain the ampersand, so code isn't validationg. Here is the section of code:

if ( ! is_active_sidebar( 'sidebar-books' ) ) {
	return;
}
?>

<div id="buy-it" class="widget-area" role="complementary">
	<?php if (is_single() && is_post_type('book')) : ?>
	<aside class="beige buy">
		<h2>Buy It!</h2>
		<ul>
			<?php if(get_field('indie_bookstores_link') !=false) { ?> 
			<li><a href="<?php the_field('indie_bookstores_link'); ?>" target="_blank"><img src="<?php bloginfo('url'); ?>/wp-content/uploads/2015/08/indiebound.png"></a></li><?php } ?>
			<?php if(get_field('amazon_link') !=false) { ?>
			<li><a href="<?php the_field('amazon_link'); ?>" target="_blank"><img src="<?php bloginfo('url'); ?>/wp-content/uploads/2015/08/amazon.png"></a></li><?php } ?>
			<?php if(get_field('barnes_n_noble_link') !=false) { ?>
			<li><a href="<?php the_field('barnes_n_noble_link'); ?>" target="_blank"><img src="<?php bloginfo('url'); ?>/wp-content/uploads/2015/08/barnes-noble.png"></a></li><?php } ?>
		</ul>
	</aside>
	<?php endif; ?>
</div><!-- #buy-it -->

Open in new window

Avatar of Frank Helk
Frank Helk
Flag of Germany image

I'll meditate a bit about that, but let me give a little suggestion first ;-)

I'm not a RegEx guru, but for designing and testing RegEx, I use the free tool Expresso. Given some basic understanding of RegEx, it's nice for learning and experimenting, too.
Maybe the best way to ask this question would be to show us exactly what you have for input and exactly what you want for output.  Ampersands are an "overloaded" character -- they have different meanings in different contexts, and there may be PHP functions that already address the context you are using.  But to know that, we would have to see the inputs and outputs.
Avatar of Alicia St Rose

ASKER

I'm not a RegEx guru, but for designing and testing RegEx, I use the free tool Expresso.

frankhelk, thank you for the suggestion I'll look into it

Maybe the best way to ask this question would be to show us exactly what you have for input and exactly what you want for output.

Ray Paseur, here's a couple of examples of the links that have been added to the custom fields:

http://www.indiebound.org/search/book?searchfor=bruce+hale+chameleon+wore+chartreuse&x=0&y=0

http://www.amazon.com/Chameleon-Wore-Chartreuse-Gecko-Mystery/dp/0152024859/ref=sr_1_4?s=books&ie=UTF8&qid=1440576439&sr=1-4&refinements=p_82%3AB000APLXEC

They aren't validating in W3C
This one does not validate, but there is no instance of "amp" in the validator output.
https://validator.w3.org/check?uri=http%3A%2F%2Fwww.indiebound.org%2Fsearch%2Fbook%3Fsearchfor%3Dbruce%2Bhale%2Bchameleon%2Bwore%2Bchartreuse%26x%3D0%26y%3D0&charset=%28detect+automatically%29&doctype=Inline&group=0

The Amazon.com page does not validate either, but that may be more a matter of sensitivity of the W3 validator than an indication that anything is wrong.  It's common to see URLs with ampersands in them.  What kind of failure is this causing?
Hi Ray,
It's not the page on the Indiebound site I'm trying to validate. It's this page on a site I'm building:

https://validator.w3.org/nu/?doc=http%3A%2F%2Fsandbox.intrepidrealist.com%2Fbruce-hale%2Fbooks%2Fchet-gecko-series%2Fthe-chameleon-wore-chartreuse-chet-gecko-mystery-no-1%2F

The links to indiebound and Amazon are causing errors because of the ampersand. I need to encode them apparently. But I want this to happen dynamically, because my client isn't going to remember to do it. And I've already got loads of these links all over the site for his books:

http://sandbox.intrepidrealist.com/bruce-hale
Have you tried translating the URLs with this?
http://php.net/manual/en/function.htmlspecialchars.php

I am not suggesting that is needed, just that it already exists and is what we usually use to "entitize" the special characters.  I'm not sure that it's always needed, but it may be enough to satisfy your requirements.  Just a thought.
Hi Ray,
I found this code in the comments section of the page you linked to. It looks like what I need, though it's been voted down!

<?php 
function formspecialchars($var) 
    { 
        $pattern = '/&(#)?[a-zA-Z0-9]{0,};/'; 
        
        if (is_array($var)) {    // If variable is an array 
            $out = array();      // Set output as an array 
            foreach ($var as $key => $v) {      
                $out[$key] = formspecialchars($v);         // Run formspecialchars on every element of the array and return the result. Also maintains the keys. 
            } 
        } else { 
            $out = $var; 
            while (preg_match($pattern,$out) > 0) { 
                $out = htmlspecialchars_decode($out,ENT_QUOTES);       
            }                             
            $out = htmlspecialchars(stripslashes(trim($out)), ENT_QUOTES,'UTF-8',true);     // Trim the variable, strip all slashes, and encode it 
            
        } 
        
        return $out; 
    } 
?>

Open in new window


My issue is not know where to put the code!!
I'm still green on some things, I guess! ;)
Hmm... I am still not seeing a failure when I click links to Indiebound or Amazon.  Can you please post a link to a page that illustrates the issue?  Thanks.
Hi Ray,
It has nothing to do with clicking the links. The links on this page do not validate in W3C because they have ampersands. Can you please tell me how to dynamically remove those ampersands and replace with HTML entities? It looks like I have to do it with regular expressions.

Are you not able to see the W3C errors on the link below? Numbers 7 through 12  are errors.

https://validator.w3.org/nu/?doc=http%3A%2F%2Fsandbox.intrepidrealist.com%2Fbruce-hale%2Fbooks%2Fchet-gecko-series%2Fthe-chameleon-wore-chartreuse-chet-gecko-mystery-no-1%2F
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm accepting your solution because you are giving me permission to cry uncle on this one!
Thanks!
:-)

I think you're on firm ground.  Best of luck with the project! ~Ray