Regular Expression to find HTML links with rel nofollow attributes php

Posted on 2012-09-10
Last Modified: 2012-09-11
I am trying to come up with a regular expression that will get the html link and look for if that link has a rel="nofollow" attribute to it and store them in variables. I have come up with the top of my head code to get the links using the strip_tags() with php but it returns not just the links but all other text. Not sure if there is a function already in php that can do this or I need a regex for that along with finding the nofollow tag.

Ultimately I want to scan a webpage and return 2 things.

The link and whether there is a nofollow tag associated with that link on the page

I don't need to pull the nofollow tag text obviously I just need to know if that link has the tag. I am assuming preg_match will be used for that purpose along with a regex. Anyone help me out?
Question by:cbielich
    LVL 34

    Accepted Solution

    I'd do it in 2 steps:
    1. Get the links:
    2. Then, foreach match, check whether it has the nofollow attribute:

    preg_match_all("#<a[^>]*href\s*=\s*['\"]([^'\">]*)['\"][^>]*>#i", $myhtml, $matches);
    foreach ($matches[0] as $matchnum=>$match) {
      if (preg_match("#rel\s*=\s*['\"]nofollow['\"]#",$matches[0][$matchnum])) {
        print "Link (nofollow): {$matches[1][$matchnum]}\n";
      } else {
        print "Link: {$matches[1][$matchnum]}\n";

    Open in new window

    LVL 1

    Author Comment

    I just came up with this, what you think?

    $yourHTML = file_get_contents('');
    //$yourHTML = strip_tags($yourHTML);

    $dom = new DOMDocument;

    $links = $dom->getElementsByTagName('a');
    foreach ($links as $link) {
        if ($link->hasAttribute('rel')) {
                if ($link->getAttribute('rel') == 'nofollow') {
                      echo $link->getAttribute('href');
    LVL 1

    Author Comment

    I like yours better :)
    LVL 34

    Expert Comment

    by:Terry Woods
    In an ideal world, the DOMDocument way would be the best. However, I've had others complain that it doesn't handle invalid HTML well though; I'm not sure in what way it fails though.

    Note also that you might like to add an "i" pattern modifier to the preg_match call so it ignores case:
      if (preg_match("#rel\s*=\s*['\"]nofollow['\"]#i",$matches[0][$matchnum])) {

    Open in new window


    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
    Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
    Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    21 Experts available now in Live!

    Get 1:1 Help Now