Improved Form Tokens to Guard Against CSRF and Screen Scrapers

Published:
Updated:
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to use "screen scrapers" to attack web forms.
Introduction

"...nothing provided in an HTTP request can be trusted." -- Chris Shiflett, circa 2005
This question got me thinking: How do we explain and use form tokens?  What do they do for us and what do they not do?  While they have been considered among security "best practices" for many years, it's important to understand the benefits and limitations.  This article shows the design pattern for form tokens, explains the uses and limitations, and teaches a stronger approach to tokenization -- one that not only addresses Cross-Site-Request Forgeries, but also mitigates the risk of "screen scrapers" that would simulate a client browser in order to steal online information.

Understanding HTTP and HTML Forms
HTML forms are at the center of most online interactivity.  They allow us to collect and store client information, powering online forums and shopping carts.  As such, they are a vitally important component of the online experience.  When we create a form to collect client information, we would like some assurance that the information we collect is "real," and that it came from our intended audience, and not from an automated 'bot.  Unfortunately, the nature of the HTTP protocol makes it difficult to authenticate the contents of a form.  To understand why, let's consider the typical timeline of events.
 
  1. A human client visits our web site and goes to the form page
  2. Our human fills in some information or makes some selections
  3. Our human clicks a "submit" button, sending the information
  4. Our server receives the information, processes, and stores it.
These activities involve HTTP requests.  In order to read and fill in the form (steps 1 & 2) our client's browser uses GET-method requests.  GET requests are idempotent, read-only requests, typically created from the browser address bar, or from a hyperlink in another HTML page.  They are used to request the form from our server, but they do not change anything on our server.  The server's only response to the GET request is to create a form page and send it to the client browser.  This response will be the same no matter how many times the client requests the form.  We don't really care about the origin or authenticity of GET requests - we just send a response.

Once the client has filled in the form, a new request is sent to the server when the "submit" button is clicked (steps 3 & 4).  This time the browser will use a POST-method request.  POST requests are not idempotent and can change information on our server.  As such, we care about the authenticity of POST requests.  We would like to know that each POST method request came from our HTML form.  Usually, that is the case, but there is no guarantee because HTTP is a stateless protocol.  All that our server sees is the information our client put into the HTML form (along with some headers that the browser includes).   As a result, we cannot trust either the contents of the POST-request nor its origin if we rely solely on the HTTP protocol.  We need to augment the stateless protocol with some stateful information to assure us that the POST request was made by using our HTML form.

Stateful Information in a Stateless Protocol
One way of achieving a somewhat stateful environment is through PHP client authentication, but you may not want to force everyone to register and log in just so they can fill in your forms!  You might also include a CAPTCHA test in the forms, but this may be off-putting to your clients, too.  A less intrusive, or even invisible, method is needed.  In the early days of this millennium PHP security experts described the "form token" as a stateful method of securing our forms.

The form token is created before the form page is presented to the client (step 1).  The token is stored in the PHP session and placed in a hidden input control inside the HTML form.  When our server receives the submitted form (step 4) we can compare the form token to the token stored in the session.  If they are missing or do not match, we can be fairly sure that the POST request did not originate from our form, but was instead an attack vector from an unauthorized source.

The form token is invisible to the (human) client, so it provides stateful information in an unobtrusive way.  The token needs to be difficult to predict, and it needs to change every time the form is presented to a client.

Cross-Site Request Forgeries
A cross-site request forgery (abbreviated CSRF or XSRF, pronounced "sea surf") is a type of attack that sends a request without the (human) client being aware of the request.  Here is one such scenario.  Let's say you log into a banking site and navigate away without logging out.  Your logged-in status is identified because your browser returns an HTTP cookie each time you visit the banking site (a site visit is an HTTP request).  Now let's say you also visit an online forum to read the comments others have left.  A malicious user of the forum can insert invisible code that will get executed when you read his comment.  This malicious code can trigger an HTTP request to the banking site.  Because the banking site will see the cookie indicating that you are logged in, it may take banking actions on your behalf, such as sending money to the malicious user.

How Form Tokens Mitigate Cross-Site Request Forgeries
But what if the banking site is looking for a form token in every request?  The malicious code that targets the banking site cannot return a form token because it did not make its request through an online form -- it just sent an unsolicited request.  Thus the banking site, and you, are both protected from this type of attack.

A Form Token Demonstration Script
The script below is an all-in-one demonstration of the form token design pattern.  It contains both parts - the creation and presentation of the tokenized form, and the verification of the tokenized request.  It is intentionally stripped down to just the SSCCE that demonstrates only the form token; there are no other request variables in play.  Here are some descriptions of the moving parts.
  • Line 8: We need to be able to store our form token in a stateful variable that will persist from the time we create the form until we receive the request that submits the form.  PHP session_start() gives us access to the $_SESSION array.  As a practical matter, any PHP script you write should start the session, unconditionally, right at the top.
  • Line 11-19: This is a function that checks the form token, comparing the values in $_POST with $_SESSION, looking for a match.  It returns True or False.  Note line 16, where it nullifies the form token after the comparison.  This ensures that the form token can only be used once.
  • Line 32-34: This code will run before the form is created.  It will generate a random form token and assign it to the $token variable.  The token value is also stored in the PHP session.
  • Line 37-55: This code uses HEREDOC notation to create and display the HTML form.  Note that the $token is injected into a hidden HTML input control on line 48.  Note also that there is no action= attribute in the <form> tag.  This means that the form submit request will be directed back to the same URL.  Since the method= attribute is post  the request will come back to this script via a POST-method request
  • Line 22-29: is our action script.  This will only be run when a POST-method request has caused PHP to fill in the $_POST array with the request variables from the form.  And in our simplified example here, there is only one variable we care about - the form token itself.  Our action script calls the check_form_token() function and displays the results.
<?php // lame_form_token_client.php
                      /**
                       * A client side script that creates a form token and saves the token in the PHP session
                       * The script also injects the form token into a hidden POST request variable
                       * When the form is submitted, the script tests the token to see if POST matches SESSION
                       */
                      error_reporting(E_ALL);
                      session_start();
                      
                      
                      // FUNCTION TO EVALUATE THE IDENTITY IN THE FORM
                      function check_form_token()
                      {
                          $sess_token = !empty($_SESSION['form_token']) ? $_SESSION['form_token'] : 'X';
                          $post_token = !empty($_POST['form_token'])    ? $_POST['form_token']    : 'Y';
                          $_SESSION['form_token'] = NULL;
                          if ($sess_token == $post_token) return TRUE;
                          return FALSE;
                      }
                      
                      
                      // IF THERE IS A POST-REQUEST
                      if (!empty($_POST))
                      {
                          $status = check_form_token();
                          if (!$status) echo "Attack!  Run like hell!";
                          if ( $status) echo "Success! Trust this client.";
                          exit;
                      }
                      
                      
                      // CREATE RANDOM FORM TOKEN, SAVED IN THE SESSION, INJECTED INTO THE HTML
                      $token = md5( rand() );
                      $_SESSION['form_token'] = $token;
                      
                      
                      $html = <<<EOF
                      <!DOCTYPE html>
                      <html dir="ltr" lang="en-US">
                      <head>
                      <meta charset="utf-8" />
                      <title>A Lame Form Token Example</title>
                      </head>
                      <body>
                      
                      <form name="my_form" method="post">
                      <input type="submit" value="Verify Token" />
                      <input type="hidden" name="form_token" value="$token" />
                      </form>
                      
                      </body>
                      </html>
                      EOF;
                      
                      echo $html;

Open in new window

This demonstration is entirely self-contained.  You can copy this script and install it on your own server, and run it to see how it behaves.  After trying it once and clicking the Verify Token button, try refreshing the browser, effectively resubmitting the form but without re-requesting the form.  The form token in the HTML will not match the value in the PHP session, and the script will recognize an attack.

Why is this a Lame Form Token Example?
I marked this script "lame" for a couple of reasons, even though it is a technically competent demonstration of the time-honored form token technology. 

The first reason goes to the creation of the token on line 33.  While the rand() function does not return a predictable number (statistically random), it is not cryptographically strong.  And the md5() function is idempotent, always returning the same encoded string for a given input value.  Taken together, these facts increase the likelihood that an attacker could guess a valid form token.  There are some better ways of generating a token.  You might salt the token with uniqid().  You might use mt_srand() and mt_rand() to get a more random number.  You might choose other PHP functions such as openssl_random_pseudo_bytes(), mcrypt_create_iv(), or random_bytes() to get cryptographically strong values.  And in the future, you might need to change your algorithm that creates the token, because what is considered strong today may become less strong in the future.

The second reason goes to the clear-text display of the token on line 48.  Because we sent the token with the HTML form, anyone reading the form can see the token.  Browsers view source can reveal the token.  That's not a very good way to keep a secret.

Because our form token is present in the HTML form, it invites an attack by screen scraping - using a non-browser script to present requests to the server, and simulating the behavior of a human client.

A Screen Scraper that Defeats a Form Token
Some developers may have believed that a form token is useful to ensure that a human being, and not a 'bot, is using your web site.  That's not true.  Here is a demonstration script that shows how we can make an automated attack on the lame form token script.  The script uses cURL to partially simulate a browser, accepting and returning cookies, and sending back the hidden form token.  Here are some of the moving parts.

  • Line 10-11: We start the automated attack by pointing at our lame form token script.
  • Line 13-44: We set up a cURL worker and configure it to look like a Firefox browser that was referred by Google.
  • Line 47: When we run curl_exec() we get back the HTML document containing the form from the lame form token script.   If you print this document, you will find that it is exactly the same as the view source output from the lame form token script posted above, but of course it has its own unique form token.
  • Line 49-75: This shows a simplified example of code that extracts the input controls and values, and prepares a set of POST-request variables that can be sent back to the action script.
  • Line 77-80: We turn the request type to POST and send the variables back to the server.
  • Line 82-92: We make another curl_exec() call and retrieve the response from the server.
  • Line 94-98: We display what came back from the spoofed POST-request.
<?php // lame_form_token_scraper.php
                      /**
                       * This script scrapes a form token and injects it into the request variables.
                       * It can successfully attack any web page that has clear text input controls,
                       * including hidden controls.  It reads the HTML form and creates the HTTP request
                       * as if it were a human being using a web browser.
                       */
                      error_reporting(E_ALL);
                      
                      // START WITH A GET-METHOD REQUEST TO THE VICTIM URL
                      $url = 'https://iconoun.com/demo/lame_form_token_client.php';
                      
                      // SET UP A CURL WORKER
                      $curl = curl_init();
                      
                      // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
                      $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
                      $header[] = "Cache-Control: max-age=0";
                      $header[] = "Connection: keep-alive";
                      $header[] = "Keep-Alive: 300";
                      $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
                      $header[] = "Accept-Language: en-us,en;q=0.5";
                      $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK
                      
                      // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
                      curl_setopt( $curl, CURLOPT_URL,            $url  );
                      curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:44.0) Gecko/20100101 Firefox/44.0'  );
                      curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
                      curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
                      curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
                      curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
                      curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
                      curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
                      curl_setopt( $curl, CURLOPT_TIMEOUT,        3  );
                      curl_setopt( $curl, CURLOPT_VERBOSE,        TRUE   );
                      curl_setopt( $curl, CURLOPT_FAILONERROR,    TRUE   );
                      
                      // SET THE LOCATION OF THE COOKIE JAR (THIS FILE WILL BE OVERWRITTEN)
                      curl_setopt( $curl, CURLOPT_COOKIEFILE,     'lame_cookie.txt' );
                      curl_setopt( $curl, CURLOPT_COOKIEJAR,      'lame_cookie.txt' );
                      
                      // IF USING SSL, THIS MAY BE IMPORTANT
                      curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, FALSE  );
                      curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE  );
                      
                      // RUN THE CURL REQUEST AND GET THE RESULTS
                      $document  = curl_exec($curl);
                      
                      // EXTRACT THE NAMED INPUT CONTROLS
                      $tags = strip_tags($document, '<input>');
                      $tags = explode(PHP_EOL, $tags);
                      foreach ($tags as $key => $tag)
                      {
                          if (!stripos($tag, 'name=')) unset($tags[$key]);
                      }
                      
                      // SIMPLIFIED EXAMPLE: EXTRACT THE NAME AND VALUE PAIRS
                      $name = $valu = NULL;
                      foreach ($tags as $tag)
                      {
                          $name = explode('name="', $tag);
                          $name = $name[1];
                          $name = explode('"', $name);
                          $name = $name[0];
                      }
                      foreach ($tags as $tag)
                      {
                          $valu = explode('value="', $tag);
                          $valu = $valu[1];
                          $valu = explode('"', $valu);
                          $valu = $valu[0];
                      }
                      
                      // CREATE A REQUEST STRING
                      $vars = $name . '=' . htmlspecialchars($valu);
                      
                      // TURN THE REQUEST AROUND TO POST THE FORM DATA
                      curl_setopt( $curl, CURLOPT_REFERER,        $url  );
                      curl_setopt( $curl, CURLOPT_POST,           TRUE  );
                      curl_setopt( $curl, CURLOPT_POSTFIELDS,     $vars  );
                      
                      // CALL THE WEB PAGE
                      $xyz = curl_exec($curl);
                      $err = curl_errno($curl);
                      $inf = curl_getinfo($curl);
                      
                      // IF ERRORS - SEE http://curl.haxx.se/libcurl/c/libcurl-errors.html
                      if ($xyz === FALSE)
                      {
                          echo PHP_EOL . "CURL POST FAIL: $url CURL_ERRNO=$err ";
                          var_dump($inf);
                      }
                      
                      // SHOW WHAT CAME BACK FROM THE POST
                      echo PHP_EOL . htmlentities($xyz);
                      
                      // SHOW THE FORM TOKEN WE GOT FROM THE FORM
                      echo PHP_EOL . $vars;

Open in new window

Again, this script is entirely self-contained.  You can copy this script and use it to attack your copy of the lame form token script.  In practice, there would be more involved code needed to mount a real attack, but all of the basic principles are shown here, and it clearly illustrates the exposure created by a clear-text form value, even if the input type is "hidden."

Toward a More Robust Form Token
With any security measure, our main goal is to make it difficult to defeat, at least difficult enough that the bad actors are unwilling to expend their time and energy attacking us.  If we give them the form token, as we have seen above, it's not really very difficult to cobble together a script that uses the token.  But what if we put up a new hurdle?  Not only would their scraping script need to appear to be a well-behaved client browser, accepting cookies, providing referrers, following redirects, but what if it also had to run JavaScript and handle AJAX?  That might make things more difficult for potential attackers.  The rest of this article shows how we can use a form token without sending it in clear text.  View Source will not reveal the token, and scraper scripts cannot find the token in the form's input.

A Helper Class for Form Tokens
To simplify our programming, we will start by moving all of our common form token workers into a class, and we will put this in a separate PHP script that can be included in the scripts that create and consume our forms.  We organize our form token information into a three-part form token object that contains the creation time, the input control name and the input control value (the token).
  • Line 15-16: These constants allow us to make wholesale adjustments to the name, token, and token life.
  • Line 18-33: The ::get() method returns a form token object.  The token is unpredictable, and so is the name.
  • Line 35-49: The ::tidy() method removes any form tokens that have expired.  In this example, the tokens expire after five minutes.
  • Line 51-74: The check() method tries to verify that a token returned in a POST request matches a token currently alive in the PHP session.
  • Line 55-56: A rudimentary same-origin check.  This could be defeated with bogus request headers, but it can't hurt to include it.
  • Line 59: An iterator allows us to have more than one token alive at the same time.
  • Line 63: The internal representation of the token, kept in our session, is a JSON string that represents the form token object.
  • Line 65-69: If the POST request contains a form token name with a value that matches the session value, we unset the session element (making this into a single-use token) and return True, indicating that the form token is valid.
<?php // form_token_class.php
                      /**
                       * A helper class for form token processing
                       *
                       * Method get() returns a form token object
                       * Method tidy() removes expired tokens
                       * Method check() verifies that a token is valid
                       */
                      error_reporting(E_ALL);
                      
                      
                      // A CLASS TO DEFINE OUR FORM TOKEN
                      Class FormToken
                      {
                          const FORM_TOKEN_PREFIX = 'form_token_';
                          const FORM_TOKEN_EXPIRY = 300;
                      
                          public static function get()
                          {
                              $obj = new StdClass;
                              $obj->time  = time();
                              if (function_exists('random_bytes')) // CRYPTO-SECURE
                              {
                                  $obj->name  = static::FORM_TOKEN_PREFIX . bin2hex( random_bytes(32) );
                                  $obj->token = static::FORM_TOKEN_PREFIX . bin2hex( random_bytes(32) );
                              }
                              else // FALL-BACK FOR PHP < 7
                              {
                                  $obj->name  = static::FORM_TOKEN_PREFIX . md5( uniqid() . rand() );
                                  $obj->token = static::FORM_TOKEN_PREFIX . md5( uniqid() . rand() );
                              }
                              return $obj;
                          }
                      
                          public static function tidy()
                          {
                              $timex = time() - static::FORM_TOKEN_EXPIRY;
                              $prefix_length = strlen(static::FORM_TOKEN_PREFIX);
                              foreach ($_SESSION as $key => $value)
                              {
                                  if (substr($key,0,$prefix_length) == static::FORM_TOKEN_PREFIX)
                                  {
                                      if ($token = json_decode($value))
                                      {
                                          if (!empty($token->time) && ($token->time < $timex)) unset($_SESSION[$key]);
                                      }
                                  }
                              }
                          }
                      
                          public static function check()
                          {
                              static::tidy(); // REMOVES EXPIRED TOKENS
                      
                              $regex = '#' . preg_quote($_SERVER['HTTP_HOST']) . '#i';
                              if (!preg_match($regex, $_SERVER['HTTP_REFERER'])) return FALSE; // RUDIMENTARY SAME-ORIGIN CHECK
                      
                              $prefix_length = strlen(static::FORM_TOKEN_PREFIX);
                              foreach ($_SESSION as $key => $value)
                              {
                                  if (substr($key,0,$prefix_length) == static::FORM_TOKEN_PREFIX)
                                  {
                                      if ($session_token_obj = json_decode($value))
                                      {
                                          if (!empty($_POST[$session_token_obj->name]) && ($_POST[$session_token_obj->name] == $session_token_obj->token))
                                          {
                                              unset($_SESSION[$key]); // MAKES EACH TOKEN INTO A SINGLE-USE TOKEN
                                              return TRUE;
                                          }
                                      }
                                  }
                              }
                              return FALSE;
                          }
                      }

Open in new window


A Server-Side AJAX Script that Prepares and Returns a Form Token
This script will be called by the client-side script that prepares our HTML forms.  It will create the form token and return it to the client-side script.
  • Line 8-9: We load the helper class and start the PHP session.
  • Line 13: We acquire a form token object.
  • Line 14: We inject a JSON-encoded representation of the object into the PHP session, giving it the pseudo-random name that was returned in the object.
  • Line 15: We use session_write_close() to ensure that the session data has been "checkpointed" and will be available to our client-side scripts.
  • Line 16: We echo the JSON-encoded representation back to the browser.
<?php // form_token_server.php
                      /**
                       * A server side script that responds to an AJAX request
                       * This script gets a form token object and encodes it into a JSON string
                       * It stores the JSON string in the PHP session and echos it to the client
                       */
                      error_reporting(E_ALL);
                      require_once('form_token_class.php');
                      session_start();
                      
                      
                      // GET, SAVE, AND RETURN A NEW FORM TOKEN OBJECT
                      $token = FormToken::get();
                      $_SESSION[$token->name] = json_encode($token);
                      session_write_close();
                      echo $_SESSION[$token->name];

Open in new window


A Client-Side Script that Injects the Form Token into the Request
This script shows how to take the JSON-encoded data from the server-side AJAX script, and inject it into a dynamically created hidden input control.  The control is truly hidden.  You can visualize it with browser dev tools, but it will not appear in the browser's view source, and cannot be found by reading the HTML document with cURL.

  • Line 7-8: We load the helper class and start the PHP session.
  • Line 21-54: We prepare and display the HTML form document.
  • Line 46-48: The only HTML form elements appear here.
  • Line 31: The AJAX call to the server-side script sends back a JSON string response.
  • Line 34-38: We create a hidden input control, assign a name and a value, and append it to the form.  It will become part of the POST request variables, even though it cannot be seen with the browser's view source.
  • Line 46: Because there is no action= attribute specified in the form tag, this script will post to its own URL.
  • Line 11-18: When this script is started with a POST request, this code will run.  It will call the FormToken::check() method and evaluate the response.
<?php // form_token_client.php
                      /**
                       * A client side script that creates an AJAX request for a form token
                       * This script injects the form token into the request variables
                       */
                      error_reporting(E_ALL);
                      require_once('form_token_class.php');
                      session_start();
                      
                      
                      // IF THERE IS A POST-REQUEST
                      if (!empty($_POST))
                      {
                          $status = FormToken::check();
                          if (!$status) echo "Attack!  Run like hell!";
                          if ( $status) echo "Success! Trust this client.";
                          exit;
                      }
                      
                      
                      $html = <<<EOF
                      <!DOCTYPE html>
                      <html dir="ltr" lang="en-US">
                      <head>
                      <meta charset="utf-8" />
                      <title>A Variable Form Token Example</title>
                      <script type="text/javascript" src="https://code.jquery.com/jquery-latest.min.js"></script>
                      
                      <script>
                      $(document).ready(function(){
                          $.get("form_token_server.php", function(response){
                              var json    = JSON.parse(response);
                              var myForm  = document.forms['my_form'];
                              var input   = document.createElement('input');
                              input.type  = 'hidden';
                              input.name  = json.name;
                              input.value = json.token;
                              myForm.appendChild(input);
                          });
                      });
                      </script>
                      
                      </head>
                      <body>
                      
                      <form name="my_form" method="post">
                      <input type="submit" value="Verify Token" />
                      </form>
                      
                      </body>
                      </html>
                      EOF;
                      
                      echo $html;

Open in new window


Summary
This article has shown the general design pattern of a form token and has shown some capabilities and limitations of the design.  PHP security has come a long way since its early days at PHP 5.2, and it represents an evolving field. Parenthetically, security is a full time four year college major at the University of Maryland.  In an environment of ever-evolving threats, it pays to keep our code and designs up-to-date.  Here are some of the resources that can help.
http://php.net/manual/en/security.php
https://www.owasp.org/index.php/PHP_Security_Cheat_Sheet
https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)_Prevention_Cheat_Sheet
https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)#Prevention_measures_that_do_NOT_work

Bonus!
You can read Chris Shiflett's original chapter on HTML form security here!  It's worth the read, if only for the concept of filtered data and tainted data.

Please give us your feedback!
If you found this article helpful, please click the "thumb's up" button below. Doing so lets the E-E community know what is valuable for E-E members and helps provide direction for future articles.  If you have questions or comments, please add them.  Thanks!
 
5
4,750 Views

Comments (1)

BRDigital Product Management Director

Commented:
it's a great article. I wish I had read it earlier.
Thank you very much Ray Paseur

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.