PHP Form Data Validation Tips

February 7th, 2011 Leave a comment 2 comments
Like the article?
Data Validation

A common web task in web application development is to validate data input by a user. This is usually done to ensure that the type of data entered matches the expected data types for an underlying database. It is also a good security practice to limit the data that a web based form will accept. While it is common in web 2.0 applications to use client side code such as Javascript to validate form fields, this should not be relied on for a number of reasons. Some users disable Javascript so you want to provide a working interface for these users. From a security perspective, you have to keep in mind that it is possible to post data to a web form without loading the form and its client side code.

If you develop using a PHP framework such as CakePHP, there is a built in validation system that makes validating form data trivial. There are also a number of third party scripts and libraries available that can provide a series of handy classes for validating form data. These libraries can be great time savers but for various reasons, you may need to write your own form validation routines.

Tell Users What Went Wrong

When validating form data, it is likely that a user will submit something that does not pass the validation routine. Often the error is simply cased by using the wrong format to enter data. In order to make your web applications as user-friendly as possible, you should always give clues to your users if you expect a particular format such as dates in mm/dd/yyyy format.

While this type of defensive coding will help many users breeze through your forms, there will still be those times when a user enters incorrect data or an incorrect format. The best way to handle this is to provide a useful error message. Some forms will display all the validation errors at the top of form. While this helps users see what errors occurred, it is far more helpful to display the error message next to the field that caused the error. This allows a user to scan down a form and fix all their errors in a single pass. To accomplish this, I use an array called $form_errors. I key the array using the name of the form field that caused the error. I also order my validation rules in order of priority. This is because I generally only want to display one error per field or because failing one validation may make other rules invalid. For example, if a required field is left blank, you want to display that the field is required. It is not necessary to show errors regarding format or data type at this stage. I display errors in the label of the form field between <strong> tags. I can then use CSS to style my errors, most often by making them red. Here’s a snippet of the code to display an error:

<label for="First Name">First Name:
<br><strong>
<?php
if (isset($form_errors['first_name'])) { 
  echo $form_errors['first_name']; } 
?>
</strong></label>
<input type="text" name="first_name" id="id_first_name" maxlength="40" />

If you do need to display multiple errors, you can make a multidimensional array. Then simply iterate over the array for each form field into an unordered list.

Return the Value Originally Entered

It is rarely useful to return an error message without also showing the user what they originally entered. By returning the values that the user originally entered you accomplish two tasks. You make it easy for them to compare what they entered to the error message. This makes it easier for a user to determine what they did wrong. It also prevents a user from having to re-enter every field to correct their error and complete your form. To accomplish this in my own code, I copy the values from the $_POST array to another array. This array is then used to perform validation and in the case of an error, to prefill the form with the user’s original input. Why don’t I just use the $_POST array, you may ask? I often use the same form for editing and adding items to a database. By using a generic array like $values, I can load the form with data from my database by assigning the record to the $values array. Here’s a snippet of HTML to display prefill a form field using this technique:

<input type="text" name="first_name" id="id_first_name" 
<?php 
if (isset($values['first_name'])) { 
  echo 'value="' . $values['first_name'] . '"'; } 	
?> 
/>

Checking Required Fields

Making sure that a user has entered a value for a required field is pretty trivial. We just need to see if the field was included when the form was submitted and whether it contains a value. Here’s a basic validation function for a required field:

function is_required($field, $values) {
  if (!isset($values[$field]) || empty($values[$field])) {
    return false;
  } else {
    return true;
  }
}

This function simply makes sure that a value was set for the field and that it is not an empty field. This second step is important because an HTML form will return an element for a text input even if the text field is empty. You can use this function to test whether a required field is present and then set an appropriate error if the function returns false.

Checking Data Types

Often you will want a field to contain a particular data type such as a number or a date. PHP has a number of built-in methods for testing these types of data. For example, PHP’s is_numeric function returns true if the value passed is a number. This means that a string can contain only numbers and a decimal point. It also means that care must be taken to inform users not to use special formatting. This function will return false if the user includes other characters such as a comma (i.e. 60,000). I won’t go into great detail here because these functions are already well-documented in the PHP documentation.

Validating Phone Numbers

Often, web forms that require a phone number will specify a particular format for the phone number. Some forms, however, manage to validate phone numbers in a variety of formats. US phone numbers can be written in a number of ways. It’s always useful to be able to accept any of the most common formats. This is actually easier than it sounds. We know that a US phone number will contain 10 digits—no more and no less. As a result, we really just need to validate that we have 10 digits. Here’s a function to handle a variety of formats.

function validate_telephone_number($number) {
  $formats = array('###-###-####',
		'(###)###-####',
		'##########','###.###.####',
		'(###) ###-####');
  $format = trim(preg_replace("/[0-9]/","#",$number));
  if (in_array($format,$formats)) {
    return true;
  } else {
    return false;
  }
}

This function defines the general formats we’ll accept and places them in an array. Each format uses # where a number should be. We then use the preg_replace function which uses a regular expression to make replacements. In our function we’re telling it to replace any digit as indicated by [0-9] with a # character. The trim function removes any trailing whitespace. The result is that the user’s input will look a lot like one of our formats with #’s in place of numbers. The final step is to see if the modified phone number is contained in our $formats array. If it is, the user entered 10 digits in an appropriate format. Otherwise, the phone number fails validation. This technique can be adapted to other data fields where formatting is variable but we know for certain that only digits will be used. It’s also better if the number of digits is defined. As an example, social security numbers and EINs used by corporations contain the same number of digits but differ only in where the dashes are placed. You could use a $formats array like:

$formats = array('#########','##-###-####','###-##-####','##-###-####');

to insure that a social security or EIN number was entered.

Validating Dates

There are a number of techniques for validating dates. The most common is to split the user’s input into the component month, day, year and then use PHP’s checkdate function to see if it is a valid date. This works better than a simple regular expression since it will catch errors that a regular expression might miss such as a date like 11/31/2010. Here’s a sample function:

function validate_date($value) {
  if (preg_match("/^([0-9]{2})\/([0-9]{2})\/([0-9]{4})$/",$value,$parts)) {
    if (checkdate($parts[1],$parts[2],$parts[3])){
      return true;
    }
  } else {
    if (preg_match("/^([0-9]{2})-([0-9]{2})-([0-9]{4})$/",$value,$parts)) {
      if (checkdate($parts[1],$parts[2],$parts[3])) {
        return true;
      }
    }
  }
  return false; 
}

In this function, we’re using a regular expression to split the user input into its component parts. Let’s look a minute at this expression. The ^ symbol means to start at the beginning of the string. [0-9]{2} means we’re looking for 2 digits. The \/ is simply escaping the forward slash since it has a different meaning in regular expressions. Then [0-9]{2} means 2 more digits followed by a forward slash again \/ and finally 4 digits: [0-9]{4}. We store the matched elements in an array called $parts. Then $parts is passed to checkdate to see if it is a valid date. Notice that if this regular expression fails we try again with the same regular expression but using “-” as a separator. This allows our users to enter dates as 9/21/2010 or 9-21-2010. We could also test for a two digit year to make our system more flexible as to the input it will accept.

While this will handle most date fields, sometimes we’re looking for the user to input a date but we may not be storing it as a date or may not be able to validate it as a full date. As a for instance, I recently coded an employment application that required a month/year date for the start and end dates of a candidate’s employment history. There are two approaches that you can use when validating these types of dates. The first is to use a regular expression and constraint what can be accepted. Instead of the pattern [0-9]{2} for the month you would use something like [0-1]{1}[0-9]{1} for the month. This pattern only allows a 0 or 1 for the first digit and any digit for the second. While this will be “good enough” for many purposes, it does allow for invalid input such as 19 for the month. A second method is to split the month and year as we did in the prior example and set the day part to “01”. Then you can use the checkdate function to make sure that it is a valid date.

Validating Time

I mentioned that I recently coded a job application for a client. On this project, they had a section that an applicant used to indicate their availability to work. They wanted a simple time format such as 07:00a or 12:45p. This validation was easily solved with a regular expression:

function validate_time($value) {
  if (!preg_match("/^[0-1][0-9]:[0-9]{2}[ap]/",$value)) {
    return false;
  } else {
    return true;
  }
}

Again the ^ represents the start of the string. [0-1] indicates that the first digit must be a 0 or 1. [0-9] indicates that the second digit can be any digit. After the : we’re looking for 2 digits and finally [ap] means the final character should be an “a” or “p” only. While this function met the needs of my client, it should be noted that it suffers from the same issue as the earlier month/year validation by regular expression. It’s still possible for someone to enter invalid data such as 17:00a. For more accurate validation, PHP’s strtotime function can be used. You’ll need to adjust your regular expression to ensure that your user enters a time in one of the valid formats used by strtotime. Check the PHP documentation for more information on strtotime and valid formats.

Validating Email Addresses

A common regular expression for validating an email address is the following:

return preg_match("/^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/", $email);

This will allow only lowercase letters, numbers, dashes, underscores and periods in the first part of the email. While this is the most popular form of validation for an email address, it will prevent valid email addresses from validating. Linux Journal ran an excellent article on this issue. In the article they discuss RFC 3696 which deals with valid emails being rejected by popular validation methods.

Don’t Forget the Obvious

If the data being entered into your form will be stored in a database, you may have length limits to consider. A common practice is to use the maxlength attribute of your HTML form elements to limit input to the maximum length of a database field such as a varchar. As an example, if you’ve defined a first name field in your MySQL table as varchar(40), you might use maxlength=”40″ in your HTML input tag to limit the input to 40 characters. The problem with this technique is that someone who crafts a POST request to your code may send 50 characters since they’re not using your form. This may not matter much since it will generate an error and really you want users to use your form. However, in cases like AJAX or if you’re writing an API that may or may not use a form, you’ll need to validate things like the length of your fields. It is also a good defensive practice to test the length of fields.

Validating is not Sanitizing

Data validation increases the security of your application by limiting input to the values you were expecting. However, validation is still different from sanitizing data which involves escaping special characters. If you are interacting with a database, it is important that after you complete validation but prior to attempting to save the data, you sanitize it. If you are using MySQL, the PHP function mysql_real_escape_string function will help you sanitize your data by escaping special characters for MySQL.

While there are many good quality validation libraries available for PHP, developing your own library of validation routines allows you to customize validation to fit your unique needs. This often leads to better user experiences by accepting more formats and styles. Validation is an important step to guiding users on what kind of data your web application is expecting. It is also a key step in securing your application by narrowing the types of data a user can submit to your application. While validation can increase security, sanitizing data being sent to a database is still a crucial step.

Help us spread the word!
  • Twitter
  • Facebook
  • LinkedIn
  • Pinterest
  • Delicious
  • DZone
  • Reddit
  • Sphinn
  • StumbleUpon
  • Google Plus
  • RSS
  • Email
  • Print
If you liked this article, consider enrolling in one of these related courses:
Don't miss another post! Receive updates via email!

2 comments

  1. Wayne says:

    Good for a beginner, but do you really think you should teach noobs about ereg? It has been deprecated and might be removed in the next major version of PHP.

    • Michael Dorf says:

      Wayne, thank you, this is a fair point. I’ve updated the tutorial with references to preg_replace and preg_match instead of the deprecated ereg_replace and eregi.

      Michael

Comment