Input Validation: Using filter_var() Over Regular Expressions

Just about the biggest time-sink on any project, is the amount of input validation that needs to be done. You _have_ to assume your visitor is a maniac serial killer, out to destroy your application. And you have to prevent it.

Thus starts our never-ending battle for user input validation. We can't allow it all (think XSS or SQL Injection), so we check every value presented to us. Correct e-mail formats, IP's, integers, HTML-code, ....

For a long time, a generic E-mail validation Regular Expression looked like this.

$filter = "^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*(.[a-z]{2,4})$";
if (!eregi($filter, $user_email)) {
	echo "Invalid e-mail address.";
}

But using PHP's filter_var function, this can be made 100x easier!

if (!filter_var($user_email, FILTER_VALIDATE_EMAIL)) {
	echo "Invalid e-mail";
}

By passing the correct argument to the filter_var($input, $type); function, we can very quickly determine if the supplied input-variable is compliant with the input we asked, and require.

Some validation-types also allow you to pass in an extra "flag". Similar to certain "settings" for the validation. It's often easier to explain this in code, so here's an example.

$user_url = "google.be";        // Requires input with 'http://'
if (!filter_var($user_url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED)) {
	echo "Invalid URL";
}

And there's more. While there are quite a few FILTER_VALIDATE_* options, there are also FILTER_SANITIZE_* filters. These are ment to not only validate the input, but alter the input so it is compliant with the given filter.

// $user_int: the tainted input string, which needs cleansing
// $sanitized_int: the input string, stripped from anything but numbers and operators
$user_int = "1+7-3=5 and then do - 5 + 4 which equals: 4";
$sanitized_int = filter_var($user_int, FILTER_SANITIZE_NUMBER_INT);
 
// Results in: 1+7-35-5+44

Because PHP's Manual pages on filter_var() don't include a detailed list of possible validation & sanitation constants, here they are listed -- with help from W3Schools.com.

Sanitizing input

  • FILTER_SANITIZE_STRING: This filter removes data that is potentially harmful for your application. It is used to strip tags and remove or encode unwanted characters.
    Optional flags available:

    • FILTER_FLAG_NO_ENCODE_QUOTES -- This flag does not encode quotes
    • FILTER_FLAG_STRIP_LOW -- Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH -- Strip characters with ASCII value above 127
    • FILTER_FLAG_ENCODE_LOW -- Encode characters with ASCII value below 32
    • FILTER_FLAG_ENCODE_HIGH -- Encode characters with ASCII value above 127
    • FILTER_FLAG_ENCODE_AMP -- Encode the & character to &
  • FILTER_SANITIZE_STRIPPED: Alias to the FILTER_SANITIZE_STRING, shown above.
  • FILTER_SANITIZE_ENCODED: Filter strips or URL-encodes unwanted characters. Similar to urlencode().
    Optional flags available:

    • FILTER_FLAG_STRIP_LOW -- Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH -- Strip characters with ASCII value above 32
    • FILTER_FLAG_ENCODE_LOW -- Encode characters with ASCII value below 32
    • FILTER_FLAG_ENCODE_HIGH -- Encode characters with ASCII value above 32
  • FILTER_SANITIZE_SPECIAL_CHARS: Filter HTML-escapes special characters.
    Optional flags available:

    • FILTER_FLAG_STRIP_LOW -- Strip characters with ASCII value below 32
    • FILTER_FLAG_STRIP_HIGH -- Strip characters with ASCII value above 32
    • FILTER_FLAG_ENCODE_HIGH -- Encode characters with ASCII value above 32
  • FILTER_SANITIZE_EMAIL: Filter removes all illegal e-mail characters from a string.
  • FILTER_SANITIZE_URL: Filter removes all illegal URL characters from a string.
  • FILTER_SANITIZE_NUMBER_INT: Filter removes all illegal characters from a number.
  • FILTER_SANITIZE_NUMBER_FLOAT: Filter removes all illegal characters from a float number.
  • FILTER_SANITIZE_MAGIC_QUOTES: Filter performs the addslashes() function to a string.

Validating input

  • FILTER_VALIDATE_INT: Validates value as integer.
    Optional flags available:

    • min_range -- specifies the minimum integer value (code example)
    • max_range -- specifies the maximum integer value
    • FILTER_FLAG_ALLOW_OCTAL -- allows octal number values
    • FILTER_FLAG_ALLOW_HEX -- allows hexadecimal number values
  • FILTER_VALIDATE_BOOLEAN: Validates value as a boolean option.
  • FILTER_VALIDATE_FLOAT: Validates value as a float number.
  • FILTER_VALIDATE_REGEXP: Validates value against a Perl-compatible regular expression.
  • FILTER_VALIDATE_URL: Validates value as an URL.
    Optional flags available:

    • FILTER_FLAG_SCHEME_REQUIRED -- Requires URL to be an RFC compliant URL (like http://example)
    • FILTER_FLAG_HOST_REQUIRED -- Requires URL to include host name (like http://www.example.com)
    • FILTER_FLAG_PATH_REQUIRED -- Requires URL to have a path after the domain name (like www.example.com/example1/test2/)
    • FILTER_FLAG_QUERY_REQUIRED -- Requires URL to have a query string (like "example.php?name=Peter&age=37″)
  • FILTER_VALIDATE_EMAIL: Validates value as an e-mail address.
  • FILTER_VALIDATE_IP: Validates value as an IPv4 or IPv6 address.
    Optional flags available:

    • FILTER_FLAG_IPV4 -- Requires the value to be a valid IPv4 IP (like 255.255.255.255)
    • FILTER_FLAG_IPV6 -- Requires the value to be a valid IPv6 IP (like 2001:0db8:85a3:08d3:1319:8a2e:0370:7334)
    • FILTER_FLAG_NO_PRIV_RANGE -- Requires the value to be a RFC specified IP, not within a private range (like 192.168.0.1, 10.0.0.1, ...)
    • FILTER_FLAG_NO_RES_RANGE -- Requires that the value is not within the reserved IP range. This flag takes both IPV4 and IPV6 values. A reserved IP could be 255.255.255.255 (broadcast address).

An excellent source for more code examples can be found at the Devolio blog post Data Filtering Using PHPs Filter Functions (Part One).

While the filter_var() functions can't replace every possible type of input validation, it can help a great deal. Especially tricky validations such as E-mail, URL or IPv6 addresses are so much easier this way.

Looking for help?

Tired of fixing all these tech-problems yourself? We've got an excellent team at Nucleus, a top-class Belgian hosting provider, that can help you. Discover our Managed Hosting, where skilled engineers manage your servers and keep them up-to-date, so you can focus on your core business. We use a variety of Configuration Management Systems such as Puppet to make sure every config is reviewed, unit-tested and guaranteed to be working.

Want to get in touch? Find me as @mattiasgeniar on Twitter or via the contact-page on my blog.

Tagged with: , , ,
Posted in php
20 comments on “Input Validation: Using filter_var() Over Regular Expressions
  1. Anon says:

    Nice article, thanks. I’ve been coding in PHP for several years now and have never heard of this function! Much simpler than using regular expressions.

  2. Matti says:

    @Anon; I’m in the same position, and have only found out about it a couple of months ago. But it sure beats writing those regular expressions, especially when validating IPv6 addresses :-)

    It can’t replace everything yet, but it saves me quite some time!

  3. Twigglesbury says:

    On the PHP site the filter constants are here:

    http://au.php.net/manual/en/filter.constants.php

  4. Geert says:

    Still I prefer regular expressions. That way you know exactly what you are matching. Of course, be sure to put them in a helper function in order not to repeat the regexes needlessly.

    Also, according to a quick benchmark I just ran, regexes for email validation are at least twice as fast as using filter_var().

  5. Matti says:

    @Twigglesbur; there’s a list, but nothing more. It lacks a bit of explanation.

    @Geert; performance takes a small dive, but for input validation I wouldn’t make that such a big deal. It’s not like you’ll loop it 1k times, so chances are you won’t ever notice in a big project (unless you’re micro-optimizing, of course).

    You have a point in knowing what gets validated, and what doesn’t, but I personally wouldn’t weigh that against the ease of use. Once you get to the point where need specific e-mail validation, you’ll of course have to get back to regex’s.

  6. Dave says:

    @Geert: use the FILTER_VALIDATE_REGEXP then :)

    I’ve been using the filter extension since the 5.1.X days when it was a PECL extension. Very, very useful. I have it wrapped up in an object wrapper making it even easier to use; though admittedly I probably only use about a tenth of what is possible!

  7. Developer says:

    I’ve been working on PHP apps for over 5 years now, building out some serious apps and I’ve never seen / heard of this function. In the years I’ve came up with quiet a frankenfunction to do general input filtering, but now I can finally retire it. Thank you!!

  8. Tijuan says:

    filter_var was introduced with PHP 5.2, so this is not an old function, but that was one of the big feature of this release.

    Have fun and make your life easier with it ! :)

  9. Smily says:

    I made a similar discovery last year (http://fernando.id.au/2008/04/php5-filter/ note: mine is really old and out of date) except i used filter_input. It definitely is a big help, what made me nervous however was the idea that these functions were experimental. Because of this I ended up putting them inside helper functions so that if it ever did change I would only have to update the functions.

  10. eregi() is being removed from PHP for PHP6, should prolly stop using it to prevent future issues.

  11. ruby says:

    just want to ask about input validation. what is the right function that i am going to use if i am validating the user name?

  12. Matti says:

    @Ruby; that depends on your constraints. If you’re only allowing a-Z, a simple regular expression will suffice. Because a “username” isn’t a fixed subject (syntax may vary, depending on the project), I doubt there will be an appropriate filter_var() option for it.

  13. Xynder says:

    A Very Good article indeed. I’ve been head over heel trying to fix a email validation that is using a regex before I find this article. Just hoping they would create some to validate credit card number too later on :D

  14. isogashii says:

    helpful article, as you said the problem with filter_var documentation that you have to find an example to see explanation of what these FILTER constants mean, you arranged and explained them in a great way.
    Thank you.

  15. Hi..,

    This is nice article. You given the many validate functions its really helping to me..

    Thanks a lot..!

  16. Nico says:

    filter_var() accepts emails without TLD. “foo@bar” is considered valid, which kinda makes no sense to me.

  17. Matti says:

    In most cases I would agree, but consider domain (as in: active domain, windows) users with “username@domain” format. Not common in practice, but valid according to RFC’s I believe.

  18. Gary Chambers says:

    The filter_var() function in PHP 5.3.2 is broken. It will not permit domain names that contain a hyphen (e.g. http://www.this-is-my-domain.com).

  19. Steve says:

    Very informative article. I’ve known about filter_var() for quite some time but I was unclear about a few of the flags.

    Lately, I’ve rolled them all into two classes – one for sanitizing (user)data and one for validating (user)data. That saves me from needing to remember the syntax for all the ones I use a lot. Just call the appropriate class method and pass in the variable to be sanitized or validated.

  20. Alexander says:

    if i use: filter_var($email,FILTER_VALIDATE_EMAIL) then email like a someuser@somehost.com/asdfwefqeasd – valid email

11 Pings/Trackbacks for "Input Validation: Using filter_var() Over Regular Expressions"
  1. [...] is essential to maintaining your application’s security. Mattias Geniar, over on his blog, talks about using PHP’s filter_var function which is a simple pecl install filter away. He explains how [...]

  2. [...] 19)Input Validation: Using filter_var() Over Regular Expressions The biggest time-sink on any project, is the amount of input validation that needs to be done. You _have_ to assume your visitor is a maniac serial killer, out to destroy your application. And you have to prevent it. [...]

  3. [...] 19)Input Validation: Using filter_var() Over Regular Expressions The biggest time-sink on any project, is the amount of input validation that needs to be done. You _have_ to assume your visitor is a maniac serial killer, out to destroy your application. And you have to prevent it. [...]

  4. [...] Input Validation: Using filter_var() Over Regular Expressions [...]

  5. [...] 19)Input Validation: Using filter_var() Over Regular Expressions The biggest time-sink on any project, is the amount of input validation that needs to be done. You _have_ to assume your visitor is a maniac serial killer, out to destroy your application. And you have to prevent it. [...]

  6. [...] searched around a bit and saw that a lot of people are pimping PHP’s native filter_var() function as a way of validating email addresses.  It goes [...]

  7. [...] wiele wiele innych ( do pełnej listy odsyłam tu, a do opisu tu. Oprócz tego przykład działania: [...]

  8. [...] lain untuk penggunaan filter_var() bisa anda lihat di blog Matias Geniar. Jadi, mulai sekarang jangan lagi abaikan validasi karena PHP sudah menyediakan fungsi yang sangat [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>