Max J Mammel
Guest Author: Max Mammel
Max Mammel is an Indianapolis-based Technical Lead of a web application development team with a global Fortune 500 firm. Max is an expert in design patterns, system architecture, and object-oriented programming and brings many years of Java know-how to SolidlyStated.com

I suspect that there are nearly as many solutions to the problem of email address validation as there are projects that require them. I’ve seen a fair amount of them myself over the years, they range from simply checking for the presence of an @ in a string to extremely complex and often flawed sub-routines designed for the task.

In my mind the most logical solution to validating email addresses is through the use of regular expressions. In this article I will present a regular expression solution that can be used in four different languages. Hopefully this will be the last email validation regex you ever need.

The Wikipedia entry on email addresses provides a nice summary of the surprisingly complex standard that governs valid and invalid email addresses, and is what I used as a guide for generating these expressions.

To summarize the standard even further, an email address takes the following basic form: ‘local_part@domain’ with separate rules for ‘local_part’ and ‘domain’.

One rule for ‘local_part’ involves the use of quoted strings, which allows the presence of some special characters as long as they are contained within double quotes. An example of a valid email address using quoted strings is this monstrosity:
mail."<some>\ <xml>\"inside\"</xml></some>"@host.com

The very standard that defines this rule recommends against its use. Another obscure rule defined in the standards involves the domain portion of the address. In addition to the normal rules governing host-names, email addresses can have IP address domains such as this: email.address@[123.123.123.123]

According to the Wikipedia article email addresses such as this are “rarely seen except in email spam” – so matching addresses such as these may have some practical applications in spam detection.

Because these two features cause the regular expression to become unwieldy I have provided three expressions for each language.

  • The basic does not allow these rare addresses.
  • The intermediate allows IP domains to be in the email address, but not quotes strings.
  • The advanced will allow both IP domains and quoted strings to appear.

BASIC VALIDATION

IP Address Domains Invalid
Quoted Strings Invalid
LANGUAGE EXPRESSION
Java
^(?:(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~])+\\.)*[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+@\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}$
JavaScript
^(?:(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~])+\\.)*[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+@\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}$
Bash
^(([-a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~])+\.)*[-a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~]+@\w((-|\w)*\w)*\.(\w((-|\w)*\w)*\.)*\w{2,4}$
PHP
/^(?:(?:[-a-zA-Z0-9!#$%&'*+\/=?^_`{|}~])+\\.)*[-a-zA-Z0-9!#$%&'*+\/=?^_`{|}~]+@\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}$/

INTERMEDIATE VALIDATION

IP Address Domains Allowed
Quoted Strings Invalid
LANGUAGE EXPRESSION
Java
^(?:(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~])+\\.)*[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+@(?:(?:\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(?:\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}))$
JavaScript
^(?:(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~])+\\.)*[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+@(?:(?:\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(?:\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}))$
Bash
^(([-a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~])+\.)*[-a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~]+@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(\w((-|\w)*\w)*\.(\w((-|\w)*\w)*\.)*\w{2,4}))$
PHP
/^(?:(?:[-a-zA-Z0-9!#$%&'*+\/=?^_`{|}~])+\\.)*[-a-zA-Z0-9!#$%&'*+\/=?^_`{|}~]+@(?:(?:\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(?:\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}))$/

ADVANCED VALIDATION

IP Address Domains Allowed
Quoted Strings Allowed
LANGUAGE EXPRESSION
Java
^(?:(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+|(?:\"(?:[-\\[\\],:;<>&@()a-zA-Z0-9!#$%&'*+/=?^_`{|}~]|(?:\\\\[\\\\ \"]))+\"))\\.)*(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+|(?:\"(?:[-\\[\\],:;<>&@()a-zA-Z0-9!#$%&'*+/=?^_`{|}~]|(?:\\\\[\\\\ \"]))+\"))@(?:(?:\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(?:\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}))$
JavaScript
^(?:(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+|(?:\"(?:[-\\[\\],:;<>&@()a-zA-Z0-9!#$%&'*+/=?^_`{|}~]|(?:\\\\[\\\\ \"]))+\"))\\.)*(?:[-a-zA-Z0-9!#$%&'*+/=?^_`{|}~]+|(?:\"(?:[-\\[\\],:;<>&@()a-zA-Z0-9!#$%&'*+/=?^_`{|}~]|(?:\\\\[\\\\ \"]))+\"))@(?:(?:\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(?:\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}))$
Bash
^(([-a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~]+|(\"([][,:;<>\&@a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~-]|(\\\\[\\ \"]))+\"))\.)*([-a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~]+|(\"([][,:;<>\&@a-zA-Z0-9\!#\$%\&\'*+/=?^_\`{\|}~-]|(\\\\[\\ \"]))+\"))@\w((-|\w)*\w)*\.(\w((-|\w)*\w)*\.)*\w{2,4}$
PHP
/^(?:(?:[-a-zA-Z0-9!#$%&'*+\/=?^_`{|}~]+|(?:\"(?:[-\\[\\],:;<>&@()a-zA-Z0-9!#$%&'*+\/=?^_`{|}~]|(?:\\\\[\\\\ \"]))+\"))\\.)*(?:[-a-zA-Z0-9!#$%&'*+\/=?^_`{|}~]+|(?:\"(?:[-\\[\\],:;<>&@()a-zA-Z0-9!#$%&'*+\/=?^_`{|}~]|(?:\\\\[\\\\ \"]))+\"))@(?:(?:\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(?:\\w(?:(?:-|\\w)*\\w)*\\.(?:\\w(?:(?:-|\\w)*\\w)*\\.)*\\w{2,4}))$/

Download Sample Scripts

I have packaged up the examples in PHP, Java, JavaScript, and Bash below.

These will help you put the expressions into your script.
Download – Java class, JavaScript HTML page, Bash shell script, and PHP script (zipped)

Try it out! – The live, working JavaScript test page.
Try it out! – The live, working PHP test page.