Regular Expresions for emailAI Pro Rules
emailAI Pro Use
The following information is basic information on Regular Expressions and the syntax used to create them. Regular Expressions are capable of more than just the following. More information on regular expressions and their construciton can be found at http://msdn2.microsoft.com/en-us/library/az24scfc(VS.71).aspx for those who wish to delve into more complex expressions.
Regular expressions in emailAI Pro are all case insensitive.
Regular expressions are used when defining rules for emailAI Pro to follow. Whether it be to send an email to spam, validate an email or to direct an email to specific person. The rule system in emailAI Pro allows for great flexibility and limitless possibilities on ways to automate email.
For more information on the emailAI Pro Rule system see the emailAI Pro Rule System section.
Quick Examples
The following examples are to demonstrate how easy it is to form basic regular expressions that can quickly identify spam.
Viagra|Cialix|Penis|Phentrimine
Will match if the email contains any of the following words. Viagra, Cialix, Penis or Phentrimine.
Viagra
Will match if the email contains the word Viagra.
(Viagra){2,}
Will match if the email contains the word Viagra two or more times.
Library
See the emailAI Pro Regular Expression Library for more examples.
Regular Expression SyntaxSyntax
Character Clases
A character class represents a set of characters that can match an input string. Combine literal characters, escape characters, and character classes to form a regular expression pattern.
The following table summarizes the character classes and their syntax.
| Character class |
Description |
|---|---|
[character_group] |
(Positive character group.) Matches any character in the specified character group. The character group consists of one or more literal characters, escape characters, character ranges, or character classes that are concatenated. For example, to specify all vowels, use [aeiou]. To specify all punctuation and decimal digit characters, code [\p{P}\d]. |
[^character_group] |
(Negative character group.) Matches any character not in the specified character group. The character group consists of one or more literal characters, escape characters, character ranges, or character classes that are concatenated. The leading carat character (^) is mandatory and indicates the character group is a negative character group instead of a positive character group. >For example, to specify all characters except vowels, use [^aeiou]. To specify all characters except punctuation and decimal digit characters, use [^\p{P}\d]. |
[firstCharacter-lastCharacter] |
(Character range.) Matches any character in a range of characters. A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. Two characters are contiguous if they have adjacent Unicode code points. Two or more character ranges can be concatenated. For example, to specify the range of decimal digits from '0' through '9', the range of lowercase letters from 'a' through 'f', and the range of uppercase letters from 'A' through 'F', use [0-9a-fA-F]. |
. |
(The period character.) Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options. Note that a period character in a positive or negative character group (a period within square brackets) is treated as a literal period character, not a character class. |
\p{name} |
Matches any character in the Unicode general category or named block specified by name (for example, Ll, Nd, Z, IsGreek, and IsBoxDrawing). |
\P{name} |
Matches any character not in Unicode general category or named block specified in name. |
\w |
Matches any word character. Equivalent to the Unicode general categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9]. |
\W |
Matches any nonword character. Equivalent to the Unicode general categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9]. |
\s |
Matches any white-space character. Equivalent to the escape sequences and Unicode general categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v]. |
\S |
Matches any non-white-space character. Equivalent to the escape sequences and Unicode general categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v]. |
\d |
Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior. |
\D |
Matches any nondigit character. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior. |
Quantifiers
Quantifiers add optional quantity data to a regular expression. A quantifier expression applies to the character, group, or character class that immediately precedes it. The .NET Framework regular expressions support minimal matching ("lazy") quantifiers.
The following table describes the metacharacters that affect matching. The quantities n and m are integer constants.
| Quantifier |
Description |
|---|---|
* |
Specifies zero or more matches; for example, \w* or (abc)*. Equivalent to {0,}. |
+ |
Specifies one or more matches; for example, \w+ or (abc)+. Equivalent to {1,}. |
? |
Specifies zero or one matches; for example, \w? or (abc)?. Equivalent to {0,1}. |
{n} |
Specifies exactly n matches; for example, (pizza){2}. |
{n,} |
Specifies at least n matches; for example, (abc){2,}. |
{n,m} |
Specifies at least n, but no more than m, matches. |
*? |
Specifies the first match that consumes as few repeats as possible (equivalent to lazy *). |
+? |
Specifies as few repeats as possible, but at least one (equivalent to lazy +). |
?? |
Specifies zero repeats if possible, or one (lazy ?). |
{n}? |
Equivalent to {n} (lazy {n}). |
{n,}? |
Specifies as few repeats as possible, but at least n (lazy {n,}). |
{n,m}? |
Specifies as few repeats as possible between n and m (lazy {n,m}). |
Alteration Constructs
The following table lists special characters that modify a regular expression to allow either/or matching.
| Alternation construct |
Definition |
|---|---|
| |
Matches any one of the terms separated by the | (vertical bar) character; for example, cat|dog|tiger. The leftmost successful match wins. |
(?(expression)yes|no) |
Matches the "yes" part if the expression matches at this point; otherwise, matches the "no" part. The "no" part can be omitted. The expression can be any valid subexpression, but it is turned into a zero-width assertion, so this syntax is equivalent to (?(?=expression)yes|no). Note that if the expression is the name of a named group or a capturing group number, the alternation construct is interpreted as a capture test (described in the next row of this table). To avoid confusion in these cases, you can spell out the inside (?=expression) explicitly. |
(?(name)yes|no) |
Matches the "yes" part if the named capture string has a match; otherwise, matches the "no" part. The "no" part can be omitted. If the given name does not correspond to the name or number of a capturing group used in this expression, the alternation construct is interpreted as an expression test (described in the preceding row of this table). |
Grouping Constructs
Grouping constructs delineate subexpressions of a regular expression and typically capture substrings of an input string. The following table describes the regular expression grouping constructs.
| Grouping construct |
Description |
|---|---|
(subexpression) |
Captures the matched subexpression (or noncapturing group; for more information, see the ExplicitCapture option in Regular Expression Options). Captures using () are numbered automatically based on the order of the opening parenthesis, starting from one. The first capture, capture element number zero, is the text matched by the whole regular expression pattern. |
(?<name>subexpression) |
Captures the matched subexpression into a group name or number name. The string used for name must not contain any punctuation and cannot begin with a number. You can use single quotes instead of angle brackets; for example, (?'name'). |
(?<name1-name2>subexpression) |
(Balancing group definition.) Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; for example, (?'name1-name2'). |
(?:subexpression) |
(Noncapturing group.) Does not capture the substring matched by the subexpression. |
(?imnsx-imnsx:subexpression) |
Applies or disables the specified options within the subexpression. For example, (?i-s: ) turns on case insensitivity and disables single-line mode. For more information, see Regular Expression Options. |
(?=subexpression) |
(Zero-width positive lookahead assertion.) Continues match only if the subexpression matches at this position on the right. For example, \w+(?=\d) matches a word followed by a digit, without matching the digit. This construct does not backtrack. |
(?!subexpression) |
(Zero-width negative lookahead assertion.) Continues match only if the subexpression does not match at this position on the right. For example, \b(?!un)\w+\b matches words that do not begin with un. |
(?<=subexpression) |
(Zero-width positive lookbehind assertion.) Continues match only if the subexpression matches at this position on the left. For example, (?<=19)99 matches instances of 99 that follow 19. This construct does not backtrack. |
(?<!subexpression) |
(Zero-width negative lookbehind assertion.) Continues match only if the subexpression does not match at the position on the left. |
(?>subexpression) |
(Nonbacktracking subexpression (also known as a "greedy" subexpression.)) The subexpression is fully matched once, and then does not participate piecemeal in backtracking. (That is, the subexpression matches only strings that would be matched by the subexpression alone.) By default, if a match does not succeed, backtracking searches for other possible matches. If you know backtracking cannot succeed, you can use a nonbacktracking subexpression to prevent unnecessary searching, which improves performance. |
Named captures are numbered sequentially, based on the left-to-right order of the opening parenthesis (like unnamed captures), but the numbering of named captures starts after all unnamed captures have been counted. For example, the pattern ((?<One>abc)\d+)?(?<Two>xyz)(.*) produces the following capturing groups by number and name. (The first capture (number 0) always refers to the entire pattern).
| Number |
Name |
Pattern |
|---|---|---|
0 |
0 (default name) |
((?<One>abc)\d+)?(?<Two>xyz)(.*) |
1 |
1 (default name) |
((?<One>abc)\d+) |
2 |
2 (default name) |
(.*) |
3 |
One |
(?<One>abc) |
4 |
Two |
(?<Two>xyz) |
Backreference Contructs
The following table lists optional parameters that add backreference modifiers to a regular expression.
| Backreference construct |
Definition |
|---|---|
\number |
Backreference. For example, (\w)\1 finds doubled word characters. |
\k<name> |
Named backreference. For example, (?<char>\w)\k<char> finds doubled word characters. The expression (?<43>\w)\43 does the same. You can use single quotes instead of angle brackets; for example, \k'char'. |