A guide to regular expressions in JavaScript. regular expressions in javascript jquery regular expressions

Last update: 1.11.2015

Regular Expressions represent a pattern that is used to search or modify a string. To work with regular expressions in JavaScript, an object is defined Regexp.

There are two ways to define a regular expression:

Var myExp = /hello/; var myExp = new RegExp("hello");

The regular expression used here is quite simple: it consists of the single word "hello". In the first case, the expression is placed between two slashes, and in the second case, the RegExp constructor is used, in which the expression is passed as a string.

RegExp Methods

To determine if a regular expression matches a string, the RegExp object defines the test() method. This method returns true if the string matches the regular expression and false otherwise.

var initialText = "hello world!"; varexp = /hello/; var result = exp.test(initialText); document.write(result + "
"); // true initialText = "beautifull wheather"; result = exp.test(initialText); document.write(result); // false - there is no "hello" in the initialText string

The exec method works similarly - it also checks if the string matches the regular expression, only now this method returns the part of the string that matches the expression. If there are no matches, then null is returned.

var initialText = "hello world!"; varexp = /hello/; varresult = exp.exec(initialText); document.write(result + "
"); // hello initialText = "beautifull wheather"; result = exp.exec(initialText); document.write(result); // null

Character groups

A regular expression does not necessarily consist of regular strings, but can also include special elements of regular expression syntax. One such element is a group of characters enclosed in square brackets. For example:

Var initialText = "defensiveness"; var exp = /[abc]/; var result = exp.test(initialText); document.write(result + "
"); // true initialText = "city"; result = exp.test(initialText); document.write(result); // false

The expression [abc] indicates that the string must have one of three letters.

If we need to determine the presence in a string alphabetic characters from a certain range, then you can once set this range:

Var initialText = "defensiveness"; var exp = /[a-z]/; var result = exp.test(initialText); document.write(result + "
"); // true initialText = "3di0789"; result = exp.test(initialText); document.write(result); // false

In this case, the string must contain at least one character from range a-z.

If, on the contrary, it is not necessary for the string to have only certain characters, then it is necessary to put the ^ sign in square brackets before listing the characters:

Var initialText = "defensiveness"; var exp = /[^a-z]/; var result = exp.test(initialText); document.write(result + "
"); // false initialText = "3di0789"; exp = /[^0-9]/; result = exp.test(initialText); document.write(result); // true

In the first case, the string must not have only characters from the range a-z, but since the string "defense" consists only of characters from this range, the test () method returns false, that is, the regular expression does not match the string.

In the second case ("3di0789") the string must not consist only of numeric characters. But since the string also contains letters, the string matches the regular expression, so the test method returns true.

If necessary, we can collect combinations of expressions:

Var initialText = "at home"; var exp = /[dt]o[nm]/; var result = exp.test(initialText); document.write(result); // true

The expression [dt]o[nm] indicates those strings that may contain the substrings "house", "volume", "don", "tone".

Expression Properties

    The global property allows you to find all substrings that match the regular expression. By default, when searching for substrings, a regular expression selects the first substring found in a string that matches the expression. Although there can be many substrings in a string that also match the expression. For this, it is applied given property as g in expressions

    The ignoreCase property allows you to find substrings that match a regular expression, regardless of the case of the characters in the string. To do this, the character i is used in regular expressions.

    The multiline property allows you to find substrings that match a regular expression in multiline text. To do this, the symbol m is used in regular expressions.

For example:

Var initialText = "hello world"; var exp = /world/; var result = exp.test(initialText); // false

There is no match between the string and the expression here, since "World" differs from "world" in case. In this case, you need to change the regular expression by adding the ignoreCase property to it:

Varexp = /world/i;

Well, we can also use several properties at once.

A regular expression is an object that describes a character pattern. The RegExp class in JavaScript represents regular expressions, and the objects of the String and RegExp classes define methods that use regular expressions to perform pattern matching and text replacement operations.

Regular expressions are a powerful tool for processing incoming data. A task that requires text replacement or search can be beautifully solved with this “language within a language”.

Creation

In JavaScript, regular expressions are represented by RegExp objects. RegExp objects can be created using the RegExp() constructor, but more often they are created using special literal syntax. Ways to create:

// Using a regular expression literal: var re = /ab+c/;

Regular expression literals cause the regular expression to be precompiled when the script is parsed.

// Calling the constructor function of the RegExp object var re = new RegExp("ab+c");

Using the constructor entails compiling the regular expression at script time. Use this way it is necessary if it is known that the expression will change.

Special characters in regular expression

\ – For regular characters, makes them special. For example, the expression /s/ just looks for the character 's'. And if you put \ before s, then /\s/ already stands for a whitespace character.

^ – Indicates the beginning of the input data. If the multiline search flag (“m”) is set, then it will also work at the beginning of a new line.

$ – Indicates the end of the input data. If the multiline search flag is set, it will also work at the end of the line.

* – Denotes repetition of 0 or more times. For example, /bo*/ will find 'boooo' in "A ghost booooed" and 'b' in "A bird warbled" but will not find anything in "A goat grunted".

+ – Denotes repetition of 1 or more times. Equivalent to (1,). For example, /a+/ will find 'a' in 'candy' and all 'a' in 'caaaaaaandy'.

? – Indicates that the element may or may not be present.

. – (Decimal point) denotes any character other than a newline: \n \r \u2028 or \u2029. (you can use [\s\S] to search for any character, including newlines).

(x)– Finds x and remembers. This is called "remember brackets". For example, /(foo)/ will find and remember ‘foo’ in “foo bar.” The found substring is stored in the search result array or in the predefined properties of the RegExp object: $1, ..., $9.

(?:x)– Finds x, but does not remember what was found. This is called "unremembered parentheses". The found substring is not stored in the result array and RegExp properties. Like all parentheses, combine what is in them into a single subpattern.

x(?=y)– Finds x only if x is followed by y. For example, /Jack(?=Sprat)/ will only find 'Jack' if it is followed by 'Sprat'. /Jack(?=Sprat|Frost)/ will only match 'Jack' if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' will appear in the search result.

x(?!y)– Finds x only if x is not followed by y. For example, /\d+(?!\.)/ will only match a number if it is not followed by a decimal point. /\d+(?!\.)/.exec(“3.141”) will find 141 but not 3.141.

x|y– Finds x or y. For example, /green|red/ will match 'green' in “green apple” and 'red' in “red apple.”

(n)– A positive integer. Finds exactly n repetitions of the preceding element.

(n,)– A positive integer. Finds n or more occurrences of an element.

(n,m)– Positive integers. Find from n to m repetitions of an element.

– Character set. Finds any of the listed characters. You can specify a span using a dash. For example, is the same as .

[^xyz]– Any character other than those specified in the set. You can also specify a span. For example, [^abc] is the same as [^a-c].

[\b]– Finds a backspace character.

\b– Finds the boundary of words (Latin).

\B– Denotes not a word boundary. For example, /\w\Bn/ will match 'on' in "noonday" and /y\B\w/ will match 'ye' in "possibly yesterday."

\cX– X is a letter from A to Z. Designates a control character in a string. For example, /\cM/ stands for the Ctrl-M character.

\d– Finds a number from any alphabet.

\D– Finds a non-numeric character (all alphabets). [^0-9] is the equivalent for regular digits.

\f,\r,\n– Matching special characters form-feed, line-feed, linefeed.

\s– Matches any whitespace character, including spaces, tabs, newlines, and other unicode whitespace characters.

\S– Matches any character except whitespace.

\t- Tab character.

\v– Vertical tab character.

\w– Matches any (Latin) word character, including letters, numbers, and underscores. Equivalent to .

\W– Matches any (non-Latin) word character. Equivalent to [^A-Za-z0-9_].

\0 – Finds the NUL character.

\xhh– Looks for character with code hh (2 hexadecimal digits).

\uhhhh– Looks for character with code hhhh (4 hexadecimal digits).

Flags

Regular expression flags define high-level pattern matching rules. Unlike the rest of regular expression grammar, the flags are not specified between the slash characters, but after the second one. The JavaScript language supports three flags.

Flag i specifies that pattern matching should be case-insensitive, and the flag g– that the search must be global, i.e. all matches in the string must be found. Flag m performs a pattern search in multiline mode. If the string expression being searched contains newlines, then in this mode, the anchor characters ^ and $, in addition to matching the beginning and end of the entire string expression, also match the beginning and end of each text line. Flags can be combined in any combination.

String class methods

Strings support four methods using regular expressions.

search() method

It takes a regular expression as an argument and returns either the position of the first character of the substring found, or -1 if no match is found. For example, the following call will return 4:

Varresult = "JavaScript".search(/script/i); // four

If the search() method argument is not a regular expression, it is first converted by passing it to the RegExp constructor. The search() method does not support global searches and ignores the flag g in your argument.

replace() method

It performs a search and replace operation. It takes a regular expression as its first argument and a replacement string as its second. The method searches the string for which it is called to match the specified pattern. If the regular expression contains the g flag, the replace() method replaces all matches found with the replacement string. Otherwise, it replaces only the first match found.

match() method

It takes a regular expression as its only argument (or converts its argument to a regular expression by passing it to the RegExp() constructor) and returns an array containing the search results. If the g flag is set in the regular expression, the method returns an array of all matches present in the string. For example:

// returns ["1", "2", "3"] var result = "1 plus 2 equals 3".match(/\d+/g);

If the regular expression does not contain the g flag, the match() method does not perform a global search; it just looks for the first match. However, match() returns an array even when the method does not perform a global search. In this case, the first element of the array is the found substring, and all remaining elements are subexpressions of the regular expression.

split() method

This method splits the string it is called on into an array of substrings, using the argument as the delimiter. For example:

"123,456,789".split(","); // Returns ["123","456","789"]

The split() method can also take a regular expression as an argument. This makes the method more powerful.

Regexp object

The RegExp() constructor takes one or two string arguments and creates a new RegExp object. The first argument to the constructor is a string containing the body of the regular expression, i.e. the text that must appear between the slashes in the regular expression literal. The second argument to RegExp() may be missing. If specified, it specifies the regular expression flags. It must be one of the characters g, i, m or a combination of these characters.

Regexp properties

Each RegExp object has five properties:

  • source– a read-only string containing the text of the regular expression.
  • globalboolean value A read-only , indicating the presence of the flag g in a regular expression.
  • ignoreCase i in a regular expression.
  • multiline is a read-only boolean indicating the presence of the flag m in a regular expression.
  • lastIndex is an integer that can be read and written. For flag templates g this property contains the position number in the string at which the next search should start.

RegExp Methods

RegExp objects define two methods that perform pattern matching.

exec() method

The exec() method executes the regular expression for the specified string, i.e. looks for a match in a string. If no match is found, the method returns null. However, if a match is found, it returns the same array as the array returned by the match() method for searching without the flag g.

The zero element of the array contains the string that matches the regular expression, and all subsequent elements are substrings that match all subexpressions. Unlike match(), the exec() method returns an array whose structure does not depend on the presence of the flag in the regular expression g.

When the exec() method is called for the same regular expression a second time, it starts searching at the character position specified in the lastIndex property. If exec() does not find a match, the lastIndex property is set to 0.

test() method

It takes a string and returns true if the string matches the regular expression:

Varpattern = /java/i; pattern.test("JavaScript"); // Return true

Calling test() is equivalent to calling exec() returning true if exec() returns non-null. For this reason, the test() method behaves in the same way as the exec() method when called on a global regular expression: it starts searching for the specified string at the position specified by the lastIndex property, and if it finds a match, it sets the lastIndex property to the character position number directly following the found match.

Writing a Template

A regular expression pattern consists of ordinary characters, such as /abc/, or combinations of ordinary and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/. The last example includes parentheses, which are used as a "memory mechanism". Matching this part of the pattern is remembered for later use.

Using Simple Templates

Simple patterns are used to find direct matches in text. For example, the pattern /abc/ matches a combination of characters in a string only when the characters 'abc' occur together and in the same order.

Regular Expressions

Regular expression is an object that describes a character pattern. The RegExp class in JavaScript represents regular expressions, and the objects of the String and RegExp classes define methods that use regular expressions to perform pattern matching and text replacement operations. The regular expression grammar in JavaScript contains a fairly complete subset of the regular expression syntax used in Perl 5, so if you're familiar with Perl, you should be able to write patterns in JavaScript programs with ease.

Features of Perl regular expressions that are not supported in ECMAScript include the s (single-line mode) and x (extended syntax) flags; the escape sequences \a, \e, \l, \u, \L, \U, \E, \Q, \A, \Z, \z, and \G, and other extended constructs beginning with (?.

Defining regular expressions

In JavaScript, regular expressions are represented by objects. Regexp. RegExp objects can be created using the RegExp() constructor, but more often they are created using special literal syntax. Just as string literals are specified as characters enclosed in quotation marks, regular expression literals are specified as characters enclosed in a pair of slash characters (/). Thus, JavaScript code may contain lines similar to this:

Varpattern = /s$/;

This line creates a new RegExp object and assigns it to the pattern variable. This object RegExp looks for any line ending with "s". The same regular expression can be defined using the RegExp() constructor:

Varpattern = new RegExp("s$");

A regular expression pattern specification consists of a sequence of characters. Most characters, including all alphanumeric characters, literally describe the characters that must be present. That is, the regular expression /java/ matches all strings containing the substring "java".

Other characters in regular expressions are not meant to be searched for their exact equivalents, but have special meaning. For example, the regular expression /s$/ contains two characters. The first character s denotes a search for a literal character. The second, $, is a special metacharacter that marks the end of a line. So this regular expression matches any string that ends with s.

The following sections describe the various characters and metacharacters used in JavaScript regular expressions.

Literal symbols

As noted earlier, all alphabetic characters and numbers in regular expressions match themselves. JavaScript's regular expression syntax also supports the ability to specify certain non-alphabetic characters using escape sequences that begin with a backslash character (\). For example, the sequence \n matches the newline character. These characters are listed in the table below:

Some punctuation marks have special meaning in regular expressions:

^ $ . * + ? = ! : | \ / () { } -

The meaning of these symbols is explained in the following sections. Some of them have special meaning only in certain contexts of regular expressions, while in other contexts they are taken literally. However, in general, to include any of these characters literally in a regular expression, you must precede it with a backslash character. Other characters, such as quotation marks and @, have no special meaning and simply match themselves in regular expressions.

If you can't remember exactly which character should be preceded by a \, you can safely put a backslash before any of the characters. However, keep in mind that many letters and numbers take on special meanings along with the slash, so letters and numbers that you look for literally should not be preceded by a \. To include the backslash character itself in the regular expression, you obviously have to precede it with another backslash character. For example, the following regular expression matches any string containing a backslash character: /\\/.

Character classes

Individual literal characters can be combined into character classes by enclosing them in square brackets. A character class matches any character contained in that class. Therefore, the regular expression // matches one of the characters a, b, or c.

Negated character classes can also be defined, matching any character other than those given in parentheses. The negated character class is specified by the ^ character as the first character following the left parenthesis. The regular expression /[^abc]/ matches any character other than a, b, or c. In character classes, a range of characters can be specified with a hyphen. Search for all characters of the Latin alphabet in lower case is done with the // expression, and any letter or number from the Latin character set can be found with the // expression.

Certain character classes are used particularly frequently, so JavaScript's regular expression syntax includes Special symbols and escape sequences to denote them. For example, \s matches spaces, tabs, and any whitespace characters from the Unicode set, and \S matches any non-whitespace characters from the Unicode set.

The table below lists these special characters and the syntax of the character classes. (Note that some of the character class escape sequences only match ASCII characters and are not extended to work with Unicode characters. You can explicitly define your own Unicode character classes, for example /[\u0400-\u04FF]/ matches any character Cyrillic.)

JavaScript regular expression character classes
Symbol Conformity
[...] Any of the characters in brackets
[^...] Any of the characters not in the brackets
. Any character other than a newline or other Unicode string delimiter
\w Any ASCII text character. Equivalently
\W Any character that is not an ASCII text character. Equivalent to [^a-zA-Z0-9_]
\s Any whitespace character from the Unicode set
\S Any non-whitespace character from the Unicode character set. Note that \w and \S are not the same
\d Any ASCII digits. Equivalently
\D Any character other than ASCII digits. Equivalent to [^0-9]
[\b] backspace character literal

Note that class special character escape sequences can be enclosed in square brackets. \s matches any whitespace character and \d matches any digit, so /[\s\d]/ matches any whitespace character or digit.

Repetition

With the knowledge of regular expression syntax gained so far, we can describe a two-digit number as /\d\d/ or a four-digit number as /\d\d\d\d/, but we cannot, for example, describe a number consisting of any number of digits, or a string of three letters followed by an optional digit. These more complex patterns use regular expression syntax to specify how many times a given regular expression element can be repeated.

Symbols denoting repetition always follow the pattern they apply to. Some kinds of repetitions are used quite often, and there are special symbols for these cases. For example, + matches one or more instances of the previous pattern. AT following table here is a summary of the repetition syntax:

The following lines show some examples:

Varpattern = /\d(2,4)/; // Matches a number containing two to four digits pattern = /\w(3)\d?/; // Matches exactly three word characters and one optional digit pattern = /\s+java\s+/; // Matches the word "java" with one or more spaces // before and after it pattern = /[^(]*/; // Matches zero or more characters other than the opening parenthesis

Be careful when using the repeat characters * and ?. They can match the absence of a pattern preceded by them, and therefore the absence of characters. For example, the regular expression /a*/ matches the string "bbbb" because it does not contain the character a.

The repetition characters listed in the table correspond to the maximum possible number of repetitions, which ensures the search for subsequent parts of the regular expression. We say that this is a "greedy" repetition. It is also possible to implement repetition in a "non-greedy" way. It is enough to specify a question mark after the character (or characters) of repetition: ??, +?, *? or even (1,5)?.

For example, the regular expression /a+/ matches one or more instances of the letter a. Applied to the string "aaa", it matches all three letters. On the other hand, /a+?/ matches one or more instances of the letter a and selects the fewest possible number of characters. Applied to the same string, this pattern only matches the first letter a.

"Non-greedy" repetition does not always give the expected result. Consider the pattern /a+b/ that matches one or more a's followed by a b. For the string "aaab", it matches the entire string.

Now let's check the "non-greedy" version of /a+?b/. One might think that it should match a b preceded by only one a. If applied to the same string, "aaab" would be expected to match a single a and the last b. However, in fact, the entire string matches this pattern, as in the case of the "greedy" version. This is because the regular expression pattern search is performed by finding the first position in the string from which a match becomes possible. Since a match is possible starting from the first character of the string, shorter matches starting from subsequent characters are not even considered.

Alternatives, grouping and links

The grammar of regular expressions includes special characters for defining alternatives, grouping subexpressions, and references to previous subexpressions. Symbol vertical bar| serves to separate alternatives. For example, /ab|cd|ef/ matches either the string "ab" or the string "cd" or the string "ef" and the pattern /\d(3)|(4)/ matches either three digits or four lowercase letters .

Note that the alternatives are processed from left to right until a match is found. If a match is found with the left alternative, the right alternative is ignored, even if a "better" match can be achieved. So when /a|ab/ is applied to the string "ab", it will only match the first character.

Parentheses have multiple meanings in regular expressions. One of them is grouping individual elements into one subexpression, so that elements when using the special characters |, *, +, ? and others are treated as one. For example, the pattern /java(script)?/ matches the word "java" followed by the optional word "script", and /(ab|cd)+|ef)/ matches either the string "ef" or one or more repetitions of one from the strings "ab" or "cd".

Another use of parentheses in regular expressions is to define subpatterns within a pattern. When a regular expression match is found in the target string, the part of the target string that matches any particular parenthesized subpattern can be extracted.

Suppose you want to find one or more lowercase letters followed by one or more numbers. You can use the pattern /+\d+/ for this. But suppose also that we only want the numbers at the end of each match. If we place this part of the pattern in parentheses (/+(\d+)/), then we can extract numbers from any matches we find. How this is done will be described below.

Related to this is another use of parenthesized subexpressions to refer to subexpressions from the previous part of the same regular expression. This is achieved by specifying one or more digits after the \ character. The numbers refer to the position of the parenthesized subexpression within the regular expression. For example, \1 refers to the first subexpression, and \3 refers to the third. Note that subexpressions can be nested, so the position of the left parenthesis is used in the count. For example, in the following regular expression, the reference to the nested subexpression (script) will look like \2:

/(ava(script)?)\sis\s(fun\w*)/

The reference to the previous subexpression does not point to the template of this subexpression, but to the found text that matches this template. Therefore, references can be used to impose a constraint that selects parts of a string that contain exactly the same characters. For example, the following regular expression matches zero or more characters inside single or double quotes. However, it does not require that the opening and closing quotes match (that is, that both quotes be single or double):

/[""][^""]*[""]/

We can require quotes to match with this reference:

Here \1 matches the first subexpression match. In this example, the link imposes a constraint requiring that the closing quote match the opening quote. This regular expression does not allow single quotes inside double quotes, and vice versa.

It is also possible to group elements in a regular expression without creating a numbered reference to those elements. Instead of simply grouping elements between (and) start the group with characters (?: and end it with a character). Consider, for example, the following template:

/(ava(?:script)?)\sis\s(fun\w*)/

Here the subexpression (?:script) is needed only for grouping, so that the repetition character ? can be applied to the group. These modified parentheses do not create a link, so \2 in this regular expression refers to text that matches the pattern (fun\w*).

The following table lists the select-from-alternatives, grouping, and referencing operators in regular expressions:

Regular expression characters select from alternatives, grouping, and JavaScript links
Symbol Meaning
| Alternative. Matches either the subexpression on the left or the subexpression on the right.
(...) Grouping. Groups elements into a single entity that can be used with *, +, ?, | etc. Also remembers the characters corresponding to this group for use in subsequent links.
(?:...) Grouping only. Groups elements together, but does not remember the characters corresponding to this group.
\number Matches the same characters that were found when matched against group number number. Groups are subexpressions inside brackets (possibly nested). Group numbers are assigned by counting the left brackets from left to right. Groups formed with (?:) characters are not numbered.

Specifying a match position

As described earlier, many elements of a regular expression match a single character in a string. For example, \s matches one whitespace character. Other elements of regular expressions match positions between characters, not the characters themselves. For example, \b matches a word boundary - a boundary between \w (an ASCII text character) and \W (a non-text character), or a boundary between an ASCII text character and the beginning or end of a line.

Elements such as \b do not define any characters that must be present in the matched string, but they do define valid positions for matching. These elements are sometimes called regular expression anchor elements because they anchor the pattern to a specific position in the string. More often than others, anchor elements such as ^ and $ are used, which anchor patterns to the beginning and end of the line, respectively.

For example, the word "JavaScript" on a line of its own can be matched with the regular expression /^JavaScript$/. To find a single word "Java" (rather than a prefix, for example in the word "JavaScript"), you can try using the pattern /\sJava\s/, which requires a space before and after the word.

But this solution raises two problems. First, it will only find the word "Java" if it is surrounded by spaces on both sides, and will not find it at the beginning or end of the string. Second, when this pattern does match, the string it returns will contain leading and trailing spaces, which is not exactly what we want. So instead of a pattern that matches whitespace characters \s, we'll use a pattern (or anchor) that matches word boundaries \b. The following expression will be obtained: /\bJava\b/.

The anchor element \B matches a position that is not a word boundary. That is, the pattern /\Bcript/ will match the words "JavaScript" and "postscript" and will not match the words "script" or "Scripting".

Arbitrary regular expressions can also act as anchor conditions. Putting an expression between the characters (?= and) turns it into a look-ahead match for subsequent characters, requiring those characters to match the specified pattern but not included in the match string.

For example, to match the name of a common programming language followed by a colon, you can use the expression /ava(script)?(?=\:)/. This pattern matches the word "JavaScript" in the string "JavaScript: The Definitive Guide", but it will not match the word "Java" in the string "Java in a Nutshell" because it is not followed by a colon.

If you enter the condition (?!, then this will be a negative look-ahead test for subsequent characters, requiring that the following characters do not match the specified pattern. For example, the pattern /Java(?!Script)(\w*)/ matches the substring "Java", followed by an uppercase letter and any number of ASCII text characters, provided that the substring "Java" is not followed by the substring "Script" It matches the string "JavaBeans" but does not match the string "Javanese", matches the string "JavaScrip ' but will not match the strings 'JavaScript' or 'JavaScripter'.

The table below lists the anchor characters in regular expressions:

Regular Expression Anchors
Symbol Meaning
^ Matches the beginning of a string expression, or the beginning of a string in a multiline search.
$ Matches the end of a string expression, or the end of a string in a multiline search.
\b Matches a word boundary, i.e. matches the position between the \w character and the \W character, or between the \w character and the beginning or end of the string. (However, note that [\b] matches a backspace character.)
\B Matches a position that is not a word boundary.
(?=p) Positive look-ahead check for subsequent characters. Requires subsequent characters to match the pattern p, but does not include those characters in the found string.
(?!p) Negative look-ahead check for subsequent characters. Requires that the following characters do not match the pattern p.

Flags

And one last element of the grammar of regular expressions. Regular expression flags define high-level pattern matching rules. Unlike the rest of regular expression grammar, the flags are not specified between the slash characters, but after the second one. The JavaScript language supports three flags.

flag i specifies that the pattern search should be case-insensitive, and g flag- that the search must be global, i.e. all matches in the string must be found. flag m performs a pattern search in multiline mode. If the string expression being searched contains newlines, then in this mode, the anchor characters ^ and $, in addition to matching the beginning and end of the entire string expression, also match the beginning and end of each text line. For example, the pattern /java$/im matches both "java" and "Java\nis fun".

These flags can be combined in any combination. For example, to search for the first occurrence of the word "java" (or "Java", "JAVA", etc.) in a case-insensitive manner, you can use the case-insensitive regular expression /\bjava\b/i. And to find all occurrences of this word in a string, you can add the flag g: /\bjava\b/gi.

String class methods for pattern matching

Up to this point, we've discussed the grammar of generated regular expressions, but we haven't looked at how these regular expressions can actually be used in JavaScript scripts. In this section, we'll discuss methods on the String object that use regular expressions for pattern matching as well as for searching and replacing. And then we'll continue talking about pattern matching with regular expressions by looking at the RegExp object, its methods, and properties.

Strings support four methods using regular expressions. The simplest of these is the method search(). It takes a regular expression as an argument and returns either the position of the first character of the substring found, or -1 if no match is found. For example, the following call will return 4:

Varresult = "JavaScript".search(/script/i); // four

If the search() method argument is not a regular expression, it is first converted by passing it to the RegExp constructor. The search() method does not support global searches and ignores the g flag in its argument.

Method replace() performs a search and replace operation. It takes a regular expression as its first argument and a replacement string as its second. The method searches the string for which it is called to match the specified pattern.

If the regular expression contains the g flag, the replace() method replaces all matches found with the replacement string. Otherwise, it replaces only the first match found. If the first argument of the replace() method is a string rather than a regular expression, then the method performs a literal search for the string, rather than converting it to a regular expression using the RegExp() constructor, as the search() method does.

As an example, we can use the replace() method to uniformly capitalize the word "JavaScript" for an entire line of text:

// Irrespective of the case of characters, we replace the word in the desired case var result = "javascript".replace(/JavaScript/ig, "JavaScript");

The replace() method is more powerful than this example would suggest. Recall that parenthesized subexpressions within a regular expression are numbered from left to right, and that the regular expression remembers the text that matches each of the subexpressions. If the replacement string contains a $ sign followed by a number, the replace() method replaces those two characters with the text that matches the specified subexpression. This is a very useful feature. We can use it, for example, to replace straight quotes in a string with typographical quotes that mimic ASCII characters:

// A quote is a quote followed by any number of characters // other than quotes (we remember them), these characters // are followed by another quote var quote = /"([^"]*)"/g; // Replace straight quotes with typographic ones and leave "$1" unchanged // Quote content stored in $1 var text = ""JavaScript" is an interpreted programming language."; var result = text.replace(quote, ""$1"") ; // "JavaScript" is an interpreted programming language.

An important thing to note is that the second argument to replace() can be a function that dynamically calculates the replacement string.

Method match() is the most general of the String class methods that use regular expressions. It takes a regular expression as its only argument (or converts its argument to a regular expression by passing it to the RegExp() constructor) and returns an array containing the search results. If the g flag is set in the regular expression, the method returns an array of all matches present in the string. For example:

// returns ["1", "2", "3"] var result = "1 plus 2 equals 3".match(/\d+/g);

If the regular expression does not contain the g flag, the match() method does not perform a global search; it just looks for the first match. However, match() returns an array even when the method does not perform a global search. In this case, the first element of the array is the found substring, and all remaining elements are subexpressions of the regular expression. Therefore, if match() returns an array arr, then arr will contain the entire string found, arr the substring corresponding to the first subexpression, and so on. Drawing a parallel with the replace() method, we can say that arr[n] is filled with the contents of $n.

For example, take a look at the following code that parses a URL:

Var url = /(\w+):\/\/([\w.]+)\/(\S*)/; var text = "Visit our site http://www..php"; var result = text match(url); if (result != null) ( var fullurl = result; // Contains "http://www..php" var protocol = result; // Contains "http" var host = result; // Contains "www..php ")

Note that for a regular expression that does not have the global search g flag set, the match() method returns the same value as the regular expression's exec() method: the returned array has index and input properties, as described in the discussion of the exec( ) below.

The last method of the String object that uses regular expressions is split(). This method splits the string it is called on into an array of substrings, using the argument as the delimiter. For example:

"123,456,789".split(","); // Returns ["123","456","789"]

The split() method can also take a regular expression as an argument. This makes the method more powerful. For example, you can specify a delimiter that allows an arbitrary number of whitespace characters on both sides:

"1, 2, 3 , 4 , 5".split(/\s*,\s*/); // Returns ["1","2","3","4","5"]

Regexp object

As mentioned, regular expressions are represented as RegExp objects. In addition to the RegExp() constructor, RegExp objects support three methods and several properties.

The RegExp() constructor takes one or two string arguments and creates a new RegExp object. The first argument to the constructor is a string containing the body of the regular expression, i.e. the text that must appear between the slashes in the regular expression literal. Note that string literals and regular expressions use the \ character to denote escape sequences, so when passing a regular expression as a string literal to the RegExp() constructor, you must replace each \ character with a pair of \\ characters.

The second argument to RegExp() may be missing. If specified, it specifies the regular expression flags. It must be one of the characters g, i, m, or a combination of these characters. For example:

// Finds all five digit numbers in a string. Note // the use of \\ characters in this example var zipcode = new RegExp("\\d(5)", "g");

The RegExp() constructor is useful when a regular expression is generated dynamically and therefore cannot be represented using regular expression literal syntax. For example, to find a string entered by the user, you must create a regular expression at run time using RegExp().

Regexp properties

Each RegExp object has five properties. Property source- a read-only string containing the text of the regular expression. Property global is a read-only boolean value that specifies the presence of the g flag in the regular expression. Property ignoreCase is a read-only boolean that specifies whether the i flag is present in the regular expression. Property multiline is a read-only boolean that specifies whether the m flag is present in the regular expression. And the last property lastIndex is a read/write integer. For patterns with the g flag, this property contains the number of the position in the string at which the next search should start. As described below, it is used by the exec() and test() methods.

RegExp Methods

RegExp objects define two methods that perform pattern matching; they behave similarly to the String class methods described above. The main method of the RegExp class used for pattern matching is exec(). It is similar to the String class match() method mentioned, except that it is a RegExp class method that takes a string as an argument, rather than a String class method that takes a RegExp argument.

The exec() method executes the regular expression for the specified string, i.e. looks for a match in a string. If no match is found, the method returns null. However, if a match is found, it returns the same array as the array returned by the match() method for searching without the g flag. The zero element of the array contains the string that matches the regular expression, and all subsequent elements are substrings that match all subexpressions. In addition, the property index contains the position number of the character with which the corresponding fragment begins, and the property input refers to the string being searched.

Unlike match(), the exec() method returns an array whose structure does not depend on the presence of the g flag in the regular expression. Let me remind you that when passing a global regular expression, the match() method returns an array of found matches. And exec() always returns one match, but provides complete information about it. When exec() is called on a regular expression containing the g flag, the method sets the lastIndex property of the regular expression object to the position number of the character immediately following the matched substring.

When the exec() method is called for the same regular expression a second time, it starts searching at the character position specified in the lastIndex property. If exec() does not find a match, the lastIndex property is set to 0. (You can also set lastIndex to zero at any time, which you should do in all cases where the search ends before the last match in one line is found, and the search starts on another string with the same RegExp object.) This special behavior allows you to call exec() repeatedly to iterate over all matches of the regular expression in the string. For example:

Varpattern = /Java/g; var text = "JavaScript is more fun than Java!"; varresult; while((result = pattern.exec(text)) != null) ( console.log("Found "" + result + """ + " at position " + result.index + "; next search will start at " + pattern .lastIndex); )

Another method of RegExp object - test(), which is much easier method exec(). It takes a string and returns true if the string matches the regular expression:

Varpattern = /java/i; pattern.test("JavaScript"); // Return true

Calling test() is equivalent to calling exec() returning true if exec() returns non-null. For this reason, the test() method behaves in the same way as the exec() method when called on a global regular expression: it starts searching for the specified string at the position specified by the lastIndex property, and if it finds a match, it sets the lastIndex property to the character position number directly following the found match. Therefore, using the test() method, you can also form a string traversal loop, as using the exec() method.

The RegExp class in JavaScript is a regular expression - an object that describes a character pattern. RegExp objects are typically created using the special literal syntax shown below, but can also be created using the RegExp() constructor.

Syntax

// using special literal syntax var regex = /pattern /flags ; // using the constructor var regex = new RegExp("pattern ", "flags "); var regex = new RegExp(/pattern /, "flags");

Parameter values:

Regular expression flags

FlagDescription
gAllows you to find all matches instead of stopping after the first match ( global match flag).
iAllows case-insensitive matching ( ignore case flag).
mThe match is made across multiple lines. Processing of leading and trailing characters (^ and $) is done across multiple lines, i.e. matching occurs at the beginning or end of each line (separators \n or \r), not just the beginning or end of the entire line ( multiline flag).
uThe pattern will be treated as a sequence of Unicode code points ( unicode flag).
yMatching occurs at the index pointed to by the lastIndex property of this regular expression, while matching is not performed at a later or earlier index ( sticky flag).

Character sets

Metacharacters

SymbolDescription
. Matches one character other than newline or end-of-line (\n, \r, \u2028, or \u2029).
\dAllows you to find a digit character in the basic Latin alphabet. Equivalent to using the character set .
\DAllows you to find any character that is not a digit in the basic Latin alphabet. Equivalent to the character set [^0-9].
\sAllows you to find a single whitespace character. A whitespace character refers to space, tab, pagefeed, linefeed, and other Unicode whitespace characters. Equivalent to the character set [\f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​ \u2028\u2029​​\u202f\u205f​\u3000].
\SAllows you to find a single character that is not whitespace. A whitespace character refers to space, tab, pagefeed, linefeed, and other Unicode whitespace characters. Equivalent to the character set [^ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a ​\u2028\u2029​​\u202f\u205f​\u3000].
[\b]Allows you to find the backspace character (special character \b, U+0008).
\0 Allows you to find the character 0 (zero).
\nAllows you to find the newline character.
\fAllows you to find the page translation character.
\rFinds a carriage return character.
\tFinds a horizontal tab character.
\vFinds a vertical tab character.
\wAllows you to find any alphanumeric character of the basic Latin alphabet, including the underscore. Equivalent to the character set .
\WAllows you to find any character that is not a character from the basic Latin alphabet. Equivalent to the character set [^a-Za-z0-9_].
\cXAllows you to find a control character in a string. Where X is a letter from A to Z. For example, /\cM/ stands for the Ctrl-M character.
\xhhAllows you to find a character using hex value(hh is a two-digit hexadecimal value).
\uhhhhAllows you to find a character using UTF-16 encoding (hhhh is a four-digit hexadecimal value).
\u(hhhh) or
\u(hhhhh)
Finds a character with a Unicode value of U+hhhh or U+hhhhh (hex value). Only when the u flag is set.
\ Indicates that the following character is special and should not be interpreted literally. For characters that are normally treated in a special way, indicates that the following character is not special and should be interpreted literally.

Restrictions

Quantifiers

SymbolDescription
n*Matches any string containing zero or more occurrences of a character n.
n+The match occurs on any string containing at least one character n.
n?Matches any string with a preceding element n zero or one time.
n(x)Matches any string containing a sequence of characters n a certain number of times x. X
n(x,) x occurrences of the preceding element n. X must be a positive integer.
n(x, y)Matches any string containing at least x, but no more than y occurrences of the preceding element n. X and y must be positive integers.
n*?
n+?
n??
n(x)?
n(x,)?
n(x,y)?
Matching occurs by analogy with the quantifiers *, +, ? and (...), however, the search is for the smallest possible match. The default is "greedy" mode, ? at the end of the quantifier allows you to specify a "non-greedy" mode in which the repetition of the match occurs the minimum possible number of times.
x(?=y)Lets match x, only if for x should y.
x(?!y)Lets match x, only if for x it does not follow y.
x|yMatching occurs against any of the specified alternatives.

Grouping and Backlinks

SymbolDescription
(x)Lets find a symbol x and remember the result of the match ("capturing parentheses"). The matched substring can be called from the elements of the resulting array ..., [n], or from the properties of the predefined RegExp object $1 ..., $9.
(?:x)Lets find a symbol x, but don't remember the result of the match ("non-capturing parentheses"). The matched substring cannot be called from the elements of the resulting array ..., [n], or from the properties of the predefined RegExp object $1 ..., $9.
\nBackreference to the last substring that matches the nth substring in parentheses in the regular expression (brackets are numbered from left to right). n must be a positive integer.

Modifiers

The minus (-) character preceded by a modifier (with the exception of U) creates its negation.

Special characters

AnalogDescription
() subpattern, nested expression
wildcard
(a,b) number of occurrences from "a" to "b"
| logical "or", in the case of single-character alternatives, use
\ special character escaping
. any character except newline
\d decimal digit
\D[^\d]any character other than a decimal digit
\f end (break) of the page
\n line translation
\pL UTF-8 encoded letter when using the u modifier
\r carriage return
\s[\t\v\r\n\f]space character
\S[^\s]any character except promel
\t tabulation
\w any number, letter, or underscore
\W[^\w]any character other than a number, letter, or underscore
\v vertical tab

Special characters inside a character class

Position within a string

ExampleConformityDescription
^ ^aa aa aaastart of line
$ a$aaa aa a end of line
\A\Aaa aa aaa
aaa aaa
beginning of text
\za\zaaa aaa
aaa aa a
end of text
\ba\b
\ba
aa a aa a
a aa a aa
word boundary, assertion: the previous character is a word, but the next is not, or vice versa
\B\Ba\Ba a a a a ano word boundary
\G\Gaaaa aaaPrevious successful search, search stopped at position 4 - where a was not found
Download in PDF, PNG.

Anchors

Anchors in regular expressions indicate the beginning or end of something. For example, lines or words. They are represented by certain symbols. For example, a pattern matching a string that starts with a number would look like this:

Here the ^ character denotes the beginning of a line. Without it, the pattern would match any string containing a digit.

Character classes

Character classes in regular expressions match at once a certain set of characters. For example, \d matches any digit from 0 to 9 inclusive, \w matches letters and numbers, and \W matches all characters except letters and numbers. The pattern identifying letters, numbers, and spaces looks like this:

POSIX

POSIX is a relatively new addition to the regular expression family. The idea, as with character classes, is to use abbreviations that represent some group of characters.

Statements

At first, almost everyone has difficulty understanding the statements, but as you become more familiar with them, you will use them quite often. Assertions provide a way to say "I want to find every word in this document that includes the letter 'q' that is not followed by 'werty'".

[^\s]*q(?!werty)[^\s]*

The above code starts by looking for any characters other than a space ([^\s]*) followed by q . The parser then reaches the "forward looking" statement. This automatically makes the preceding element (character, group, or character class) conditional—it will only match the pattern if the assertion is true. In our case, the statement is negative (?!), i.e., it will be true if what it is looking for is not found.

So, the parser checks the next few characters according to the proposed pattern (werty). If they are found, then the statement is false, which means that the character q will be "ignored", that is, it will not match the pattern. If werty is not found, then the statement is true, and everything is fine with q. It then continues searching for any characters other than a space ([^\s]*).

quantifiers

Quantifiers allow you to define a part of a pattern that must be repeated multiple times in a row. For example, if you want to find out if a document contains a string of 10 to 20 (inclusive) letters "a", then you can use this pattern:

A(10,20)

By default, quantifiers are greedy. Therefore, the quantifier + , meaning "one or more times", will match the maximum possible value. Sometimes this causes problems, and then you can tell the quantifier to stop being greedy (become "lazy") using a special modifier. Look at this code:

".*"

This pattern matches the text enclosed in double quotes. However, your original line might be something like this:

Hello World

The template above will find the following substring in this string:

"helloworld.htm" title="(!LANG:Hello World" !}

He was too greedy, capturing the largest piece of text he could.

".*?"

This pattern also matches any characters enclosed in double quotes. But the lazy version (notice the modifier?) looks for the smallest possible occurrence, and therefore finds each double-quoted substring individually:

"helloworld.htm" "Hello World"

Escaping in regular expressions

Regular expressions use certain characters to represent different parts of a pattern. However, there is a problem if you need to find one of these characters in a string as a regular character. A dot, for example, in a regular expression means "any character except a line break". If you need to find a period in a string, you can't just use " . » as a wildcard - this will find almost everything. So, you need to tell the parser that this dot should be treated as a regular dot and not "any character". This is done with an escape character.

An escape character preceding a character such as a dot causes the parser to ignore its function and treat it as a normal character. There are several characters that require such escaping in most templates and languages. You can find them in the lower right corner of the cheat sheet ("Meta Symbols").

The template for finding a point is:

\.

Other special characters in regular expressions match unusual elements in text. Line breaks and tabs, for example, can be typed on the keyboard, but are likely to confuse programming languages. The escape character is used here to tell the parser to treat the next character as a special character and not as a normal letter or number.

Escape characters in regular expressions

String substitution

String substitution is described in detail in the next section "Groups and ranges", but the existence of "passive" groups should be mentioned here. These are the groups that are ignored when substituting, which is very useful if you want to use the "or" condition in the template, but do not want this group to take part in the substitution.

Groups and ranges

Groups and ranges are very, very useful. It's probably easier to start with ranges. They allow you to specify a set of suitable characters. For example, to check if a string contains hexadecimal digits (0 to 9 and A to F), use the following range:

To test the opposite, use a negative range, which in our case matches any character except numbers from 0 to 9 and letters from A to F:

[^A-Fa-f0-9]

Groups are most often used when an "or" condition is needed in a template; when you need to refer to a part of a template from another part of it; as well as when substituting strings.

Using "or" is very simple: the following pattern looks for "ab" or "bc":

If you need to refer to any of the preceding groups in the regular expression, you should use \n , where instead of n, substitute the number of the desired group. You may want a pattern that matches the letters "aaa" or "bbb" followed by a number and then the same three letters. This pattern is implemented using groups:

(aaa|bbb)+\1

The first part of the pattern searches for "aaa" or "bbb", combining the found letters into a group. This is followed by a search for one or more digits (+), and finally \1 . The last part of the template refers to the first group and looks for the same. It looks for a match with the text already found by the first part of the pattern, rather than matching it. So "aaa123bbb" will not match the pattern above, as \1 will look for "aaa" after the number.

One of the most useful tools in regular expressions is string substitution. When replacing text, you can refer to the found group using $n . Let's say you want to make all the words "wish" bold in your text. To do this, you should use the regular expression replace function, which might look like this:

Replace(pattern, replacement, subject)

The first parameter will be something like this (you may need a few extra characters for this particular function):

([^A-Za-z0-9])(wish)([^A-Za-z0-9])

It will find any occurrence of the word "wish" along with the previous and next characters, unless they are letters or numbers. Then your substitution could be like this:

$1$2$3

It will replace the entire string found by the pattern. We start the replacement at the first character found (that is not a letter or a number), marking it with $1 . Without it, we would simply remove this character from the text. The same goes for the end of the substitution ($3). In the middle we added HTML tag for bold (of course, you can use CSS instead or ), highlighting the second group found by the pattern ($2).

Template modifiers

Template modifiers are used in several languages, notably Perl. They allow you to change how the parser works. For example, the i modifier causes the parser to ignore case.

Regular expressions in Perl are framed by the same character at the beginning and at the end. It can be any character (more commonly "/") and looks like this:

/pattern/

Modifiers are added to the end of this line, like this:

/pattern/i

Meta characters

Finally, the last part of the table contains meta characters. These are characters that have special meaning in regular expressions. So if you want to use one of them as a regular character, then it needs to be escaped. To check for the presence of a parenthesis in a text, the following pattern is used:

The cheat sheet is a general guide to regular expression patterns without regard to the specifics of any language. It is presented in the form of a table that fits on one printed A4 sheet. Created under a Creative Commons license based on a cheat sheet by Dave Child. Download in PDF, PNG.


Internet