Regular Expressions
Intro
- This tutorial covers JS Regular Expressions.
- It is recommended that you follow along with each section below. Paste or type the code into your own JavaScript file and view the variable or expression values either with console.logs or using the VS Code Debug sidebar.
- The example code for the entire tutorial series is at github.com/LearnByCheating/javascript-tutorial. There is a separate file or folder for each tutorial, including the To-Do List example project. The Readme file has instructions on how to run the code.
1. Create a Regular Expression
- A regular expression is used for matching text with a pattern.
- Patterns give you more flexibility then having to match the text exactly.
- Use it to test if certain text is present, or to find and replace text.
- "Regular expression" is often abbreviated as "regex".
- There are two ways to create a regular expression.
- The preferred way is to use literal notation. Put the pattern between two forward slashes. Optional flags go after the second slash.
/pattern/[flags];
- The simplest type of pattern is a sequence characters. For instance if you are searching for the text "findme", the pattern would be
/findme/
.
let re = /findme/;
- The second way to create a regular expression is to call the constructor function with the new operator. It has two parameters: the pattern and optional flags.
- The pattern can either be a string, or a regex literal.
new RegExp('exact text'[, 'flags']); new RegExp(/pattern/[, 'flags']);
Examples:
let re2 = new RegExp('findme');
let re3 = new RegExp(/findme/);
- The constructor function is useful if you need to pass in a variable as the input.
let input = "findme";
let re4 = new RegExp(input);
2. The RegExp built-in object
- RegExp is a JavaScript standard built-in object. RegExp is a global object since it is attached to the global object (window in the browser or global in Node.js).
RegExp === window.RegExp; // returns true in the browser environment
RegExp === global.RegExp; // returns true in the Node.js environment
- The RegExp global object is a constructor function that creates regular expression objects. It has static and instance properties and methods.
- JavaScript regular expressions are objects, and instances of the RegExp constructor.
let re = /findme/;
typeof re; // returns 'object' re instanceof RegExp; // returns true re.constructor === RegExp; // returns true
- RegExp.prototype is the prototype of regular expressions and has some instance methods you can use on them like test and exec (covered below).
Object.getPrototypeOf(re); // RegExp.prototype
- The String prototype has some some instance methods that take regex objects as arguments, including: match, matchAll, search, replace, and split.
3. Flags
- There are a number of regex special characters that fall into different categories including: flags, character classes, groups, boundaries, and quantifiers. We will start with flags.
- Flags are single letters you place after the regex pattern. The most commonly used flags are g and i.
- In the example below the str variable contains a string. The replace method searches the string for the pattern in the first argument, and replaces it with the value in the second argument.
- The replace method searches for the first instance of the lower case "a" character, replaces it with the letter "x", and returns the string: "Replxce at least one letter A"
let str = 'Replace at least one letter A.';
let view = str.replace(/a/, 'x'); // Replxce at least one letter A.
Global "g":
- In the next example below, the replace method applies the global flag "g". That means it will replace to all matches. It returns: "Replxce xt lexst one letter A."
str = 'Replace at least one letter A.';
view = str.replace(/a/g, 'x'); // Replxce xt lexst one letter A.
Case Insensitive "i":
- In this next example, replace applies both the global flag "g" and the case insensitive flag "i", so that it applies to all matches of both lower case and upper case "a". It returns: "Replxce xt lexst one letter x."
str = 'Replace at least one letter A.';
view = str.replace(/a/gi, 'x'); // Replxce xt lexst one letter x.
Multiline Matching "m":
- The string below spans multiple lines.
str = `Multi-line matching.
At least one line starts with "a".`;
view = /^A/.test(str); // false
view = /^A/m.test(str); // true
- Test is a RegExp instance method. It searches the string for a pattern and returns true if found, false if not.
- We are using the test method above to search for an upper case A.
- The caret character ^ means that the pattern must be at the beginning of the string. Since the capital A is not at the beginning of the string it returns false.
- But when we apply the multiline m flag to the regex pattern, then it will search for capital A at the beginning of each line, not just the whole string. Since there is a capital A at the beginning of the second line, this returns true.
4. Character Classes
- Character classes distinguish different types of characters such as word characters, digits, and white space.
Any character ".":
- In a regular expression, a dot represents any character except a new line or line terminator.
- In the below example, the str variable is set to a string of miscellaneous characters.
- The replace method searches for the "." pattern with the global flag. It replaces all matches with a dash. It returns a string with every character replaced except new line "\n".
let str = 'Letters 0123, ^_.$%/]\n';
let view = str.replace(/./g, '-'); // returns '----------------------\n'
Word characters "\w":
- Backslash w is a special regex character that represents all word characters. It includes numbers, letters, and underscore: [A-Za-z0-9_]
- Applying it to the str variable replaces all letters, numbers, and the underscore with dashes.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/\w/g, '-'); // returns '------- ----, ^-.$%/]\n'
Non-word characters "\W":
- Backslash capital W represents all NON-word characters, so everything except numbers, letters, and underscore.
- Applying it to str replaces everything except the letters, numbers, and underscore with dashes.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/\W/g, '-'); // returns 'Letters-0123---_------'
Digits "\d":
- Backslash d represents digits 0 through 9.
- Applying it to str replaces the numbers with dashes.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/\d/g, '-'); // returns 'Letters ----, ^_.$%/]\n';
Non-digits "\D":
- Backslash capital D represents all non-digit characters.
- Below everything except the numbers are replaced with dashes.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/\D/g, '-'); // returns '--------0123----------';
Whitespace "\s":
- Backslash s represents whitespace characters. Whitespace includes spaces, tabs, and new line characters.
- Appying it to str replaces the space characters and the newline character at the end with dashes.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/\s/g, '-'); // returns 'Letters-0123,-^_.$%/]-';
Non-whitespace "\S":
- Backslash capital S represents all non-space characters which you can see in the example.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/\S/g, '-'); // returns '------- ----- -------\n';
Escape special characters "\":
- To match characters that happen to be regex special characters you must put the backslash escape character before them.
- Regex special characters:
/ \ . + - * ^ = ! | : ? $ ( ) [ ] { } < >
- In the below example we want to replace the dot character with a dash. Since dot is also a Regex special character, we need to escape it first with a backslash:
\.
- In the return string the dot is replaced by a dash.
- In the last statement we want to replace the forward slash with a dash, so we escape it with the backslash:
\/
- In the return string the forward slash is replaced by a dash.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/\./g, '-'); // returns 'Letters 0123, ^_-$%/]\n' view = str.replace(/\//g, '-'); // returns 'Letters 0123, ^_.$%-]\n'
- Square brackets in a regular expression are used to search for one of multiple characters.
- There are regex special characters specific to the square brackets including:
^ ] /
and-
when between characters. - In the first replace statement in the below we want to search for the characters "^" and "]" and replace them with a dash. Since these both have special meaning we need to escape them first with a backslash. In the return value the caret and closing bracket are replaced with a dash.
- In the last example we want to replace all the special characters except the newline at the end, with dashes. So we put them all in the square brackets, only escaping two of them. The result is a string with all the special characters replaced by dashes except the \n newline character.
str = 'Letters 0123, ^_.$%/]\n'; view = str.replace(/[\^\]]/g, '-'); // returns 'Letters 0123, -_.$%/-\n' view = str.replace(/[\^_.$%/\]]/g, '-'); // returns 'Letters 0123, -------\n'
5. Groups
- There are a few ways to group regex characters depending on the pattern you are trying to achieve.
Sequential Characters:
- If you want to match a set of one or more sequential characters just put them in your regex as is.
- In the example below we want to find the sequential characters "ers" and replace them with a dash so we just put them in the regex literal. The three characters are found as a group and replaced by a single dash.
let str = 'Letters 0123, _$%^/.'; let view = str.replace(/ers/, '-'); // returns 'Lett- 0123, _$%^/.'
Or "|":
- The OR symbol is a single vertical line. Use it to match either character sets.
- The below example matches 12 OR e. It doesn't have the global flag so it just finds the first match going left to right and replaces it with a dash.
- The next example includes the global flag so it finds all the matches and replaces them with dashes.
str = 'Letters 0123, _$%^/.';
view = str.replace(/12|e/, '-'); // returns 'L-tters 0123, _$%^/.'
view = str.replace(/12|e/g, '-'); // returns 'L-tt-rs 0-3, _$%^/.'
Square brackets "[]":
- Use square brackets to match any character between the brackets. It can be a range like letters a-z.
- The first example below uses square brackets looking for an upper case letter using range A-Z:
[A-Z]
. There's only one upper case letter, "L", so it gets replaced with a dash.' - The next example again looks for an upper case letter A-Z, or a digit character, or the dollar sign character:
[A-Z\d$]
. Those characters all get replaced by a dash. - The last example matches e, s, 2, or a comma:
[es2,]
, and replaces them with a dash.
str = 'Letters 0123, _$%^/.';
view = str.replace(/[A-Z]/g, '-'); // returns '-etters 0123 _$%^/. view = str.replace(/[A-Z\d$]/g, '-'); // returns '-etters ----- _-%^/.' view = str.replace(/[es2,]/g, '-'); // returns 'L-tt-r- 01-3- _$%^/.'
- If the first character in the square brackets is the caret symbol it will find any character not between the brackets. The below example finds all characters except e, s, and 2:
[^es2,]
and replaces them with dashes.
view = str.replace(/[^es2,]/g, '-'); // returns '-e--e-s---2-,-------'
6. Boundaries
String boundry "^" and "$":
Match zero or one "?":
Match one or more "+":
Match zero or more "*":
Match sequence "{x}":
RegExp.prototype.test(str)
RegExp.prototype.exec(str)
String.prototype.match(regex)
String.prototype.matchAll(regex)
String.prototype.search(regex)
String.prototype.replace(regexpOrSubstr, newSubstr|function)
String.prototype.split([separator[, limit]])
If you have the CheatSheet desktop app and downloaded the JavaScript CheatSheet, then go through the flashcards for this category to the point where you can answer them all in order and shuffled.
- There are special characters for adding boundaries to the regex pattern.
- ^str: The caret symbol means the pattern must match from the beginning of the string.
- str$: The dollar symbol means it must match from the end of the string.
- The below str variable is set to a string that contains "word" in it three times.
let str = 'word, word and word';
let view = str.replace(/^word/g, '-'); // returns '-, word and word';
view = str.replace(/word$/g, '-'); // returns 'word, word and -';
view = str.replace(/(^word$)/g, '-'); // returns 'word, word and word';
- The first replace method above uses the caret symbol so the pattern will only match the text at the start of the string. There is a match, and the pattern "word" at the beginning of the string is replaced by a single dash.
- The second regex uses the dollar symbol so it only matches text at the end of the string. The pattern "word" at the end is replaced by a single dash.
- The third regex uses both the caret and dollar symbols so the pattern must match the entire string from beginning to end. It doesn't match so there are no replacements.
Word boundry "\b":
Not a word boundry "\B":
- You can also set a boundary around a word by putting \b at the start or the end of the regular expression.
str = 'cat, category, cat, concat';
view = str.replace(/\bcat/g, '-'); // returns '-, -egory, -, concat';
view = str.replace(/cat\b/g, '-'); // returns '-, category, -, con-';
view = str.replace(/\bcat\b/g, '-'); // returns '-, category, -, concat';
- The first example above places \b at the beginning of the word cat:
/\bcat/
. So it will only match the characters if the word starts with cat. It replaces "cat" with a dash for the words "cat" and "category" but not "concat".
- The second example uses \b at the end of the word "cat":
/cat\b/
. So it will only match the characters if the word ends with "cat". It replaces "cat" with a dash for the words "cat" and "concat" but not "category".
- The third example surrounds the word "cat" with \b:
/\bcat\b/
. So only the full word "cat" is replaced with a dash, not "category" or "concat".
Not a word boundry "\B":
- Backslash capital B matches the pattern if it is NOT at the beginning or end of the word depending on where you put it.
str = 'cat, category, cat, concat';
view = str.replace(/\Bcat/g, '-'); // returns 'cat, category, cat, con-';
view = str.replace(/cat\B/g, '-'); // returns 'cat, -egory, cat, concat';
- The first example above replaces "cat" with a dash if the word does NOT start with "cat":
/\Bcat/
. So "cat" and "category" are not changed, but "concat" does replace "cat" with a dash.
- The last example replaces "cat" with a dash if the word does not END with cat:
/cat\B/
. So "cat" and "concat" are not changed, but "category" does replace "cat" with a dash.
7. Quantifiers
- Quantifiers are special regex characters that specify the number of characters or expressions to match.
Match zero or one "?":
- Placing a question mark after a character means you can match zero or one occurrence of that character. So it will match with or without that character.
- In the below example str is set to a string with the word "jump" that has different endings to the word.
- In the replace method the pattern has the word "jump" followed by "s?". That means it will search for matches to "jump" with our without an "s" at the end. So "jump" and "jumps". And replace all matches with a dash.
let str = 'jump jumps jumpsssss jumped';
let view = str.replace(/jumps?/g, '-'); // returns '- - -ssss -ed'
Match one or more "+":
- Placing the plus sign after a character will match one or more occurrences of the character.
- The below example matches any occurrence of jump with one or more s'es. So it replaces "jumps" and "jumpsssss" with a dash. It does not match "jump" or "jumped".
str = 'jump jumps jumpsssss jumped';
view = str.replace(/jumps+/g, '-'); // returns 'jump - - jumped'
Match zero or more "*":
- Placing an asterisk after a character will match zero, one, or more occurrences of that character.
- In the below example, placing an asterisk after "jumps" will match any occurrence of jump followed by zero or more s'es. So it replaces "jump", "jumps", "jumpsssss", and the jump portion of jumped, with a single dash.
str = 'jump jumps jumpsssss jumped';
view = str.replace(/jumps*/g, '-'); // returns '- - - -ed'
Match sequence "{x}":
- To match a specified sequence of characters follow the character by curly braces with the number of sequences inside.
- The str variable below is set to a series of numbers and x'es.
- Below that, the replace method searches the string for 3 x'es and replaces the matches with a dash.
str = '1x 2xx 3xxx 4xxxx 5xxxxx';
view = str.replace(/x{3}/g, '-'); // returns '1x 2xx 3- 4-x 5-xx'
- If you include a second number in the curly braces it will match somewhere between the first and second number of sequences. In the below example it will match from 3 to 4 x'es, and replace each match with a dash.
str = '1x 2xx 3xxx 4xxxx 5xxxxx'; view = str.replace(/x{3,4}/g, '-'); // returns '1x 2xx 3- 4- 5-x'
- If you put a comma after the first number but nothing after the comma, then it will match at least the first number of sequences and an unlimited maximum amount. The below example replaces 3 or more x'es with a dash.
str = '1x 2xx 3xxx 4xxxx 5xxxxx'; view = str.replace(/x{3,}/g, '-'); // returns '1x 2xx 3- 4- 5-'
- Below is a more practical example. The phone variable is set to a phone number string. The statement tests that phone is in the following format:
- Starts with 3 digits:
^\d{3}
- Followed by a dash, period, or space:
[-. ]
- Then 3 more digits:
\d{3}
- A dash, period, or space:
[-. ]
- And ends with 4 digits:
\d{4}$
let phone = '415.555.1234';
const isValid = /^\d{3}[-. ]\d{3}[-. ]\d{4}$/.test(phone); // returns true
- The below example puts valid phone numbers in the same format, with the numbers separated by dashes. The replace method searches the string for dots or spaces, and replaces them with dashes.
if (isValid) {
phone = phone.replace(/[. ]/g, '-'); // returns '415-555-1234'
}
8. $1-$9 Substring Matches
- You can group regular expressions into substrings using parentheses. Each parenthesized substring can be referenced by $1 through $9. Below is an example.
let str = 'John Smith';
let view = str.replace(/(\w+)\s(\w+)/, 'First name: $1, Last name: $2'); // returns "First name: John, Last name: Smith"
- The variable str is set to the string "John Smith".
- The regex pattern first looks for a contiguous group of word characters. This part of the pattern is grouped in parentheses so it becomes regex group 1.
- Then comes a space.
- Then another contiguous group of word characters, also in parentheses, so it becomes group 2.
- Then comes the replacement string. You can use $1 and $2 to references those grouped matches. John Smith is matched and replaced by "First name: John, Last name: Smith".
- The below example is similar. It finds the first and last names in the string, then displays them as
last name, first name
str = 'John Smith';
view = str.replace(/(\w+)\s(\w+)/, '$2, $1'); // 'Smith, John'
9. RegExp objects
- While Regex literals are preferred, you can create a regular expression using the RegExp constructor function, and pass in the pattern and flags as arguments.
- You can't use variables in regular expression literals so if you need to pass in a variable then you'll need to use the RegExp constructor.
- The below sets the str variable to "Hello Joey".
- The name variable is set to "Joey".
- To test whether the variable name is in the string we can use the test method. The test method must be applied to a regular expression instance. Variables cannot be used in a regex literal so we must use the RegExp constructor function.
let str = 'Hello Joey,';
const name = 'Joey'; let view = new RegExp(name).test(str); // returns true
- Let's use the RegExp constructor function again. The below example calls the replace method on the str string. The first argument instantiates a new RegExp object so we can pass in the variable name as the pattern. The second argument replaces the match it with "Johnny". The method returns "Hello Johhny".
view = str.replace(new RegExp(name), 'Johnny'); // returns 'Hello Johnny'
10. RegExp instance properties and methods
- The RegExp prototype object contains the following instance properties and methods:
Object.getOwnPropertyNames(RegExp.prototype); // Returns: constructor, exec, dotAll, flags, global, hasIndices, ignoreCase, multiline, source, sticky, unicode, compile, toString, test
- These can be called on a regular expression by chaining them to the regex.
Instance property
- The constructor property is a reference to the RegExp() constructor function:
/pattern/.constructor; // returns RegExp
Instance methods
- RegExp has two instance methods that can run regular expressions: test and exec.
RegExp.prototype.test(str)
- The test method tests if there is a match in a string, and returns true or false. We have already used this method in previous examples.
- The below example tests if
/smith/i
is in the str variable. - We chain the test method to the regular expression. The i flag makes it case insensitive. We pass in the str string to test against as the method's argument.
let str = 'John Smith';
let view = /smith/i.test(str); // returns true
- The exec method is similar to test except instead of returning true or false it returns an array of information on the first match, or null if there is no match.
- The array contains the matched characters, the index starting position of the match, the input string value, and any named captured groups.
str = 'John Smith';
view = /smith/i.exec(str); // ["Smith", index: 5, input: "John Smith", groups: undefined]
- The above example runs the exec method searching for "smith", case insensitive.
- It returns an array with the match string "Smith", the index starting point 5, the input string "John Smith" and groups undefined.
11. String Methods that use RegExp
- There are five string instance methods that use or can use regular expressions: match, matchAll, search, replace, and split.
String.prototype.match(regex)
- The string match method does essentially what the regex exec method does. It searches a string for a match against a regular expression. It returns an array object with match information for the first match. With the global flag it returns an array of all matched values. If no match it returns null.
- The str variable below is set to "Hello world".
- We call the match method on it to search for the pattern "world".
- It returns an array of information about the match including the matched text, the start index where it was found, the string input "Hello world", and groups undefined.
let str = 'Hello world';
let view = str.match(/world/);
// returns ["world", index: 6, input: "Hello world", groups: undefined]
String.prototype.matchAll(regex)
- The matchAll method returns an iterator, which is an array-like object, of all results matching a string against a regular expression. For each match, the iterator contains the same array of information as from the exec method.
- The below example searches the string with matchAll for the letter "l", and assigns the result to a variable named matches.
- The result is an iterator containing three matches.
- The Array.from method converts the iterator into an array.
str = 'Hello world';
const matches = str.matchAll('l'); // returns an array-like object:
// ["l", index: 2, input: "Hello world", groups: undefined]
// ["l", index: 3, input: "Hello world", groups: undefined]
// ["l", index: 9, input: "Hello world", groups: undefined]
view = Array.from(matches);
String.prototype.search(regex)
- The search method searches the string for a match with the regular expression. It returns the index position where the first occurrence of the regex occurs. It returns -1 if not found. The global option g is ignored.
- The below example searches for the pattern "world". The result is the index 6.
str = 'Hello world'; view = str.search(/world/); // returns 6
String.prototype.replace(regexpOrSubstr, newSubstr|function)
- We have used the replace method in many of the previous examples.
- The first argument is the search criteria. You can use either a string or a regular expression.
- The second argument is the replacement string, or a function that returns the replacement string.
- The return value is the original string with any replacements.
- The below example searches for "world" and replaces it with "planet".
str = 'Hello world';
view = str.replace(/world/, 'planet'); // returns 'Hello planet'
String.prototype.split([separator[, limit]])
- The split method splits a string on a separator and returns an array. The separator can be a string or a regular expression.
- The below example splits the "Hello world" string. The separator is either a space or comma
[ ,]
. It returns an array with elements for "Hello" and "world".
str = 'Hello world';
view = str.split(/[ ,]/); // returns ['Hello', 'world']
- And that concludes this tutorial on JavaScript regular expressions.
Conclusion
The topics in this tutorial correspond with the JavaScript CheatSheet Regular Expressions category. Make sure you understand each topic so you can refer back to the CheatSheet when working on your own projects.If you have the CheatSheet desktop app and downloaded the JavaScript CheatSheet, then go through the flashcards for this category to the point where you can answer them all in order and shuffled.