How to understand the search pattern that regex defines?
Summary
Regex stands for Regular Expression. It’s written in plain English, and you can search through large amount of data by using characters.
In the following table of contents you can go and learn more about a specific branch of regex.
Table of Contents
- Components
- Anchors
- Recursive
- Quantifiers
- OR Operator
- Character Classes
- Flags
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
- Boundaries
- Back-references
- Look-ahead and Look-behind
- Author
Components
Single Characters
All from the keyboard.
except
`(^ . [ $ ( ) | * + ? { \)`
// this are operator characters //
Wild Card
The period by itself .
It matches everything, but \n (newline).
Bracket Expressions
Represents a character set.
It's an exact target string. [' and ']
Control characters
A backslash, \,
followed by one of the characters
`a, b, f, n, r, t, v`
Escape character sets
Special sequences set.
\d is a digit character: [0-9]
\w is word (program identifier) character:
[A-Za-z0-9_]
Anchors
Special sequences that match an empty substring:
^ matches at the beginning of the target string
$ matches at the end of the target string
\b = word boundary.
Recursive
Any regular expression surrounded by parentheses is an atom:
( regular_expression )
Quantifiers
To generate unbounded matching possibilities and other matching amount specifications.
An atom can optionally be followed by one of these quantifiers:
* represents 0 or more occurrences of the atom
+ represents 1 or more occurrence of the atom
? represents 0 or 1 occurrences of the atom
{n} represents n ocurrences of the atom
{m,n} represents m and n of the atom
OR Operator characters:
(^ . [ $ ( ) | * + ? { \)
Character Classes
This defines the type of character.
Flags
Consists of a pattern and optional flags.
regexp = new RegExp("pattern", "flags");
Grouping and Capturing
A way to treat multiple characters as a single unit.
(regex)
// Allowing to apply regex operators to the entire regex group.
Bracket Expressions
s a list of characters enclosed by:
[ ' and ' ]
Greedy and Lazy Match
'Greedy' means match longest possible string.
'Lazy' means match shortest possible string.
Boundaries
Is a position between
\w and \W (non-word char)
At the beginning or the end of a string if it begins or ends (respectively) with a word character.
Back-references
A backreference in regex identifies a previously matched group and looks for exactly the same text again.
Look-ahead and Look-behind
Also known as “lookaround”
(?!) - negative lookahead
(?=) - positive lookahead
(?<=) - positive lookbehind
(?<!) - negative lookbehind
(?>) - atomic group
bar(?=bar) finds the 1st bar ("bar" which has "bar" after it)
bar(?!bar) finds the 2nd bar ("bar" which does not have "bar" after it)
(?<=foo)bar finds the 1st bar ("bar" which has "foo" before it)
(?<!foo)bar finds the 2nd bar ("bar" which does not have "foo" before it)