8.8 regexp — Specification
Reference material. Method matrix, syntax tables, and the formal guarantees of the RE2-based engine. For prose explanations see senior.md; for production patterns see professional.md.
1. Construction
| Function | Returns | Behavior on bad pattern |
Compile(expr string) | (*Regexp, error) | Returns error |
MustCompile(expr string) | *Regexp | Panics |
CompilePOSIX(expr string) | (*Regexp, error) | Returns error; rejects Perl features |
MustCompilePOSIX(expr string) | *Regexp | Panics; rejects Perl features |
CompilePOSIX enables leftmost-longest semantics. (*Regexp).Longest() is the runtime equivalent on a Compile-built regex.
2. Method matrix
For an input of type T (either string or []byte):
| Operation | bool | first match | all matches | submatch | submatch all |
| string in | MatchString | FindString | FindAllString | FindStringSubmatch | FindAllStringSubmatch |
| []byte in | Match | Find | FindAll | FindSubmatch | FindAllSubmatch |
| io.RuneReader in | MatchReader | — | — | — | — |
| Index variants | — | FindStringIndex / FindIndex | FindAllStringIndex / FindAllIndex | FindStringSubmatchIndex / FindSubmatchIndex | FindAllStringSubmatchIndex / FindAllSubmatchIndex |
Replace operations (string and []byte forms):
| Operation | Form |
| Replace by literal | ReplaceAllString(src, repl) / ReplaceAll(src, repl) |
Replace by literal (no $ interpretation) | ReplaceAllLiteralString / ReplaceAllLiteral |
| Replace by callback (whole match) | ReplaceAllStringFunc(src, fn) / ReplaceAllFunc(src, fn) |
| Apply replacement template to one match | ExpandString(dst, tpl, src, match) / Expand(dst, tpl, src, match) |
Splitting:
| Method | Returns |
Split(s string, n int) | []string; n < 0 means no cap |
Pattern introspection:
| Method | Purpose |
String() | The source pattern as compiled |
NumSubexp() | Count of capturing groups |
SubexpNames() | Slice of names (index 0 always "") |
SubexpIndex(name) | Numeric index for a name, or -1 |
LiteralPrefix() | (prefix, complete) — guaranteed literal start |
Longest() | Switch to leftmost-longest semantics (mutating) |
Copy() | Deprecated since Go 1.6; do not use |
3. FindAll* cap argument
n | Meaning |
n < 0 | All non-overlapping matches |
n >= 0 | At most n matches; stop early |
n == 0 | Returns nil (zero matches requested) |
4. Submatch index encoding
FindSubmatchIndex(src) returns a []int of length 2*(NumSubexp()+1):
| Index pair | Meaning |
[0:2] | Start/end of whole match |
[2:4] | Start/end of capture group 1 |
[4:6] | Start/end of capture group 2 |
| ... | ... |
A capture that did not participate is encoded as -1, -1. Always check >= 0 before slicing.
5. Replacement-string syntax
| Syntax | Meaning |
$0 | Whole match |
$N (N >= 1) | Capture group N (greedy: largest N that names a real group) |
${N} | Capture group N (explicit boundary) |
${name} | Named capture |
$$ | Literal $ |
$Nabc | If Nabc is not a name, equivalent to ${N}abc only if N is parseable |
ReplaceAllLiteralString skips this interpretation entirely.
6. Pattern syntax — characters and escapes
| Syntax | Matches |
| literal char | itself |
. | any char except \n (any char including \n with (?s)) |
[xyz] | any of x, y, z |
[^xyz] | not in set |
[a-z] | range |
\d | [0-9] |
\D | [^0-9] |
\s | [\t\n\f\r ] |
\S | [^\t\n\f\r ] |
\w | [0-9A-Za-z_] |
\W | [^0-9A-Za-z_] |
\pX / \PX | Unicode property X / not X (one-letter) |
\p{Name} / \P{Name} | Unicode property by full name |
\xFF | byte by hex |
\x{10FFFF} | rune by hex |
\Q...\E | literal text (everything between is literal) |
\\, \., \*, ... | metacharacters as literals |
7. Pattern syntax — operators
| Syntax | Meaning |
xy | concatenation |
x|y | alternation |
x* | zero or more (greedy) |
x+ | one or more (greedy) |
x? | zero or one (greedy) |
x{n} | exactly n |
x{n,} | n or more |
x{n,m} | between n and m inclusive |
x*?, x+?, x??, x{n,m}? | non-greedy variants |
(re) | numbered capturing group |
(?P<name>re) | named capturing group |
(?:re) | non-capturing group |
(?flags) | set flags from this point |
(?flags:re) | set flags scoped to this group |
Flags (any combination, prefix with - to clear):
| Flag | Effect |
i | case-insensitive |
m | multi-line: ^ $ match at \n boundaries |
s | let . match \n |
U | swap meaning of x* and x*? |
8. Anchors
| Anchor | Meaning |
^ | beginning of text (or line with (?m)) |
$ | end of text (or line with (?m)) |
\A | beginning of text (always) |
\z | end of text (always) |
\b | word boundary |
\B | not a word boundary |
9. Unicode property classes
The set is the Unicode 13.0+ general categories and scripts:
| Class | Examples |
| General categories | \p{L}, \p{Ll}, \p{Lu}, \p{N}, \p{Nd}, \p{P}, \p{S}, \p{Z}, \p{C}, \p{M} |
| Scripts | \p{Latin}, \p{Cyrillic}, \p{Greek}, \p{Han}, \p{Arabic}, \p{Hiragana}, \p{Katakana} |
\PX is the negation of \pX. The full list is in regexp/syntax/perl_groups.go (POSIX classes) and regexp/syntax/unicode_groups.go (Unicode categories).
10. Match-time guarantees
| Property | Guarantee |
| Time complexity | O(input × pattern), regardless of pattern shape |
| Space complexity | O(pattern) for match state |
| Backtracking | None (NFA simulation) |
| ReDoS | Impossible — there is no input that produces super-linear time |
| UTF-8 handling | Patterns and inputs are UTF-8; . matches one rune |
| Determinism | Same *Regexp + same input always returns the same match |
11. Concurrency
| Method | Concurrent calls on same *Regexp |
Match*, Find*, ReplaceAll*, Split | Safe |
LiteralPrefix, NumSubexp, SubexpNames, SubexpIndex, String | Safe |
Longest | Not safe to call concurrently with matches |
Copy | Deprecated; do not use |
12. Sentinel errors
| Error | Source | Meaning |
*regexp.Error (syntax.Error) | Compile | Pattern syntax invalid |
regexp/syntax.Error{Code: ErrInvalidEscape} | Compile | Bad \ escape |
regexp/syntax.Error{Code: ErrInvalidCharClass} | Compile | Bad character class |
regexp/syntax.Error{Code: ErrInvalidPerlOp} | Compile | Used unsupported Perl feature (lookaround, backref) |
regexp/syntax.Error{Code: ErrInvalidRepeatOp} | Compile | Bad quantifier (x{,}, x{5,2}) |
regexp/syntax.Error{Code: ErrInvalidRepeatSize} | Compile | Repeat counts too large |
The Error.Code is comparable; the Error.Expr field is the substring that triggered the error.
13. regexp/syntax — pattern AST
| Type | Purpose |
Regexp | Parsed expression node (the AST form, not the compiled *regexp.Regexp) |
Op (uint8) | Operator type (OpLiteral, OpCharClass, OpStar, ...) |
Prog | Compiled NFA program |
Inst | One instruction in the program |
Flags (uint16) | Parser flags |
| Function | Purpose |
Parse(s, flags) | Parse to AST |
(*Regexp).String() | Canonical string form |
(*Regexp).Simplify() | AST normalization |
Compile(*Regexp) | AST to Prog |
IsWordChar(r) | Whether r is in \w |
| Parser flag | Meaning |
FoldCase | (?i) |
Literal | Pattern is a literal string |
ClassNL | Allow class to match newline |
DotNL | (?s) |
OneLine | ^ and $ are absolute (not affected by (?m)) |
NonGreedy | Default to non-greedy |
PerlX | Allow Perl extensions |
UnicodeGroups | Allow \p{Name} |
WasDollar | Internal — $ was at end of pattern |
Simple | Pattern is "simple" (compiled differently) |
MatchNL | Shorthand for ClassNL | DotNL |
Perl | ClassNL | OneLine | PerlX | UnicodeGroups |
POSIX | 0 (no extensions) |
regexp.Compile uses Perl. regexp.CompilePOSIX uses POSIX.
14. POSIX vs default differences
| Aspect | Default (Compile) | POSIX (CompilePOSIX) |
| Match selection | Leftmost-first | Leftmost-longest |
| Allowed syntax | All Perl extensions | POSIX ERE only |
\d, \w, \s | Allowed | Rejected |
(?i), (?m), (?s) | Allowed | Rejected |
(?P<name>...) | Allowed | Rejected |
(?:...) | Allowed | Rejected |
| Backreferences | Rejected | Rejected |
| Lookaround | Rejected | Rejected |
15. Empty-match semantics
| Situation | Behavior |
Find*All with a pattern that can match empty | Each empty match advances the cursor by 1 to avoid infinite loop |
Split with empty pattern | Splits between every UTF-8 rune |
ReplaceAll* of empty matches | Replaces each, including the synthesized empty matches |
16. Allocation profile of common methods
| Method | Allocations |
MatchString(s) | 0 (in fast path) |
Match(b) | 0 (in fast path) |
MatchReader(r) | 0 (depends on reader) |
FindString(s) | 0-1 (a sub-string is a sub-slice on the same backing array) |
Find(b) | 0 (returns a sub-slice) |
FindStringIndex(s) | 1 (the []int{start,end}) |
FindIndex(b) | 1 |
FindStringSubmatch(s) | 1 + N (N = NumSubexp) |
FindSubmatch(b) | 1 + N |
FindAllString(s, -1) | 1 + matches |
FindAllStringSubmatchIndex(s, -1) | 1 + matches |
ReplaceAllString(s, repl) | 1-2 (one for the result, one if repl has refs) |
The package internally maintains a sync.Pool of match-state buffers, so the per-call state is amortized to zero across many calls.
17. Regexp cost in pattern complexity
The compile cost is roughly O(pattern_length²) worst case — bounded because the package limits pattern size to syntax.ErrInternalError levels well before practical patterns hit pathological cost.
The match-time program size grows linearly with pattern length, with small constants. A typical pattern compiles to 10-50 instructions; a large alternation can produce thousands.
18. Cross-references in this leaf