monotone documentation: Regexp Summary

7.4.1 Regexp Syntax Summary

This is a quick-reference summary of the regular expression syntax used in Monotone.

Quoting

\x: where x is non-alphanumeric is a literal x
\Q...\E: treat enclosed characters as literal

Characters

\a: alarm, that is, the BEL character (hex 07)
\cx: “control-x”, where x is any character
\e: escape (hex 1B)
\f: formfeed (hex 0C)
\n: newline (hex 0A)
\r: carriage return (hex 0D)
\t: tab (hex 09)
\ddd: character with octal code ddd, or backreference
\xhh: character with hex code hh
\x{hhh...}: character with hex code hhh...

Character Types

.: any character except newline; in dotall mode, any character whatsoever
\C: one byte, even in UTF-8 mode (best avoided)
\d: a decimal digit
\D: a character that is not a decimal digit
\h: a horizontal whitespace character
\H: a character that is not a horizontal whitespace character
\p{xx}: a character with the xx property
\P{xx}: a character without the xx property
\R: a newline sequence
\s: a whitespace character
\S: a character that is not a whitespace character
\v: a vertical whitespace character
\V: a character that is not a vertical whitespace character
\w: a “word” character
\W: a “non-word” character
\X: an extended Unicode sequence

‘\d’, ‘\D’, ‘\s’, ‘\S’, ‘\w’, and ‘\W’ recognize only ASCII characters.

Script names for ‘`\p`’ and ‘`\P`’

Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.

Character Classes

[...]: positive character class
[^...]: negative character class
[x-y]: range (can be used for hex characters)
[[:xxx:]]: positive POSIX named set
[[:^xxx:]]: negative POSIX named set
alnum: alphanumeric
alpha: alphabetic
ascii: 0-127
blank: space or tab
cntrl: control character
digit: decimal digit
graph: printing, excluding space
lower: lower case letter
print: printing, including space
punct: printing, excluding alphanumeric
space: whitespace
upper: upper case letter
word: same as ‘\w’
xdigit: hexadecimal digit

In PCRE, POSIX character set names recognize only ASCII characters. You can use ‘\Q...\E’ inside a character class.

Quantifiers

?: 0 or 1, greedy
?+: 0 or 1, possessive
??: 0 or 1, lazy
*: 0 or more, greedy
*+: 0 or more, possessive
*?: 0 or more, lazy
+: 1 or more, greedy
++: 1 or more, possessive
+?: 1 or more, lazy
{n}: exactly n
{n,m}: at least n, no more than m, greedy
{n,m}+: at least n, no more than m, possessive
{n,m}?: at least n, no more than m, lazy
{n,}: n or more, greedy
{n,}+: n or more, possessive
{n,}?: n or more, lazy

Anchors and Simple Assertions

\b: word boundary
\B: not a word boundary
^: start of subject also after internal newline in multiline mode
\A: start of subject
$: end of subject also before newline at end of subject also before internal newline in multiline mode
\Z: end of subject also before newline at end of subject
\z: end of subject
\G: first matching position in subject

Match Point Reset

\K: reset start of match

Alternation

expr|expr|expr...

Capturing

(...): capturing group
(?<name>...): named capturing group (like Perl)
(?'name'...): named capturing group (like Perl)
(?P<name>...): named capturing group (like Python)
(?:...): non-capturing group
(?|...): non-capturing group; reset group numbers for capturing groups in each alternative

Atomic Groups

(?>...): atomic, non-capturing group

Comment

(?#....): comment (not nestable)

Option Setting

(?i): caseless
(?J): allow duplicate names
(?m): multiline
(?s): single line (dotall)
(?U): default ungreedy (lazy)
(?x): extended (ignore white space)
(?-...): unset option(s)

Lookahead and Lookbehind Assertions

(?=...): positive look ahead
(?!...): negative look ahead
(?<=...): positive look behind
(?<!...): negative look behind

Each top-level branch of a look behind must be of a fixed length.

Backreferences

\n: reference by number (can be ambiguous)
\gn: reference by number
\g{n}: reference by number
\g{-n}: relative reference by number
\k<name>: reference by name (like Perl)
\k'name': reference by name (like Perl)
\g{name}: reference by name (like Perl)
\k{name}: reference by name (like .NET)
(?P=name): reference by name (like Python)

Subroutine References (possibly recursive)

(?R): recurse whole pattern
(?n): call subpattern by absolute number
(?+n): call subpattern by relative number
(?-n): call subpattern by relative number
(?&name): call subpattern by name (like Perl)
(?P>name): call subpattern by name (like Python)

Conditional Patterns

(?(condition)yes-pattern)
(?(condition)yes-pattern|no-pattern)
(?(n)...: absolute reference condition
(?(+n)...: relative reference condition
(?(-n)...: relative reference condition
(?(<name>)...: named reference condition (like Perl)
(?('name')...: named reference condition (like Perl)
(?(name)...: named reference condition (PCRE only)
(?(R)...: overall recursion condition
(?(Rn)...: specific group recursion condition
(?(R&name)...: specific recursion condition
(?(DEFINE)...: define subpattern for reference
(?(assert)...: assertion condition

Backtracking Control

The following act immediately they are reached:

(*ACCEPT): force successful match
(*FAIL): force backtrack; synonym ‘(*F)’

The following act only when a subsequent match failure causes a backtrack to reach them. They all force a match failure, but they differ in what happens afterwards. Those that advance the start-of-match point do so only if the pattern is not anchored.

(*COMMIT): overall failure, no advance of starting point
(*PRUNE): advance to next starting character
(*SKIP): advance start to current matching position
(*THEN): local failure, backtrack to next alternation

Newline Conventions

These are recognized only at the very start of the pattern or after a ‘(*BSR_...)’ option.

(*CR)
(*LF)
(*CRLF)
(*ANYCRLF)
(*ANY)

What ‘`\R`’ Matches

These are recognized only at the very start of the pattern or after a ‘(*...)’ option that sets the newline convention.

(*BSR_ANYCRLF)
(*BSR_UNICODE)

Quick Links: www.monotone.ca - Downloads - Documentation - Wiki - Code Forge - Build Status