Regular Expressions

Bellow is a list of all the metacharacters that Ruby supports.

…is the rule.

Back References

The regular expression \1 \2 … \n is a back reference. It matches the character string matched in the nth parentheses (Regular Expression ( ) Grouping).

/((foo)bar)\1\2/

is the same as:

/((foo)bar)foobarfoo/

Example:

re = /(foo|bar|baz)\1/
p re =‾ 'foofoo'   # => 0
p re =‾ 'barbar'   # => 0
p re =‾ 'bazbaz'   # => 0
p re =‾ 'foobar'   # => nil

The parentheses in use must be more to the left than the back reference.

If there is a back reference in the parentheses in use the match will consistently fail. Also, the match will consistently fail when a single digit back reference has no parenthesis too.

p /(\1)/ =‾ "foofoofoo" # => nil
p /(foo)\2/ =‾ "foo\2"  # => nil

While one can specify a back reference greater then 2 digits, one must be carefull not to confuse it with \nnn (characters corresponding to the octal nnn) of backslash notation. If a numeric value is 1 digit, it is a back reference. When establishing more then 2 digits, it will be perceived as 8-bit code if parentheses are not used.

Also, when working with regular expressions it is necessary to start with 0 (such as \01, etc.) when using 1 bit code in 8 bit. (There is no back reference \0 so it isn't unclear.)

p   /\1/ =‾ "\1"   # => nil     # back reference that doesn’t use parentheses.
p  /\01/ =‾ "\1"   # => 0       8 bit code
p  /\11/ =‾ "\11"  # => 0       8 bit code

# 8 bit code (because there are no parentheses in use)
p /(.)\10/ =‾ "1\10" # => 0

# back reference (because there are parentheses in use)
p /((((((((((.))))))))))\10/ =‾ "aa"  # => 0

# 8 bit code (However because there is no such
# \08 "\0" + "8" 8 bit code)
p /(.)\08/ =‾ "1\0008" # => 0

#If you want to write numbers following a back reference 
#you have to use parentheses to group them and split them up.
p /(.)(\1)1/ =‾ "111"   # => 0

Character Class

Regular class [] is a set character class. One character listed inside the [] will be matched.

For example, for /[abc]/ one of "a", "b" or "c" will be matched. You can also write character strings using the "-" when characters follow the ASCII code order like this: /[a-c]/. Also, if the first character is a ^ character from outside of the set character string will be matched.

Any ‘^' not at the beginning will be matched with that character. Also, any "-" at the front or end of a line will be matched with that character.

p /[a^]/ =‾ "^"   # => 0
p /[-a]/ =‾ "-"   # => 0
p /[a-]/ =‾ "-"   # => 0
p /[-]/ =‾ "-"   # => 0

A blank character class will result in an error.

p /[]/ =‾ ""
p /[^]/ =‾ "^"
# => invalid regular expression; empty character class: /[^]/

The "]" at the front of a line (or directly after a NOT "^") doesn't mean that the character class is over but is just a simple "]". It is recommended that this kind of "]" performs a backslash escape.

p /[]]/ =‾ "]"       # => 0
p /[^]]/ =‾ "]"      # => nil

"^", "-", "]" and "\\" (backslash) can do a backslash escape and make a match with that character.

p /[\^]/ =‾ "^"   # => 0
p /[\-]/ =‾ "-"   # => 0
p /[\]]/ =‾ "]"   # => 0
p /[\\]/ =‾ "\\"  # => 0

Inside the [] you can use character string and the same backslash notation, and also the regular expressions \w, \W, \s, \S, \d, \D (these are short-hand for the character class).

Please pay attention to to the fact that the character classes below can make a match with a line feed character too according to the negation (the same is true with regular expressions \W and \D.)

p /[^a-z]/ =‾ "\n"    # => 0
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)