Regular Expressions

Learn how to use regular expressions to search, match, and manipulate text.


Regular Expressions: Quantifiers and Repetition

Regular expressions are powerful tools for pattern matching in text. A key aspect of their power lies in their ability to specify how many times a character, character class, or group should be repeated. This is achieved using quantifiers.

What are Quantifiers?

Quantifiers in regular expressions allow you to specify the number of times a preceding element (character, group, character class) can occur in the text you are searching. They define repetition patterns, making your regex more flexible and precise.

Common Quantifiers

Here are the most common quantifiers:

  • *: Zero or more occurrences.
  • +: One or more occurrences.
  • ?: Zero or one occurrence.
  • {n}: Exactly n occurrences.
  • {n,}: n or more occurrences.
  • {n,m}: Between n and m occurrences (inclusive).

Using Quantifiers in Python

The re module in Python provides support for regular expressions. Here are examples demonstrating how to use the quantifiers with re.search and re.findall:

Example 1: Zero or More (*)

This example searches for the pattern "ab*" which means "a" followed by zero or more "b"s.

 import re

text = "ac, abc, abbc, accc"
pattern = "ab*"

matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'abb', 'a']

match = re.search(pattern, text) #returns the first match object
if match:
    print(match.group(0)) #prints the first matching string: a
else:
    print("No match found") 

Example 2: One or More (+)

This example searches for the pattern "ab+" which means "a" followed by one or more "b"s.

 import re

text = "ac, abc, abbc, accc"
pattern = "ab+"

matches = re.findall(pattern, text)
print(matches) # Output: ['ab', 'abb']

match = re.search(pattern, text) #returns the first match object
if match:
    print(match.group(0)) #prints the first matching string: ab
else:
    print("No match found") 

Example 3: Zero or One (?)

This example searches for the pattern "ab?" which means "a" followed by zero or one "b"s.

 import re

text = "ac, abc, abbc, accc"
pattern = "ab?"

matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'ab', 'a']

match = re.search(pattern, text) #returns the first match object
if match:
    print(match.group(0)) #prints the first matching string: a
else:
    print("No match found") 

Example 4: Exactly n ({n})

This example searches for the pattern "ab{2}" which means "a" followed by exactly two "b"s.

 import re

text = "ac, abc, abbc, abbbc, a"
pattern = "ab{2}"

matches = re.findall(pattern, text)
print(matches) # Output: ['abb']

match = re.search(pattern, text) #returns the first match object
if match:
    print(match.group(0)) #prints the first matching string: abb
else:
    print("No match found") 

Example 5: n or More ({n,})

This example searches for the pattern "ab{2,}" which means "a" followed by two or more "b"s.

 import re

text = "ac, abc, abbc, abbbc, a"
pattern = "ab{2,}"

matches = re.findall(pattern, text)
print(matches) # Output: ['abb', 'abbb']

match = re.search(pattern, text) #returns the first match object
if match:
    print(match.group(0)) #prints the first matching string: abb
else:
    print("No match found") 

Example 6: Between n and m ({n,m})

This example searches for the pattern "ab{1,2}" which means "a" followed by between one and two "b"s.

 import re

text = "ac, abc, abbc, abbbc, a"
pattern = "ab{1,2}"

matches = re.findall(pattern, text)
print(matches) # Output: ['ab', 'abb']

match = re.search(pattern, text) #returns the first match object
if match:
    print(match.group(0)) #prints the first matching string: ab
else:
    print("No match found") 

Greedy vs. Lazy Quantifiers

By default, quantifiers are greedy, meaning they try to match as much of the text as possible. You can make a quantifier lazy (or non-greedy) by adding a ? after it. For example, .*? will match as few characters as possible to satisfy the pattern.

Example 7: Greedy vs Lazy Quantifiers

Demonstrates the difference between greedy and lazy matching with .* and .*?

 import re

text = "<p>This is some text</p><p>This is more text</p>"

# Greedy
pattern_greedy = "<p>.*</p>"
matches_greedy = re.findall(pattern_greedy, text)
print("Greedy:", matches_greedy)  # Output: ['

This is some text

This is more text

'] # Lazy pattern_lazy = "<p>.*?</p>" matches_lazy = re.findall(pattern_lazy, text) print("Lazy:", matches_lazy) # Output: ['

This is some text

', '

This is more text

']

Conclusion

Quantifiers are essential for writing effective and flexible regular expressions. Understanding how to use them allows you to specify complex repetition patterns and extract the exact information you need from text. Remember to consider whether you need greedy or lazy matching for your specific use case.