Regular Expressions
Learn how to use regular expressions to search, match, and manipulate text.
Regular Expressions: Quantifiers and Repetition
Regular expressions are powerful tools for pattern matching in text. A key aspect of their power lies in their ability to specify how many times a character, character class, or group should be repeated. This is achieved using quantifiers.
What are Quantifiers?
Quantifiers in regular expressions allow you to specify the number of times a preceding element (character, group, character class) can occur in the text you are searching. They define repetition patterns, making your regex more flexible and precise.
Common Quantifiers
Here are the most common quantifiers:
*
: Zero or more occurrences.+
: One or more occurrences.?
: Zero or one occurrence.{n}
: Exactlyn
occurrences.{n,}
:n
or more occurrences.{n,m}
: Betweenn
andm
occurrences (inclusive).
Using Quantifiers in Python
The re
module in Python provides support for regular expressions. Here are examples demonstrating how to use the quantifiers with re.search
and re.findall
:
Example 1: Zero or More (*
)
This example searches for the pattern "ab*" which means "a" followed by zero or more "b"s.
import re
text = "ac, abc, abbc, accc"
pattern = "ab*"
matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'abb', 'a']
match = re.search(pattern, text) #returns the first match object
if match:
print(match.group(0)) #prints the first matching string: a
else:
print("No match found")
Example 2: One or More (+
)
This example searches for the pattern "ab+" which means "a" followed by one or more "b"s.
import re
text = "ac, abc, abbc, accc"
pattern = "ab+"
matches = re.findall(pattern, text)
print(matches) # Output: ['ab', 'abb']
match = re.search(pattern, text) #returns the first match object
if match:
print(match.group(0)) #prints the first matching string: ab
else:
print("No match found")
Example 3: Zero or One (?
)
This example searches for the pattern "ab?" which means "a" followed by zero or one "b"s.
import re
text = "ac, abc, abbc, accc"
pattern = "ab?"
matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'ab', 'a']
match = re.search(pattern, text) #returns the first match object
if match:
print(match.group(0)) #prints the first matching string: a
else:
print("No match found")
Example 4: Exactly n ({n}
)
This example searches for the pattern "ab{2}" which means "a" followed by exactly two "b"s.
import re
text = "ac, abc, abbc, abbbc, a"
pattern = "ab{2}"
matches = re.findall(pattern, text)
print(matches) # Output: ['abb']
match = re.search(pattern, text) #returns the first match object
if match:
print(match.group(0)) #prints the first matching string: abb
else:
print("No match found")
Example 5: n or More ({n,}
)
This example searches for the pattern "ab{2,}" which means "a" followed by two or more "b"s.
import re
text = "ac, abc, abbc, abbbc, a"
pattern = "ab{2,}"
matches = re.findall(pattern, text)
print(matches) # Output: ['abb', 'abbb']
match = re.search(pattern, text) #returns the first match object
if match:
print(match.group(0)) #prints the first matching string: abb
else:
print("No match found")
Example 6: Between n and m ({n,m}
)
This example searches for the pattern "ab{1,2}" which means "a" followed by between one and two "b"s.
import re
text = "ac, abc, abbc, abbbc, a"
pattern = "ab{1,2}"
matches = re.findall(pattern, text)
print(matches) # Output: ['ab', 'abb']
match = re.search(pattern, text) #returns the first match object
if match:
print(match.group(0)) #prints the first matching string: ab
else:
print("No match found")
Greedy vs. Lazy Quantifiers
By default, quantifiers are greedy, meaning they try to match as much of the text as possible. You can make a quantifier lazy (or non-greedy) by adding a ?
after it. For example, .*?
will match as few characters as possible to satisfy the pattern.
Example 7: Greedy vs Lazy Quantifiers
Demonstrates the difference between greedy and lazy matching with .* and .*?
import re
text = "<p>This is some text</p><p>This is more text</p>"
# Greedy
pattern_greedy = "<p>.*</p>"
matches_greedy = re.findall(pattern_greedy, text)
print("Greedy:", matches_greedy) # Output: ['This is some text
This is more text
']
# Lazy
pattern_lazy = "<p>.*?</p>"
matches_lazy = re.findall(pattern_lazy, text)
print("Lazy:", matches_lazy) # Output: ['This is some text
', 'This is more text
']
Conclusion
Quantifiers are essential for writing effective and flexible regular expressions. Understanding how to use them allows you to specify complex repetition patterns and extract the exact information you need from text. Remember to consider whether you need greedy or lazy matching for your specific use case.