Regular Expressions
Learn how to use regular expressions to search, match, and manipulate text.
Alternation (OR Operator) in Python Regex
Explanation of Alternation (OR Operator)
In regular expressions, the alternation operator, represented by the pipe symbol (|
), acts as a logical OR. It allows you to match one of several possible patterns. The regex engine will try to match the patterns from left to right, and stop at the first successful match.
Think of it as saying: "Match this OR that OR something else." The power of alternation comes from its flexibility in defining multiple possibilities within a single regex.
Exploring the Alternation Operator with Python's re
Module
The Python re
module provides robust support for regular expressions, including the alternation operator. Here are some examples demonstrating its usage:
Example 1: Matching Different Words
Let's say you want to match either "cat", "dog", or "bird" in a given string.
import re
text = "I have a cat, a dog, and a bird."
pattern = r"cat|dog|bird" # Matches 'cat' OR 'dog' OR 'bird'
matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'dog', 'bird']
In this example, re.findall()
finds all occurrences of "cat", "dog", or "bird" in the string.
Example 2: Matching Different Number Formats
Suppose you need to match a phone number, which could be in the format "123-456-7890" or "(123) 456-7890".
import re
text = "My phone number is 123-456-7890 or (123) 456-7890."
pattern = r"\d{3}-\d{3}-\d{4}|\(\d{3}\) \d{3}-\d{4}" # Matches either format
matches = re.findall(pattern, text)
print(matches) # Output: ['123-456-7890', '(123) 456-7890']
Here, the pattern matches either three digits followed by a hyphen, then three digits followed by a hyphen, and finally four digits, OR it matches an opening parenthesis, three digits, a closing parenthesis, a space, three digits, a hyphen, and four digits.
Example 3: Combining Alternation with Character Classes
You can combine alternation with other regex features like character classes to create even more flexible patterns. For example, matching "color" or "colour":
import re
text = "I like the color blue. The colour is important."
pattern = r"colou?r" # Matches 'color' OR 'colour'
matches = re.findall(pattern, text)
print(matches) # Output: ['color', 'colour']
In this case, u?
means "zero or one occurrence of the character 'u'", effectively matching both spellings.
Example 4: Alternation with Groups
Alternation can be used within groups, allowing you to capture specific parts of the matched alternatives.
import re
text = "The file is named report.pdf or document.docx"
pattern = r"(\w+)\.(pdf|docx)"
matches = re.findall(pattern, text)
print(matches) # Output: [('report', 'pdf'), ('document', 'docx')]
This example captures the filename (without the extension) in group 1 and the file extension (either 'pdf' or 'docx') in group 2.
Key Considerations
- Order matters: The regex engine tries to match the alternatives from left to right. Place the most specific or most common alternatives earlier in the pattern.
- Parentheses for clarity: Use parentheses to group parts of the pattern, especially when combining alternation with other operators. This improves readability and ensures the pattern behaves as intended.
- Escaping special characters: Remember to escape special characters like parentheses
(
,)
, and the pipe symbol|
itself with a backslash\
if you want to match them literally.