Regular Expressions

Learn how to use regular expressions to search, match, and manipulate text.


Alternation (OR Operator) in Python Regex

Explanation of Alternation (OR Operator)

In regular expressions, the alternation operator, represented by the pipe symbol (|), acts as a logical OR. It allows you to match one of several possible patterns. The regex engine will try to match the patterns from left to right, and stop at the first successful match.

Think of it as saying: "Match this OR that OR something else." The power of alternation comes from its flexibility in defining multiple possibilities within a single regex.

Exploring the Alternation Operator with Python's re Module

The Python re module provides robust support for regular expressions, including the alternation operator. Here are some examples demonstrating its usage:

Example 1: Matching Different Words

Let's say you want to match either "cat", "dog", or "bird" in a given string.

import re text = "I have a cat, a dog, and a bird." pattern = r"cat|dog|bird" # Matches 'cat' OR 'dog' OR 'bird' matches = re.findall(pattern, text) print(matches) # Output: ['cat', 'dog', 'bird']

In this example, re.findall() finds all occurrences of "cat", "dog", or "bird" in the string.

Example 2: Matching Different Number Formats

Suppose you need to match a phone number, which could be in the format "123-456-7890" or "(123) 456-7890".

import re text = "My phone number is 123-456-7890 or (123) 456-7890." pattern = r"\d{3}-\d{3}-\d{4}|\(\d{3}\) \d{3}-\d{4}" # Matches either format matches = re.findall(pattern, text) print(matches) # Output: ['123-456-7890', '(123) 456-7890']

Here, the pattern matches either three digits followed by a hyphen, then three digits followed by a hyphen, and finally four digits, OR it matches an opening parenthesis, three digits, a closing parenthesis, a space, three digits, a hyphen, and four digits.

Example 3: Combining Alternation with Character Classes

You can combine alternation with other regex features like character classes to create even more flexible patterns. For example, matching "color" or "colour":

import re text = "I like the color blue. The colour is important." pattern = r"colou?r" # Matches 'color' OR 'colour' matches = re.findall(pattern, text) print(matches) # Output: ['color', 'colour']

In this case, u? means "zero or one occurrence of the character 'u'", effectively matching both spellings.

Example 4: Alternation with Groups

Alternation can be used within groups, allowing you to capture specific parts of the matched alternatives.

import re text = "The file is named report.pdf or document.docx" pattern = r"(\w+)\.(pdf|docx)" matches = re.findall(pattern, text) print(matches) # Output: [('report', 'pdf'), ('document', 'docx')]

This example captures the filename (without the extension) in group 1 and the file extension (either 'pdf' or 'docx') in group 2.

Key Considerations

  • Order matters: The regex engine tries to match the alternatives from left to right. Place the most specific or most common alternatives earlier in the pattern.
  • Parentheses for clarity: Use parentheses to group parts of the pattern, especially when combining alternation with other operators. This improves readability and ensures the pattern behaves as intended.
  • Escaping special characters: Remember to escape special characters like parentheses (, ), and the pipe symbol | itself with a backslash \ if you want to match them literally.