String Algorithms: Pattern Matching in Java

Exploring various string algorithms for pattern matching, including brute force, KMP, and Boyer-Moore algorithms, implemented in Java.


String Matching in Competitive Programming (Java) with Solutions

Explanation: Practice Problems and Solutions

This section focuses on applying string matching algorithms to solve various competitive programming problems. We'll explore different problem types, learn how to identify when string matching techniques are applicable, and implement efficient Java solutions. Each problem will be accompanied by a clear explanation of the problem statement, the chosen algorithm, the reasoning behind the solution, and well-commented Java code. We emphasize understanding *why* a particular approach works, not just memorizing solutions.

Solving Various Competitive Programming Problems

We will tackle a range of problems involving string matching, including but not limited to:

  • Exact String Matching: Finding all occurrences of a pattern within a text. Algorithms covered will include:
    • Brute Force
    • Knuth-Morris-Pratt (KMP) Algorithm
    • Boyer-Moore Algorithm (Optional)
  • Approximate String Matching: Finding occurrences of a pattern within a text that are "close" to the pattern, allowing for errors. Algorithms considered (depending on complexity and relevance):
    • Levenshtein Distance (Edit Distance)
    • Dynamic Programming approaches
  • String Matching with Wildcards: Handling patterns that contain wildcard characters (e.g., '?' matching any single character, '*' matching any sequence of characters).
    • Backtracking (for simpler cases)
    • Dynamic Programming
  • Regular Expression Matching (Limited): Exploring basic regular expression matching concepts and techniques suitable for simpler patterns.
    • Using Java's java.util.regex package
  • Problems Involving String Manipulation and Pattern Recognition: Applying string matching concepts in combination with other string manipulation techniques to solve more complex problems.
    • Identifying palindromes, anagrams, or specific patterns within strings.
    • Using string matching as a building block for solving graph-related problems.

For each problem, we will provide:

  • Problem Statement: A clear and concise description of the problem.
  • Algorithm Explanation: A detailed explanation of the chosen algorithm and why it's suitable for the problem.
  • Java Code: A well-commented Java implementation of the algorithm.
  • Time and Space Complexity Analysis: An analysis of the algorithm's time and space complexity.
  • Example Test Cases: A set of test cases with expected outputs to verify the correctness of the solution.

Example Problem 1: Exact String Matching using KMP

Problem Statement

Given a text text and a pattern pattern, find all occurrences of the pattern within the text using the Knuth-Morris-Pratt (KMP) algorithm.

Algorithm Explanation

The KMP algorithm avoids redundant comparisons by pre-processing the pattern to build a "longest proper prefix suffix" (LPS) array. The LPS array indicates, for each position in the pattern, the length of the longest proper prefix that is also a suffix. This information is used to efficiently shift the pattern after a mismatch, avoiding unnecessary backtracking.

Java Code

 import java.util.ArrayList;
            import java.util.List;

            public class KMP {

                public static List<Integer> kmpSearch(String text, String pattern) {
                    List<Integer> occurrences = new ArrayList<>();
                    int[] lps = computeLPSArray(pattern);
                    int i = 0; // index for text
                    int j = 0; // index for pattern
                    while (i < text.length()) {
                        if (pattern.charAt(j) == text.charAt(i)) {
                            j++;
                            i++;
                        }
                        if (j == pattern.length()) {
                            occurrences.add(i - j); // Pattern found at index i-j
                            j = lps[j - 1]; // Prepare for the next match
                        } else if (i < text.length() && pattern.charAt(j) != text.charAt(i)) {
                            // Do not match lps[0..lps[j-1]] characters,
                            // they will match anyway
                            if (j != 0) {
                                j = lps[j - 1];
                            } else {
                                i = i + 1;
                            }
                        }
                    }
                    return occurrences;
                }

                private static int[] computeLPSArray(String pattern) {
                    int[] lps = new int[pattern.length()];
                    int length = 0; // length of the previous longest prefix suffix
                    int i = 1;
                    lps[0] = 0; // lps[0] is always 0
                    while (i < pattern.length()) {
                        if (pattern.charAt(i) == pattern.charAt(length)) {
                            length++;
                            lps[i] = length;
                            i++;
                        } else {
                            if (length != 0) {
                                length = lps[length - 1];
                            } else {
                                lps[i] = 0;
                                i++;
                            }
                        }
                    }
                    return lps;
                }

                public static void main(String[] args) {
                    String text = "ABABDABACDABABCABAB";
                    String pattern = "ABABCABAB";
                    List<Integer> occurrences = kmpSearch(text, pattern);
                    System.out.println("Pattern found at indices: " + occurrences);
                }
            } 

Time and Space Complexity Analysis

* **Time Complexity:** O(n + m), where n is the length of the text and m is the length of the pattern. The computeLPSArray function takes O(m) time, and the kmpSearch function takes O(n) time. * **Space Complexity:** O(m) for the LPS array.

Example Test Cases

  • Text: "ABABDABACDABABCABAB", Pattern: "ABABCABAB", Output: [10]
  • Text: "AAAAABAAABA", Pattern: "AAAA", Output: [0, 1]
  • Text: "THIS IS A TEST TEXT", Pattern: "TEST", Output: [10]