In computer science, string searching is a common problem where we need to find the occurrences of a pattern within a larger text. The **Knuth-Morris-Pratt (KMP)** algorithm is a powerful string searching algorithm that provides an efficient solution to this problem. It is particularly useful when the pattern has repetitive elements. In this blog post, we’ll explore the KMP algorithm and implement it in C++.

###### Knuth-Morris-Pratt (KMP) Algorithm

The KMP algorithm improves the efficiency of string searching by utilizing the information gathered from previous character comparisons. It avoids unnecessary character comparisons by exploiting the knowledge of the pattern itself.

The main idea behind the KMP algorithm is to create a lookup table, often referred to as the “failure function” or “LPS” (Longest (proper) Prefix which is also a Suffix) table. This table helps in determining the next possible position to start the comparison when a mismatch occurs.

**The KMP algorithm consists of two main steps:**

- Preprocessing the pattern to compute a table of longest proper prefixes that are also suffixes (LPS table).
- Searching the text using the LPS table to skip comparisons.

###### Preprocessing

The preprocessing step takes `pattern`

as the input and computes an array of integers called the `LPS table`

. The `LPS table`

stores the length of the longest proper prefix that is also a suffix for each position in the pattern. A proper prefix is a prefix that is not equal to the whole string.

For example, for the `pattern = "ABA"`

, the `LPS table`

is `[0, 0, 1]`

.

Here, `LPS[2] = 1`

, since the character at `pattern[0] == pattern[2]`

.

**Pseudocode**

ComputeLPS(pattern): # Initialize LPS table with zeros lps = {0} # Initialize pointers i = 1 # Current position in pattern j = 0 # Length of current matching prefix # Loop until end of pattern for i = 1 to len(pattern): # Case 1: Characters match if pattern[i] == pattern[j]: j++ # Case 2: Characters do not match else: # Decrement j to previous matching prefix length while j > 0 && pattern[i] != pattern[j]: j = lps[j-1] lps[i] = j; # Return LPS table return lps

**Explanation**

The LPS table is computed by using a loop that iterates over the `pattern`

and maintains two pointers: `i and j`

. The `pointer i`

tracks the **current position** in the `pattern`

, while the `pointer j`

tracks the length of the **current matching prefix**.

The loop works as follows:

- If characters at
`i`

and`j`

match, increment both`i`

and`j`

and store`j`

in`LPS`

table at`i`

. - If the characters at
`i`

and`j`

do not match, decrement`j`

to the value stored in the`LPS`

table at`LPS[j-1]`

until`pattern[i] == pattern[j]`

OR`j == 0`

.

The loop ends when `i`

reaches the end of the `pattern`

.

###### Searching

The searching step takes the `text`

and the `pattern`

as input and uses the `LPS table`

to find all the occurrences of the `pattern`

in the `text`

. It also maintains two pointers: `i and j`

. The `pointer i`

tracks the current position in the `text`

, while the` pointer j`

tracks the current position in the `pattern`

. Initially, both `i`

and `j`

are set to `0`

.

**Pseudocode**

SearchKMP(text, pattern): # Compute LPS table for pattern lps = computeLPS(pattern) # Initialize pointers i = 0 # Current position in text j = 0 # Current position in pattern # Loop until end of text while i < len(text): # Case 1: Characters match if text[i] == pattern[j]: # Increment both pointers i += 1 j += 1 # If end of pattern is reached, report a match if j == len(pattern): print("Pattern found at index", i-j) # Reset j to previous matching prefix length j = lps[j-1] # Case 2: Characters do not match else: # Decrement j to previous matching prefix length if j > 0: j = lps[j-1] # Or increment i if there's no prefix else: i += 1

**Explanation**

**text = “BABABA”, pattern = “ABA”****LPS table** = **[0, 0, 1]**

Iteration | Value of i | Value of j | text[i] | pattern[j] | Pattern Found |

1 | 0 | 0 | B | A | No |

2 | 1 | 0 | A | A | No |

3 | 2 | 1 | B | B | No |

4 | 3 | 2 | A | A | Yes |

5 | 4 | 1 | B | B | No |

6 | 5 | 2 | A | A | Yes |

The loop works as follows:

- If i is equal to the length of the
`text`

, end the loop. - If the characters at
`i`

and`j`

match, increment both`i`

and`j`

. If`j`

is equal to the length of the`pattern`

, report a match at`i-j`

and reset j to`lps[j-1]`

. - If the characters at
`i`

and`j`

do not match, decrement`j`

to`lps[j-1]`

, or do nothing if`j`

is already`0`

. Increment`i`

in this case.

This implies that if we’re searching for the

in `pattern`

= "ABA"`text = "BABABA"`

, and we reach

and `text`

[3]`pattern[2]`

, we get a match. However now, we would set the `iterator j (traversing pattern)`

to `1`

ie `j = lps[2] = 1`

(instead of `0`

) and we could continue to search the remaining

. `pattern`

"BC"

This saves us the extra iterations by not resetting the `iterator j (traversing pattern)`

to `0`

and `iterator i (traversing text)`

to `(i - j + 1) or i = 2`

.

**Code Implementation**

```
//
// main.cpp
// KMP Algorithm
//
// Created by Himanshu on 31/05/23.
//
#include <iostream>
#include <vector>
#include <string>
using namespace std;
vector<int> constructLPS (string pattern) {
int n = (int) pattern.length();
vector<int> lps(n, 0);
int j = 0;
for (int i = 1; i < n; i++) {
if (pattern[i] == pattern[j]) {
j++;
} else {
while (j > 0 && pattern[i] != pattern[j]) {
j = lps[j-1];
}
}
lps[i] = j;
}
return lps;
}
void kmpStringSearch (string text, string pattern) {
int n = (int) text.length();
int m = (int) pattern.length();
vector<int> lps = constructLPS(pattern);
int i = 0;
int j = 0;
while (i < n) {
if (text[i] == pattern[j]) {
i++;
j++;
}
//Since, j = m (size of pattern), we set j to lps[j-1]
if (j == m) {
cout<<"Pattern found at index "<< i - j<<endl;
j = lps[j-1];
} else if (i < n && text[i] != pattern[j]) {
if (j != 0) {
j = lps[j-1];
} else {
i++;
}
}
}
}
int main() {
string text = "BABABA";
string pattern = "ABA";
kmpStringSearch(text, pattern);
return 0;
}
```

**Output**

Pattern found at index 1 Pattern found at index 3

**Time complexity: O(m + n)****where m is the length of the pattern and n is the length of the text.**

Here’s a working code with step-by-step code run: KMP Algorithm