Name  : Rabiya  Babar

Id : 15009065158

We Will Write a Custom Essay Specifically
For You For Only \$13.90/page!

order now

Teacher’s Name : Tayaba
anjum

Course name :  Analysis of algorithm

Title : Rabin-karp
algorithm

Date  : 3rd January 2018

1.  Abstract

There are many string
matching algorithms. The two main categories are, Exact string matching and
approximate string matching. Rabin-Karp is an exact string matching technique that
uses hashing for searching of string. It is an average case analysis of string
matching algorithm. Hashing is also known as finger print technique. Rabin Karp
converts these finger prints in constant time.
This technique sum ASCII values for each and every letter and take mod
of this letter by some prime number. We say it a good hash function if it looks
at every word in

the string. After the Computations
done with function of hash computation, If the answer appeared to be same then
we need to do normal string comparison. If both the strings appeared same,  this results in complete search. Next,
calculate the hash of next variable, continue the process until the actual
string is found. Or we reach to the end of text string.

Keywords— hashing, ASCII, hash function.

2.
Introduction

Rabin-Karp
algorithm is the most often used string matching algorithm. To find a hash code pattern ‘pt’ from a  given string ‘s’. first of all, it divides
the hash code pattern with a prime number ‘pri’ that is defined before to
compute the remainder of pattern ‘pt’. Then it takes input ‘l’. here ‘l’ is the
length of pattern ‘ptr’ from the string ‘str’ with the first shift ‘sh’ to
compute remainder of ‘l’ chracters from the string ‘s’. If the remainder of the
pattern ‘pt’ and the remainder of the string  ‘str’ are equal than we have comparison
between the string and pattern both. Except, In other cases it is not needed to
have a comparison. This is because if remainders are not equal to each other
than these numbers will not be equal in any case. This process is repeated for
every next characters from string for all possible shifts. Rabin-karp algorithm
practical application is detecting plagiarism. When source material is given,
the algorithm can instantaneously search through a given string. Ignoring the
minor details like punctuations. When good hashing is done, it results in good
performance for the encountered data. Similarly if hashing is done poorly then
the whole algorithm takes the worst case time O(mn).

Figure

Text t

R

A

B

I

y

A

i

y

A

Pattern pt

Fig.1
overview of string matching

Rabin
karp string matching problem is the problem of finding all possible shifts with
which it occurs in the described pattern. Figure 1 represent this description.

3. Literature review

Rabin
karp is the increment version to naïve approach by implementing a most powerful
implementation called hash function. Before processing takes place it computes
hash value of pattern ‘ptr’ (with length ‘l’) along with hash value for each
sub string in pattern ptr. Next it compares numerical values instead of
comparing the symbols. According to naïve approach, If any similarity is found
than it compares the pattern with the substring. Or it shifts to next substring
of ‘str’ with pattern ‘ptr’ by using hash values. It shows that performance of
Rabin karp depends on computation of hash values. The string used in the
algorithm can be treated as array of characters. These characters can be
converted to integers or hash values depending upon what type of translating
technique is being used. Hence the string can be used as array of integers.

Let
L be a character sequence as a L digit number in base B. following equations
such as equation
1 can be used to find the substring where ti is the integer at ith position.
Similarly equation 2 is used to compute a hash integer of the sub character where
P is the large prime number.

Ai=
ti B^ (L–1) + ti+1 B^(L–2) + … + ti+L–1 R^ 0

EQ.1

K(Ai)=
ti B (L–1) + ti+1 B(L–2) + … + ti+L–1 B^ 0 (mod Q)

EQ.2

Moreover, after
computing Ai we can compute Ai+1 for the next substring ti+1 ..i+L in
constant time.

xi+1 = ( xi – ti*BL–1 )
B + t i + B

Calculation of the
hash          value following equation
can be given.

A(xi+1)= ( xi – t
i*BL–1 ) B + t i + B (mod Q)

EQ.4

4. Methodology

Let P be the pattern
that has length L and string Str has length n. following are the ways that find
sub string in the string are:

1.      Hashing
calculations of P to get A(p)O(L).

2.      Keep
iterating through the length L of string and substrings of S, hashing integers
of those substrings and comparison with A(p) O(nB).

3.      If
the hash answer of substring does not matches the given string A(P). compute
the string comparison once more with the sub string and pattern. stop immediately
if the pattern match and continue if they do not match.

pseudocode

a ? inputstring

b ? searchstring

l ? length(a)

m ? length(b)

hSrchStr 😕 hash(b)

for i = 0 ? l ? m

do

hStr ? hash(ai ? m)

if ai ? m = b

then

return “F OUND”

else

return “NOT F OUND” end if

end for

Calculating the hash

S ? inputstring

Sum1 ? 0

y ? length(S)

x ? S

for i = 0 to y

do

sum1 = sum1 + Si

end for

return sum1 % 3

Figure

Fig.2 Calculation
of hash value

Time complexity

Table

Algorithm

Pre-processing

Searching

Execution
time

Rabin Karp

O(m)

O(mn)

O(mn)

Table.1
Time complexity of Rabin karp.

Table
1 shows the time complexity of
the rabin karp.

Working

hash function working describes
a string as a base b number. Where b is a constant that represents the total
number of characters. calculate the hash value of next character by shifting to
upcoming character.  The best practice is
to choose the prime number as large as possible which does not result in
overflow. Hence the time period of hash function is much bigger than time
complexity of normal flow algorithm.

Implementation

Next step to naïve
approach is to move a level above to match a string by using the comparisons that
are being done in each iteration and making use of it in next iteration.

1.      Firstly
define the string of numbers only. Let S={0,…10}. After which pattern and
string are defined. Then these numbers are converted to characters. So now
these are set of characters.

2.      Let
pt defines the decimal value of pattern pt1,…l. l represents the length of
the pattern.

3.      A
major problem that can occur is that the decimal values can cause over flow if
they are large enough. Hence, modulus is computed by a large prime number.

4.      Rabin
karp worst case time complexity is O(m-(n-m+1)). And best and average case time
complexity is O (n+m) that makes it better approach of string matching.

5.
Analysis and comparison with other algo’s solving
same problem.

Comparisons

1.
Knuth
Morris Pratt Algorithms

Knuth morris pratt
algorithm is a string searching algorithm that takes linear time for search. A
table of prefixes is build for the string that is to be searched before the
string matching phase starts. The string matching starts with the left most
character in the pattern. prefix table is build only for mismatch strings.

Complexity

It compares string from
left to right. Before processing its time complexity is O(m) that assumes to be
longest length time complexity of the string of chracters. Time complexity of search
mode is O(n+m). overall Total comparisons performed by knuth morris pratt
algorithm is 2n-1 which includes the searching phase and the delay.

2.
Boyer
Moore Horspool Algorithm

Boyer Moore algorithm
performs searching from right to left. It begins the comparison from right to
left. If string mismatches, two pre computed functions are designed to shift
the string window to the right. These sliding functions are called good suffix
shift and bad character shift. This algorithm is the simplified form of
boyer-moore algorithm. The string that is to be searched is preprocessed to
build a table that is designed for bad match occurrence. It contain the length
that shifts. It has values for every character in the string.

Val=len-ind-1

EQ.5 By using shift
valuesof
table2. Eq.5 is used to calculate the values.

Table

Letter

t

e

A

M

Value

8

6

1

3

match table

Figure

Text t

R

a

B

I

Y

A

i

y

A

Pattern pt

Comparing from right
to left

Fig.3
overview of BMH algorithm

3.
Brute force
Algorithm

Brute force algorithm compares one character at a
time. It starts comparing from left to right until a mismatch is found. This algorithm
has no separate preprocessing phase. It matches first character of pattern with
first character of string. If it does not match then it move towards the next
character of the string and then compares second character of string with first
character of pattern.if match is found than it compares second character of
pattern with the next character of the string.

Figure

Text t

R

a

B

I

Y

a

B

i

R

a

b

i

Y

a

b

I

R

a

b

i

Y

a

b

i

Fig.4 Brute force example

Analysis

Time complexity

Table

Algorithm

preprocessing

Execution
time

Accuracy

Knuth Morris Pratt

O(M)

O(M+N)

65%

Brute Force

None

O(MN)

66.7%

Boyer Moore

O(M+N)

O(MN)

75%

Rabin Karp

O(N)

O(MN)

70%

Table.3 Analysis
by             calculations.

Old algorithm
studies

The time complexity of exact string pattern matching
can be improved if the most efficient algorithm is used. String matching
algorithms were studied with proteins and DNA of biological sequences. After
the analysis it was concluded that KMP algorithm is easier to implement because
it does not back track the given sequence and consume extra space. Rabin karp
algorithm is used to check plagiarism that also requires additional space.
Brute force does not require additional space or preprocessing. It is slow and
does not produce efficient result. BM algorithm is very fast for large
sequences. It discard lots of unnecessary comparisons. Rasool
et.al (2012) had done a research which shows that KMP is more
efficient than Brute force algorithm.

6.
Major finding

·
Rabin Karp algorithm is independent of length of
pattern

·
It runs in real time and require a constant number
of storage location.

·
They are easy and simple to understand and
implement.

·
This algorithm can be used to readily generalize to
high length pattern matching problems.

·
It is also used for multiple pattern matching
problem.

·
One of the application of Rabin karp is detecting
plagiarism.

·
It is easy to implement with good hashing function
affectivity.

·
This algorithm is not faster than brute force, but
its time complexity is O(n+m).

·
There are many other algorithms that are fatser than
O(n+m).

·
It requires additional space like brute force and is
practically slow.

·
It is not so faster than brute force

7.
Conclusion

In this paper,
string searching algorithm especially Rabin karp is in main focus. Rabin karp
is a great and effective algorithm. It has many advantages one of the best
advantage is that it is used in multiple pattern matching. This makes it a
perfect algorithm for large strings. Finding the correct text In today’s online
scenario is most important thing in minimum time. String matching algorithms
are used for it. Work on software and hardware levels is being done to make
pattern searching faster. Approximately best algorithm is determined by applying
algorithms in different applications. The best algorithm thus have less time
complexity and reduced computation time. Algorithms may not be best optimal
algorithms but better than the general algorithms that are tested in various
practical applications. From the study it is concluded that Rabin karp requires
extra space as it is used to detect plagiarism but brute force do not require
preprocessing of the string. The only problem is it is very slow. It sometimes
produces effective results. Additionally boyer moore algorithm is relatively
faster for large strings as it avoids extra comparisons.

8.
References

2  S.

Miachel O.Rabin, “Computer Algorithms- String Pattern Matching”.

Nuser M, Hussain I,” Journal of Computer Science Issues,”.

5  Knuth
D.E, Singla N” International Journal of Soft Computing and Engineering”.