File Coverage

blib/lib/String/Similarity.pm
Criterion Covered Total %
statement 6 6 100.0
branch n/a
condition n/a
subroutine 2 2 100.0
pod n/a
total 8 8 100.0


line stmt bran cond sub pod time code
1             =head1 NAME
2              
3             String::Similarity - calculate the similarity of two strings
4              
5             =head1 SYNOPSIS
6              
7             use String::Similarity;
8              
9             $similarity = similarity $string1, $string2;
10             $similarity = similarity $string1, $string2, $limit;
11              
12             =head1 DESCRIPTION
13              
14             =over 4
15              
16             =cut
17              
18             package String::Similarity;
19              
20 1     1   995 use Exporter;
  1         1  
  1         29  
21 1     1   4 use DynaLoader;
  1         1  
  1         109  
22              
23             $VERSION = '1.04';
24             @ISA = qw/Exporter DynaLoader/;
25             @EXPORT = qw(similarity);
26             @EXPORT_OK = qw(fstrcmp);
27              
28             bootstrap String::Similarity $VERSION;
29              
30             =item $factor = similarity $string1, $string2, [$limit]
31              
32             The C-function calculates the similarity index of
33             its two arguments. A value of C<0> means that the strings are
34             entirely different. A value of C<1> means that the strings are
35             identical. Everything else lies between 0 and 1 and describes the amount
36             of similarity between the strings.
37              
38             It roughly works by looking at the smallest number of edits to change one
39             string into the other.
40              
41             You can add an optional argument C<$limit> (default 0) that gives the
42             minimum similarity the two strings must satisfy. C stops
43             analyzing the string as soon as the result drops below the given limit,
44             in which case the result will be invalid but lower than the given
45             C<$limit>. You can use this to speed up the common case of searching for
46             the most similar string from a set by specifing the maximum similarity
47             found so far.
48              
49             =cut
50              
51             # out of historical reasons, I prefer "fstrcmp" as the original name.
52             *similarity = *fstrcmp;
53              
54             1;
55              
56             =back
57              
58             =head1 SEE ALSO
59              
60             The basic algorithm is described in:
61             "An O(ND) Difference Algorithm and its Variations", Eugene Myers,
62             Algorithmica Vol. 1 No. 2, 1986, pp. 251-266;
63             see especially section 4.2, which describes the variation used below.
64              
65             The basic algorithm was independently discovered as described in:
66             "Algorithms for Approximate String Matching", E. Ukkonen,
67             Information and Control Vol. 64, 1985, pp. 100-118.
68              
69             =head1 AUTHOR
70              
71             Marc Lehmann
72             http://home.schmorp.de/
73              
74             (the underlying fstrcmp function was taken from gnu diffutils and
75             modified by Peter Miller and Marc Lehmann
76             ).
77              
78              
79