File Coverage

blib/lib/Text/NSP.pm
Criterion Covered Total %
statement 9 9 100.0
branch n/a
condition n/a
subroutine 3 3 100.0
pod n/a
total 12 12 100.0


line stmt bran cond sub pod time code
1             =head1 NAME
2              
3             Text::NSP - Extract collocations and Ngrams from text
4              
5             =head1 SYNOPSIS
6              
7             =head2 Basic Usage
8              
9             use Text::NSP::Measures::2D::MI::ll;
10              
11             my $npp = 60; my $n1p = 20; my $np1 = 20; my $n11 = 10;
12              
13             $ll_value = calculateStatistic( n11=>$n11,
14             n1p=>$n1p,
15             np1=>$np1,
16             npp=>$npp);
17              
18             if( ($errorCode = getErrorCode()))
19             {
20             print STDERR $errorCode." - ".getErrorMessage()."\n"";
21             }
22             else
23             {
24             print getStatisticName."value for bigram is ".$ll_value."\n"";
25             }
26              
27             =head1 DESCRIPTION
28              
29             The Ngram Statistics Package (NSP) is a collection of perl modules
30             that aid in analyzing Ngrams in text files. We define an Ngram as a
31             sequence of 'n' tokens that occur within a window of at least 'n'
32             tokens in the text; what constitutes a "token" can be defined by the
33             user.
34              
35             NSP.pm is a stub that doesn't have any real functionality. It serves
36             as a top level module in the hierarchy and allows us to group the
37             Text::NSP::Count and Text::NSP::Measures modules.
38              
39             The modules under Text::NSP::Measures implement measures of
40             association that are used to evaluate whether the co-occurrence of the
41             words in a Ngram is purely by chance or statistically significant.
42             These measures compute a numerical score for Ngrams. This score can be
43             used to decide whether or not there is enough evidence to reject the
44             null hypothesis (that the Ngram is not statistically significant) for
45             that Ngram.
46              
47             To use one of the measures you can either use the program statistic.pl
48             provided under the utils directory, or write your own driver program.
49             Program statistic.pl takes as input a list of Ngrams with their
50             frequencies (in the format output by count.pl) and runs a
51             user-selected statistical measure of association to compute the score
52             for each Ngram. The Ngrams, along with their scores, are output in
53             descending order of this score. For help on using utils/statistic.pl
54             please refer to its perldoc (perldoc utils/statistic.pl).
55              
56             If you are writing your own driver program, a basic usage example is
57             provided above under SYNOPSIS. For further clarification please refer
58             to the documentation of Text::NSP::Measures (perldoc
59             Text::NSP::Measures).
60              
61              
62             =head2 Error Codes
63              
64             The following table describes the error codes use in the
65             implementation,
66              
67             Error codes common to all the association measures.
68              
69             100 - Trying to create an object of a abstract class.
70              
71             200 - one of the required values is missing.
72              
73             201 - one of the observed frequency comes out to be -ve.
74              
75             202 - one of the frequency values(n11) exceeds the total no of
76             bigrams(npp) or a marginal total(n1p, np1).
77              
78             203 - one of the marginal totals(n1p, np1) exceeds the total bigram
79             count(npp).
80              
81             204 - one of the marginal totals is -ve.
82              
83             Error Codes required by the mutual information measures
84              
85             211 - one of the expected values is zero.
86              
87             212 - one of the expected values is -ve.
88              
89              
90             Error codes required by the CHI measures.
91              
92             221 - one of the expected values is zero.
93              
94             =head2 Methods
95              
96             =over
97              
98             =cut
99              
100             package Text::NSP;
101              
102 29     29   765 use strict;
  29         47  
  29         1263  
103 29     29   136 use Carp;
  29         50  
  29         2405  
104 29     29   136 use warnings;
  29         45  
  29         2223  
105              
106             our ($VERSION, @ISA);
107              
108             @ISA = qw(Exporter);
109              
110             $VERSION = '1.29';
111              
112             1;
113              
114             __END__