File Coverage

blib/lib/Statistics/Sequences.pm
Criterion Covered Total %
statement 10 12 83.3
branch n/a
condition n/a
subroutine 4 4 100.0
pod n/a
total 14 16 87.5


line stmt bran cond sub pod time code
1             package Statistics::Sequences;
2 2     2   53120 use strict;
  2         4  
  2         85  
3 2     2   9 use warnings FATAL => 'all';
  2         4  
  2         84  
4 2     2   10 use Carp qw(croak cluck);
  2         11  
  2         193  
5 2     2   941 use Statistics::Data 0.08;
  0            
  0            
6             use base qw(Statistics::Data);
7             use Scalar::Util qw(looks_like_number);
8             our $VERSION = '0.12';
9              
10             =pod
11              
12             =head1 NAME
13              
14             Statistics::Sequences - Manage sequences (ordered list of literals) for testing their runs, joins, turns, trinomes, potential energy, etc.
15              
16             =head1 VERSION
17              
18             This is documentation for Version 0.12 of Statistics::Sequences.
19              
20             =head1 SYNOPSIS
21              
22             use Statistics::Sequences 0.12;
23             $seq = Statistics::Sequences->new();
24             my @data = (1, 'a', 'a', 1); # ordered list of literal scalars (numbers, strings), as permitted by specific test
25             $seq->load(\@data); # or @data or dataname => \@data
26             print $seq->observed(stat => 'runs'); # expected, variance, z_value, p_value - assuming sub-module Runs.pm is installed
27             print $seq->test(stat => 'vnomes', length => 2); # - - assuming sub-module Vnomes.pm is installed
28             $seq->dump(stat => 'runs', values => {observed => 1, z_value => 1, p_value => 1}, exact => 1, tails => 1);
29             # see also Statistics::Data for inherited methods
30              
31             =head1 DESCRIPTION
32              
33             Loading, updating and accessing data as ordered list of literal scalars (numbers, strings) for statistical tests of their sequential structure via L<Statistics::Sequences::Joins|Statistics::Sequences::Joins>, L<Statistics::Sequences::Pot|Statistics::Sequences::Pot>, L<Statistics::Sequences::Runs|Statistics::Sequences::Runs>, L<Statistics::Sequences::Turns|Statistics::Sequences::Turns> and L<Statistics::Sequences::Vnomes|Statistics::Sequences::Vnomes>. Note that none of these sub-modules are installed by default; to use this module as intended, install one or more of these sub-modules.
34              
35             To access the tests, L<use|perlfunc/use> this base module to create a Statistics::Sequences object with L<new|new>, then L<load|load> data into it and access each test by calling the L<test|test> method, specifying the B<stat> attribute: either joins, pot, runs, turns or vnomes, where the relevant sub-module is installed. This allows running several tests on the same data, as the data are immediately available to each test (of joins, pot, runs, turns or vnomes). See the L<SYNOPSIS|Statistics::Sequences/SYNOPSIS> for a simple example.
36              
37             Alternatively, L<use|perlfunc/use> each sub-module directly, and restrict analyses to the sub-module's test; this module is used implicitly as their base. That is, to perform a test of one type (e.g., runs), L<use|perlfunc/use> the relevant sub-package, load data via its constructor; see the SYNOPSIS for the particular test, i.e., L<Joins|Statistics::Sequences::Joins/SYNOPSIS>, L<Pot|Statistics::Sequences::Pot/SYNOPSIS>, L<Runs|Statistics::Sequences::Runs/SYNOPSIS>, L<Turns|Statistics::Sequences::Turns/SYNOPSIS> or L<Vnomes|Statistics::Sequences::Vnomes/SYNOPSIS>. You won't be able to access other tests of the same data by this approach, unless you create another object for that test, and then specifically pass the data from the earlier object into the new one.
38              
39             =head1 SUBROUTINES/METHODS
40              
41             =head2 new
42              
43             $seq = Statistics::Sequences->new();
44              
45             Returns a new Statistics::Sequences object (inherited from L<Statistics::Data|Statistics::Data>) by which all the methods for caching, reading and testing data can be accessed, including each of the methods for performing the L<Runs-|Statistics::Sequences::Runs>, L<Joins-|Statistics::Sequences::Joins>, L<Pot-|Statistics::Sequences::Pot>, L<Turns-|Statistics::Sequences::Turns> or L<Vnomes-|Statistics::Sequences::Vnomes>tests.
46              
47             Sub-packages also have their own new method - so, e.g., L<Statistics::Sequences::Runs|Statistics::Sequences::Runs>, can be individually imported, and its own L<new|new> method can be called, e.g.:
48              
49             use Statistics::Sequences::Runs;
50             $runs = Statistics::Sequences::Runs->new();
51              
52             In this case, data are not automatically shared across packages, and only one test (in this case, the Runs-test) can be accessed through the class-object returned by L<new|new>.
53              
54             =head2 load, add, access, unload
55              
56             All these operations on the basic data are inherited from L<Statistics::Data|Statistics::Data> - see this doc for details of these and other possible methods.
57              
58             B<Dichotomous data>: Both the runs- and joins-tests expect dichotomous data: a binary or binomial or Bernoulli sequence, but with whatever characters to symbolize the two possible events. They test their "loads" to make sure the data are dichotomous. To reduce numerical and categorical data to a dichotomous level, see the L<pool|Statistics::Data::Dichotomize/pool>, L<match|Statistics::Data::Dichotomize/match>, L<split|Statistics::Data::Dichotomize/split, cut>, L<swing|Statistics::Data::Dichotomize/swing>, L<shrink (boolwin)|Statistics::Data::Dichotomize/shrink, boolwin> and other methods in L<Statistics::Data::Dichotomize|Statistics::Data::Dichotomize>.
59              
60             =head2 observed, observation
61              
62             $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
63             $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
64             $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', label => 'myLabelledLoadedData'); # just needs args for partic.stats
65              
66             Return the observed value of the statistic for the L<load|Statistics::Sequences/load>ed data, or data sent with this call, eg., how many runs in the sequence (1, 1, 0, 1). See the particular statistic's manpage for any other arguments needed or optional.
67              
68             =cut
69              
70             sub observed { return _feedme('observed', @_); } *observation = \&observed;
71              
72             =head2 expected, expectation
73              
74             $v = $seq->expected(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
75             $v = $seq->expected(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
76              
77             Return the expected value of the statistic for the L<load|Statistics::Sequences/load>ed data, or data sent with this call, eg., how many runs should occur in a 4-length sequence of two possible events. See the statistic's manpage for any other arguments needed or optional.
78              
79             =cut
80              
81             sub expected { return _feedme('expected', @_); } *expectation = \&expected;
82              
83             =head2 variance
84              
85             $seq->variance(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
86             $seq->variance(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
87              
88             Returns the expected range of deviation in the statistic's observed value for the given number of trials.
89              
90             =cut
91              
92             sub variance { return _feedme('variance', @_); }
93              
94             =head2 obsdev, observed_deviation
95              
96             $v = $seq->obsdev(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
97             $v = $seq->obsdev(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
98              
99             Returns the deviation of (difference between) observed and expected values of the statistic for the loaded/given sequence (I<O> - I<E>).
100              
101             =cut
102              
103             sub obsdev {
104             return observed(@_) - expected(@_);
105             }
106             *observed_deviation = \&obsdev;
107              
108             =head2 stdev, standard_deviation
109              
110             $v = $seq->stdev(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
111             $v = $seq->stdev(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
112              
113             Returns square-root of the variance.
114              
115             =cut
116              
117             sub stdev {
118             return sqrt variance(@_);
119             }
120             *standard_deviation = \&stdev;
121              
122             =head2 z_value, zscore
123              
124             $v = $seq->zscore(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
125             $v = $seq->zscore(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
126              
127             Return the deviation ratio: observed deviation to standard deviation. Use argument B<ccorr> for continuity correction.
128              
129             =cut
130              
131             sub zscore { return _feedme('zscore', @_); } *z_value = \&zscore;
132              
133             =head2 p_value, test
134              
135             $p = $seq->test(stat => 'runs');
136             $p = $seq->test(stat => 'joins');
137             $p = $seq->test(stat => 'turns');
138             $p = $seq->test(stat => 'pot', state => 'a value appearing in the data');
139             $p = $seq->test(stat => 'vnomes', length => 'an integer greater than zero and less than sample-size');
140              
141             Returns the probability of observing so many runs, joins, etc., versus those expected, relative to the expected variance.
142              
143             When using a Statistics::Sequences class-object, this method requires naming which test to perform, i.e., runs, joins, pot or vnomes. This is I<not> required when the class-object already refers to one of the sub-modules, as created by the C<new> method within L<Statistics::Sequences::Runs|Statistics::Sequences::Runs/new>, L<Statistics::Sequences::Joins|Statistics::Sequences::Joins/new>, L<Statistics::Sequences::Pot|Statistics::Sequences::Pot/new>, L<Statistics::Sequences::Turns|Statistics::Sequences::Turns/new> and L<Statistics::Sequences::Vnomes|Statistics::Sequences::Vnomes/new>.
144              
145             =head3 Common options
146              
147             Options common to all the sub-package tests are as follows.
148              
149             =over 8
150              
151             =item data => 'I<string>'
152              
153             Optionally specify the name of the data to be tested. By default, this is not required: the data tested are those that were last loaded, either anonymously, or as returned by one of the L<Statistics::Data::Dichotomize|Statistics::Data::Dichotomize> methods. Otherwise, I<if the data are already ready for testing in a dichotomous format>, data that were previously loaded by name can be individually tested. For example, here are two sets of data that are loaded by name, and then a single test of one of them is performed.
154              
155             @chimps = (qw/banana banana cheese banana cheese banana banana banana/);
156             @mice = (qw/banana cheese cheese cheese cheese cheese cheese cheese/);
157             $seq->load(chimps => \@chimps, mice => \@mice);
158             $p = $seq->test(stat => 'runs', data => 'chimps');
159              
160             =item ccorr => I<boolean>
161              
162             Specify whether or not to perform the continuity-correction on the observed deviation. Default is false. Relevant only for those tests relying on a I<Z>-test. See L<Statistics::Zed|Statistics::Zed>.
163              
164             =item tails => I<1>|I<2>
165              
166             Specify whether the I<z>-value is calculated for both sides of the normal (or chi-square) distribution (2, the default for most tested data) or only one side (the default for data prepared with the B<swing> method.
167              
168             =back
169              
170             =head3 Test-specific required settings and options
171              
172             Some sub-package tests need to have parameters defined in the call to L<test|test>, and/or have specific options, as follows.
173              
174             B<Joins> : The Joins test I<optionally> allows the setting of a probability value; see C<test|test> in the L<Statistics::Sequences::Joins|Statistics::Sequences::Joins/test> manpage.
175              
176             B<Pot> : The Pot test I<requires> the setting of a state to be tested; see C<test> in the L<Statistics::Sequences::Pot|Statistics::Sequences::Pot/test> manpage.
177              
178             B<Vnomes> : The Serial test for v-nomes requires a length, i.e., the value of I<v>; see C<test> in the L<Statistics::Sequences::Vnomes|Statistics::Sequences::Vnomes/test> manpage..
179              
180             B<Runs>, B<Turns> : There are presently no specific requirements nor options for the Runs- and Turns-tests.
181              
182             =cut
183              
184             sub p_value { return _feedme('p_value', @_); } *test = \&p_value;
185              
186             =head2 stats_hash
187              
188             $href = $seq->stats_hash(stat => 'runs', values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1});
189              
190             Returns a hashref with values for any of the descriptives and probability value relevant to the specified B<stat>istic. Include other required or optional arguments relevant to any of the values requested, e.g., B<ccorr> if getting a z_value, B<tails> and B<exact> if getting a p_value, B<state> if testing pot, B<prob> if testing joins, ... B<precision_s>, B<precision_p> ...
191              
192             =cut
193              
194             sub stats_hash {
195             my $self = shift;
196             my $args = ref $_[0] ? $_[0] : {@_};
197             my @methods = keys %{$args->{'values'}};
198             my (%stats_hash) = ();
199             no strict 'refs';
200             foreach my $method(@methods) {
201             if ($args->{'values'}->{$method} == 1) {
202             eval {$stats_hash{$method} = $self->$method($args);};
203             croak "Method $method is not defined or correctly called for " . __PACKAGE__ if $@;
204             }
205             }
206             if (! scalar keys %stats_hash) { # get default stats:
207             foreach my $method(qw/observed p_value/) {
208             eval {$stats_hash{$method} = $self->$method($args);};
209             croak "Method $method is not defined or correctly called for " . __PACKAGE__ if $@;
210             }
211             }
212             return \%stats_hash;
213             }
214              
215             =head2 dump
216              
217             $seq->dump(stat => 'runs|joins|pot ...', values => {}, format => 'string|table', flag => '1|0', precision_s => 'integer', precision_p => 'integer');
218              
219             I<Alias>: B<print_summary>
220              
221             Print results of the last-conducted test to STDOUT. By default, if no parameters to C<dump> are passed, a single line of test statistics are printed. Options are as follows.
222              
223             =over 8
224              
225             =item values => hashref
226              
227             Hashref of the statistical parameters to dump. Default is observed value and p-value for the given B<stat>.
228              
229             =item flag => I<boolean>
230              
231             If true, the I<p>-value associated with the I<z>-value is appended with a single asterisk if the value if below .05, and with two asterisks if it is below .01.
232              
233             If false (default), nothing is appended to the I<p>-value.
234              
235             =item format => 'table|labline|csv'
236              
237             Default is 'csv', to print the stats hash as a comma-separated string (no newline), e.g., '4.0000,0.8596800". If specifying 'labline', you get something like "observed = 4.0000, p_value = 0.8596800\n". If specifying "table", this is a dump from L<Text::SimpleTable|Text::SimpleTable> with the stat methods as headers and column length set to the maximum required for the given headers, level of precision, flag, etc. For example, with B<precision_s> => 4 and B<precision_p> => 7, you get:
238              
239             .-----------+-----------.
240             | observed | p_value |
241             +-----------+-----------+
242             | 4.0000 | 0.8596800 |
243             '-----------+-----------'
244              
245             =item verbose => 1|0
246              
247             If true, includes a title giving the name of the statistic, details about the hypothesis tested (if B<p_value> => 1 in the B<values> hashref), et al. No effect if B<format> is not defined or equals 'csv'.
248              
249             =item precision_s => 'I<non-negative integer>'
250              
251             Precision of the statistic values (observed, expected, variance, z_value).
252              
253             =item precision_p => 'I<non-negative integer>'
254              
255             Specify rounding of the probability associated with the I<z>-value to so many digits. If zero or undefined, you get everything available.
256              
257             =back
258              
259             =cut
260              
261             sub dump {
262             my $self = shift;
263             my $args = ref $_[0] ? $_[0] : {@_};
264             my $stats_hash = $self->stats_hash($args);
265             $args->{'format'} ||= 'csv';
266             my @standard_methods = (qw/observed expected variance obsdev stdev z_value p_value/);
267             my ($maxlen, @strs, @headers, @wanted_methods) = (0);
268             foreach my $method(@standard_methods) { # set up what has been requested in a meaningful order:
269             push(@wanted_methods, $method) if defined $stats_hash->{$method};
270             }
271             foreach my $method(keys %{$stats_hash}) {
272             push(@wanted_methods, $method) if ! grep/$method/, @wanted_methods;
273             }
274             foreach my $method(@wanted_methods) {
275             my $val = delete $stats_hash->{$method};
276             my $len;
277             if ($method eq 'p_value') {
278             $val = _precisioned($args->{'precision_p'}, $val);
279             $val .= ($val < .05 ? ($val < .01 ? q{**} : q{*} ) : q{}) if $args->{'flag'};
280             }
281             else {
282             if (ref $val) {
283             if (ref $val eq 'HASH') {
284             my %vals = %{$val};
285             $val = q{};
286             my $delim = $args->{'format'} eq 'table' ? "\n" : q{,};
287             my ($str, $this_len) = (q{});
288             while (my($k, $v) = each %vals) {
289             $str = "'$k' = $v";
290             $this_len = length($str);
291             $len = $this_len if not defined $len or $this_len > $len;
292             $val .= $str;
293             $val .= $delim;
294             }
295             if ($args->{'format'} ne 'table') {
296             chop $val;
297             $val = '(' . $val . ')';
298             }
299             }
300             else {
301             $val = join q{, }, @{$val};
302             }
303             }
304             elsif(looks_like_number($val)) {
305             $val = _precisioned($args->{'precision_s'}, $val);
306             }
307             }
308             push @headers, $method;
309             push(@strs, $val);
310             $len = length $val if ! defined $len;
311             $maxlen = $len if $len > $maxlen;
312             }
313             if ($args->{'format'} eq 'table') {
314             $maxlen = 8 if $maxlen < 8;
315             my $title = $args->{'verbose'} ? ucfirst($args->{'stat'}) . " statistics\n" : q{};
316             print $title or croak 'Cannot print title for data-table';
317             my @hh = ();
318             push( @hh, [$maxlen, $_]) foreach @headers;
319             require Text::SimpleTable;
320             my $tbl = Text::SimpleTable->new(@hh);
321             $tbl->row(@strs);
322             print $tbl->draw or croak 'Cannot print data-table';
323             }
324             elsif ($args->{'format'} eq 'labline') {
325             my @hh;
326             for (my $i = 0; $i <= $#strs; $i++) {
327             $hh[$i] = "$headers[$i] = $strs[$i]";
328             }
329             my $str = join(q{, }, @hh);
330             if ($args->{'verbose'}) {
331             $str = ucfirst($args->{'stat'}) . ': ' . $str;
332             }
333             print {*STDOUT} $str, "\n" or croak 'Cannot print data-string';
334             }
335             else { # csv
336             print join(q{,}, @strs) or croak 'Cannot print data-string'
337             }
338             return;
339             }
340             *print_summary = \&dump;
341              
342             =head2 dump_data
343              
344             $seq->dump_data(delim => "\n");
345              
346             Prints to STDOUT a space-separated line of the tested data - as dichotomized and put to test. Optionally, give a value for B<delim> to specify how the datapoints should be separated. Inherited from L<Statistics::Data|Statistics::Data/dump_data>.
347              
348             =cut
349              
350             # PRIVATMETHODEN
351              
352             sub _feedme {
353             my $method = shift;
354             my $self = shift;
355             my $args = ref $_[0] ? $_[0] : {@_};
356             my $statname = $args->{'stat'} || q{};
357             my $class = __PACKAGE__ . q{::} . ucfirst($statname);
358             eval "require $class";
359             croak __PACKAGE__, " error: Requested sequences module '$class' is not valid/available. You might need to install '$class'" if $@;
360             my ($val, $nself) = (q{}, {});
361             #my $nself = {};
362             bless($nself, $class);#$nself = $class->new();
363             $nself->{$_} = $self->{$_} foreach keys %{$self};
364             no strict 'refs';
365             eval '$val = $nself->$method($args)'; # but does not trap "deep recursion" if method not defined
366             croak __PACKAGE__, " error: Method '$method' is not defined for $class" if $@;
367             $self->{'stat'} = $statname;
368             return $val;
369             }
370              
371             sub _precisioned {
372             return $_[0] ? sprintf(q{%.} . $_[0] . 'f', $_[1]) : (defined $_[1] ? $_[1] : q{}); # don't lose any zero
373             }
374              
375             =head1 BUNDLING
376              
377             This module C<use>s its sub-modules implicitly - so a bundled program using this module might need to explicitly C<use> its sub-modules if these need to be included in the bundle itself.
378              
379             =head1 AUTHOR
380              
381             Roderick Garton, C<< <rgarton at cpan.org> >>
382              
383             =head1 LICENSE AND COPYRIGHT
384              
385             =over 4
386              
387             =item Copyright (c) 2006-2013 Roderick Garton
388              
389             This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see L<http://www.perl.com/perl/misc/Artistic.html>).
390              
391             =item Disclaimer
392              
393             To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.
394              
395             =back
396              
397             =cut
398              
399             1; # end of Statistics::Sequences