File Coverage

blib/lib/Statistics/Sequences.pm

Criterion	Covered	Total	%
statement	10	12	83.3
branch			n/a
condition			n/a
subroutine	4	4	100.0
pod			n/a
total	14	16	87.5

line	stmt	sub	time	code
1				package Statistics::Sequences;
2	2	2	53120	use strict;
	2		4
	2		85
3	2	2	9	use warnings FATAL => 'all';
	2		4
	2		84
4	2	2	10	use Carp qw(croak cluck);
	2		11
	2		193
5	2	2	941	use Statistics::Data 0.08;
	0
	0
6				use base qw(Statistics::Data);
7				use Scalar::Util qw(looks_like_number);
8				our $VERSION = '0.12';
9
10				=pod
11
12				=head1 NAME
13
14				Statistics::Sequences - Manage sequences (ordered list of literals) for testing their runs, joins, turns, trinomes, potential energy, etc.
15
16				=head1 VERSION
17
18				This is documentation for Version 0.12 of Statistics::Sequences.
19
20				=head1 SYNOPSIS
21
22				use Statistics::Sequences 0.12;
23				$seq = Statistics::Sequences->new();
24				my @data = (1, 'a', 'a', 1); # ordered list of literal scalars (numbers, strings), as permitted by specific test
25				$seq->load(\@data); # or @data or dataname => \@data
26				print $seq->observed(stat => 'runs'); # expected, variance, z_value, p_value - assuming sub-module Runs.pm is installed
27				print $seq->test(stat => 'vnomes', length => 2); # - - assuming sub-module Vnomes.pm is installed
28				$seq->dump(stat => 'runs', values => {observed => 1, z_value => 1, p_value => 1}, exact => 1, tails => 1);
29				# see also Statistics::Data for inherited methods
30
31				=head1 DESCRIPTION
32
33				Loading, updating and accessing data as ordered list of literal scalars (numbers, strings) for statistical tests of their sequential structure via L<Statistics::Sequences::Joins\|Statistics::Sequences::Joins>, L<Statistics::Sequences::Pot\|Statistics::Sequences::Pot>, L<Statistics::Sequences::Runs\|Statistics::Sequences::Runs>, L<Statistics::Sequences::Turns\|Statistics::Sequences::Turns> and L<Statistics::Sequences::Vnomes\|Statistics::Sequences::Vnomes>. Note that none of these sub-modules are installed by default; to use this module as intended, install one or more of these sub-modules.
34
35				To access the tests, L<use\|perlfunc/use> this base module to create a Statistics::Sequences object with L<new\|new>, then L<load\|load> data into it and access each test by calling the L<test\|test> method, specifying the B<stat> attribute: either joins, pot, runs, turns or vnomes, where the relevant sub-module is installed. This allows running several tests on the same data, as the data are immediately available to each test (of joins, pot, runs, turns or vnomes). See the L<SYNOPSIS\|Statistics::Sequences/SYNOPSIS> for a simple example.
36
37				Alternatively, L<use\|perlfunc/use> each sub-module directly, and restrict analyses to the sub-module's test; this module is used implicitly as their base. That is, to perform a test of one type (e.g., runs), L<use\|perlfunc/use> the relevant sub-package, load data via its constructor; see the SYNOPSIS for the particular test, i.e., L<Joins\|Statistics::Sequences::Joins/SYNOPSIS>, L<Pot\|Statistics::Sequences::Pot/SYNOPSIS>, L<Runs\|Statistics::Sequences::Runs/SYNOPSIS>, L<Turns\|Statistics::Sequences::Turns/SYNOPSIS> or L<Vnomes\|Statistics::Sequences::Vnomes/SYNOPSIS>. You won't be able to access other tests of the same data by this approach, unless you create another object for that test, and then specifically pass the data from the earlier object into the new one.
38
39				=head1 SUBROUTINES/METHODS
40
41				=head2 new
42
43				$seq = Statistics::Sequences->new();
44
45				Returns a new Statistics::Sequences object (inherited from L<Statistics::Data\|Statistics::Data>) by which all the methods for caching, reading and testing data can be accessed, including each of the methods for performing the L<Runs-\|Statistics::Sequences::Runs>, L<Joins-\|Statistics::Sequences::Joins>, L<Pot-\|Statistics::Sequences::Pot>, L<Turns-\|Statistics::Sequences::Turns> or L<Vnomes-\|Statistics::Sequences::Vnomes>tests.
46
47				Sub-packages also have their own new method - so, e.g., L<Statistics::Sequences::Runs\|Statistics::Sequences::Runs>, can be individually imported, and its own L<new\|new> method can be called, e.g.:
48
49				use Statistics::Sequences::Runs;
50				$runs = Statistics::Sequences::Runs->new();
51
52				In this case, data are not automatically shared across packages, and only one test (in this case, the Runs-test) can be accessed through the class-object returned by L<new\|new>.
53
54				=head2 load, add, access, unload
55
56				All these operations on the basic data are inherited from L<Statistics::Data\|Statistics::Data> - see this doc for details of these and other possible methods.
57
58				B<Dichotomous data>: Both the runs- and joins-tests expect dichotomous data: a binary or binomial or Bernoulli sequence, but with whatever characters to symbolize the two possible events. They test their "loads" to make sure the data are dichotomous. To reduce numerical and categorical data to a dichotomous level, see the L<pool\|Statistics::Data::Dichotomize/pool>, L<match\|Statistics::Data::Dichotomize/match>, L<split\|Statistics::Data::Dichotomize/split, cut>, L<swing\|Statistics::Data::Dichotomize/swing>, L<shrink (boolwin)\|Statistics::Data::Dichotomize/shrink, boolwin> and other methods in L<Statistics::Data::Dichotomize\|Statistics::Data::Dichotomize>.
59
60				=head2 observed, observation
61
62				$v = $seq->observed(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
63				$v = $seq->observed(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
64				$v = $seq->observed(stat => 'joins\|pot\|runs\|turns\|vnomes', label => 'myLabelledLoadedData'); # just needs args for partic.stats
65
66				Return the observed value of the statistic for the L<load\|Statistics::Sequences/load>ed data, or data sent with this call, eg., how many runs in the sequence (1, 1, 0, 1). See the particular statistic's manpage for any other arguments needed or optional.
67
68				=cut
69
70				sub observed { return _feedme('observed', @_); } *observation = \&observed;
71
72				=head2 expected, expectation
73
74				$v = $seq->expected(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
75				$v = $seq->expected(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
76
77				Return the expected value of the statistic for the L<load\|Statistics::Sequences/load>ed data, or data sent with this call, eg., how many runs should occur in a 4-length sequence of two possible events. See the statistic's manpage for any other arguments needed or optional.
78
79				=cut
80
81				sub expected { return _feedme('expected', @_); } *expectation = \&expected;
82
83				=head2 variance
84
85				$seq->variance(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
86				$seq->variance(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
87
88				Returns the expected range of deviation in the statistic's observed value for the given number of trials.
89
90				=cut
91
92				sub variance { return _feedme('variance', @_); }
93
94				=head2 obsdev, observed_deviation
95
96				$v = $seq->obsdev(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
97				$v = $seq->obsdev(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
98
99				Returns the deviation of (difference between) observed and expected values of the statistic for the loaded/given sequence (I<O> - I<E>).
100
101				=cut
102
103				sub obsdev {
104				return observed(@_) - expected(@_);
105				}
106				*observed_deviation = \&obsdev;
107
108				=head2 stdev, standard_deviation
109
110				$v = $seq->stdev(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
111				$v = $seq->stdev(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
112
113				Returns square-root of the variance.
114
115				=cut
116
117				sub stdev {
118				return sqrt variance(@_);
119				}
120				*standard_deviation = \&stdev;
121
122				=head2 z_value, zscore
123
124				$v = $seq->zscore(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
125				$v = $seq->zscore(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
126
127				Return the deviation ratio: observed deviation to standard deviation. Use argument B<ccorr> for continuity correction.
128
129				=cut
130
131				sub zscore { return _feedme('zscore', @_); } *z_value = \&zscore;
132
133				=head2 p_value, test
134
135				$p = $seq->test(stat => 'runs');
136				$p = $seq->test(stat => 'joins');
137				$p = $seq->test(stat => 'turns');
138				$p = $seq->test(stat => 'pot', state => 'a value appearing in the data');
139				$p = $seq->test(stat => 'vnomes', length => 'an integer greater than zero and less than sample-size');
140
141				Returns the probability of observing so many runs, joins, etc., versus those expected, relative to the expected variance.
142
143				When using a Statistics::Sequences class-object, this method requires naming which test to perform, i.e., runs, joins, pot or vnomes. This is I<not> required when the class-object already refers to one of the sub-modules, as created by the C<new> method within L<Statistics::Sequences::Runs\|Statistics::Sequences::Runs/new>, L<Statistics::Sequences::Joins\|Statistics::Sequences::Joins/new>, L<Statistics::Sequences::Pot\|Statistics::Sequences::Pot/new>, L<Statistics::Sequences::Turns\|Statistics::Sequences::Turns/new> and L<Statistics::Sequences::Vnomes\|Statistics::Sequences::Vnomes/new>.
144
145				=head3 Common options
146
147				Options common to all the sub-package tests are as follows.
148
149				=over 8
150
151				=item data => 'I<string>'
152
153				Optionally specify the name of the data to be tested. By default, this is not required: the data tested are those that were last loaded, either anonymously, or as returned by one of the L<Statistics::Data::Dichotomize\|Statistics::Data::Dichotomize> methods. Otherwise, I<if the data are already ready for testing in a dichotomous format>, data that were previously loaded by name can be individually tested. For example, here are two sets of data that are loaded by name, and then a single test of one of them is performed.
154
155				@chimps = (qw/banana banana cheese banana cheese banana banana banana/);
156				@mice = (qw/banana cheese cheese cheese cheese cheese cheese cheese/);
157				$seq->load(chimps => \@chimps, mice => \@mice);
158				$p = $seq->test(stat => 'runs', data => 'chimps');
159
160				=item ccorr => I<boolean>
161
162				Specify whether or not to perform the continuity-correction on the observed deviation. Default is false. Relevant only for those tests relying on a I<Z>-test. See L<Statistics::Zed\|Statistics::Zed>.
163
164				=item tails => I<1>\|I<2>
165
166				Specify whether the I<z>-value is calculated for both sides of the normal (or chi-square) distribution (2, the default for most tested data) or only one side (the default for data prepared with the B<swing> method.
167
168				=back
169
170				=head3 Test-specific required settings and options
171
172				Some sub-package tests need to have parameters defined in the call to L<test\|test>, and/or have specific options, as follows.
173
174				B<Joins> : The Joins test I<optionally> allows the setting of a probability value; see C<test\|test> in the L<Statistics::Sequences::Joins\|Statistics::Sequences::Joins/test> manpage.
175
176				B<Pot> : The Pot test I<requires> the setting of a state to be tested; see C<test> in the L<Statistics::Sequences::Pot\|Statistics::Sequences::Pot/test> manpage.
177
178				B<Vnomes> : The Serial test for v-nomes requires a length, i.e., the value of I<v>; see C<test> in the L<Statistics::Sequences::Vnomes\|Statistics::Sequences::Vnomes/test> manpage..
179
180				B<Runs>, B<Turns> : There are presently no specific requirements nor options for the Runs- and Turns-tests.
181
182				=cut
183
184				sub p_value { return _feedme('p_value', @_); } *test = \&p_value;
185
186				=head2 stats_hash
187
188				$href = $seq->stats_hash(stat => 'runs', values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1});
189
190				Returns a hashref with values for any of the descriptives and probability value relevant to the specified B<stat>istic. Include other required or optional arguments relevant to any of the values requested, e.g., B<ccorr> if getting a z_value, B<tails> and B<exact> if getting a p_value, B<state> if testing pot, B<prob> if testing joins, ... B<precision_s>, B<precision_p> ...
191
192				=cut
193
194				sub stats_hash {
195				my $self = shift;
196				my $args = ref $_[0] ? $_[0] : {@_};
197				my @methods = keys %{$args->{'values'}};
198				my (%stats_hash) = ();
199				no strict 'refs';
200				foreach my $method(@methods) {
201				if ($args->{'values'}->{$method} == 1) {
202				eval {$stats_hash{$method} = $self->$method($args);};
203				croak "Method $method is not defined or correctly called for " . __PACKAGE__ if $@;
204				}
205				}
206				if (! scalar keys %stats_hash) { # get default stats:
207				foreach my $method(qw/observed p_value/) {
208				eval {$stats_hash{$method} = $self->$method($args);};
209				croak "Method $method is not defined or correctly called for " . __PACKAGE__ if $@;
210				}
211				}
212				return \%stats_hash;
213				}
214
215				=head2 dump
216
217				$seq->dump(stat => 'runs\|joins\|pot ...', values => {}, format => 'string\|table', flag => '1\|0', precision_s => 'integer', precision_p => 'integer');
218
219				I<Alias>: B<print_summary>
220
221				Print results of the last-conducted test to STDOUT. By default, if no parameters to C<dump> are passed, a single line of test statistics are printed. Options are as follows.
222
223				=over 8
224
225				=item values => hashref
226
227				Hashref of the statistical parameters to dump. Default is observed value and p-value for the given B<stat>.
228
229				=item flag => I<boolean>
230
231				If true, the I<p>-value associated with the I<z>-value is appended with a single asterisk if the value if below .05, and with two asterisks if it is below .01.
232
233				If false (default), nothing is appended to the I<p>-value.
234
235				=item format => 'table\|labline\|csv'
236
237				Default is 'csv', to print the stats hash as a comma-separated string (no newline), e.g., '4.0000,0.8596800". If specifying 'labline', you get something like "observed = 4.0000, p_value = 0.8596800\n". If specifying "table", this is a dump from L<Text::SimpleTable\|Text::SimpleTable> with the stat methods as headers and column length set to the maximum required for the given headers, level of precision, flag, etc. For example, with B<precision_s> => 4 and B<precision_p> => 7, you get:
238
239				.-----------+-----------.
240				\| observed \| p_value \|
241				+-----------+-----------+
242				\| 4.0000 \| 0.8596800 \|
243				'-----------+-----------'
244
245				=item verbose => 1\|0
246
247				If true, includes a title giving the name of the statistic, details about the hypothesis tested (if B<p_value> => 1 in the B<values> hashref), et al. No effect if B<format> is not defined or equals 'csv'.
248
249				=item precision_s => 'I<non-negative integer>'
250
251				Precision of the statistic values (observed, expected, variance, z_value).
252
253				=item precision_p => 'I<non-negative integer>'
254
255				Specify rounding of the probability associated with the I<z>-value to so many digits. If zero or undefined, you get everything available.
256
257				=back
258
259				=cut
260
261				sub dump {
262				my $self = shift;
263				my $args = ref $_[0] ? $_[0] : {@_};
264				my $stats_hash = $self->stats_hash($args);
265				$args->{'format'} \|\|= 'csv';
266				my @standard_methods = (qw/observed expected variance obsdev stdev z_value p_value/);
267				my ($maxlen, @strs, @headers, @wanted_methods) = (0);
268				foreach my $method(@standard_methods) { # set up what has been requested in a meaningful order:
269				push(@wanted_methods, $method) if defined $stats_hash->{$method};
270				}
271				foreach my $method(keys %{$stats_hash}) {
272				push(@wanted_methods, $method) if ! grep/$method/, @wanted_methods;
273				}
274				foreach my $method(@wanted_methods) {
275				my $val = delete $stats_hash->{$method};
276				my $len;
277				if ($method eq 'p_value') {
278				$val = _precisioned($args->{'precision_p'}, $val);
279				$val .= ($val < .05 ? ($val < .01 ? q{*} : q{} ) : q{}) if $args->{'flag'};
280				}
281				else {
282				if (ref $val) {
283				if (ref $val eq 'HASH') {
284				my %vals = %{$val};
285				$val = q{};
286				my $delim = $args->{'format'} eq 'table' ? "\n" : q{,};
287				my ($str, $this_len) = (q{});
288				while (my($k, $v) = each %vals) {
289				$str = "'$k' = $v";
290				$this_len = length($str);
291				$len = $this_len if not defined $len or $this_len > $len;
292				$val .= $str;
293				$val .= $delim;
294				}
295				if ($args->{'format'} ne 'table') {
296				chop $val;
297				$val = '(' . $val . ')';
298				}
299				}
300				else {
301				$val = join q{, }, @{$val};
302				}
303				}
304				elsif(looks_like_number($val)) {
305				$val = _precisioned($args->{'precision_s'}, $val);
306				}
307				}
308				push @headers, $method;
309				push(@strs, $val);
310				$len = length $val if ! defined $len;
311				$maxlen = $len if $len > $maxlen;
312				}
313				if ($args->{'format'} eq 'table') {
314				$maxlen = 8 if $maxlen < 8;
315				my $title = $args->{'verbose'} ? ucfirst($args->{'stat'}) . " statistics\n" : q{};
316				print $title or croak 'Cannot print title for data-table';
317				my @hh = ();
318				push( @hh, [$maxlen, $_]) foreach @headers;
319				require Text::SimpleTable;
320				my $tbl = Text::SimpleTable->new(@hh);
321				$tbl->row(@strs);
322				print $tbl->draw or croak 'Cannot print data-table';
323				}
324				elsif ($args->{'format'} eq 'labline') {
325				my @hh;
326				for (my $i = 0; $i <= $#strs; $i++) {
327				$hh[$i] = "$headers[$i] = $strs[$i]";
328				}
329				my $str = join(q{, }, @hh);
330				if ($args->{'verbose'}) {
331				$str = ucfirst($args->{'stat'}) . ': ' . $str;
332				}
333				print {*STDOUT} $str, "\n" or croak 'Cannot print data-string';
334				}
335				else { # csv
336				print join(q{,}, @strs) or croak 'Cannot print data-string'
337				}
338				return;
339				}
340				*print_summary = \&dump;
341
342				=head2 dump_data
343
344				$seq->dump_data(delim => "\n");
345
346				Prints to STDOUT a space-separated line of the tested data - as dichotomized and put to test. Optionally, give a value for B<delim> to specify how the datapoints should be separated. Inherited from L<Statistics::Data\|Statistics::Data/dump_data>.
347
348				=cut
349
350				# PRIVATMETHODEN
351
352				sub _feedme {
353				my $method = shift;
354				my $self = shift;
355				my $args = ref $_[0] ? $_[0] : {@_};
356				my $statname = $args->{'stat'} \|\| q{};
357				my $class = __PACKAGE__ . q{::} . ucfirst($statname);
358				eval "require $class";
359				croak __PACKAGE__, " error: Requested sequences module '$class' is not valid/available. You might need to install '$class'" if $@;
360				my ($val, $nself) = (q{}, {});
361				#my $nself = {};
362				bless($nself, $class);#$nself = $class->new();
363				$nself->{$_} = $self->{$_} foreach keys %{$self};
364				no strict 'refs';
365				eval '$val = $nself->$method($args)'; # but does not trap "deep recursion" if method not defined
366				croak __PACKAGE__, " error: Method '$method' is not defined for $class" if $@;
367				$self->{'stat'} = $statname;
368				return $val;
369				}
370
371				sub _precisioned {
372				return $_[0] ? sprintf(q{%.} . $_[0] . 'f', $_[1]) : (defined $_[1] ? $_[1] : q{}); # don't lose any zero
373				}
374
375				=head1 BUNDLING
376
377				This module C<use>s its sub-modules implicitly - so a bundled program using this module might need to explicitly C<use> its sub-modules if these need to be included in the bundle itself.
378
379				=head1 AUTHOR
380
381				Roderick Garton, C<< <rgarton at cpan.org> >>
382
383				=head1 LICENSE AND COPYRIGHT
384
385				=over 4
386
387				=item Copyright (c) 2006-2013 Roderick Garton
388
389				This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see L<http://www.perl.com/perl/misc/Artistic.html>).
390
391				=item Disclaimer
392
393				To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.
394
395				=back
396
397				=cut
398
399				1; # end of Statistics::Sequences