File Coverage

blib/lib/Statistics/Sequences/Pot.pm
Criterion Covered Total %
statement 16 18 88.8
branch n/a
condition n/a
subroutine 6 6 100.0
pod n/a
total 22 24 91.6


line stmt bran cond sub pod time code
1             package Statistics::Sequences::Pot;
2              
3 1     1   21515 use 5.008008;
  1         4  
  1         32  
4 1     1   5 use strict;
  1         2  
  1         30  
5 1     1   4 use warnings;
  1         5  
  1         25  
6 1     1   5 use Carp 'croak';
  1         1  
  1         65  
7 1     1   5 use vars qw($VERSION @ISA);
  1         1  
  1         53  
8 1     1   561 use Statistics::Sequences 0.10;
  0            
  0            
9             @ISA = qw(Statistics::Sequences);
10             $VERSION = '0.12';
11             use Statistics::Zed 0.072;
12             our $zed = Statistics::Zed->new();
13              
14             =pod
15              
16             =head1 NAME
17              
18             Statistics::Sequences::Pot - Helmut Schmidt's test of force-like runs of a discrete state within a numerical or categorical sequence
19              
20             =head1 SYNOPSIS
21              
22             use strict;
23             use Statistics::Sequences::Pot 0.12; # methods/args here are not compatible with earlier versions
24             my $pot = Statistics::Sequences::Pot->new();
25             $pot->load([qw/2 0 8 5 3 5 2 3 1 1 9 4 4 1 5 5 6 5 8 7 5 3 8 5 6/]); # strings/numbers; or send as "data => $aref" with each stat call
26             my $val = $pot->observed(state => 5); # other methods include: expected(), variance(), obsdev() and stdev()
27             $val = $pot->zscore(state => 5, tails => 2, ccorr => 1); # # or want an array & get back both z- and p-value
28             $val = $pot->p_value(state => 5, tails => 1); # assuming data are loaded; alias: test()
29             my $href = $pot->stats_hash(values => {observed => 1, p_value => 1}, state => 5); # include any other stat-method as needed
30             $pot->dump(values => {observed => 1, expected => 1, p_value => 1}, state => 5, flag => 1, precision_s => 3, precision_p => 7);
31             # prints: observed = 4.310, expected = 4.529, p_value = 0.8090600
32              
33             =head1 DESCRIPTION
34              
35             The Pot statistic measures the bunching relative to the spacing of a single state within a series of other states, conceived by Helmut Schmidt as a targeted "potential" energy (or Pot) that dissipates exponentially between states. It's not limited to considering only clusters of I<consecutive> states (or bunches), as is the case with the more familiar Runs test of sequences.
36              
37             Say you're interested in the occurrence of the state B<3> within an array of digits: note how, in the following arrays, there are increasing breaks between the B<3>s (separated by 0, 1 and then 2 other states):
38              
39             4, 7, 3, 3
40             3, 4, 3, 7
41             3, 8, 1, 3
42              
43             The occurrence of B<3> is, with the Pot-test, of exponentially declining interest across these sequences, given the increasing breaks by other states between the occurrences of 3. The statistic does not ignore these ever remoter occurrences of the state of interest; it accounts for increased spacing between them as if there were an exponentially declining force, a I<pot>ential towards B<3>, within the data-stream (up to a theoretical or empirical asymptote that may be specified).
44              
45             Running the Pot-test involves testing its significance as a standard "z" score; Schmidt (2000) provided data demonstrating Pot's conformance with the normal distribution. This will naturally be improved by repeated sampling, and by using block averages.
46              
47             =head1 METHODS
48              
49             Methods are essentially as described in L<Statistics::Sequences>
50              
51             =head2 new
52              
53             $pot = Statistics::Sequences::Pot->new();
54              
55             Returns a new Pot object. Expects/accepts no arguments but the classname.
56              
57             =head2 load
58              
59             $pot->load(@data); # anonymously
60             $pot->load(\@data);
61             $pot->load('sample1' => \@data); # labelled whatever
62              
63             Loads data anonymously or by name - see L<load|Statistics::Data/load, load_data> in the Statistics::Data manpage for details on the various ways data can be loaded and then retrieved (more than shown here).
64              
65             Data can be categorical or numerical, and multi-valued - i.e, unlike in tests of L<runs|Statistics::Sequences::Runs> and L<joins|Statistics::Sequences::Joins>, they do not have to be dichotomous.
66              
67             =head2 add, read, unload
68              
69             See L<Statistics::Data> for these additional operations on data that have been loaded.
70              
71             =head2 observed, pot_observed, pvo
72              
73             $v = $pot->observed(state => 'x'); # use the first data loaded anonymously; specify a 'state' within it to test its pot
74             $v = $pot->observed(index => 1, state => 'x'); # ... or give the required "index" for the loaded data
75             $v = $pot->observed(label => 'mysequence', state => 'x'); # ... or its "label" value
76             $v = $pot->observed(data => [qw/x z x x p c/], state => 'x'); # ... or just give the data now
77              
78             Returns observed value of pot, a measure of the number and size of bunchings of the state that occurred within the array. The data to calculate this on can already have been L<load|load>ed, or you send it here as a flat referenced array keyed as B<data>. The observed value of pot is based on Schmidt (2000), Equations 6-7, and his Appended program, viz.:
79              
80             =for html <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i>I</i>,<i>J</i>=1..<i>N</i><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&Sigma;&nbsp;&nbsp;<i>r</i><sup>|<i>n</i>(<i>I</i>) - <i>n</i>(<i>J</i>)|</sup><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i>I</i>&lt;<i>J</i></p>
81              
82             where
83              
84             =for html <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i>r</i> = <i>e</i><sup>&ndash;<i>N</i>/<i>MS</i></sup></p>
85              
86             is the number of observations, and I<S> (for scale) is a constant determining the range I<r> of the potential energy between pairs of I<I> and I<J> states. These values are set as C<name =E<gt> value> pairs, as follows.
87              
88             B<state> => I<string>
89              
90             The state within the data whose bunching is to be tested. This is the only required argument; C<croak> if no state is specified. Returns 0 if this state does not exist in the data.
91              
92             B<scale> => I<numeric> E<gt>= 1
93              
94             Optionally, the scale of the range parameter, which should be greater than or equal to 1. Default = 1; values less than 1 are effected as 1.
95              
96             In most situations, should all states be equiprobable, or their probability be proportionate to their number, I<r> would reflect the average distance, or delay, between I<successive> states, equal to the number of all observations divided by the number of states. For example, if there were 10 possible states, and 100 observations have been made, then the probability of re-occurrence of any one of the 10 states within any slot will be equal to 100/10, with I<S> = 1, i.e., expecting that any one of the states would mostly occur by a spacing of 10, and then by an exponentially declining tendency toward consecutive occurrence. In this way, with I<S> = 1, Pot is a measure of "short-range bunching," as Schmidt called it. Bunching over a larger range than this minimally expected range can be measured with I<S> > 1. This is specified, optionally, as the argument named B<scale> to L<test|test>. Hypothesis-testing might be made with respect to various values of the B<scale> parameter.
97              
98             =cut
99              
100             sub observed {# measure pot in the given data for given state, Schmidt (2000) Equations 6-7:
101             my ($m, $n, $r, $scale, $state, $indices) = _set_terms(@_);
102             my ($pvo, $i, $j) = (0);
103             for $i (1 .. ($n - 1)) {
104             for $j (0 .. $i - 1) {
105             $pvo += $r**abs($indices->[$i] - $indices->[$j]);
106             }
107             }
108             return $pvo;
109             }
110             *pot_observed = \&observed;
111             *pvo = \&observed;
112              
113             =head2 expected, pot_expected, pve
114              
115             $v = $pot->expected(state => 'x'); # or specify loaded data by "index" or "label", or give it as "data" - see observed()
116             $v = $pot->expected(data => [qw/x z x x p c/], state => 'x'); # use these data
117              
118             Returns the theoretically expected value of Pot, given I<N> states among I<M> observations, and I<r> range of clustering within these observations. It is calculated as follows from Schmidt (2000) Eq. 8, given the above definitions.
119              
120             =for html <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Pot = ((<i>N</i>(<i>N</i> &ndash; 1))/(<i>M</i>(<i>M</i> &ndash; 1))) . (<i>r</i>/1 &ndash; <i>r</i>) . (<i>M</i> &ndash; (1/(1 &ndash; <i>r</i>)))</p>
121              
122             =cut
123              
124             sub expected { # calculate expected Pot: Schmidt (2000) Equation 8:
125             my ($m, $n, $r) = _set_terms(@_);
126             my $pve = 0;
127             if ($m > 1 && $r < 1) {
128             $pve = $n * ($n - 1) * $r * ( $m - 1 / (1 - $r) ) / ( $m * ($m - 1) * (1 - $r) );
129             }
130             return $pve;
131             }
132             *pot_expected = \&expected;
133             *pve = \&expected;
134              
135             =head2 variance, pot_variance, pvv
136              
137             $v = $pot->variance(state => 'x'); # or specify loaded data by "index" or "label", or give it as "data" - see observed()
138             $v = $pot->variance(data => [qw/x z x x p c/], state => 'x'); # use these data
139              
140             Returns the variance in the theoretically expected value of pot, given by Schmidt (2000) Equation 9a:
141              
142             =for html <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Variance = (<i>r</i>&sup2;/ (1 &ndash; <i>r</i>&sup2;) . (<i>N</i> / <i>M</i>) . (1 &ndash; (<i>N</i> / <i>M</i>))&sup2; . <i>N</i></p>
143              
144             =cut
145              
146             sub variance {
147             my ($m, $n, $r) = _set_terms(@_);
148             my $var = 0;
149             my $rsq = $r**2;
150             if ($m && $rsq < 1) {
151             $var = ( $rsq * $n**2 * (1 - $n / $m)**2 )
152             /
153             ( $m * (1 - $rsq) );
154             }
155             return $var;
156             }
157             *pot_variance = \&variance;
158             *pvv = \&variance;
159              
160             =head2 obsdev, observed_deviation
161              
162             $v = $pot->obsdev(state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
163             $v = $pot->obsdev(data => [qw/x z x x p c/], state => 'x'); # use these data
164              
165             Returns the deviation of (difference between) observed and expected pot for the loaded/given sequence (I<O> - I<E>).
166              
167             =cut
168              
169             sub obsdev {
170             return observed(@_) - expected(@_);
171             }
172             *observed_deviation = \&obsdev;
173              
174             =head2 stdev, standard_deviation
175              
176             $v = $pot->stdev(state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
177             $v = $pot->stdev(data => [qw/x z x x p c/], state => 'x');
178              
179             Returns square-root of the variance.
180              
181             =cut
182              
183             sub stdev {
184             return sqrt(variance(@_));
185             }
186             *standard_deviation = \&stdev;
187              
188             =head2 z_value, zscore, pot_zscore, pzs
189              
190             $v = $pot->z_value(ccorr => 1, state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
191             $v = $pot->z_value(data => $aref, ccorr => 1, state => 'x');
192             ($zvalue, $pvalue) = $pot->z_value(data => $aref, ccorr => 1, tails => 2, state => 'x'); # same but wanting an array, get the p-value too
193              
194             Returns the zscore from a test of pot deviation, taking the pot expected away from that observed and dividing by the root expected pot variance, by default with a continuity correction to expectation. Called wanting an array, returns the z-value with its I<p>-value for the tails (1 or 2) given.
195              
196             The data to test can already have been L<load|load>ed, or sent directly as an aref keyed as B<data>.
197              
198             Other options are B<precision_s> (for the z_value) and B<precision_p> (for the p_value).
199              
200             =cut
201              
202             sub z_value {
203             my $self = shift;
204             my $args = ref $_[0] ? shift : {@_};
205             my $ccorr = defined $args->{'ccorr'} ? $args->{'ccorr'} : 1;
206             my $tails = $args->{'tails'} || 2;
207             my $precision_s = $args->{'precision_s'};
208             my $precision_p = $args->{'precision_p'};
209             my $pvo = defined $args->{'observed'} ? $args->{'observed'} : $self->pvo($args);
210            
211             my ($zval, $pval) = $zed->zscore(
212             observed => $pvo,
213             expected => $self->pve($args),
214             variance => $self->pvv($args),
215             ccorr => $ccorr,
216             tails => $tails,
217             precision_s => $precision_s,
218             precision_p => $precision_p,
219             );
220             return wantarray ? ($zval, $pval) : $zval;
221             }
222             *pzs = \&z_value;
223             *pot_zscore = \&z_value;
224             *zscore = \&z_value;
225              
226             =head2 p_value, test, pot_test, ptt
227              
228             $p = $pot->p_value(state => 'x'); # using loaded data and default args
229             $p = $pot->p_value(ccorr => 0|1, tails => 1|2, state => 'x'); # normal-approximation based on loaded data
230             $p = $pot->p_value(data => $aref, ccorr => 1, tails => 2, state => 'x'); # using given data (by-passing load and read)
231              
232             Test the currently loaded data for significance of the vale of pot. Returns the zscore from test of pot deviation, or, called wanting an array, the z-value with its I<p>-value for the tails (1 or 2) given.
233              
234             =cut
235              
236             sub p_value {
237             return (z_value(@_))[1];
238             }
239             *test = \&p_value;
240             *pot_test = \&p_value;
241             *ptt = \&p_value;
242              
243              
244             =head2 stats_hash
245              
246             $href = $pot->stats_hash(values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1}, prob => .5, ccorr => 1);
247              
248             Returns a hashref for the counts and stats as specified by hashref in its "values" argument, and with any options for calculating them. See L<Statistics::Sequences/stats_hash> for details.
249              
250             =head2 dump
251              
252             $pot->dump(values => { observed => 1, variance => 1, p_value => 1}, exact => 1, flag => 1, precision_s => 3); # among other options
253              
254             Print Pot-test results to STDOUT. See L<dump|Statistics::Sequences/dump> in the Statistics::Sequences manpage for details.
255              
256             =cut
257              
258             sub dump {
259             my $self = shift;
260             my $args = ref $_[0] ? $_[0] : {@_};
261             $args->{'stat'} = 'pot';
262             $args->{'stat'} .= " ($args->{'state'})" if defined $args->{'state'};
263             $self->SUPER::dump($args);
264             #No. of bunches of state '$self->{'state'}' = " . $self->{'bunches'}->count() .
265             # ', with a mean length of '. sprintf('%.2f', $self->{'bunches'}->mean()) .
266             # ', and a mean spacing of '. sprintf('%.2f', $self->{'spaces'}->mean()) ." between each bunch.\n" if $self->{'bunches'};
267             # $args->{'title'} .= ' (Calculated with a range of ' . sprintf('%.2f', $self->{'range'}) . " over a scale of $self->{'scale'})";
268             }
269              
270             sub _set_terms {
271             my $self = shift;
272             my $args = ref $_[0] ? $_[0] : {@_};
273             my $data = ref $args->{'data'} ? $args->{'data'} : $self->read($args);
274             my ($m, $n, $r, $scale, $state, $indices, $bunches, $spaces) = ();
275             $m = scalar(@{$data});
276             #$m = defined $args->{'trials'} ? $args->{'trials'} : scalar(@{$data});
277             #if (ref $args->{'indices'}) { # defunct by-pass from having data
278             # $n = scalar @{$args->{'indices'}};
279             # $indices = $args->{'indices'};
280             #}
281             #else {
282             $state = defined $args->{'state'} ? $args->{'state'} : croak __PACKAGE__, '::test A state for pot-testing is needed';
283             ($indices, $bunches, $spaces) = _state_indices($data, $state, $m);
284             $n = scalar(@{$indices});
285             #}
286             $scale = (!$args->{'scale'} or $args->{'scale'} < 1) ? 1 : $args->{'scale'}; # assume scale = 1 if not specified or invalid
287             $r = _range($n, $m, $scale);
288             return ($m, $n, $r, $scale, $state, $indices, $bunches, $spaces);
289             }
290              
291             sub _range { # init range parameter
292             my ($n, $m, $scale) = @_;
293             return exp(-$n / $m * $scale);
294             }
295              
296             sub _state_indices {# Init an array holding the indices at which the state appears in the given data:
297             my ($data, $state, $m) = @_;
298             # Meanwhile, build arrays of bunch and space frequencies, should this be requested:
299             my ($i, $j, $k, @indices, @bunches, @spaces) = (0, 0, 0);
300             for ($i = 0; $i < $m; $i++) {
301             # Allow for matching numerical or string values:
302             if ($data->[$i] eq $state) {
303             $k++ if $spaces[$k];
304             $j++ if ( scalar @indices ) and ( $indices[-1] != ($i - 1) );
305             $bunches[$j]++;
306             push @indices, $i;
307             }
308             else {
309             $spaces[$k]++;
310             }
311             }
312             return (\@indices, \@bunches, \@spaces);
313             }
314              
315             1;
316              
317             __END__
318              
319             =head1 EXAMPLE
320              
321             Using Pot as a test of bunching of a particular state within a collection of quasi-random events.
322              
323             use strict;
324             use Statistics::Sequences::Pot;
325             my ($i, @data) = ();
326             # Init random integers ranging from 0 to 15:
327             for ($i = 0; $i < 960; $i++) { $data[$i] = int(rand(16)); }
328             # Assess degree of bunching within these data with respect to a randomly selected target state:
329             my $state = int(rand(16));
330             my $pot = Statistics::Sequences::Pot->new();
331             $pot->load(\@data);
332             my %args = (state => $state, values => {p_value => 1, observed => 1, expected => 1, stdev => 1});
333             my $statsref = $pot->stats_hash(\%args);
334             # Access the results of this analysis:
335             print "The probability of obtaining as much bunching of $state as observed is $statsref->{'p_value'}.\n";
336             print "The observed value of Pot was $statsref->{'observed'}, with expected value $statsref->{'expected'} ($statsref->{'stdev'}).\n";
337             # or print the lot, and more, in English:
338             $pot->dump(%args, precision_s => 3, precision_p => 7,);
339              
340             =head1 REFERENCES
341              
342             Schmidt, H. (2000). A proposed measure for psi-induced bunching of randomly spaced events. I<Journal of Parapsychology>, I<64,> 301-316.
343              
344             =head1 SEE ALSO
345              
346             L<http://www.fourmilab.ch/rpkp/> for Schmidt's many papers on the physical conceptualisation and properties of psi.
347              
348             L<Statistics::Descriptive|Statistics::Descriptive> : The present module adds data to "Full" objects of this package in order to access descriptives re bunches and spaces.
349              
350             L<Statistics::Frequency|Statistics::Frequency> : the C<proportional_frequency()> method in this module could be informative when working with data of the kind used here.
351              
352             =head1 BUGS/LIMITATIONS
353              
354             No computational bugs as yet identfied. Hopefully this will change, given time.
355              
356             Limitations of the code, perhaps, concern the non-unique storage of data arrays (compared to, say, C<Statistics::DependantTTest>, but see C<Statistics::TTest>). This would require a unique name for each array of data, and explicit reference to one or another array with each L<test|test> (when, perhaps, you'd have only one data-set, after all). In any case, the data are accepted as array references.
357              
358             =head1 REVISION HISTORY
359              
360             See CHANGES in installation dist for revisions.
361              
362             =head1 AUTHOR/LICENSE
363              
364             =over 4
365              
366             =item Copyright (c) 2006-2013 Roderick Garton
367              
368             rgarton AT cpan DOT org
369              
370             This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see L<http://www.perl.com/perl/misc/Artistic.html>).
371              
372             =item Disclaimer
373              
374             To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.
375              
376             =back
377              
378             =head1 END
379              
380             This ends documentation of a Perl implementation of Helmut Schmidt's test of pot (potential energy) of occurrence of a state among others in a categorical sequence.
381              
382             =cut