File Coverage

blib/lib/Statistics/Sequences/Pot.pm

Criterion	Covered	Total	%
statement	16	18	88.8
branch			n/a
condition			n/a
subroutine	6	6	100.0
pod			n/a
total	22	24	91.6

line	stmt	sub	time	code
1				package Statistics::Sequences::Pot;
2
3	1	1	21515	use 5.008008;
	1		4
	1		32
4	1	1	5	use strict;
	1		2
	1		30
5	1	1	4	use warnings;
	1		5
	1		25
6	1	1	5	use Carp 'croak';
	1		1
	1		65
7	1	1	5	use vars qw($VERSION @ISA);
	1		1
	1		53
8	1	1	561	use Statistics::Sequences 0.10;
	0
	0
9				@ISA = qw(Statistics::Sequences);
10				$VERSION = '0.12';
11				use Statistics::Zed 0.072;
12				our $zed = Statistics::Zed->new();
13
14				=pod
15
16				=head1 NAME
17
18				Statistics::Sequences::Pot - Helmut Schmidt's test of force-like runs of a discrete state within a numerical or categorical sequence
19
20				=head1 SYNOPSIS
21
22				use strict;
23				use Statistics::Sequences::Pot 0.12; # methods/args here are not compatible with earlier versions
24				my $pot = Statistics::Sequences::Pot->new();
25				$pot->load([qw/2 0 8 5 3 5 2 3 1 1 9 4 4 1 5 5 6 5 8 7 5 3 8 5 6/]); # strings/numbers; or send as "data => $aref" with each stat call
26				my $val = $pot->observed(state => 5); # other methods include: expected(), variance(), obsdev() and stdev()
27				$val = $pot->zscore(state => 5, tails => 2, ccorr => 1); # # or want an array & get back both z- and p-value
28				$val = $pot->p_value(state => 5, tails => 1); # assuming data are loaded; alias: test()
29				my $href = $pot->stats_hash(values => {observed => 1, p_value => 1}, state => 5); # include any other stat-method as needed
30				$pot->dump(values => {observed => 1, expected => 1, p_value => 1}, state => 5, flag => 1, precision_s => 3, precision_p => 7);
31				# prints: observed = 4.310, expected = 4.529, p_value = 0.8090600
32
33				=head1 DESCRIPTION
34
35				The Pot statistic measures the bunching relative to the spacing of a single state within a series of other states, conceived by Helmut Schmidt as a targeted "potential" energy (or Pot) that dissipates exponentially between states. It's not limited to considering only clusters of I<consecutive> states (or bunches), as is the case with the more familiar Runs test of sequences.
36
37				Say you're interested in the occurrence of the state B<3> within an array of digits: note how, in the following arrays, there are increasing breaks between the B<3>s (separated by 0, 1 and then 2 other states):
38
39				4, 7, 3, 3
40				3, 4, 3, 7
41				3, 8, 1, 3
42
43				The occurrence of B<3> is, with the Pot-test, of exponentially declining interest across these sequences, given the increasing breaks by other states between the occurrences of 3. The statistic does not ignore these ever remoter occurrences of the state of interest; it accounts for increased spacing between them as if there were an exponentially declining force, a I<pot>ential towards B<3>, within the data-stream (up to a theoretical or empirical asymptote that may be specified).
44
45				Running the Pot-test involves testing its significance as a standard "z" score; Schmidt (2000) provided data demonstrating Pot's conformance with the normal distribution. This will naturally be improved by repeated sampling, and by using block averages.
46
47				=head1 METHODS
48
49				Methods are essentially as described in L<Statistics::Sequences>
50
51				=head2 new
52
53				$pot = Statistics::Sequences::Pot->new();
54
55				Returns a new Pot object. Expects/accepts no arguments but the classname.
56
57				=head2 load
58
59				$pot->load(@data); # anonymously
60				$pot->load(\@data);
61				$pot->load('sample1' => \@data); # labelled whatever
62
63				Loads data anonymously or by name - see L<load\|Statistics::Data/load, load_data> in the Statistics::Data manpage for details on the various ways data can be loaded and then retrieved (more than shown here).
64
65				Data can be categorical or numerical, and multi-valued - i.e, unlike in tests of L<runs\|Statistics::Sequences::Runs> and L<joins\|Statistics::Sequences::Joins>, they do not have to be dichotomous.
66
67				=head2 add, read, unload
68
69				See L<Statistics::Data> for these additional operations on data that have been loaded.
70
71				=head2 observed, pot_observed, pvo
72
73				$v = $pot->observed(state => 'x'); # use the first data loaded anonymously; specify a 'state' within it to test its pot
74				$v = $pot->observed(index => 1, state => 'x'); # ... or give the required "index" for the loaded data
75				$v = $pot->observed(label => 'mysequence', state => 'x'); # ... or its "label" value
76				$v = $pot->observed(data => [qw/x z x x p c/], state => 'x'); # ... or just give the data now
77
78				Returns observed value of pot, a measure of the number and size of bunchings of the state that occurred within the array. The data to calculate this on can already have been L<load\|load>ed, or you send it here as a flat referenced array keyed as B<data>. The observed value of pot is based on Schmidt (2000), Equations 6-7, and his Appended program, viz.:
79
80				=for html <p>            <i>I</i>,<i>J</i>=1..<i>N</i><br>             Σ  <i>r</i><sup>\|<i>n</i>(<i>I</i>) - <i>n</i>(<i>J</i>)\|</sup><br>            <i>I</i><<i>J</i></p>
81
82				where
83
84				=for html <p>            <i>r</i> = <i>e</i><sup>–<i>N</i>/<i>MS</i></sup></p>
85
86				is the number of observations, and I<S> (for scale) is a constant determining the range I<r> of the potential energy between pairs of I<I> and I<J> states. These values are set as C<name =E<gt> value> pairs, as follows.
87
88				B<state> => I<string>
89
90				The state within the data whose bunching is to be tested. This is the only required argument; C<croak> if no state is specified. Returns 0 if this state does not exist in the data.
91
92				B<scale> => I<numeric> E<gt>= 1
93
94				Optionally, the scale of the range parameter, which should be greater than or equal to 1. Default = 1; values less than 1 are effected as 1.
95
96				In most situations, should all states be equiprobable, or their probability be proportionate to their number, I<r> would reflect the average distance, or delay, between I<successive> states, equal to the number of all observations divided by the number of states. For example, if there were 10 possible states, and 100 observations have been made, then the probability of re-occurrence of any one of the 10 states within any slot will be equal to 100/10, with I<S> = 1, i.e., expecting that any one of the states would mostly occur by a spacing of 10, and then by an exponentially declining tendency toward consecutive occurrence. In this way, with I<S> = 1, Pot is a measure of "short-range bunching," as Schmidt called it. Bunching over a larger range than this minimally expected range can be measured with I<S> > 1. This is specified, optionally, as the argument named B<scale> to L<test\|test>. Hypothesis-testing might be made with respect to various values of the B<scale> parameter.
97
98				=cut
99
100				sub observed {# measure pot in the given data for given state, Schmidt (2000) Equations 6-7:
101				my ($m, $n, $r, $scale, $state, $indices) = _set_terms(@_);
102				my ($pvo, $i, $j) = (0);
103				for $i (1 .. ($n - 1)) {
104				for $j (0 .. $i - 1) {
105				$pvo += $r**abs($indices->[$i] - $indices->[$j]);
106				}
107				}
108				return $pvo;
109				}
110				*pot_observed = \&observed;
111				*pvo = \&observed;
112
113				=head2 expected, pot_expected, pve
114
115				$v = $pot->expected(state => 'x'); # or specify loaded data by "index" or "label", or give it as "data" - see observed()
116				$v = $pot->expected(data => [qw/x z x x p c/], state => 'x'); # use these data
117
118				Returns the theoretically expected value of Pot, given I<N> states among I<M> observations, and I<r> range of clustering within these observations. It is calculated as follows from Schmidt (2000) Eq. 8, given the above definitions.
119
120				=for html <p>              Pot = ((<i>N</i>(<i>N</i> – 1))/(<i>M</i>(<i>M</i> – 1))) . (<i>r</i>/1 – <i>r</i>) . (<i>M</i> – (1/(1 – <i>r</i>)))</p>
121
122				=cut
123
124				sub expected { # calculate expected Pot: Schmidt (2000) Equation 8:
125				my ($m, $n, $r) = _set_terms(@_);
126				my $pve = 0;
127				if ($m > 1 && $r < 1) {
128				$pve = $n * ($n - 1) * $r * ( $m - 1 / (1 - $r) ) / ( $m * ($m - 1) * (1 - $r) );
129				}
130				return $pve;
131				}
132				*pot_expected = \&expected;
133				*pve = \&expected;
134
135				=head2 variance, pot_variance, pvv
136
137				$v = $pot->variance(state => 'x'); # or specify loaded data by "index" or "label", or give it as "data" - see observed()
138				$v = $pot->variance(data => [qw/x z x x p c/], state => 'x'); # use these data
139
140				Returns the variance in the theoretically expected value of pot, given by Schmidt (2000) Equation 9a:
141
142				=for html <p>              Variance = (<i>r</i>²/ (1 – <i>r</i>²) . (<i>N</i> / <i>M</i>) . (1 – (<i>N</i> / <i>M</i>))² . <i>N</i></p>
143
144				=cut
145
146				sub variance {
147				my ($m, $n, $r) = _set_terms(@_);
148				my $var = 0;
149				my $rsq = $r**2;
150				if ($m && $rsq < 1) {
151				$var = ( $rsq * $n*2 (1 - $n / $m)**2 )
152				/
153				( $m * (1 - $rsq) );
154				}
155				return $var;
156				}
157				*pot_variance = \&variance;
158				*pvv = \&variance;
159
160				=head2 obsdev, observed_deviation
161
162				$v = $pot->obsdev(state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
163				$v = $pot->obsdev(data => [qw/x z x x p c/], state => 'x'); # use these data
164
165				Returns the deviation of (difference between) observed and expected pot for the loaded/given sequence (I<O> - I<E>).
166
167				=cut
168
169				sub obsdev {
170				return observed(@_) - expected(@_);
171				}
172				*observed_deviation = \&obsdev;
173
174				=head2 stdev, standard_deviation
175
176				$v = $pot->stdev(state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
177				$v = $pot->stdev(data => [qw/x z x x p c/], state => 'x');
178
179				Returns square-root of the variance.
180
181				=cut
182
183				sub stdev {
184				return sqrt(variance(@_));
185				}
186				*standard_deviation = \&stdev;
187
188				=head2 z_value, zscore, pot_zscore, pzs
189
190				$v = $pot->z_value(ccorr => 1, state => 'x'); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
191				$v = $pot->z_value(data => $aref, ccorr => 1, state => 'x');
192				($zvalue, $pvalue) = $pot->z_value(data => $aref, ccorr => 1, tails => 2, state => 'x'); # same but wanting an array, get the p-value too
193
194				Returns the zscore from a test of pot deviation, taking the pot expected away from that observed and dividing by the root expected pot variance, by default with a continuity correction to expectation. Called wanting an array, returns the z-value with its I<p>-value for the tails (1 or 2) given.
195
196				The data to test can already have been L<load\|load>ed, or sent directly as an aref keyed as B<data>.
197
198				Other options are B<precision_s> (for the z_value) and B<precision_p> (for the p_value).
199
200				=cut
201
202				sub z_value {
203				my $self = shift;
204				my $args = ref $_[0] ? shift : {@_};
205				my $ccorr = defined $args->{'ccorr'} ? $args->{'ccorr'} : 1;
206				my $tails = $args->{'tails'} \|\| 2;
207				my $precision_s = $args->{'precision_s'};
208				my $precision_p = $args->{'precision_p'};
209				my $pvo = defined $args->{'observed'} ? $args->{'observed'} : $self->pvo($args);
210
211				my ($zval, $pval) = $zed->zscore(
212				observed => $pvo,
213				expected => $self->pve($args),
214				variance => $self->pvv($args),
215				ccorr => $ccorr,
216				tails => $tails,
217				precision_s => $precision_s,
218				precision_p => $precision_p,
219				);
220				return wantarray ? ($zval, $pval) : $zval;
221				}
222				*pzs = \&z_value;
223				*pot_zscore = \&z_value;
224				*zscore = \&z_value;
225
226				=head2 p_value, test, pot_test, ptt
227
228				$p = $pot->p_value(state => 'x'); # using loaded data and default args
229				$p = $pot->p_value(ccorr => 0\|1, tails => 1\|2, state => 'x'); # normal-approximation based on loaded data
230				$p = $pot->p_value(data => $aref, ccorr => 1, tails => 2, state => 'x'); # using given data (by-passing load and read)
231
232				Test the currently loaded data for significance of the vale of pot. Returns the zscore from test of pot deviation, or, called wanting an array, the z-value with its I<p>-value for the tails (1 or 2) given.
233
234				=cut
235
236				sub p_value {
237				return (z_value(@_))[1];
238				}
239				*test = \&p_value;
240				*pot_test = \&p_value;
241				*ptt = \&p_value;
242
243
244				=head2 stats_hash
245
246				$href = $pot->stats_hash(values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1}, prob => .5, ccorr => 1);
247
248				Returns a hashref for the counts and stats as specified by hashref in its "values" argument, and with any options for calculating them. See L<Statistics::Sequences/stats_hash> for details.
249
250				=head2 dump
251
252				$pot->dump(values => { observed => 1, variance => 1, p_value => 1}, exact => 1, flag => 1, precision_s => 3); # among other options
253
254				Print Pot-test results to STDOUT. See L<dump\|Statistics::Sequences/dump> in the Statistics::Sequences manpage for details.
255
256				=cut
257
258				sub dump {
259				my $self = shift;
260				my $args = ref $_[0] ? $_[0] : {@_};
261				$args->{'stat'} = 'pot';
262				$args->{'stat'} .= " ($args->{'state'})" if defined $args->{'state'};
263				$self->SUPER::dump($args);
264				#No. of bunches of state '$self->{'state'}' = " . $self->{'bunches'}->count() .
265				# ', with a mean length of '. sprintf('%.2f', $self->{'bunches'}->mean()) .
266				# ', and a mean spacing of '. sprintf('%.2f', $self->{'spaces'}->mean()) ." between each bunch.\n" if $self->{'bunches'};
267				# $args->{'title'} .= ' (Calculated with a range of ' . sprintf('%.2f', $self->{'range'}) . " over a scale of $self->{'scale'})";
268				}
269
270				sub _set_terms {
271				my $self = shift;
272				my $args = ref $_[0] ? $_[0] : {@_};
273				my $data = ref $args->{'data'} ? $args->{'data'} : $self->read($args);
274				my ($m, $n, $r, $scale, $state, $indices, $bunches, $spaces) = ();
275				$m = scalar(@{$data});
276				#$m = defined $args->{'trials'} ? $args->{'trials'} : scalar(@{$data});
277				#if (ref $args->{'indices'}) { # defunct by-pass from having data
278				# $n = scalar @{$args->{'indices'}};
279				# $indices = $args->{'indices'};
280				#}
281				#else {
282				$state = defined $args->{'state'} ? $args->{'state'} : croak __PACKAGE__, '::test A state for pot-testing is needed';
283				($indices, $bunches, $spaces) = _state_indices($data, $state, $m);
284				$n = scalar(@{$indices});
285				#}
286				$scale = (!$args->{'scale'} or $args->{'scale'} < 1) ? 1 : $args->{'scale'}; # assume scale = 1 if not specified or invalid
287				$r = _range($n, $m, $scale);
288				return ($m, $n, $r, $scale, $state, $indices, $bunches, $spaces);
289				}
290
291				sub _range { # init range parameter
292				my ($n, $m, $scale) = @_;
293				return exp(-$n / $m * $scale);
294				}
295
296				sub _state_indices {# Init an array holding the indices at which the state appears in the given data:
297				my ($data, $state, $m) = @_;
298				# Meanwhile, build arrays of bunch and space frequencies, should this be requested:
299				my ($i, $j, $k, @indices, @bunches, @spaces) = (0, 0, 0);
300				for ($i = 0; $i < $m; $i++) {
301				# Allow for matching numerical or string values:
302				if ($data->[$i] eq $state) {
303				$k++ if $spaces[$k];
304				$j++ if ( scalar @indices ) and ( $indices[-1] != ($i - 1) );
305				$bunches[$j]++;
306				push @indices, $i;
307				}
308				else {
309				$spaces[$k]++;
310				}
311				}
312				return (\@indices, \@bunches, \@spaces);
313				}
314
315				1;
316
317				__END__
318
319				=head1 EXAMPLE
320
321				Using Pot as a test of bunching of a particular state within a collection of quasi-random events.
322
323				use strict;
324				use Statistics::Sequences::Pot;
325				my ($i, @data) = ();
326				# Init random integers ranging from 0 to 15:
327				for ($i = 0; $i < 960; $i++) { $data[$i] = int(rand(16)); }
328				# Assess degree of bunching within these data with respect to a randomly selected target state:
329				my $state = int(rand(16));
330				my $pot = Statistics::Sequences::Pot->new();
331				$pot->load(\@data);
332				my %args = (state => $state, values => {p_value => 1, observed => 1, expected => 1, stdev => 1});
333				my $statsref = $pot->stats_hash(\%args);
334				# Access the results of this analysis:
335				print "The probability of obtaining as much bunching of $state as observed is $statsref->{'p_value'}.\n";
336				print "The observed value of Pot was $statsref->{'observed'}, with expected value $statsref->{'expected'} ($statsref->{'stdev'}).\n";
337				# or print the lot, and more, in English:
338				$pot->dump(%args, precision_s => 3, precision_p => 7,);
339
340				=head1 REFERENCES
341
342				Schmidt, H. (2000). A proposed measure for psi-induced bunching of randomly spaced events. I<Journal of Parapsychology>, I<64,> 301-316.
343
344				=head1 SEE ALSO
345
346				L<http://www.fourmilab.ch/rpkp/> for Schmidt's many papers on the physical conceptualisation and properties of psi.
347
348				L<Statistics::Descriptive\|Statistics::Descriptive> : The present module adds data to "Full" objects of this package in order to access descriptives re bunches and spaces.
349
350				L<Statistics::Frequency\|Statistics::Frequency> : the C<proportional_frequency()> method in this module could be informative when working with data of the kind used here.
351
352				=head1 BUGS/LIMITATIONS
353
354				No computational bugs as yet identfied. Hopefully this will change, given time.
355
356				Limitations of the code, perhaps, concern the non-unique storage of data arrays (compared to, say, C<Statistics::DependantTTest>, but see C<Statistics::TTest>). This would require a unique name for each array of data, and explicit reference to one or another array with each L<test\|test> (when, perhaps, you'd have only one data-set, after all). In any case, the data are accepted as array references.
357
358				=head1 REVISION HISTORY
359
360				See CHANGES in installation dist for revisions.
361
362				=head1 AUTHOR/LICENSE
363
364				=over 4
365
366				=item Copyright (c) 2006-2013 Roderick Garton
367
368				rgarton AT cpan DOT org
369
370				This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see L<http://www.perl.com/perl/misc/Artistic.html>).
371
372				=item Disclaimer
373
374				To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.
375
376				=back
377
378				=head1 END
379
380				This ends documentation of a Perl implementation of Helmut Schmidt's test of pot (potential energy) of occurrence of a state among others in a categorical sequence.
381
382				=cut