line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package Data::Validator::Item;
|
2
|
|
|
|
|
|
|
|
3
|
|
|
|
|
|
|
=head1 NAME
|
4
|
|
|
|
|
|
|
|
5
|
|
|
|
|
|
|
Data::Validator::Item Factory Class to validate data items
|
6
|
|
|
|
|
|
|
|
7
|
|
|
|
|
|
|
=head1 DESCRIPTION
|
8
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
This is an attempt to create an object which will permit semi-automatic verification of a data value.
|
10
|
|
|
|
|
|
|
|
11
|
|
|
|
|
|
|
=head1 SYNOPSIS
|
12
|
|
|
|
|
|
|
|
13
|
|
|
|
|
|
|
use Data::Validator::Item;
|
14
|
|
|
|
|
|
|
my $item = Data::Validator::Item->new(); #Create a new Data::Validator::Item, called $item.
|
15
|
|
|
|
|
|
|
|
16
|
|
|
|
|
|
|
#Set values
|
17
|
|
|
|
|
|
|
$item->name('fred');
|
18
|
|
|
|
|
|
|
$item->values([1,2,3]); or $item->values(\@array);
|
19
|
|
|
|
|
|
|
$item->missing('*'); or $item->missing(''); #undef is unlikely to be sensible!
|
20
|
|
|
|
|
|
|
$item->min(0); $item->max(100);
|
21
|
|
|
|
|
|
|
$item->verify($reference_to_subroutine); #Used in the $item->validate() function
|
22
|
|
|
|
|
|
|
$item->transform($reference_to_subroutine); #Used in the $item->put() function
|
23
|
|
|
|
|
|
|
|
24
|
|
|
|
|
|
|
#Get values
|
25
|
|
|
|
|
|
|
my $name = $item->name();
|
26
|
|
|
|
|
|
|
my @values = $item->values();
|
27
|
|
|
|
|
|
|
my $missing = $item->missing();
|
28
|
|
|
|
|
|
|
etc...
|
29
|
|
|
|
|
|
|
|
30
|
|
|
|
|
|
|
#Use it..
|
31
|
|
|
|
|
|
|
$item->validate(); #Returns 1 for success, 0 for failure
|
32
|
|
|
|
|
|
|
$item->error(); #Returns the correct error message
|
33
|
|
|
|
|
|
|
$item->put();
|
34
|
|
|
|
|
|
|
|
35
|
|
|
|
|
|
|
=head1 USAGE
|
36
|
|
|
|
|
|
|
|
37
|
|
|
|
|
|
|
Many people work with data organised as records, each containing
|
38
|
|
|
|
|
|
|
(potentially many) variables. It is often necessary to process files
|
39
|
|
|
|
|
|
|
of such records, and to test every variable within every record to ensure that
|
40
|
|
|
|
|
|
|
each one is valid. I do this before putting data from very large flat files into my databases.
|
41
|
|
|
|
|
|
|
For each variable I had a need to define specific, sometimes complex rules for validity,
|
42
|
|
|
|
|
|
|
then implement them, and check them. This is what Data::Validator::Item is for.
|
43
|
|
|
|
|
|
|
|
44
|
|
|
|
|
|
|
Note carefully that Data::Validator::Item handles only one scalar vlaue at a time. This
|
45
|
|
|
|
|
|
|
value could come from a file, a database, an array, a hash or your granny's parrot.
|
46
|
|
|
|
|
|
|
Data::Validator::Item doesn't care.
|
47
|
|
|
|
|
|
|
|
48
|
|
|
|
|
|
|
I use Data::Validator::Item as follows. I create one for every named variable in my
|
49
|
|
|
|
|
|
|
data file. In many real applications most of this setup can be done by looping
|
50
|
|
|
|
|
|
|
over a list of variable names, creating many Data::Validator::Items each named for
|
51
|
|
|
|
|
|
|
the corresponding variable. Common features, like missing values, and names
|
52
|
|
|
|
|
|
|
can be set in this loop. Specifics, like values(), min(), max(), verify() and so on
|
53
|
|
|
|
|
|
|
can be set individually. I then create a hash to hold all of the Data::Validator::Items for
|
54
|
|
|
|
|
|
|
a particular data source, The keys of this hash are the names of the variables,
|
55
|
|
|
|
|
|
|
and the values are the Data:Validators themselves.
|
56
|
|
|
|
|
|
|
Y.M.M.V.
|
57
|
|
|
|
|
|
|
|
58
|
|
|
|
|
|
|
=head1 ROLE
|
59
|
|
|
|
|
|
|
|
60
|
|
|
|
|
|
|
A Data::Validator::Item exists (almost) solely to create two functions - validate() and put().
|
61
|
|
|
|
|
|
|
They make it easy to apply complex tests for 'validity' to data.
|
62
|
|
|
|
|
|
|
|
63
|
|
|
|
|
|
|
Typically you will set up many of these, one per variable, once at the start
|
64
|
|
|
|
|
|
|
of a program, and you then use them to validate() and put() each individual item of data.
|
65
|
|
|
|
|
|
|
Data::Validator::Item neither knows nor cares where the data comes from, you just feed data
|
66
|
|
|
|
|
|
|
items to the correct ->validate() and ->put() one at at time, and they get checked.
|
67
|
|
|
|
|
|
|
|
68
|
|
|
|
|
|
|
There is no useful way to check the values of a variable depending on the values
|
69
|
|
|
|
|
|
|
of another variable in the same record. This is a different problem, one which could
|
70
|
|
|
|
|
|
|
be approached with Data::Validator::Record, if it existed. Feel free to write it. I hope to
|
71
|
|
|
|
|
|
|
get around to this in 2003.
|
72
|
|
|
|
|
|
|
|
73
|
|
|
|
|
|
|
=head2 PROBLEM ADDRESSED
|
74
|
|
|
|
|
|
|
|
75
|
|
|
|
|
|
|
A fairly common problem in my work is the following:
|
76
|
|
|
|
|
|
|
I get a data file, which has been created, often using Excel or Access. It is
|
77
|
|
|
|
|
|
|
riddled with errors, because it wasn't checked at all during data
|
78
|
|
|
|
|
|
|
entry. (I'm a *very* good data entry person, and I make about
|
79
|
|
|
|
|
|
|
1 mistake per 100 data items.)
|
80
|
|
|
|
|
|
|
|
81
|
|
|
|
|
|
|
Before I can use it I need to check the actual values in the data file.
|
82
|
|
|
|
|
|
|
Typically my clients don't know exactly what the legitimate values are for
|
83
|
|
|
|
|
|
|
each variable. For example a variable called 'sex' is supposed to be 0 or 1,
|
84
|
|
|
|
|
|
|
(female or male) and there are actually 140 '2's in the data set. On enquiry,
|
85
|
|
|
|
|
|
|
it turns out that 2 is the missing value for that variable. (Of course for
|
86
|
|
|
|
|
|
|
other variables in the data set the missing value might be '3', or '8' or
|
87
|
|
|
|
|
|
|
'-' or '*' or just blank).
|
88
|
|
|
|
|
|
|
|
89
|
|
|
|
|
|
|
I need to check every individual value in every record in a file,
|
90
|
|
|
|
|
|
|
against the values it is supposed to have, and I also often need to
|
91
|
|
|
|
|
|
|
change a variable, so that I can stuff it into a database. Clearly these two
|
92
|
|
|
|
|
|
|
tasks are closely related, and so I wrote a module which can do both,
|
93
|
|
|
|
|
|
|
if you want. Let me have your views on this decision.
|
94
|
|
|
|
|
|
|
|
95
|
|
|
|
|
|
|
=cut
|
96
|
|
|
|
|
|
|
|
97
|
|
|
|
|
|
|
#use stuff
|
98
|
11
|
|
|
11
|
|
270099
|
use strict;
|
|
11
|
|
|
|
|
27
|
|
|
11
|
|
|
|
|
404
|
|
99
|
11
|
|
|
11
|
|
59
|
use Carp;
|
|
11
|
|
|
|
|
20
|
|
|
11
|
|
|
|
|
16500
|
|
100
|
|
|
|
|
|
|
|
101
|
|
|
|
|
|
|
#Package globals
|
102
|
|
|
|
|
|
|
our $VERSION = '0.75';
|
103
|
|
|
|
|
|
|
my $Debugging = 0;
|
104
|
|
|
|
|
|
|
|
105
|
|
|
|
|
|
|
=head1 PUBLIC FUNCTIONS
|
106
|
|
|
|
|
|
|
|
107
|
|
|
|
|
|
|
=head2 new()
|
108
|
|
|
|
|
|
|
|
109
|
|
|
|
|
|
|
The new() function initialises a blank Data::Validator::Item with all of it's contents set
|
110
|
|
|
|
|
|
|
explicitly to undef.
|
111
|
|
|
|
|
|
|
|
112
|
|
|
|
|
|
|
C<< my $item = Data::Validator::Item->new(); >>
|
113
|
|
|
|
|
|
|
|
114
|
|
|
|
|
|
|
=cut
|
115
|
|
|
|
|
|
|
|
116
|
|
|
|
|
|
|
#Initiate the Data::Validator::Item
|
117
|
|
|
|
|
|
|
sub new {
|
118
|
11
|
|
|
11
|
1
|
195
|
my $proto = shift;
|
119
|
11
|
|
33
|
|
|
93
|
my $class = ref($proto) || $proto;
|
120
|
11
|
|
|
|
|
29
|
my $self = {};
|
121
|
|
|
|
|
|
|
#Documentation only
|
122
|
11
|
|
|
|
|
40
|
$self->{NAME} = undef; # Name of the variable or whatever, not currently used
|
123
|
|
|
|
|
|
|
#Used for validation
|
124
|
11
|
|
|
|
|
33
|
$self->{MIN} = undef; # Numerically (or alphabetically) smallest value
|
125
|
11
|
|
|
|
|
32
|
$self->{MAX} = undef; # Numerically (or alphabetically) largest value
|
126
|
11
|
|
|
|
|
29
|
$self->{MATCH} = undef; # Reference to a function matching a regex
|
127
|
11
|
|
|
|
|
26
|
$self->{VALUES} = undef; # Reference to an array of all possible values
|
128
|
11
|
|
|
|
|
29
|
$self->{VERIFY} = undef; # Reference to a function capable of verifying variable e.g. dates
|
129
|
11
|
|
|
|
|
26
|
$self->{LOOKUP} = undef; # Reference to a DBI Satement handle to do lookup on possible values
|
130
|
|
|
|
|
|
|
#Used for validation and transformation
|
131
|
11
|
|
|
|
|
33
|
$self->{MISSING} = undef; # Missing value, accepted as a valid value, and transformed to undef in put()
|
132
|
|
|
|
|
|
|
#Used for transformation only
|
133
|
11
|
|
|
|
|
26
|
$self->{TRANSFORM}= undef; # Reference to a function capable of transforming variable for output
|
134
|
|
|
|
|
|
|
#Used for reporting on errors - Overwritten every time validate() is called
|
135
|
11
|
|
|
|
|
32
|
$self->{ERROR} = undef; # Error message from last failed validation
|
136
|
11
|
|
|
|
|
36
|
bless ($self, $class);
|
137
|
11
|
|
|
|
|
38
|
return $self;
|
138
|
|
|
|
|
|
|
} #End of subroutine new
|
139
|
|
|
|
|
|
|
|
140
|
|
|
|
|
|
|
=head2 zap()
|
141
|
|
|
|
|
|
|
|
142
|
|
|
|
|
|
|
The zap() function re-initialises an existing Data::Validator::Item with all of it's contents reset
|
143
|
|
|
|
|
|
|
explicitly back to undef. This is used in some of the test scripts, but may not have many other uses.
|
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
C<< $item->zap(); >>
|
146
|
|
|
|
|
|
|
|
147
|
|
|
|
|
|
|
=cut
|
148
|
|
|
|
|
|
|
|
149
|
|
|
|
|
|
|
sub zap {
|
150
|
8
|
|
|
8
|
1
|
812
|
my $self = shift;
|
151
|
8
|
|
|
|
|
22
|
$self->{NAME} = undef;
|
152
|
8
|
|
|
|
|
13
|
$self->{MIN} = undef;
|
153
|
8
|
|
|
|
|
15
|
$self->{MAX} = undef;
|
154
|
8
|
|
|
|
|
18
|
$self->{MATCH} = undef;
|
155
|
8
|
|
|
|
|
20
|
$self->{VALUES} = undef;
|
156
|
8
|
|
|
|
|
12
|
$self->{VERIFY} = undef;
|
157
|
8
|
|
|
|
|
13
|
$self->{LOOKUP} = undef;
|
158
|
8
|
|
|
|
|
12
|
$self->{MISSING} = undef;
|
159
|
8
|
|
|
|
|
13
|
$self->{TRANSFORM}= undef;
|
160
|
8
|
|
|
|
|
11
|
$self->{ERROR} = undef;
|
161
|
8
|
|
|
|
|
38
|
return $self;
|
162
|
|
|
|
|
|
|
} #End of subroutine zap
|
163
|
|
|
|
|
|
|
|
164
|
|
|
|
|
|
|
=head1 put() and validate()
|
165
|
|
|
|
|
|
|
|
166
|
|
|
|
|
|
|
These two functions are what Data::Validator::Item is meant to create.
|
167
|
|
|
|
|
|
|
validate() checks a scalar to see if it is acceptable.
|
168
|
|
|
|
|
|
|
put() is used to transform a scalar for otuput
|
169
|
|
|
|
|
|
|
|
170
|
|
|
|
|
|
|
=head2 validate()
|
171
|
|
|
|
|
|
|
|
172
|
|
|
|
|
|
|
validate() takes a scalar, and tests it, using all of the tests which you have
|
173
|
|
|
|
|
|
|
chosen to put into the particular Data::Validator::Item. It returns success (1)
|
174
|
|
|
|
|
|
|
or failure(0) if at least one test fails.
|
175
|
|
|
|
|
|
|
|
176
|
|
|
|
|
|
|
C<< $item->validate($datum); >>
|
177
|
|
|
|
|
|
|
|
178
|
|
|
|
|
|
|
It also sets an appropriate error message as
|
179
|
|
|
|
|
|
|
|
180
|
|
|
|
|
|
|
C<< $item->error(); >>
|
181
|
|
|
|
|
|
|
|
182
|
|
|
|
|
|
|
1 means the item was either ok (passed all tests) *or* the missing value, in other words, acceptable...
|
183
|
|
|
|
|
|
|
0 means that the item failed at least one test. Note that you can't get at how many tests an item
|
184
|
|
|
|
|
|
|
failed, and that the error message relates only to the first test failed by an item.
|
185
|
|
|
|
|
|
|
|
186
|
|
|
|
|
|
|
Do B ignore these return codes when using this module.
|
187
|
|
|
|
|
|
|
|
188
|
|
|
|
|
|
|
=cut
|
189
|
|
|
|
|
|
|
|
190
|
|
|
|
|
|
|
sub validate {
|
191
|
95
|
|
|
95
|
1
|
22996
|
my $self = shift;
|
192
|
95
|
|
|
|
|
120
|
my $datum = shift;
|
193
|
|
|
|
|
|
|
|
194
|
95
|
|
|
|
|
186
|
$self->error(undef);
|
195
|
|
|
|
|
|
|
|
196
|
|
|
|
|
|
|
#Tests placed in approximate order of cost!
|
197
|
|
|
|
|
|
|
|
198
|
95
|
100
|
100
|
|
|
170
|
if (defined($self->missing()) && ($datum eq $self->missing())) {
|
199
|
1
|
|
|
|
|
4
|
$self->error("$datum is missing");
|
200
|
1
|
|
|
|
|
16
|
return 1;};
|
201
|
|
|
|
|
|
|
#It's missing - return validated, and move on
|
202
|
|
|
|
|
|
|
|
203
|
94
|
100
|
|
|
|
201
|
unless (defined($datum)) {
|
204
|
2
|
|
|
|
|
9
|
$self->error("$datum is undefined");
|
205
|
2
|
|
|
|
|
13
|
return 0};
|
206
|
|
|
|
|
|
|
#It's undefined - complain! It shouldn't be.
|
207
|
|
|
|
|
|
|
|
208
|
92
|
100
|
100
|
|
|
163
|
if (defined($self->min()) && ($datum < $self->min())) {
|
209
|
1
|
|
|
|
|
23
|
$self->error("$datum is too small");
|
210
|
1
|
|
|
|
|
4
|
return 0;};
|
211
|
|
|
|
|
|
|
|
212
|
91
|
100
|
100
|
|
|
177
|
if (defined($self->max()) && ($datum > $self->max())) {
|
213
|
1
|
|
|
|
|
11
|
$self->error("$datum is too big");
|
214
|
1
|
|
|
|
|
5
|
return 0;};
|
215
|
|
|
|
|
|
|
#Too big or too small
|
216
|
|
|
|
|
|
|
|
217
|
90
|
100
|
|
|
|
184
|
if (defined($self->match())){
|
218
|
22
|
|
|
|
|
40
|
my $match = $self->match();
|
219
|
22
|
100
|
|
|
|
112
|
if ($datum !~ /$match/){
|
220
|
20
|
|
|
|
|
59
|
$self->error("$datum doesn't match the regex");
|
221
|
20
|
|
|
|
|
643
|
return 0;}
|
222
|
2
|
|
|
|
|
9
|
return 1;} # if defined $self->match()
|
223
|
|
|
|
|
|
|
#Doesn't match the regex supplied
|
224
|
|
|
|
|
|
|
|
225
|
68
|
100
|
|
|
|
699
|
if (defined($self->values())) {
|
226
|
60
|
|
|
|
|
57
|
my %hash = %{ $self->values()};
|
|
60
|
|
|
|
|
614
|
|
227
|
60
|
100
|
|
|
|
186
|
unless (exists $hash{$datum}) {
|
228
|
35
|
|
|
|
|
115
|
$self->error("$datum is not in the list of values");
|
229
|
35
|
|
|
|
|
178
|
return 0;};
|
230
|
|
|
|
|
|
|
};
|
231
|
|
|
|
|
|
|
# Not in the approved list of values
|
232
|
|
|
|
|
|
|
|
233
|
33
|
50
|
|
|
|
79
|
if (defined($self->verify())) {
|
234
|
0
|
|
|
|
|
0
|
my $coderef = $self->verify();
|
235
|
0
|
0
|
|
|
|
0
|
unless (&$coderef($datum)) {
|
236
|
0
|
|
|
|
|
0
|
$self->error("$datum is not verified");
|
237
|
0
|
|
|
|
|
0
|
return 0};
|
238
|
|
|
|
|
|
|
};
|
239
|
|
|
|
|
|
|
#Not confirmed by verification subroutine
|
240
|
|
|
|
|
|
|
|
241
|
33
|
|
|
|
|
138
|
return 1;
|
242
|
|
|
|
|
|
|
# All is well
|
243
|
|
|
|
|
|
|
} #End of subroutine validate
|
244
|
|
|
|
|
|
|
|
245
|
|
|
|
|
|
|
=head2 put()
|
246
|
|
|
|
|
|
|
|
247
|
|
|
|
|
|
|
put() returns the data value,
|
248
|
|
|
|
|
|
|
|
249
|
|
|
|
|
|
|
=over 4
|
250
|
|
|
|
|
|
|
|
251
|
|
|
|
|
|
|
=item *
|
252
|
|
|
|
|
|
|
|
253
|
|
|
|
|
|
|
or the transformed data value by the transform() function provided by you,
|
254
|
|
|
|
|
|
|
|
255
|
|
|
|
|
|
|
=item *
|
256
|
|
|
|
|
|
|
|
257
|
|
|
|
|
|
|
or undef, if the data value was the missing() value.
|
258
|
|
|
|
|
|
|
|
259
|
|
|
|
|
|
|
=back
|
260
|
|
|
|
|
|
|
|
261
|
|
|
|
|
|
|
=cut
|
262
|
|
|
|
|
|
|
|
263
|
|
|
|
|
|
|
sub put {
|
264
|
9
|
|
|
9
|
1
|
14
|
my $self = shift;
|
265
|
9
|
|
|
|
|
10
|
my $datum = shift;
|
266
|
|
|
|
|
|
|
|
267
|
9
|
100
|
66
|
|
|
99
|
if (defined($self->missing()) && ($datum eq $self->missing())) {return undef;};
|
|
1
|
|
|
|
|
9
|
|
268
|
|
|
|
|
|
|
# It's missing
|
269
|
8
|
50
|
|
|
|
14
|
if (defined($self->transform())) {
|
270
|
|
|
|
|
|
|
# It needs to be transformed, and it's not missing
|
271
|
8
|
|
|
|
|
13
|
my $coderef = $self->transform();
|
272
|
8
|
|
|
|
|
18
|
$datum =&$coderef($datum);
|
273
|
8
|
|
|
|
|
67
|
return $datum;
|
274
|
|
|
|
|
|
|
}
|
275
|
|
|
|
|
|
|
#Just pass it through
|
276
|
0
|
|
|
|
|
0
|
return $datum;
|
277
|
|
|
|
|
|
|
} #End of subroutine put
|
278
|
|
|
|
|
|
|
|
279
|
|
|
|
|
|
|
=head1 Get and Set functions
|
280
|
|
|
|
|
|
|
|
281
|
|
|
|
|
|
|
Data::Validator::Item implements a policy to decide on the acceptability or otherwise
|
282
|
|
|
|
|
|
|
of scalar value, and to transform this value for output. The B functions
|
283
|
|
|
|
|
|
|
allow you to define the policy. These functions require an argument. These
|
284
|
|
|
|
|
|
|
functions are most likely to be used when creating a Data::Validator::Item.
|
285
|
|
|
|
|
|
|
|
286
|
|
|
|
|
|
|
The corresponding B functions are intended for use B within the
|
287
|
|
|
|
|
|
|
Data::Validator::Item, when creating the put() and validate() functions. These are the
|
288
|
|
|
|
|
|
|
no argument functions.
|
289
|
|
|
|
|
|
|
|
290
|
|
|
|
|
|
|
=head2 name()
|
291
|
|
|
|
|
|
|
|
292
|
|
|
|
|
|
|
name() sets or gets the name of the Data:Validator - I use this just to remind me, and
|
293
|
|
|
|
|
|
|
I usually set it to the name of the variable. This doesn't get used anywhere else - it's just
|
294
|
|
|
|
|
|
|
icing, but it sure makes debugging easier.
|
295
|
|
|
|
|
|
|
|
296
|
|
|
|
|
|
|
C<< $item->name("Item"); >>
|
297
|
|
|
|
|
|
|
|
298
|
|
|
|
|
|
|
=cut
|
299
|
|
|
|
|
|
|
|
300
|
|
|
|
|
|
|
sub name {
|
301
|
7
|
|
|
7
|
1
|
19
|
my $self = shift;
|
302
|
7
|
100
|
|
|
|
29
|
if (@_) { $self->{NAME} = shift }
|
|
2
|
|
|
|
|
10
|
|
303
|
7
|
|
|
|
|
59
|
return $self->{NAME};
|
304
|
|
|
|
|
|
|
} #End of subroutine name
|
305
|
|
|
|
|
|
|
|
306
|
|
|
|
|
|
|
=head2 error()
|
307
|
|
|
|
|
|
|
|
308
|
|
|
|
|
|
|
error() sets or gets the last error message.
|
309
|
|
|
|
|
|
|
|
310
|
|
|
|
|
|
|
=cut
|
311
|
|
|
|
|
|
|
|
312
|
|
|
|
|
|
|
sub error {
|
313
|
171
|
|
|
171
|
1
|
204
|
my $self = shift;
|
314
|
171
|
100
|
|
|
|
358
|
if (@_) { $self->{ERROR} = shift }
|
|
157
|
|
|
|
|
245
|
|
315
|
171
|
|
|
|
|
299
|
return $self->{ERROR};
|
316
|
|
|
|
|
|
|
} #end of subroutine error
|
317
|
|
|
|
|
|
|
|
318
|
|
|
|
|
|
|
=head2 missing()
|
319
|
|
|
|
|
|
|
|
320
|
|
|
|
|
|
|
missing() gets or sets the missing value for a Data::Validator::Item. This does matter, because
|
321
|
|
|
|
|
|
|
missing values are acceptable to validate(), and because put() changes missing values to undef.
|
322
|
|
|
|
|
|
|
This is used by *both* put() and validate(). If you don't understand why missing values are
|
323
|
|
|
|
|
|
|
*acceptable* you need to think harder about the problem we're solving here.
|
324
|
|
|
|
|
|
|
Would you like missing() to accept several alternative missing values? Let me know...
|
325
|
|
|
|
|
|
|
|
326
|
|
|
|
|
|
|
C<< $item->missing(""); >>
|
327
|
|
|
|
|
|
|
C<< $item->missing('*'); >>
|
328
|
|
|
|
|
|
|
|
329
|
|
|
|
|
|
|
=cut
|
330
|
|
|
|
|
|
|
|
331
|
|
|
|
|
|
|
sub missing {
|
332
|
130
|
|
|
130
|
1
|
141
|
my $self = shift;
|
333
|
130
|
100
|
|
|
|
274
|
if (@_) { $self->{MISSING} = shift }
|
|
3
|
|
|
|
|
21
|
|
334
|
130
|
|
|
|
|
439
|
return $self->{MISSING};
|
335
|
|
|
|
|
|
|
} #End of subroutine missing
|
336
|
|
|
|
|
|
|
|
337
|
|
|
|
|
|
|
=head2 min()/max()
|
338
|
|
|
|
|
|
|
|
339
|
|
|
|
|
|
|
min() and max() get and set the lower and upper limits for a Data::Validator::Item. These are
|
340
|
|
|
|
|
|
|
used by validate() to check whether a value is greater than or less than a limit. These could
|
341
|
|
|
|
|
|
|
be used for character data, but really make more sense for numeric values. Note that I
|
342
|
|
|
|
|
|
|
don't really understand how min and max work for character data yet. Note also that perl
|
343
|
|
|
|
|
|
|
may occasionally require you to tell it that a variable is numeric. (try adding 0 to it if this
|
344
|
|
|
|
|
|
|
problem arises).
|
345
|
|
|
|
|
|
|
|
346
|
|
|
|
|
|
|
C<< $item->min(-5) >>
|
347
|
|
|
|
|
|
|
or
|
348
|
|
|
|
|
|
|
C<< $item->max(42) >>
|
349
|
|
|
|
|
|
|
|
350
|
|
|
|
|
|
|
=cut
|
351
|
|
|
|
|
|
|
|
352
|
|
|
|
|
|
|
sub min {
|
353
|
100
|
|
|
100
|
1
|
111
|
my $self = shift;
|
354
|
100
|
100
|
|
|
|
201
|
if (@_) { $self->{MIN} = shift }
|
|
2
|
|
|
|
|
10
|
|
355
|
100
|
|
|
|
|
354
|
return $self->{MIN};
|
356
|
|
|
|
|
|
|
} #End of subroutine min
|
357
|
|
|
|
|
|
|
|
358
|
|
|
|
|
|
|
sub max {
|
359
|
98
|
|
|
98
|
1
|
196
|
my $self = shift;
|
360
|
98
|
100
|
|
|
|
225
|
if (@_) { $self->{MAX} = shift }
|
|
2
|
|
|
|
|
6
|
|
361
|
98
|
|
|
|
|
285
|
return $self->{MAX};
|
362
|
|
|
|
|
|
|
} #End of subroutine max
|
363
|
|
|
|
|
|
|
|
364
|
|
|
|
|
|
|
=head2 match()
|
365
|
|
|
|
|
|
|
|
366
|
|
|
|
|
|
|
match() sets or gets a Perl regular expression. If you know the syntax of these
|
367
|
|
|
|
|
|
|
you can do clever stuff. Bear in mind that the validate function uses it internally like this
|
368
|
|
|
|
|
|
|
|
369
|
|
|
|
|
|
|
my $match = $self->match();
|
370
|
|
|
|
|
|
|
if ($datum !~ /$match/)
|
371
|
|
|
|
|
|
|
|
372
|
|
|
|
|
|
|
If this means nothing to you, just use it like these examples -
|
373
|
|
|
|
|
|
|
|
374
|
|
|
|
|
|
|
C<< $item->('r') >>
|
375
|
|
|
|
|
|
|
C<< $item->('dog') >>
|
376
|
|
|
|
|
|
|
|
377
|
|
|
|
|
|
|
=cut
|
378
|
|
|
|
|
|
|
|
379
|
|
|
|
|
|
|
sub match {
|
380
|
116
|
|
|
116
|
1
|
170
|
my $self = shift;
|
381
|
116
|
100
|
|
|
|
234
|
if (@_) {
|
382
|
1
|
|
|
|
|
3
|
my $regex = shift;
|
383
|
1
|
50
|
|
|
|
6
|
if (_is_valid_pattern($regex)) { #Is it a valid regex?
|
384
|
1
|
|
|
|
|
5
|
$self->{MATCH} = $regex;
|
385
|
1
|
|
|
|
|
5
|
return $self->{MATCH};
|
386
|
|
|
|
|
|
|
}
|
387
|
|
|
|
|
|
|
}# If @_
|
388
|
115
|
|
|
|
|
293
|
return $self->{MATCH};
|
389
|
|
|
|
|
|
|
} #End of subroutine match
|
390
|
|
|
|
|
|
|
|
391
|
|
|
|
|
|
|
=head2 transform()
|
392
|
|
|
|
|
|
|
|
393
|
|
|
|
|
|
|
transform() sets or gets a reference to a subroutine, a reference of type CODE. This
|
394
|
|
|
|
|
|
|
is used by put() to change the value of a variable. This is very flexible, and has covered
|
395
|
|
|
|
|
|
|
all of my needs so far.
|
396
|
|
|
|
|
|
|
|
397
|
|
|
|
|
|
|
C<< $item->transform(\&test) >>
|
398
|
|
|
|
|
|
|
|
399
|
|
|
|
|
|
|
=cut
|
400
|
|
|
|
|
|
|
|
401
|
|
|
|
|
|
|
sub transform {
|
402
|
26
|
|
|
26
|
1
|
48
|
my $self = shift;
|
403
|
26
|
100
|
|
|
|
53
|
if (@_) {
|
404
|
6
|
|
|
|
|
8
|
my $ref = shift;
|
405
|
|
|
|
|
|
|
|
406
|
6
|
100
|
|
|
|
16
|
if (_ref_check($ref,'CODE')) { # Is it a CODEREF??
|
407
|
3
|
|
|
|
|
27
|
$self->{TRANSFORM} = $ref;
|
408
|
3
|
|
|
|
|
23
|
return $self->{TRANSFORM};
|
409
|
|
|
|
|
|
|
}
|
410
|
|
|
|
|
|
|
} # if(@_)
|
411
|
23
|
|
|
|
|
64
|
return $self->{TRANSFORM};
|
412
|
|
|
|
|
|
|
} #End of subroutine transform
|
413
|
|
|
|
|
|
|
|
414
|
|
|
|
|
|
|
=head2 verify()
|
415
|
|
|
|
|
|
|
|
416
|
|
|
|
|
|
|
verify() sets or gets a reference to a subroutine, a reference of type CODE. This is
|
417
|
|
|
|
|
|
|
used by validate() to check if a variable complies with certain rules. This is the most
|
418
|
|
|
|
|
|
|
complicated method of testing a value but it can be very useful in some circumstances.
|
419
|
|
|
|
|
|
|
Remember there isn't any built in way to use the value of *another* variable from the
|
420
|
|
|
|
|
|
|
same record in this subroutine.
|
421
|
|
|
|
|
|
|
|
422
|
|
|
|
|
|
|
C<< $item->verify(\&test); >>
|
423
|
|
|
|
|
|
|
|
424
|
|
|
|
|
|
|
=cut
|
425
|
|
|
|
|
|
|
|
426
|
|
|
|
|
|
|
sub verify {
|
427
|
39
|
|
|
39
|
1
|
53
|
my $self = shift;
|
428
|
39
|
100
|
|
|
|
99
|
if (@_) {
|
429
|
4
|
|
|
|
|
5
|
my $ref = shift;
|
430
|
4
|
100
|
|
|
|
10
|
if (_ref_check($ref,'CODE')) { # Is it a CODEREF??
|
431
|
1
|
|
|
|
|
5
|
$self->{VERIFY} = $ref;
|
432
|
1
|
|
|
|
|
17
|
return $self->{VERIFY};
|
433
|
|
|
|
|
|
|
}
|
434
|
|
|
|
|
|
|
} # if(@_)
|
435
|
38
|
|
|
|
|
104
|
return $self->{VERIFY};
|
436
|
|
|
|
|
|
|
} #End of subroutine verify
|
437
|
|
|
|
|
|
|
|
438
|
|
|
|
|
|
|
=head2 values()
|
439
|
|
|
|
|
|
|
|
440
|
|
|
|
|
|
|
values() sets or gets an array reference containing all of the possible values of a variable.
|
441
|
|
|
|
|
|
|
This is used by validate() to check if a variable has one of a list of values. The array reference gets
|
442
|
|
|
|
|
|
|
turned into a hash internally so that I can use exists(), but in Perl 5.8 and up exists() works for arrays.
|
443
|
|
|
|
|
|
|
I chose to initialise this using array references because the syntax is easy -
|
444
|
|
|
|
|
|
|
|
445
|
|
|
|
|
|
|
C<< $item->values([0,1,2,3,4]); >>
|
446
|
|
|
|
|
|
|
or
|
447
|
|
|
|
|
|
|
C<< $item->values(\@array); >>
|
448
|
|
|
|
|
|
|
|
449
|
|
|
|
|
|
|
=cut
|
450
|
|
|
|
|
|
|
|
451
|
|
|
|
|
|
|
sub values {
|
452
|
135
|
|
|
135
|
1
|
153
|
my $self = shift;
|
453
|
135
|
100
|
|
|
|
891
|
if (@_) {
|
454
|
6
|
|
|
|
|
8
|
my $ref = shift;
|
455
|
6
|
100
|
|
|
|
22
|
if (_ref_check($ref,'ARRAY')) { # Is it an ARRAY reference?? $self->{TRANSFORM} = $ref;
|
456
|
4
|
|
|
|
|
24
|
my %hash;
|
457
|
4
|
|
|
|
|
27
|
grep { ! $hash{$_} ++ } @$ref; #Perl Cookbook Recipe 4.6 Thanks!
|
|
26
|
|
|
|
|
72
|
|
458
|
4
|
|
|
|
|
22
|
$self->{VALUES} = \%hash;
|
459
|
4
|
|
|
|
|
35
|
return $self->{VALUES};
|
460
|
|
|
|
|
|
|
}
|
461
|
|
|
|
|
|
|
} # if(@_)
|
462
|
131
|
|
|
|
|
533
|
return $self->{VALUES};
|
463
|
|
|
|
|
|
|
} #End of subroutine values
|
464
|
|
|
|
|
|
|
|
465
|
|
|
|
|
|
|
=head1 PRIVATE FUNCTIONS
|
466
|
|
|
|
|
|
|
|
467
|
|
|
|
|
|
|
=head2 _ref_check()
|
468
|
|
|
|
|
|
|
|
469
|
|
|
|
|
|
|
_ref_check() is a private subroutine which looks to see if a reference refers to what you expect. Don't
|
470
|
|
|
|
|
|
|
use it. Note that this produces a number of warnings during testing. you're meant to see these warnings!
|
471
|
|
|
|
|
|
|
|
472
|
|
|
|
|
|
|
=cut
|
473
|
|
|
|
|
|
|
|
474
|
|
|
|
|
|
|
sub _ref_check {
|
475
|
16
|
|
|
16
|
|
130
|
my ($test,$should_be) = @_;
|
476
|
|
|
|
|
|
|
#Why doesn't this get called with self as it's first argument?
|
477
|
|
|
|
|
|
|
|
478
|
16
|
|
|
|
|
33
|
my $ref = ref($test);
|
479
|
|
|
|
|
|
|
|
480
|
16
|
100
|
|
|
|
70
|
unless ($ref eq $should_be) {
|
481
|
8
|
100
|
|
|
|
28
|
if (length($ref) > 0) {
|
482
|
5
|
|
|
|
|
659
|
carp ("\n>> $test isn't a reference to an array, but rather a reference to a ".$ref."\n")
|
483
|
|
|
|
|
|
|
}
|
484
|
|
|
|
|
|
|
else
|
485
|
|
|
|
|
|
|
{
|
486
|
3
|
|
|
|
|
401
|
carp ("\n>> $test isn't an array reference at all, but a SCALAR\n")
|
487
|
|
|
|
|
|
|
}# if (defined($refref))
|
488
|
8
|
|
|
|
|
1079
|
return 0;
|
489
|
|
|
|
|
|
|
} # unless ($ref eq $should_be)
|
490
|
8
|
|
|
|
|
35
|
return 1;
|
491
|
|
|
|
|
|
|
} #End of subrotuine _ref_check
|
492
|
|
|
|
|
|
|
|
493
|
|
|
|
|
|
|
=head2 _is_valid_pattern()
|
494
|
|
|
|
|
|
|
|
495
|
|
|
|
|
|
|
_is_valid_pattern is a private function used internally to check if a supplied regex is valid.
|
496
|
|
|
|
|
|
|
It comes from Tom Christiansen and Nathan Torkington 'The Perl CookBook' Recipe 6.11.
|
497
|
|
|
|
|
|
|
Thanks! More details at L<< http://www.oreilly.com/catalog/cookbook/ >>
|
498
|
|
|
|
|
|
|
|
499
|
|
|
|
|
|
|
=cut
|
500
|
|
|
|
|
|
|
|
501
|
|
|
|
|
|
|
sub _is_valid_pattern {
|
502
|
1
|
|
|
1
|
|
2
|
my $pat = shift;
|
503
|
|
|
|
|
|
|
|
504
|
1
|
|
50
|
|
|
3
|
return eval { "" =~ /$pat/; 1 } || 0;
|
505
|
|
|
|
|
|
|
} #End of subroutine _is_valid_pattern
|
506
|
|
|
|
|
|
|
|
507
|
|
|
|
|
|
|
return 1; #Required for all modules
|
508
|
|
|
|
|
|
|
|
509
|
|
|
|
|
|
|
=head1 KNOWN BUGS
|
510
|
|
|
|
|
|
|
|
511
|
|
|
|
|
|
|
min() and max() don't really work for non-numeric values, arguably they should!
|
512
|
|
|
|
|
|
|
|
513
|
|
|
|
|
|
|
=head1 AUTHOR
|
514
|
|
|
|
|
|
|
|
515
|
|
|
|
|
|
|
Anthony Staines
|
516
|
|
|
|
|
|
|
|
517
|
|
|
|
|
|
|
=head1 VERSION
|
518
|
|
|
|
|
|
|
|
519
|
|
|
|
|
|
|
Version 0.7 first public (alpha) release
|
520
|
|
|
|
|
|
|
|
521
|
|
|
|
|
|
|
=head1 TO DO
|
522
|
|
|
|
|
|
|
|
523
|
|
|
|
|
|
|
This is an alpha release. I am actively seeking feedback on the user interface.
|
524
|
|
|
|
|
|
|
Please let me know what you think.
|
525
|
|
|
|
|
|
|
|
526
|
|
|
|
|
|
|
The validate and put functions are called a lot - several hundred thousand times
|
527
|
|
|
|
|
|
|
in my applications. The program spends most of it's time executing these. (Confirmed
|
528
|
|
|
|
|
|
|
by profiling). I will implement an eval based version of these, and see if it is faster - it should be!
|
529
|
|
|
|
|
|
|
|
530
|
|
|
|
|
|
|
Try anthony.staines@ucd.ie with your comments
|
531
|
|
|
|
|
|
|
|
532
|
|
|
|
|
|
|
=head1 SEE ALSO
|
533
|
|
|
|
|
|
|
|
534
|
|
|
|
|
|
|
L.
|
535
|
|
|
|
|
|
|
|
536
|
|
|
|
|
|
|
=head1 COPYRIGHT AND DISCLAIMER
|
537
|
|
|
|
|
|
|
|
538
|
|
|
|
|
|
|
This program is Copyright 2002 by Anthony Staines. This program is free software;
|
539
|
|
|
|
|
|
|
you can redistribute it and/or modify it under the terms of the Perl Artistic License or the
|
540
|
|
|
|
|
|
|
GNU General Public License as published by the Free Software Foundation; either
|
541
|
|
|
|
|
|
|
version 2 of the License, or (at your option) any later version.
|
542
|
|
|
|
|
|
|
|
543
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
|
544
|
|
|
|
|
|
|
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
545
|
|
|
|
|
|
|
See the GNU General Public License for more details.
|
546
|
|
|
|
|
|
|
|
547
|
|
|
|
|
|
|
If you do not have a copy of the GNU General Public License write to the Free Software Foundation, Inc.,
|
548
|
|
|
|
|
|
|
675 Mass Ave, Cambridge, MA 02139, USA.
|
549
|
|
|
|
|
|
|
|
550
|
|
|
|
|
|
|
=head1 Long Example
|
551
|
|
|
|
|
|
|
|
552
|
|
|
|
|
|
|
Please let me know if you feel that this example is B appropriate here.
|
553
|
|
|
|
|
|
|
This example is heavily edited and won't compile - If you want the original ask me.
|
554
|
|
|
|
|
|
|
at C<< anthony.staines@ucd.ie >>
|
555
|
|
|
|
|
|
|
|
556
|
|
|
|
|
|
|
#Load_Births.pl
|
557
|
|
|
|
|
|
|
#
|
558
|
|
|
|
|
|
|
# Copyright (c) 2002 Anthony Staines. All rights reserved.
|
559
|
|
|
|
|
|
|
# This program is free software; you can redistribute it and/or
|
560
|
|
|
|
|
|
|
# modify it under the same terms as Perl itself.
|
561
|
|
|
|
|
|
|
|
562
|
|
|
|
|
|
|
#use things...
|
563
|
|
|
|
|
|
|
|
564
|
|
|
|
|
|
|
use Data::Validator::Item; #My verification function factory
|
565
|
|
|
|
|
|
|
|
566
|
|
|
|
|
|
|
|
567
|
|
|
|
|
|
|
#Open the data file - we use STDIN (redirected to a file)
|
568
|
|
|
|
|
|
|
|
569
|
|
|
|
|
|
|
#Read the header - first line of file - a comma seperated list of variable names
|
570
|
|
|
|
|
|
|
|
571
|
|
|
|
|
|
|
my @fields = @{read_header()};
|
572
|
|
|
|
|
|
|
my $fields = join(", ",@fields); #List of field names for DBI INSERT
|
573
|
|
|
|
|
|
|
|
574
|
|
|
|
|
|
|
my @values = ('?') x scalar(@fields);
|
575
|
|
|
|
|
|
|
my $values = join(", ", @values); #Same number of question marks...
|
576
|
|
|
|
|
|
|
|
577
|
|
|
|
|
|
|
#Setup the data dictionary
|
578
|
|
|
|
|
|
|
my %dictionary = %{Births_setup()};
|
579
|
|
|
|
|
|
|
|
580
|
|
|
|
|
|
|
# Hash to store rejected variables
|
581
|
|
|
|
|
|
|
my %errors;
|
582
|
|
|
|
|
|
|
|
583
|
|
|
|
|
|
|
#Set up and prepare the SQL and the $sth
|
584
|
|
|
|
|
|
|
if ($entering) {
|
585
|
|
|
|
|
|
|
$sql = "INSERT INTO $table ($fields) VALUES ($values)";
|
586
|
|
|
|
|
|
|
$sth = $dbh->prepare($sql); #Putting this outside the loop reduces execution time significantly
|
587
|
|
|
|
|
|
|
} # if $entering
|
588
|
|
|
|
|
|
|
|
589
|
|
|
|
|
|
|
while (<>) { #This reads the input file, line by line
|
590
|
|
|
|
|
|
|
my @output; my $index = 0; my $error = 0; my $error_msg='';
|
591
|
|
|
|
|
|
|
|
592
|
|
|
|
|
|
|
$csv->parse($_);
|
593
|
|
|
|
|
|
|
my @data = $csv->fields();
|
594
|
|
|
|
|
|
|
|
595
|
|
|
|
|
|
|
foreach my $datum (@data) {
|
596
|
|
|
|
|
|
|
|
597
|
|
|
|
|
|
|
# B< Validate >
|
598
|
|
|
|
|
|
|
if ($dictionary{$fields[$index]}->validate($datum)!=1) {
|
599
|
|
|
|
|
|
|
$error_msg = "\t Line ".$line." ".$fields[$index]."-".$datum;
|
600
|
|
|
|
|
|
|
$errors{$error_msg} = 1; #Fill the hash of error messages for later printing
|
601
|
|
|
|
|
|
|
$error = 1;
|
602
|
|
|
|
|
|
|
} #if validate() returns invalid
|
603
|
|
|
|
|
|
|
|
604
|
|
|
|
|
|
|
# B< Put >
|
605
|
|
|
|
|
|
|
#if required, and no errors occurred
|
606
|
|
|
|
|
|
|
if ($entering && !$error) {
|
607
|
|
|
|
|
|
|
push @output, $dictionary{$fields[$index]}->put($datum);
|
608
|
|
|
|
|
|
|
} #If entering data
|
609
|
|
|
|
|
|
|
|
610
|
|
|
|
|
|
|
$index++;
|
611
|
|
|
|
|
|
|
} # foreach $datum in @data
|
612
|
|
|
|
|
|
|
|
613
|
|
|
|
|
|
|
$line++; #Increment the line counter for error reporting, note that lines beginning with the comment character will be included
|
614
|
|
|
|
|
|
|
} #End of while (<>)
|
615
|
|
|
|
|
|
|
|
616
|
|
|
|
|
|
|
|
617
|
|
|
|
|
|
|
print join("\n",sort(keys(%errors)))."\n"; # Produces a list of rejected values
|
618
|
|
|
|
|
|
|
|
619
|
|
|
|
|
|
|
exit(1);
|
620
|
|
|
|
|
|
|
|
621
|
|
|
|
|
|
|
#
|
622
|
|
|
|
|
|
|
# Read_header First line in data files must contain a list of field names.
|
623
|
|
|
|
|
|
|
#
|
624
|
|
|
|
|
|
|
sub read_header {
|
625
|
|
|
|
|
|
|
defined(my $header = <>) #First line in STDIN
|
626
|
|
|
|
|
|
|
or die("Error accessing STDIN - $!\n");
|
627
|
|
|
|
|
|
|
|
628
|
|
|
|
|
|
|
$csv->parse($header)
|
629
|
|
|
|
|
|
|
or die("Error parsing the header of the input file - $!\n");
|
630
|
|
|
|
|
|
|
|
631
|
|
|
|
|
|
|
my @fields = $csv->fields()
|
632
|
|
|
|
|
|
|
or die("Error retrieving contents of parsed header - Should never happen - $!\n");
|
633
|
|
|
|
|
|
|
|
634
|
|
|
|
|
|
|
foreach my $field (@fields) {
|
635
|
|
|
|
|
|
|
$field = lc($field);
|
636
|
|
|
|
|
|
|
}
|
637
|
|
|
|
|
|
|
|
638
|
|
|
|
|
|
|
return \@fields;
|
639
|
|
|
|
|
|
|
} #End of subroutine read_header
|
640
|
|
|
|
|
|
|
|
641
|
|
|
|
|
|
|
#
|
642
|
|
|
|
|
|
|
# B< Births_setup >
|
643
|
|
|
|
|
|
|
#
|
644
|
|
|
|
|
|
|
sub Births_setup {
|
645
|
|
|
|
|
|
|
|
646
|
|
|
|
|
|
|
my @variables = ('AGE_MAT','AGE_MAT_OBS','HOSP_NO','YEAR_RECORD','CASE_NO','INST_NO',
|
647
|
|
|
|
|
|
|
'DAY_BIRTH','MONTH_BIRTH','YEAR_BIRTH','YEAR_BIRTH_OBS',
|
648
|
|
|
|
|
|
|
[snip]
|
649
|
|
|
|
|
|
|
'ENT_NO','CO_REG','REGSTAMP','AGE_MARRIAGE','DURATION_MARRIAGE','ADJ_PREV_LIVE_BTHS');
|
650
|
|
|
|
|
|
|
|
651
|
|
|
|
|
|
|
my %dictionary;
|
652
|
|
|
|
|
|
|
|
653
|
|
|
|
|
|
|
#Write the boring bits of dictionary
|
654
|
|
|
|
|
|
|
foreach my $variable (@variables) {
|
655
|
|
|
|
|
|
|
my $code = 'my $'.lc($variable).'= Data::Validator::Item->new();' ;
|
656
|
|
|
|
|
|
|
$code .= '$dictionary{'.lc($variable).'} = $'.lc($variable).';';
|
657
|
|
|
|
|
|
|
$code .= '$'.lc($variable).'->name("'.lc($variable).'");';
|
658
|
|
|
|
|
|
|
$code .= '$'.lc($variable).'->missing(\'\');';
|
659
|
|
|
|
|
|
|
eval($code);
|
660
|
|
|
|
|
|
|
print "\$@ was $@\n" if $@;
|
661
|
|
|
|
|
|
|
}
|
662
|
|
|
|
|
|
|
|
663
|
|
|
|
|
|
|
#Each entry in the dictionary looks like this -
|
664
|
|
|
|
|
|
|
# my $age_mat=Data::Validator::Item->new(); #Set up the Data::Validator::Item called age_mat
|
665
|
|
|
|
|
|
|
# $dictionary{age_mat}=$age_mat; #Add it to the $dictionary hash
|
666
|
|
|
|
|
|
|
# $age_mat->name('age_mat'); #Set the name attribute of the $age_mat
|
667
|
|
|
|
|
|
|
# $age_mat->missing(''); #Set the missing attribute of the $age_mat
|
668
|
|
|
|
|
|
|
#
|
669
|
|
|
|
|
|
|
|
670
|
|
|
|
|
|
|
#Subroutines used for verification/transformation
|
671
|
|
|
|
|
|
|
my $sex_coderef = sub{
|
672
|
|
|
|
|
|
|
my $datum = shift;
|
673
|
|
|
|
|
|
|
my %transform = (
|
674
|
|
|
|
|
|
|
1 => 'M',
|
675
|
|
|
|
|
|
|
2 => 'F',
|
676
|
|
|
|
|
|
|
3 => 'U',
|
677
|
|
|
|
|
|
|
);
|
678
|
|
|
|
|
|
|
return $transform{$datum}
|
679
|
|
|
|
|
|
|
};
|
680
|
|
|
|
|
|
|
|
681
|
|
|
|
|
|
|
my $day_coderef = sub {
|
682
|
|
|
|
|
|
|
my $datum = shift;
|
683
|
|
|
|
|
|
|
if ($datum =~ /0+-$/){return 1};
|
684
|
|
|
|
|
|
|
if ($datum > 00 || $datum < 32) {return 1;}
|
685
|
|
|
|
|
|
|
return 0;
|
686
|
|
|
|
|
|
|
};
|
687
|
|
|
|
|
|
|
|
688
|
|
|
|
|
|
|
my $month_coderef = sub {
|
689
|
|
|
|
|
|
|
my $datum = shift;
|
690
|
|
|
|
|
|
|
if ($datum =~ /0+-$/){return 1};
|
691
|
|
|
|
|
|
|
if ($datum > 00 || $datum < 12) {return 1;}
|
692
|
|
|
|
|
|
|
return 0;
|
693
|
|
|
|
|
|
|
};
|
694
|
|
|
|
|
|
|
|
695
|
|
|
|
|
|
|
#
|
696
|
|
|
|
|
|
|
# This is where the specific rules are set for each variable
|
697
|
|
|
|
|
|
|
# This lot shoudl give you a fair idea of how this module can be used
|
698
|
|
|
|
|
|
|
#
|
699
|
|
|
|
|
|
|
|
700
|
|
|
|
|
|
|
# AGE_MAT
|
701
|
|
|
|
|
|
|
$dictionary{age_mat}->missing('99');
|
702
|
|
|
|
|
|
|
$dictionary{age_mat}->min(13);
|
703
|
|
|
|
|
|
|
$dictionary{age_mat}->max(52);
|
704
|
|
|
|
|
|
|
# AGE_MAT_OBS
|
705
|
|
|
|
|
|
|
$dictionary{age_mat_obs}->missing('99');
|
706
|
|
|
|
|
|
|
$dictionary{age_mat_obs}->min(13);
|
707
|
|
|
|
|
|
|
$dictionary{age_mat_obs}->max(52);
|
708
|
|
|
|
|
|
|
# YEAR_RECORD
|
709
|
|
|
|
|
|
|
$dictionary{year_record}->values([$year]);
|
710
|
|
|
|
|
|
|
# SEX
|
711
|
|
|
|
|
|
|
$dictionary{sex}->values([1,2,3]);
|
712
|
|
|
|
|
|
|
$dictionary{sex}->transform($sex_coderef);
|
713
|
|
|
|
|
|
|
# WEIGHT
|
714
|
|
|
|
|
|
|
$dictionary{weight}->missing('9999');
|
715
|
|
|
|
|
|
|
$dictionary{weight}->min(200);
|
716
|
|
|
|
|
|
|
$dictionary{weight}->max(6500);
|
717
|
|
|
|
|
|
|
[snip]
|
718
|
|
|
|
|
|
|
# PD_GEST
|
719
|
|
|
|
|
|
|
$dictionary{pd_gest}->missing('99');
|
720
|
|
|
|
|
|
|
$dictionary{pd_gest}->min(16);
|
721
|
|
|
|
|
|
|
$dictionary{pd_gest}->max(46);
|
722
|
|
|
|
|
|
|
[snip]
|
723
|
|
|
|
|
|
|
# DAY_BIRTH_MOTHER
|
724
|
|
|
|
|
|
|
#$dictionary{day_birth_mother}->();
|
725
|
|
|
|
|
|
|
$dictionary{day_birth_mother}->missing('99');
|
726
|
|
|
|
|
|
|
$dictionary{day_birth_mother}->verify($day_coderef);
|
727
|
|
|
|
|
|
|
# MONTH_BIRTH_MOTHER
|
728
|
|
|
|
|
|
|
$dictionary{month_birth_mother}->missing('99');
|
729
|
|
|
|
|
|
|
$dictionary{month_birth_mother}->verify($month_coderef);
|
730
|
|
|
|
|
|
|
# YEAR_BIRTH_MOTHER
|
731
|
|
|
|
|
|
|
$dictionary{year_birth_mother}->missing('9999');
|
732
|
|
|
|
|
|
|
$dictionary{year_birth_mother}->min($min_year);
|
733
|
|
|
|
|
|
|
$dictionary{year_birth_mother}->max($max_year);
|
734
|
|
|
|
|
|
|
[snip]
|
735
|
|
|
|
|
|
|
|
736
|
|
|
|
|
|
|
return \%dictionary; # this hash is the objective of this whole subroutine
|
737
|
|
|
|
|
|
|
}# End of Births_setup
|
738
|
|
|
|
|
|
|
|
739
|
|
|
|
|
|
|
=cut
|