File Coverage

blib/lib/Date/Parse/Lite.pm
Criterion Covered Total %
statement 124 124 100.0
branch 54 58 93.1
condition 24 34 70.5
subroutine 28 28 100.0
pod 10 10 100.0
total 240 254 94.4


line stmt bran cond sub pod time code
1             package Date::Parse::Lite;
2              
3 2     2   46735 use 5.008_001;
  2         8  
4 2     2   12 use strict;
  2         4  
  2         54  
5 2     2   10 use warnings FATAL => 'all';
  2         12  
  2         82  
6 2     2   10 use Carp;
  2         3  
  2         3908  
7              
8             =head1 NAME
9              
10             Date::Parse::Lite - Light weight parsing of human-entered date strings
11              
12             =head1 VERSION
13              
14             Version 0.03
15              
16             =cut
17              
18             our $VERSION = '0.03';
19              
20              
21             =head1 SYNOPSIS
22              
23             Parse human-entered strings that are intended to contain dates and attempt
24             to extract machine-readable date information from them while being as
25             generous as possible with format interpretation.
26              
27             use Date::Parse::Lite;
28              
29             my $parser = Date::Parse::Lite->new();
30             $parser->parse('June 1st 17');
31              
32             if($parser->parsed()) {
33             my $day = $parser->day();
34              
35             ...
36              
37             =head1 DESCRIPTION
38              
39             This simple module attempts to parse a day, a month and a year from a string
40             on the assumption that the string is intended to represent a date. Note that
41             it does B validate the date except to the extent that doing so informs the
42             parsing, e.g. numbers in the range 13 to 31 will be parsed as days rather than
43             months but 31 will still be parsed as a day even when the month is 2. The
44             responsibility for validating the results and/or representing them in a more
45             useful form remains with the caller, if it is interested in doing so.
46              
47             The parser will extract dates from a wide range of inputs, including a lot
48             which would not look like dates to a human reader. The intention is to
49             maximise the likelihood that a date entered by a human being will be accepted
50             as such without the need to place difficult restrictions on what may be
51             entered. To add to the flexibility there are some configuration options. These
52             are described with their corresponding methods, below.
53              
54             The API is entirely object oriented - you must instantiate an object, which
55             will encapsulate the configuration and the strings to be parsed, and then
56             query that object to get the results.
57              
58             =head1 DATE FORMATS
59              
60             The parser is very forgiving about date formats - anything that's not a string
61             of digits or letters is essentially treated as a separator and then the remaining
62             numbers and words are understood as a day, month and year with words describing
63             months taken as priority over numbers. Any trailing text is ignored and any
64             amount of non-alphanumeric text may surround and separate the recognised parts.
65             While this means that a wide range of formats are accepted it does also mean
66             that the fact that this parser was able to extract date information from a string
67             does not guarantee that the string would have looked like a date to a human
68             observer. The parser is founded on the assumption that the string to be parsed
69             is intended to be a date.
70              
71             There is a single special case: a string of 8 digits, with or without leading
72             and/or trailing whitespace, is treated as YYYYMMDD.
73              
74             =head1 METHODS
75              
76             =head2 new([param => $value [, ...]])
77              
78             Create a new parser object. You may pass in a hash to initialise the object
79             with the following keys:
80              
81             =over
82              
83             =item prefer_month_first_order
84              
85             =item literal_years_below_100
86              
87             =item month_names
88              
89             Initialise the configuration set by the methods of the same names -
90             see below.
91              
92             =item date
93              
94             A string to be parsed - this will be passed to the C method.
95             Note that this is optional; you can just call C later (and
96             repeatedly) if you wish.
97              
98             =back
99              
100             =cut
101              
102             sub new {
103 2     2 1 196 my $invocant = shift;
104 2         8 my %params = @_;
105              
106 2   33     14 my $self = bless {}, ref $invocant || $invocant;
107 2         8 $self->prefer_month_first_order(1);
108 2         7 $self->_set_default_month_names;
109 2         6 $self->_reset;
110 2         6 foreach my $initialiser (qw{prefer_month_first_order literal_years_below_100 month_names date}) {
111 8 100       35 $self->$initialiser($params{$initialiser}) if exists $params{$initialiser};
112             }
113 2         8 return $self;
114             }
115              
116             sub _reset {
117 2388     2388   3415 my $self = shift;
118              
119 2388         8594 delete @$self{qw{day month year parsed _possible_month_or_day}};
120             }
121              
122             sub _set_default_month_names {
123 2     2   4 my $self = shift;
124              
125             $self->{_month_names} = [
126 2         11 january => 1,
127             february => 2,
128             march => 3,
129             april => 4,
130             may => 5,
131             june => 6,
132             july => 7,
133             august => 8,
134             september => 9,
135             october => 10,
136             november => 11,
137             december => 12,
138             ];
139             }
140              
141             =head2 day()
142              
143             Returns the day parsed from the date string, if any. This will be a
144             number in the range 1 to 31 if the parse was succesful.
145              
146             =head2 month()
147              
148             Returns the month parsed from the date string, if any. This will be
149             a number in the range 1 to 12 if the parse was succesful.
150              
151             =head2 year()
152              
153             Returns the year parsed from the date string, if any. This will be
154             a number if the parse was succesful.
155              
156             =head2 parsed()
157              
158             Reaturns a flag indicating whether a date has been successfully parsed
159             from a string.
160              
161             =cut
162              
163 14262     14262 1 28833 sub day { return shift->_access('day'); }
164 14589     14589 1 30988 sub month { return shift->_access('month'); }
165 9507     9507 1 19425 sub year { return shift->_access('year'); }
166 7157     7157 1 14050 sub parsed { return shift->_access('parsed'); }
167             sub _access {
168 45515     45515   61829 my $self = shift;
169 45515         71540 my($attr) = @_;
170              
171 45515         184320 return $self->{$attr};
172             }
173              
174             =head2 prefer_month_first_order([$flag])
175              
176             Returns a flag indicating how day-month order ambiguity will be resolved,
177             e.g. in a date like C<1/2/2015>. Defaults to true so that American dates are
178             parsed as expected. You may optionally pass a value to set the flag.
179              
180             =head2 literal_years_below_100([$flag])
181              
182             Returns a flag indicating whether years below 100 will be interpreted literally
183             (i.e. as being in the first century). If this is not set then such years
184             will be intepreted as being the one nearest the system date that suits,
185             e.g. in 2015 the year C<15> is interpreted as 2015, C<50> as 2050 and
186             C<90> as 1990. Defaults to false. You may optionally pass a value to set
187             the flag.
188              
189             =cut
190              
191 1737     1737 1 515001 sub prefer_month_first_order { return shift->_mutate_bool('prefer_month_first_order', @_); }
192 42     42 1 6424 sub literal_years_below_100 { return shift->_mutate_bool('literal_years_below_100', @_); }
193             sub _mutate_bool {
194 1779     1779   3136 my($self, $attr) = (shift, shift);
195              
196 1779 100       5651 $self->{$attr} = ! ! $_[0] if @_;
197 1779 100       8205 return exists $self->{$attr} ? $self->{$attr} : '';
198             }
199              
200             =head2 parse($string)
201              
202             Parse a string and attempt to extract a date. Returns a success flag
203             (see the C method). You can call this as many times as you like if
204             you need to parse multiple strings. The results available from the methods
205             described above will always be for the most recently parsed date string.
206              
207             =cut
208              
209 1     1 1 4 sub date { &parse; }
210             sub parse {
211 2386     2386 1 559013 my $self = shift;
212 2386         3888 my($string) = @_;
213              
214 2386         4982 $self->_reset;
215 2386         4628 my $tokens = _extract_tokens($string);
216 2386 100       9838 $self->_parse_tokens($tokens) if @$tokens >= 3;
217              
218 2386 100       4877 delete @$self{qw{day month year}} unless $self->parsed;
219              
220 2386         4521 return $self->parsed;
221             }
222              
223             =head2 month_names($name => $number [, ...])
224              
225             Add new names to be recognised as months, typically for internationalisation. You
226             may pass an array with an even number of elements or a reference to the same. Month
227             names are matched by comparing the number of characters found in the
228             parsed string with the same number of characters at the start of the names
229             provided through this method. Thus abreviations are understood as long as they
230             are intial sections of the provided names. Other abbreviations must be specified
231             separately - you may pass as many names with the same month number as you
232             wish. Comparisons are case-insensitive.
233              
234             Multiple calls to this method will add to the list of names - to reset the list
235             you must create a new object but note that all objects include the twelve
236             common English month names. This means that you won't have much luck with
237             languages that have the same names, or abbreviations of them, for different
238             months. I don't know of any such though.
239              
240             =cut
241              
242             sub month_names {
243 3     3 1 300 my $self = shift;
244 3 100       16 my @params = ref $_[0] ? @{$_[0]} : @_;
  1         4  
245              
246 3         11 while(@params > 1) {
247 7         16 my($month_name, $month_number) = splice @params, 0, 2;
248 7   50     16 $month_name ||= '';
249 7 50       19 croak "Month name '$month_name' should be more than two characters" unless length $month_name > 2;
250 7   50     17 $month_number ||= 0;
251 7 50 33     35 croak "Invalid month number '$month_number'" unless $month_number >= 1 && $month_number <= 12;
252 7         9 push @{$self->{_month_names}}, lc $month_name, $month_number;
  7         37  
253             }
254             }
255              
256             sub _extract_tokens {
257 2386     2386   3986 my($string) = @_;
258              
259 2386 50       5780 $string = '' unless defined $string;
260 2386 100       13893 return [$1, $2, $3] if $string =~ m{^\s*(\d\d\d\d)(\d\d)(\d\d)\s*$};
261 1232         9502 return [ $string =~ m{[[:alpha:]]+|\d+}ig ];
262             }
263              
264             sub _parse_tokens {
265 2384     2384   3482 my $self = shift;
266 2384         3522 my($tokens) = @_;
267              
268 2384         4729 foreach my $token (@$tokens) {
269 7162 100       15416 return unless $self->_process_token($token);
270 7158 100 100     14980 if($self->day && $self->month && defined $self->year) {
      100        
271 2373         4017 $self->{parsed} = 1;
272 2373         5402 last;
273             }
274             }
275             }
276              
277             sub _process_token {
278 7162     7162   9316 my $self = shift;
279 7162         11692 my($token) = @_;
280              
281 7162 100       22507 if($token =~ m{^\d+$}) {
282 7103         15208 return $self->_process_numeric_token($token);
283             }
284             else {
285 59         123 return $self->_process_word_token($token);
286             }
287             }
288              
289             sub _process_numeric_token {
290 7103     7103   9953 my $self = shift;
291 7103         11017 my($token) = @_;
292              
293 7103 100 66     50340 if($token > 31 || $token == 0 || length $token > 2 || ($self->month && $self->day)) {
      66        
      100        
      33        
294 2384 100       4557 return 0 if defined $self->year;
295 2382         6745 $self->_set_year($token + 0);
296 2382         8653 return 1;
297             }
298             else {
299 4719         11666 return $self->_process_month_or_day_token($token + 0);
300             }
301             }
302              
303             sub _set_year {
304 2382     2382   3320 my $self = shift;
305 2382         3269 my($year) = @_;
306              
307 2382 100 100     6397 if($year < 100 && ! $self->literal_years_below_100) {
308 13         322 my $this_year = 1900 + (localtime)[5];
309 13         28 $year += $this_year - $this_year % 100;
310 13 100       42 $year -= 100 if $year > $this_year + 50;
311             }
312 2382         5310 $self->{year} = $year;
313             }
314              
315             sub _process_month_or_day_token {
316 4719     4719   6705 my $self = shift;
317 4719         6686 my($token) = @_;
318              
319 4719 100       10615 if($token > 12) {
320 1201 100       2329 return 0 if $self->day;
321 1199         2211 $self->{day} = $token;
322 1199 100       3727 $self->{month} = $self->{_possible_month_or_day} if exists $self->{_possible_month_or_day};
323             }
324             else {
325 3518         7104 $self->_store_month_or_day($token);
326             }
327 4717         18363 return 1;
328             }
329              
330             sub _store_month_or_day {
331 3518     3518   4519 my $self = shift;
332 3518         4629 my($token) = @_;
333              
334 3518 100       6683 if($self->month) {
    100          
335 6         16 $self->{day} = $token;
336             }
337             elsif($self->day) {
338 306         744 $self->{month} = $token;
339             }
340             else {
341 3206         6593 $self->_check_uncertain_month_or_day($token);
342             }
343             }
344              
345             sub _check_uncertain_month_or_day {
346 3206     3206   4419 my $self = shift;
347 3206         4368 my($token) = @_;
348              
349 3206 100       7207 if(exists $self->{_possible_month_or_day}) {
350 1160         1835 my $day = $self->{_possible_month_or_day};
351 1160         1619 my $month = $token;
352 1160 100 100     2295 ($day, $month) = ($month, $day) if defined $self->year || $self->prefer_month_first_order;
353 1160         2335 $self->{day} = $day;
354 1160         3494 $self->{month} = $month;
355             }
356             else {
357 2046         6025 $self->{_possible_month_or_day} = $token;
358             }
359             }
360              
361             sub _process_word_token {
362 59     59   76 my $self = shift;
363 59         79 my($token) = @_;
364              
365 59         126 my $check_month = $self->_month_from_name($token);
366 59 100       169 return 1 unless $check_month;
367 40 50       82 return 0 if $self->month;
368 40         70 $self->{month} = $check_month;
369 40 100       102 $self->{day} = $self->{_possible_month_or_day} if exists $self->{_possible_month_or_day};
370 40         133 return 1;
371             }
372              
373             sub _month_from_name {
374 59     59   80 my $self = shift;
375 59         79 my($name) = @_;
376              
377 59 100       148 return unless length $name > 2;
378 53         122 $name = lc $name;
379 53         85 for(my $index = 0; $index < $#{$self->{_month_names}}; $index += 2) {
  597         1537  
380 584 100       1780 return $self->{_month_names}->[$index + 1] if substr($self->{_month_names}->[$index], 0, length $name) eq $name;
381             }
382 13         31 return;
383             }
384              
385             =head1 AUTHOR
386              
387             Merlyn Kline, C<< >>
388              
389             =head1 BUGS
390              
391             Please report any bugs or feature requests to C, or through
392             the web interface at L. I will be notified, and then you'll
393             automatically be notified of progress on your bug as I make changes.
394              
395              
396              
397              
398             =head1 SUPPORT
399              
400             You can find documentation for this module with the perldoc command.
401              
402             perldoc Date::Parse::Lite
403              
404              
405             You can also look for information at:
406              
407             =over 4
408              
409             =item * RT: CPAN's request tracker (report bugs here)
410              
411             L
412              
413             =item * AnnoCPAN: Annotated CPAN documentation
414              
415             L
416              
417             =item * CPAN Ratings
418              
419             L
420              
421             =item * Search CPAN
422              
423             L
424              
425             =item * Source code on GitHub
426              
427             L
428              
429             =back
430              
431             =head1 LICENSE AND COPYRIGHT
432              
433             Copyright 2015 Merlyn Kline.
434              
435             This program is free software; you can redistribute it and/or modify it
436             under the terms of the the Artistic License (2.0). You may obtain a
437             copy of the full license at:
438              
439             L
440              
441             Any use, modification, and distribution of the Standard or Modified
442             Versions is governed by this Artistic License. By using, modifying or
443             distributing the Package, you accept this license. Do not use, modify,
444             or distribute the Package, if you do not accept this license.
445              
446             If your Modified Version has been derived from a Modified Version made
447             by someone other than you, you are nevertheless required to ensure that
448             your Modified Version complies with the requirements of this license.
449              
450             This license does not grant you the right to use any trademark, service
451             mark, tradename, or logo of the Copyright Holder.
452              
453             This license includes the non-exclusive, worldwide, free-of-charge
454             patent license to make, have made, use, offer to sell, sell, import and
455             otherwise transfer the Package with respect to any patent claims
456             licensable by the Copyright Holder that are necessarily infringed by the
457             Package. If you institute patent litigation (including a cross-claim or
458             counterclaim) against any party alleging that the Package constitutes
459             direct or contributory patent infringement, then this Artistic License
460             to you shall terminate on the date that such litigation is filed.
461              
462             Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER
463             AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
464             THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
465             PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY
466             YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR
467             CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR
468             CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE,
469             EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
470              
471              
472             =cut
473              
474             1; # End of Date::Parse::Lite