File Coverage

blib/lib/Date/Parse/Lite.pm
Criterion Covered Total %
statement 124 124 100.0
branch 54 58 93.1
condition 24 34 70.5
subroutine 28 28 100.0
pod 10 10 100.0
total 240 254 94.4


line stmt bran cond sub pod time code
1             package Date::Parse::Lite;
2              
3 2     2   64233 use 5.006;
  2         8  
4 2     2   11 use strict;
  2         4  
  2         48  
5 2     2   9 use warnings FATAL => 'all';
  2         15  
  2         72  
6 2     2   10 use Carp;
  2         4  
  2         3475  
7              
8             =head1 NAME
9              
10             Date::Parse::Lite - Light weight parsing of human-entered date strings
11              
12             =head1 VERSION
13              
14             Version 0.02
15              
16             =cut
17              
18             our $VERSION = '0.02';
19              
20              
21             =head1 SYNOPSIS
22              
23             Parse human-entered strings that are intended to contain dates and attempt
24             to extract machine-readable date information from them while being as
25             generous as possible with format interpretation.
26              
27             use Date::Parse::Lite;
28              
29             my $parser = Date::Parse::Lite->new();
30             $parser->parse('June 1st 17');
31              
32             if($parser->parsed()) {
33             my $day = $parser->day();
34              
35             ...
36              
37             =head1 DESCRIPTION
38              
39             This simple module attempts to parse a day, a month and a year from a string
40             on the assumption that the string is intended to represent a date. Note that
41             it does B validate the date except to the extent that doing so informs the
42             parsing, e.g. numbers in the range 13 to 31 will be parsed as days rather than
43             months but 31 will still be parsed as a day even when the month is 2. The
44             responsibility for validating the results and/or representing them in a more
45             useful form remains with the caller, if it is interested in doing so.
46              
47             The parser will extract dates from a wide range of inputs, including a lot
48             which would not look like dates to a human reader. The intention is to
49             maximise the likelihood that a date entered by a human being will be accepted
50             as such without the need to place difficult restrictions on what may be
51             entered. To add to the flexibility there are some configuration options. These
52             are described with their corresponding methods, below.
53              
54             The API is entirely object oriented - you must instantiate an object, which
55             will encapsulate the configuration and the strings to be parsed, and then
56             query that object to get the results.
57              
58             =head1 DATE FORMATS
59              
60             The parser is very forgiving about date formats - anything that's not a string
61             of digits or letters is essentially treated as a separator and then the remaining
62             numbers and words are understood as a day, month and year with words describing
63             months taken as priority over numbers. Any trailing text is ignored and any
64             amount of non-alphanumeric text may surround and separate the recognised parts.
65             While this means that a wide range of formats are accepted it does also mean
66             that the fact that this parser was able to extract date information from a string
67             does not guarantee that the string would have looked like a date to a human
68             observer. The parser is founded on the assumption that the string to be parsed
69             is intended to be a date.
70              
71             There is a single special case: a string of 8 digits, with or without leading
72             and/or trailing whitespace, is treated as YYYYMMDD.
73              
74             =head1 METHODS
75              
76             =head2 new([param => $value [, ...]])
77              
78             Create a new parser object. You may pass in a hash to initialise the object
79             with the following keys:
80              
81             =over
82              
83             =item prefer_month_first_order
84              
85             =item literal_years_below_100
86              
87             =item month_names
88              
89             Initialise the configuration set by the methods of the same names -
90             see below.
91              
92             =item date
93              
94             A string to be parsed - this will be passed to the C method.
95             Note that this is optional; you can just call C later (and
96             repeatedly) if you wish.
97              
98             =back
99              
100             =cut
101              
102             sub new {
103 2     2 1 197 my $invocant = shift;
104 2         7 my %params = @_;
105              
106 2   33     14 my $self = bless {}, ref $invocant || $invocant;
107 2         7 $self->prefer_month_first_order(1);
108 2         7 $self->_set_default_month_names;
109 2         6 $self->_reset;
110 2         5 foreach my $initialiser (qw{prefer_month_first_order literal_years_below_100 month_names date}) {
111 8 100       36 $self->$initialiser($params{$initialiser}) if exists $params{$initialiser};
112             }
113 2         8 return $self;
114             }
115              
116             sub _reset {
117 2388     2388   3143 my $self = shift;
118              
119 2388         8576 delete @$self{qw{day month year parsed _possible_month_or_day}};
120             }
121              
122             sub _set_default_month_names {
123 2     2   4 my $self = shift;
124              
125             $self->{_month_names} = [
126 2         13 january => 1,
127             february => 2,
128             march => 3,
129             april => 4,
130             may => 5,
131             june => 6,
132             july => 7,
133             august => 8,
134             september => 9,
135             october => 10,
136             november => 11,
137             december => 12,
138             ];
139             }
140              
141             =head2 day()
142              
143             Returns the day parsed from the date string, if any. This will be a
144             number in the range 1 to 31 if the parse was succesful.
145              
146             =head2 month()
147              
148             Returns the month parsed from the date string, if any. This will be
149             a number in the range 1 to 12 if the parse was succesful.
150              
151             =head2 year()
152              
153             Returns the year parsed from the date string, if any. This will be
154             a number if the parse was succesful.
155              
156             =head2 parsed()
157              
158             Reaturns a flag indicating whether a date has been successfully parsed
159             from a string.
160              
161             =cut
162              
163 14262     14262 1 29920 sub day { return shift->_access('day'); }
164 14589     14589 1 31558 sub month { return shift->_access('month'); }
165 9507     9507 1 19099 sub year { return shift->_access('year'); }
166 7157     7157 1 13967 sub parsed { return shift->_access('parsed'); }
167             sub _access {
168 45515     45515   63064 my $self = shift;
169 45515         70525 my($attr) = @_;
170              
171 45515         182541 return $self->{$attr};
172             }
173              
174             =head2 prefer_month_first_order([$flag])
175              
176             Returns a flag indicating how day-month order ambiguity will be resolved,
177             e.g. in a date like C<1/2/2015>. Defaults to true so that American dates are
178             parsed as expected. You may optionally pass a value to set the flag.
179              
180             =head2 literal_years_below_100([$flag])
181              
182             Returns a flag indicating whether years below 100 will be interpreted literally
183             (i.e. as being in the first century). If this is not set then such years
184             will be intepreted as being the one nearest the system date that suits,
185             e.g. in 2015 the year C<15> is interpreted as 2015, C<50> as 2050 and
186             C<90> as 1990. Defaults to false. You may optionally pass a value to set
187             the flag.
188              
189             =cut
190              
191 1737     1737 1 510450 sub prefer_month_first_order { return shift->_mutate_bool('prefer_month_first_order', @_); }
192 42     42 1 6201 sub literal_years_below_100 { return shift->_mutate_bool('literal_years_below_100', @_); }
193             sub _mutate_bool {
194 1779     1779   3387 my($self, $attr) = (shift, shift);
195              
196 1779 100       5840 $self->{$attr} = ! ! $_[0] if @_;
197 1779 100       8545 return exists $self->{$attr} ? $self->{$attr} : '';
198             }
199              
200             =head2 parse($string)
201              
202             Parse a string and attempt to extract a date. Returns a success flag
203             (see the C method). You can call this as many times as you like if
204             you need to parse multiple strings. The results available from the methods
205             described above will always be for the most recently parsed date string.
206              
207             =cut
208              
209 1     1 1 3 sub date { &parse; }
210             sub parse {
211 2386     2386 1 569744 my $self = shift;
212 2386         3855 my($string) = @_;
213              
214 2386         5184 $self->_reset;
215 2386         4639 my $tokens = _extract_tokens($string);
216 2386 100       10057 $self->_parse_tokens($tokens) if @$tokens >= 3;
217              
218 2386 100       5242 delete @$self{qw{day month year}} unless $self->parsed;
219              
220 2386         4865 return $self->parsed;
221             }
222              
223             =head2 month_names($name => $number [, ...])
224              
225             Add new names to be recognised as months, typically for internationalisation. You
226             may pass an array with an even number of elements or a reference to the same. Month
227             names are matched by comparing the number of characters found in the
228             parsed string with the same number of characters at the start of the names
229             provided through this method. Thus abreviations are understood as long as they
230             are intial sections of the provided names. Other abbreviations must be specified
231             separately - you may pass as many names with the same month number as you
232             wish. Comparisons are case-insensitive.
233              
234             Multiple calls to this method will add to the list of names - to reset the list
235             you must create a new object but note that all objects include the twelve
236             common English month names. This means that you won't have much luck with
237             languages that have the same names, or abbreviations of them, for different
238             months. I don't know of any such though.
239              
240             =cut
241              
242             sub month_names {
243 3     3 1 298 my $self = shift;
244 3 100       13 my @params = ref $_[0] ? @{$_[0]} : @_;
  1         3  
245              
246 3         11 while(@params > 1) {
247 7         16 my($month_name, $month_number) = splice @params, 0, 2;
248 7   50     17 $month_name ||= '';
249 7 50       18 croak "Month name '$month_name' should be more than two characters" unless length $month_name > 2;
250 7   50     16 $month_number ||= 0;
251 7 50 33     36 croak "Invalid month number '$month_number'" unless $month_number >= 1 && $month_number <= 12;
252 7         8 push @{$self->{_month_names}}, lc $month_name, $month_number;
  7         39  
253             }
254             }
255              
256             sub _extract_tokens {
257 2386     2386   3374 my($string) = @_;
258              
259 2386 50       6069 $string = '' unless defined $string;
260 2386 100       14041 return [$1, $2, $3] if $string =~ m{^\s*(\d\d\d\d)(\d\d)(\d\d)\s*$};
261 1232         9508 return [ $string =~ m{[[:alpha:]]+|\d+}ig ];
262             }
263              
264             sub _parse_tokens {
265 2384     2384   3238 my $self = shift;
266 2384         3505 my($tokens) = @_;
267              
268 2384         4858 foreach my $token (@$tokens) {
269 7162 100       14765 return unless $self->_process_token($token);
270 7158 100 100     15638 if($self->day && $self->month && defined $self->year) {
      100        
271 2373         4231 $self->{parsed} = 1;
272 2373         5224 last;
273             }
274             }
275             }
276              
277             sub _process_token {
278 7162     7162   9686 my $self = shift;
279 7162         12578 my($token) = @_;
280              
281 7162 100       23978 if($token =~ m{^\d+$}) {
282 7103         14864 return $self->_process_numeric_token($token);
283             }
284             else {
285 59         130 return $self->_process_word_token($token);
286             }
287             }
288              
289             sub _process_numeric_token {
290 7103     7103   9577 my $self = shift;
291 7103         10106 my($token) = @_;
292              
293 7103 100 66     47679 if($token > 31 || $token == 0 || length $token > 2 || ($self->month && $self->day)) {
      66        
      100        
      33        
294 2384 100       5192 return 0 if defined $self->year;
295 2382         5812 $self->_set_year($token + 0);
296 2382         9201 return 1;
297             }
298             else {
299 4719         11044 return $self->_process_month_or_day_token($token + 0);
300             }
301             }
302              
303             sub _set_year {
304 2382     2382   3203 my $self = shift;
305 2382         3620 my($year) = @_;
306              
307 2382 100 100     6678 if($year < 100 && ! $self->literal_years_below_100) {
308 13         323 my $this_year = 1900 + (localtime)[5];
309 13         30 $year += $this_year - $this_year % 100;
310 13 100       46 $year -= 100 if $year > $this_year + 50;
311             }
312 2382         5468 $self->{year} = $year;
313             }
314              
315             sub _process_month_or_day_token {
316 4719     4719   6181 my $self = shift;
317 4719         6540 my($token) = @_;
318              
319 4719 100       9537 if($token > 12) {
320 1201 100       2369 return 0 if $self->day;
321 1199         2180 $self->{day} = $token;
322 1199 100       3628 $self->{month} = $self->{_possible_month_or_day} if exists $self->{_possible_month_or_day};
323             }
324             else {
325 3518         7194 $self->_store_month_or_day($token);
326             }
327 4717         19080 return 1;
328             }
329              
330             sub _store_month_or_day {
331 3518     3518   4740 my $self = shift;
332 3518         4746 my($token) = @_;
333              
334 3518 100       6565 if($self->month) {
    100          
335 6         15 $self->{day} = $token;
336             }
337             elsif($self->day) {
338 306         780 $self->{month} = $token;
339             }
340             else {
341 3206         6689 $self->_check_uncertain_month_or_day($token);
342             }
343             }
344              
345             sub _check_uncertain_month_or_day {
346 3206     3206   4454 my $self = shift;
347 3206         4334 my($token) = @_;
348              
349 3206 100       7188 if(exists $self->{_possible_month_or_day}) {
350 1160         1901 my $day = $self->{_possible_month_or_day};
351 1160         1762 my $month = $token;
352 1160 100 100     2312 ($day, $month) = ($month, $day) if defined $self->year || $self->prefer_month_first_order;
353 1160         2385 $self->{day} = $day;
354 1160         3495 $self->{month} = $month;
355             }
356             else {
357 2046         6263 $self->{_possible_month_or_day} = $token;
358             }
359             }
360              
361             sub _process_word_token {
362 59     59   86 my $self = shift;
363 59         81 my($token) = @_;
364              
365 59         117 my $check_month = $self->_month_from_name($token);
366 59 100       179 return 1 unless $check_month;
367 40 50       78 return 0 if $self->month;
368 40         74 $self->{month} = $check_month;
369 40 100       137 $self->{day} = $self->{_possible_month_or_day} if exists $self->{_possible_month_or_day};
370 40         121 return 1;
371             }
372              
373             sub _month_from_name {
374 59     59   78 my $self = shift;
375 59         85 my($name) = @_;
376              
377 59 100       146 return unless length $name > 2;
378 53         103 $name = lc $name;
379 53         82 for(my $index = 0; $index < $#{$self->{_month_names}}; $index += 2) {
  597         1568  
380 584 100       1816 return $self->{_month_names}->[$index + 1] if substr($self->{_month_names}->[$index], 0, length $name) eq $name;
381             }
382 13         33 return;
383             }
384              
385             =head1 AUTHOR
386              
387             Merlyn Kline, C<< >>
388              
389             =head1 BUGS
390              
391             Please report any bugs or feature requests to C, or through
392             the web interface at L. I will be notified, and then you'll
393             automatically be notified of progress on your bug as I make changes.
394              
395              
396              
397              
398             =head1 SUPPORT
399              
400             You can find documentation for this module with the perldoc command.
401              
402             perldoc Date::Parse::Lite
403              
404              
405             You can also look for information at:
406              
407             =over 4
408              
409             =item * RT: CPAN's request tracker (report bugs here)
410              
411             L
412              
413             =item * AnnoCPAN: Annotated CPAN documentation
414              
415             L
416              
417             =item * CPAN Ratings
418              
419             L
420              
421             =item * Search CPAN
422              
423             L
424              
425             =item * Source code on GitHub
426              
427             L
428              
429             =back
430              
431             =head1 LICENSE AND COPYRIGHT
432              
433             Copyright 2015 Merlyn Kline.
434              
435             This program is free software; you can redistribute it and/or modify it
436             under the terms of the the Artistic License (2.0). You may obtain a
437             copy of the full license at:
438              
439             L
440              
441             Any use, modification, and distribution of the Standard or Modified
442             Versions is governed by this Artistic License. By using, modifying or
443             distributing the Package, you accept this license. Do not use, modify,
444             or distribute the Package, if you do not accept this license.
445              
446             If your Modified Version has been derived from a Modified Version made
447             by someone other than you, you are nevertheless required to ensure that
448             your Modified Version complies with the requirements of this license.
449              
450             This license does not grant you the right to use any trademark, service
451             mark, tradename, or logo of the Copyright Holder.
452              
453             This license includes the non-exclusive, worldwide, free-of-charge
454             patent license to make, have made, use, offer to sell, sell, import and
455             otherwise transfer the Package with respect to any patent claims
456             licensable by the Copyright Holder that are necessarily infringed by the
457             Package. If you institute patent litigation (including a cross-claim or
458             counterclaim) against any party alleging that the Package constitutes
459             direct or contributory patent infringement, then this Artistic License
460             to you shall terminate on the date that such litigation is filed.
461              
462             Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER
463             AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
464             THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
465             PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY
466             YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR
467             CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR
468             CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE,
469             EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
470              
471              
472             =cut
473              
474             1; # End of Date::Parse::Lite