File Coverage

lib/Text/CSV/Hashify.pm
Criterion Covered Total %
statement 98 98 100.0
branch 46 48 95.8
condition 23 23 100.0
subroutine 16 16 100.0
pod 6 7 85.7
total 189 192 98.4


line stmt bran cond sub pod time code
1             package Text::CSV::Hashify;
2 5     5   115519 use strict;
  5         10  
  5         144  
3 5     5   87 use 5.8.0;
  5         16  
4 5     5   25 use Carp;
  5         8  
  5         421  
5 5     5   29 use Scalar::Util qw( reftype looks_like_number );
  5         9  
  5         302  
6 5     5   3775 use Text::CSV;
  5         90174  
  5         260  
7 5     5   2237 use open qw( :encoding(UTF-8) :std );
  5         4784  
  5         27  
8              
9             BEGIN {
10 5     5   58016 use Exporter ();
  5         9  
  5         99  
11 5     5   17 use vars qw($VERSION @ISA @EXPORT);
  5         6  
  5         425  
12 5     5   10 $VERSION = '0.08';
13 5         50 @ISA = qw(Exporter);
14 5         5549 @EXPORT = qw( hashify );
15             }
16              
17             =head1 NAME
18              
19             Text::CSV::Hashify - Turn a CSV file into a Perl hash
20              
21             =head1 VERSION
22              
23             This document refers to version 0.08 of Text::CSV::Hashify. This version was
24             released March 15 2017.
25              
26             =head1 SYNOPSIS
27              
28             # Simple functional interface
29             use Text::CSV::Hashify;
30             $hash_ref = hashify('/path/to/file.csv', 'primary_key');
31              
32             # Object-oriented interface
33             use Text::CSV::Hashify;
34             $obj = Text::CSV::Hashify->new( {
35             file => '/path/to/file.csv',
36             format => 'hoh', # hash of hashes, which is default
37             key => 'id', # needed except when format is 'aoh'
38             max_rows => 20, # number of records to read; defaults to all
39             ... # other key-value pairs possible for Text::CSV
40             } );
41              
42             # all records requested
43             $hash_ref = $obj->all;
44              
45             # arrayref of fields input
46             $fields_ref = $obj->fields;
47              
48             # hashref of specified record
49             $record_ref = $obj->record('value_of_key');
50              
51             # value of one field in one record
52             $datum = $obj->datum('value_of_key', 'field');
53              
54             # arrayref of all unique keys seen
55             $keys_ref = $obj->keys;
56              
57             =head1 DESCRIPTION
58              
59             The Comma-Separated-Value ('CSV') format is the most common way to store
60             spreadsheets or the output of relational database queries in plain-text
61             format. However, since commas (or other designated field-separator
62             characters) may be embedded within data entries, the parsing of delimited
63             records is non-trivial. Fortunately, in Perl this parsing is well handled by
64             CPAN distribution Text::CSV. This permits us to address more specific data
65             manipulation problems by building modules on top of Text::CSV.
66              
67             B In this document we will use I as a catch-all for tab-delimited
68             files, pipe-delimited files, and so forth. Please refer to the documentation
69             for Text::CSV to learn how to handle field separator characters other than the
70             comma.
71              
72             =head2 Primary Case: CSV (with primary key) to Hash of Hashes
73              
74             Text::CSV::Hashify is designed for the case where you simply want to turn a
75             CSV file into a Perl hash. In particular, it is designed for the case where
76             (a) the CSV file's first record is a list of fields in the ancestral database
77             table and (b) one field (column) functions as a B, I each
78             record's entry in that field is non-null and is distinct from every other
79             record's entry therein.
80              
81             Text::CSV::Hashify turns that kind of CSV file into one big hash of hashes.
82             Elements of this hash are keyed on the entries in the designated primary key
83             field and the value for each element is a hash reference of all the data in a
84             particular database record (including the primary key field and its value).
85              
86             =head2 Secondary Case: CSV (lacking primary key) to Array of Hashes
87              
88             You may, however, encounter cases where a CSV file's header row contains the
89             list of database fields but no field is capable of serving as a primary key,
90             I there is no field in which the entry for that field in any record is
91             guaranteed to be distinct from the entries in that field for all other
92             records.
93              
94             In this case, while an individual record can be turned into a hash,
95             the CSV file as a whole cannot accurately be turned into a hash of hashes. As
96             a fallback, Text::CSV::Hashify can, upon request, turn this into an array of
97             hashes. In this case, you will not be able to look up a particular record by
98             its primary key. You will instead have to know its index position within the
99             array (which is equivalent to knowing its record number in the original CSV
100             file minus C<1>).
101              
102             =head2 Interfaces
103              
104             Text::CSV::Hashify provides two interfaces: one functional, one
105             object-oriented.
106              
107             Use the functional interface when all you want is to turn a CSV file with a
108             primary key field into a hash of hashes.
109              
110             Use the object-oriented interface for any more sophisticated manipulation of
111             the CSV file. This includes:
112              
113             =over 4
114              
115             =item * Text::CSV options
116              
117             Access to any of the options available to Text::CSV, such as use of a
118             separator character other than a comma.
119              
120             =item * Limit number of records
121              
122             Selection of a limited number of records from the CSV file, rather than
123             slurping the whole file into your in-memory hash.
124              
125             =item * Array of hash references format
126              
127             Probably better than the default hash of hash references format when the CSV
128             file has no field able to serve as a primary key.
129              
130             =item * Metadata
131              
132             Access to the list of fields, the list of all primary key values, the values
133             in an individual record, or the value of an individual field in an individual
134             record.
135              
136             =back
137              
138             B On the recommendation of the authors/maintainers of Text::CSV,
139             Text::CSV::Hashify will internally always set Text::CSV's C 1>
140             option.
141              
142             =head1 FUNCTIONAL INTERFACE
143              
144             Text::CSV::Hashify by default exports one function: C.
145              
146             $hash_ref = hashify('/path/to/file.csv', 'primary_key');
147              
148             Function takes two arguments: path to CSV file; field in that file which
149             serves as primary key.
150              
151             Returns a reference to a hash of hash references.
152              
153             =cut
154              
155             sub hashify {
156 3 100   3 0 988 croak "'hashify()' must have two arguments"
157             unless @_ == 2;
158 2         4 my @args = @_;
159 2         7 for (my $i=0;$i<=$#args;$i++) {
160 4 100       123 croak "'hashify()' argument at index '$i' not true" unless $args[$i];
161             }
162 1         9 my $obj = Text::CSV::Hashify->new( {
163             file => $args[0],
164             key => $args[1],
165             } );
166 1         5 return $obj->all();
167             }
168              
169             =head1 OBJECT-ORIENTED INTERFACE
170              
171             =head2 C
172              
173             =over 4
174              
175             =item * Purpose
176              
177             Text::CSV::Hashify constructor.
178              
179             =item * Arguments
180              
181             $obj = Text::CSV::Hashify->new( {
182             file => '/path/to/file.csv',
183             format => 'hoh', # hash of hashes, which is default
184             key => 'id', # needed except when format is 'aoh'
185             max_rows => 20, # number of records to read; defaults to all
186             ... # other key-value pairs possible for Text::CSV
187             } );
188              
189             Single hash reference. Required element is:
190              
191             =over 4
192              
193             =item * C
194              
195             String: path to CSV file serving as input.
196              
197             =back
198              
199             Element usually needed:
200              
201             =over 4
202              
203             =item * C
204              
205             String: name of field in CSV file serving as unique key. Needed except when
206             optional element C is C.
207              
208             =back
209              
210             Optional elements are:
211              
212             =over 4
213              
214             =item * C
215              
216             String: possible values are C and C. Defaults to C (hash of
217             hashes). C will fail if the same value is encountered in more than one
218             record's entry in the C column. So if you know in advance that your data
219             cannot meet this condition, explicitly select C aoh>.
220              
221             =item * C
222              
223             Number: provide this if you do not wish to populate the hash with all data
224             records from the CSV file. (Will have no effect if the number provided is
225             greater than or equal to the number of data records in the CSV file.)
226              
227             =item * Any option available to Text::CSV
228              
229             See documentation for either Text::CSV or Text::CSV_XS.
230              
231             =back
232              
233             =item * Return Value
234              
235             Text::CSV::Hashify object.
236              
237             =item * Comment
238              
239             =back
240              
241             =cut
242              
243             sub new {
244 23     23 1 12767 my ($class, $args) = @_;
245 23         33 my %data;
246              
247 23 100 100     595 croak "Argument to 'new()' must be hashref"
248             unless (ref($args) and reftype($args) eq 'HASH');
249 21 100       167 croak "Argument to 'new()' must have 'file' element" unless $args->{file};
250             croak "Cannot locate file '$args->{file}'"
251 20 100       478 unless (-f $args->{file});
252 19         52 $data{file} = delete $args->{file};
253              
254 19 100 100     98 if ($args->{format} and ($args->{format} !~ m/^(?:h|a)oh$/i) ) {
255 1         117 croak "Entry '$args->{format}' for format is invalid'";
256             }
257 18   100     71 $data{format} = delete $args->{format} || 'hoh';
258              
259 18 100 100     64 if (! exists $args->{key} and $data{format} ne 'aoh') {
260 1         112 croak "Argument to 'new()' must have 'key' element unless 'format' element is 'aoh'";
261             }
262 17         32 $data{key} = delete $args->{key};
263              
264 17 100       45 if (defined($args->{max_rows})) {
265 6 100       38 if ($args->{max_rows} !~ m/^[0-9]+$/) {
266 3         323 croak "'max_rows' option, if defined, must be numeric";
267             }
268             else {
269 3         7 $data{max_rows} = delete $args->{max_rows};
270             }
271             }
272             # We've now handled all the Text::CSV::Hashify::new-specific options.
273             # Any remaining options are assumed to be intended for Text::CSV::new().
274              
275 14         27 $args->{binary} = 1;
276 14 50       89 my $csv = Text::CSV->new ( $args )
277             or croak "Cannot use CSV: ".Text::CSV->error_diag ();
278             open my $IN, "<", $data{file}
279 14 50       2061 or croak "Unable to open '$data{file}' for reading";
280 14         1520 my $header_ref = $csv->getline($IN);
281 14         905 my %header_fields_seen;
282 14         18 for (@{$header_ref}) {
  14         35  
283 107 100       130 if (exists $header_fields_seen{$_}) {
284 1         140 croak "Duplicate field '$_' observed in '$data{file}'";
285             }
286             else {
287 106         170 $header_fields_seen{$_}++;
288             }
289             }
290 13         25 $data{fields} = $header_ref;
291 13         17 $csv->column_names(@{$header_ref});
  13         80  
292              
293             # 'hoh format
294 13         458 my %keys_seen;
295 13         25 my @keys_list = ();
296 13         53 my %parsed_data;
297             # 'aoh' format
298             my @parsed_data;
299              
300 13         66 PARSE_FILE: while (my $record = $csv->getline_hr($IN)) {
301 133 100       7379 if ($data{format} eq 'hoh') {
302 123         155 my $kk = $record->{$data{key}};
303 123 100       139 if ($keys_seen{$kk}) {
304 1         169 croak "Key '$kk' already seen";
305             }
306             else {
307 122         187 $keys_seen{$kk}++;
308 122         142 push @keys_list, $kk;
309 122         114 $parsed_data{$kk} = $record;
310             last PARSE_FILE if (
311             defined $data{max_rows} and
312             scalar(keys %parsed_data) == $data{max_rows}
313 122 100 100     436 );
314             }
315             }
316             else { # format: 'aoh'
317 10         14 push @parsed_data, $record;
318             last PARSE_FILE if (
319             defined $data{max_rows} and
320             scalar(@parsed_data) == $data{max_rows}
321 10 100 100     65 );
322             }
323             }
324 12 100       590 $data{all} = ($data{format} eq 'aoh') ? \@parsed_data : \%parsed_data;
325 12 100       56 $data{keys} = \@keys_list if $data{format} eq 'hoh';
326 12         26 $data{csv} = $csv;
327 12         28 while (my ($k,$v) = each %{$args}) {
  26         88  
328 14         35 $data{$k} = $v;
329             }
330 12         233 return bless \%data, $class;
331             }
332              
333             =head2 C
334              
335             =over 4
336              
337             =item * Purpose
338              
339             Get a representation of all data found in a CSV input file.
340              
341             =item * Arguments
342              
343             $hash_ref = $obj->all; # when format is default or 'hoh'
344             $array_ref = $obj->all; # when format is 'aoh'
345              
346             =item * Return Value
347              
348             Reference representing all data records in the CSV input file. In the default
349             case, or if you have specifically requested C 'hoh'>, the return
350             value is a hash reference. When you have requested C 'aoh'>, the
351             return value is an array reference.
352              
353             =item * Comment
354              
355             In the default (C) case, the return value is equivalent to that of
356             C.
357              
358             =back
359              
360             =cut
361              
362             sub all {
363 5     5 1 3296 my ($self) = @_;
364 5         37 return $self->{all};
365             }
366              
367             =head2 C
368              
369             =over 4
370              
371             =item * Purpose
372              
373             Get a list of the fields in the CSV source.
374              
375             =item * Arguments
376              
377             $fields_ref = $obj->fields;
378              
379             =item * Return Value
380              
381             Array reference.
382              
383             =item * Comment
384              
385             If any field names are duplicate, you will not get this far, as C would
386             have died.
387              
388             =back
389              
390             =cut
391              
392             sub fields {
393 3     3 1 1297 my ($self) = @_;
394 3         7 return $self->{fields};
395             }
396              
397             =head2 C
398              
399             =over 4
400              
401             =item * Purpose
402              
403             Get a hash representing one record in the CSV input file.
404              
405             =item * Arguments
406              
407             $record_ref = $obj->record('value_of_key');
408              
409             One argument. In the default case (C 'hoh'>), this argument is the value in the record in the column serving as unique key.
410              
411             In the C 'aoh'> case, this will be index position of the data record
412             in the array. (The header row will be at index C<0>.)
413              
414             =item * Return Value
415              
416             Hash reference.
417              
418             =back
419              
420             =cut
421              
422             sub record {
423 15     15 1 9894 my ($self, $key) = @_;
424 15 100 100     844 croak "Argument to 'record()' either not defined or non-empty"
425             unless (defined $key and $key ne '');
426             ($self->{format} eq 'aoh')
427             ? return $self->{all}->[$key]
428 9 100       38 : return $self->{all}->{$key};
429             }
430              
431             =head2 C
432              
433             =over 4
434              
435             =item * Purpose
436              
437             Get value of one field in one record.
438              
439             =item * Arguments
440              
441             $datum = $obj->datum('value_of_key', 'field');
442              
443             List of two arguments: the value in the record in the column serving as unique
444             key; the name of the field.
445              
446             =item * Return Value
447              
448             Scalar.
449              
450             =back
451              
452             =cut
453              
454             sub datum {
455 14     14 1 6781 my ($self, @args) = @_;
456 14 100       295 croak "'datum()' needs two arguments" unless @args == 2;
457 11         39 for (my $i=0;$i<=$#args;$i++) {
458 19 100 100     595 croak "Argument to 'datum()' at index '$i' either not defined or non-empty"
459             unless ((defined($args[$i])) and ($args[$i] ne ''));
460             }
461             ($self->{format} eq 'aoh')
462             ? return $self->{all}->[$args[0]]->{$args[1]}
463 5 100       36 : return $self->{all}->{$args[0]}->{$args[1]};
464             }
465              
466             =head2 C
467              
468             =over 4
469              
470             =item * Purpose
471              
472             Get a list of all unique keys found in the input file.
473              
474             =item * Arguments
475              
476             $keys_ref = $obj->keys;
477              
478             =item * Return Value
479              
480             Array reference.
481              
482             =item * Comment
483              
484             If you have selected C 'aoh'> in the options to C, the
485             C method is inappropriate and will cause your program to die.
486              
487             =back
488              
489             =cut
490              
491             sub keys {
492 3     3 1 1246 my ($self) = @_;
493 3 100       11 if (exists $self->{keys}) {
494 2         4 return $self->{keys};
495             }
496             else {
497 1         117 croak "'keys()' method not appropriate when 'format' is 'aoh'";
498             }
499             }
500              
501             =head1 AUTHOR
502              
503             James E Keenan
504             CPAN ID: jkeenan
505             jkeenan@cpan.org
506             http://thenceforward.net/perl/modules/Text-CSV-Hashify
507              
508             =head1 COPYRIGHT
509              
510             This program is free software; you can redistribute
511             it and/or modify it under the same terms as Perl itself.
512              
513             The full text of the license can be found in the
514             LICENSE file included with this module.
515              
516             Copyright 2012-2017, James E Keenan. All rights reserved.
517              
518             =head1 BUGS
519              
520             There are no bug reports outstanding on Text::CSV::Hashify as of the most recent
521             CPAN upload date of this distribution.
522              
523             =head1 SUPPORT
524              
525             To report any bugs or make any feature requests, please send mail to
526             C or use the web interface at
527             L.
528              
529             =head1 ACKNOWLEDGEMENTS
530              
531             Thanks to Christine Shieh for serving as the alpha consumer of this
532             library's output.
533              
534             =head1 OTHER CPAN DISTRIBUTIONS
535              
536             =head2 Text-CSV and Text-CSV_XS
537              
538             These distributions underlie Text-CSV-Hashify and provide all of its
539             file-parsing functionality. Where possible, install both. That will enable
540             you to process a file with a single, shared interface but have access to the
541             faster processing speeds of XS where available.
542              
543             =head2 Text-CSV-Slurp
544              
545             Like Text-CSV-Hashify, Text-CSV-Slurp slurps an entire CSV file into memory,
546             but stores it as an array of hashes instead.
547              
548             =head2 Text-CSV-Auto
549              
550             This distribution inspired the C option to C.
551              
552             =cut
553              
554             1;
555