File Coverage

blib/lib/Data/CTable.pm
Criterion Covered Total %
statement 1322 1527 86.5
branch 373 632 59.0
condition 207 423 48.9
subroutine 164 183 89.6
pod 6 151 3.9
total 2072 2916 71.0


line stmt bran cond sub pod time code
1             #!/usr/bin/perl
2             ## Emacs: -*- tab-width: 4; -*-
3              
4 29     29   1281380 use strict;
  29         80  
  29         1999  
5              
6             package Data::CTable;
7              
8 29     29   173 use vars qw($VERSION); $VERSION = '1.03_02';
  29         63  
  29         5636  
9              
10             =pod
11              
12             =head1 NAME
13              
14             Data::CTable - Read, write, manipulate tabular data
15              
16             =head1 SYNOPSIS
17              
18             ## Read some data files in various tabular formats
19             use Data::CTable;
20             my $People = Data::CTable->new("people.merge.mac.txt");
21             my $Stats = Data::CTable->new("stats.tabs.unix.txt");
22              
23             ## Clean stray whitespace in fields
24             $People->clean_ws();
25             $Stats ->clean_ws();
26              
27             ## Retrieve columns
28             my $First = $People->col('FirstName');
29             my $Last = $People->col('LastName' );
30              
31             ## Calculate a new column based on two others
32             my $Full = [map {"$First->[$_] $Last->[$_]"} @{$People->all()}];
33              
34             ## Add new column to the table
35             $People->col(FullName => $Full);
36              
37             ## Another way to calculate a new column
38             $People->col('Key');
39             $People->calc(sub {no strict 'vars'; $Key = "$Last,$First";});
40              
41             ## "Left join" records matching Stats:PersonID to People:Key
42             $Stats->join($People, PersonID => 'Key');
43              
44             ## Find certain records
45             $Stats->select_all();
46             $Stats->select(Department => sub {/Sale/i }); ## Sales depts
47             $Stats->omit (Department => sub {/Resale/i}); ## not Resales
48             $Stats->select(UsageIndex => sub {$_ > 20.0}); ## high usage
49              
50             ## Sort the found records
51             $Stats->sortspec('DeptNum' , {SortType => 'Integer'});
52             $Stats->sortspec('UsageIndex', {SortType => 'Number' });
53             $Stats->sort([qw(DeptNum UsageIndex Last First)]);
54              
55             ## Make copy of table with only found/sorted data, in order
56             my $Report = $Stats->snapshot();
57              
58             ## Write an output file
59             $Report->write(_FileName => "Rept.txt", _LineEnding => "mac");
60              
61             ## Print a final progress message.
62             $Stats->progress("Done!");
63              
64             ## Dozens more methods and parameters available...
65              
66             =head1 OVERVIEW
67              
68             Data::CTable is a comprehensive utility for reading, writing,
69             manipulating, cleaning and otherwise transforming tabular data. The
70             distribution includes several illustrative subclasses and utility
71             scripts.
72              
73             A Columnar Table represents a table as a hash of data columns, making
74             it easy to do data cleanup, formatting, searching, calculations,
75             joins, or other complex operations.
76              
77             The object's hash keys are the field names and the hash values hold
78             the data columns (as array references).
79              
80             Tables also store a "selection" -- a list of selected / sorted record
81             numbers, and a "field list" -- an ordered list of all or some fields
82             to be operated on. Select() and sort() methods manipulate the
83             selection list. Later, you can optionally rewrite the table in memory
84             or on disk to reflect changes in the selection list or field list.
85              
86             Data::CTable reads and writes any tabular text file format including
87             Merge, CSV, Tab-delimited, and variants. It transparently detects,
88             reads, and preserves Unix, Mac, and/or DOS line endings and tab or
89             comma field delimiters -- regardless of the runtime platform.
90              
91             In addition to reading data files, CTable is a good way to gather,
92             store, and operate on tabular data in memory, and to export data to
93             delimited text files to be read by other programs or interactive
94             productivity applications.
95              
96             To achieve extremely fast data loading, CTable caches data file
97             contents using the Storable module. This can be helpful in CGI
98             environments or when operating on very large data files. CTable can
99             read an entire cached table of about 120 megabytes into memory in
100             about 10 seconds on an average mid-range computer.
101              
102             For simple data-driven applications needing to store and quickly
103             retrieve simple tabular data sets, CTable provides a credible
104             alternative to DBM files or SQL.
105              
106             For data hygiene applications, CTable forms the foundation for writing
107             utility scripts or compilers to transfer data from external sources,
108             such as FileMaker, Excel, Access, personal organizers, etc. into
109             compiled or validated formats -- or even as a gateway to loading data
110             into SQL databases or other destinations. You can easily write short,
111             repeatable scripts in Perl to do reporting, error checking, analysis,
112             or validation that would be hard to duplicate in less-flexible
113             application environments.
114              
115             The data representation is simple and open so you can directly access
116             the data in the object if you feel like it -- or you can use accessors
117             to request "clean" structures containing only the data or copies of
118             it. Or you can build your own columns in memory and then when you're
119             ready, turn them into a table object using the very flexible new()
120             method.
121              
122             The highly factored interface and implementation allow fine-grained
123             subclassing so you can easily create useful lightweight subclasses.
124             Several subclasses are included with the distribution.
125              
126             Most defaults and parameters can be customized by subclassing,
127             overridden at the instance level (avoiding the need to subclass too
128             often), and further overridden via optional named-parameter arguments
129             to most major method calls.
130              
131             =head2 Similar / related modules on CPAN
132              
133             The Data::Table module by Yingyao Zhou & Guangzhou Zou offers similar
134             functionality, but uses a different underlying data representation
135             (2-dimensional array), and has a somewhat different feature set.
136             Check it out. Maybe you will prefer it for your application.
137              
138             http://search.cpan.org/search?mode=module&query=Data::Table
139              
140             The Data::ShowTable module renders tables in various viewable formats.
141             CTable relies on ShowTable's ShowBoxTable method to implement its own
142             format() and out() methods.
143              
144             http://search.cpan.org/search?mode=module&query=Data::ShowTable
145              
146              
147             =head2 Prerequisites
148              
149             The CTable documentation, source code, and examples assume familiarity
150             with large nested data structures, object-oriented syntax and
151             terminology, and comfort working with array and hash references and
152             array and hash slice syntax.
153              
154             See the perlref man page for more on these topics.
155              
156              
157             =head2 How to learn more
158              
159             Dozens more methods, parameters, and examples are described below.
160              
161             See the full source code in CTable.pm.
162              
163             Or, after installing, read the man page using:
164              
165             man Data::CTable
166             perldoc Data::CTable
167              
168             See the eg/ (examples) folder in the Data::CTable distribution and the
169             test.pl script for scripts demonstrating every CTable method.
170              
171             For latest version and other news, check the Data::CTable home page:
172              
173             http://christhorman.com/projects/perl/Data-CTable/
174              
175             Or search CPAN:
176              
177             http://search.cpan.org/search?mode=module&query=Data::CTable
178              
179             =head1 INSTALLATION
180              
181             Using CPAN module:
182              
183             perl -MCPAN -e 'install Data::CTable'
184              
185             Or manually:
186              
187             tar xzvf Data-CTable*gz
188             cd Data-CTable-?.??
189             perl Makefile.PL
190             make
191             make test
192             make install
193              
194             =head1 INCLUDED SUBCLASSES AND UTILITIES
195              
196             In addition to the module itself, there are a number of subclasses and
197             simple utilities included with the Data::CTable distribution.
198              
199             =head2 Subclases
200              
201             The Data::CTable distribution includes these example subclasses. Each
202             is installed in your Perl environment along with the main module, and
203             so may be used by your scripts. Each has its own man/perldoc page
204             containing more detail.
205              
206             B is a subclass that logs all progress
207             messages to a list within the object itself rather than (merely)
208             echoing them to STDERR. Later, you may retrieve and examine the list.
209              
210             B is a virtual subclass that includes class and
211             object methods that make it easy to write a simple interactive
212             command-line program that parses options and outputs a table.
213              
214             B is a very useful subclass of
215             Data::CTable::Script that implements a souped-up Unix-like "ls" (file
216             listing) command -- it first gets an optionally-recursive listing of
217             any number of files and/or directories, builds a list of their full
218             absolute or relative paths, then build a Data::CTable::Listing object
219             that contains all the paths, plus about 25+ other pieces of useful
220             information about each file or directory.
221              
222             The "tls" utility, below, is simply a command-line cover for this
223             class, but you could use this class in your own scripts in order to
224             get detailed file listings.
225              
226             =head2 Utilities
227              
228             Each of these utilities is provided mainly so you can see mature
229             examples of how to use Data::CTable in real-world scripts.
230              
231             But each is also genuinely useful, too, and you may enjoy adding them
232             to your regular bag of tricks or using them as an easily-modifiable
233             basis for scripts of your own.
234              
235             On most systems, these will be installed in an appropriate directory
236             in your path when you install CTable, and hence will be executable
237             just by typing their file name. (On Windows, they'll be wrapped by a
238             .bat file and installed in C:\Perl\bin or equivalent. On *nix,
239             they'll be in /usr/bin/ or equivalent.)
240              
241             B is a command-line utility that wraps Data::CTable::Listing to
242             implement a variant on the classic Unix "ls" command in which an
243             internal CTable object is used to hold and calculate a very large
244             amount of meta-data about each file and directory and then output that
245             data in formatted tables, or as any kind of delimited text file, and
246             with much more flexibility and control over included data, and sort
247             and sub-sort order than with ls.
248              
249             B is a command-line utility that reads each of its arguments as
250             a Data::CTable file and then calls the out() method to display its
251             entire contents. (Warning: out() is slow with very large data sets.)
252              
253             B is a command-line utility that takes a US zip code,
254             grabs the local weather report from a popular weather web site and
255             uses a CTable object to store, process, and clean, and present the
256             table of weather data that results in a simple text format.
257              
258             =cut
259            
260             ## Required dependencies
261              
262 29     29   85720 use IO::File;
  29         524819  
  29         6021  
263 29     29   305 use Config; qw(%Config);
  29         63  
  29         1837  
264              
265 29     29   363 use Carp qw(croak carp confess cluck);
  29         57  
  29         2445  
266 29     29   64709 use Storable qw(store nstore retrieve dclone);
  29         157544  
  29         3960  
267 29     29   452 use File::Basename qw(fileparse);
  29         66  
  29         3786  
268              
269              
270             ## Optional dependencies
271              
272             my $HaveDumper;
273             my $HaveShowTable;
274              
275             BEGIN
276             {
277 29     29   4581 eval "
  29     29   63673  
  29     29   574221  
  29         3644  
  29         32006  
  0         0  
  0         0  
278             use Data::Dumper qw(Dumper);
279             use Data::ShowTable qw(ShowBoxTable);
280             ";
281            
282 29         204 $HaveDumper = $Data::Dumper::VERSION;
283 29         1218 $HaveShowTable = exists($Data::ShowTable::{ShowBoxTable});
284             };
285              
286              
287             ## We optionally export a few general-purpose utility routines.
288              
289 29     29   173 use Exporter; use vars qw(@ISA @EXPORT_OK); @ISA=qw(Exporter);
  29     29   57  
  29         1376  
  29         160  
  29         56  
  29         153849  
290             @EXPORT_OK = qw(
291             &ISORoman8859_1ToMacRoman
292             &MacRomanToISORoman8859_1
293            
294             &ISORoman8859_1ToMacRoman_clean
295             &MacRomanToISORoman8859_1_clean
296              
297             &guess_endings
298             &guess_delimiter
299              
300             &path_info
301             &path_is_absolute
302              
303             &min
304             &max
305             );
306              
307             =pod
308              
309             =head1 CREATING TABLE OBJECTS
310              
311             ## Create an object / read file(s) / override params
312              
313             use Data::CTable;
314              
315             $t = Data::CTable->new()
316             $t = Data::CTable->new($File)
317             $t = Data::CTable->new($File1, $File2....)
318             $t = Data::CTable->new($Params)
319             $t = Data::CTable->new($Params, $File1)
320             $t = Data::CTable->new($Params, $File1, $File2....)
321              
322             ## Internal initializer (subclassable): called for you by new()
323              
324             $t->initialize()
325              
326             If the first argument to new() is a hash ref, it is the $Params hash
327             of initial parameters and/or data columns which, if supplied will form
328             the starting point for the object being created. Any non-hash and/or
329             further arguments to new() are treated as file names to be opened.
330              
331             If supplied, data in the $Params hash will be shallowly copied -- the
332             original hash object passed will not be used, but any sub-structures
333             within it will now "belong" to the resulting new object which will
334             feel free to manipulate them or discard them.
335              
336             Then, any parameters not supplied (usually most of them) will be
337             defaulted for you because new() will call the internal method
338             initialize() before the object is finished and returned.
339              
340             See the PARAMETER REFERENCE section below for the parameters you can
341             choose to supply or have defaulted for you.
342              
343             Any file name arguments will be read and appended into a single
344             object. If any of the files fails to be read, then new() will fail
345             (return a false value) and no object will be created.
346              
347             initialize() makes sure there is a legal and consistent value for
348             every internal parameter in the object. Generally, initialize()
349             leaves alone any parameters you supplied to new(), and simply sets
350             default values for any that were not yet supplied.
351              
352             You should never need to call initialize() directly.
353              
354             After calling initialize(), new() then calls the append_files_new()
355             method to process the filename arguments. This method then calls
356             read() on the first filename, and then append_file on all subsequent
357             file names, appending them to the first, in sequence.
358              
359             See append() for an explanation of how the data from multiple files is
360             combined into a single table.
361              
362             =head2 Advanced: Using a template object
363              
364             ## Calling new() with a template object
365              
366             $v = $t->new()
367             $v = $t->new($File)
368             $v = $t->new($File1, $File2....)
369             $v = $t->new($Params)
370             $v = $t->new($Params, $File1)
371             $v = $t->new($Params, $File1, $File2....)
372              
373             You can also call $t->new() to use an existing object, $t, as a
374             template for the new object. $t->new() will create a new object of
375             the same class or subclass as the template object. Furthermore, the
376             template object, if provided, will be used as a starting point for the
377             resulting object -- in fact, it will initially share shallow copies of
378             all data columns, if any, and all internal parameters and data
379             structures, if any.
380              
381             This advanced shared-data technique could be used to create two
382             separate table objects that share and operate on the same underlying
383             data columns in memory but have different custom field lists, custom
384             selections, sort behavior, etc. But don't do this unless you're sure
385             you understand what you're doing, because changing data in one table
386             would change it in the other.
387              
388             =head1 PARAMETER REFERENCE
389              
390             The parameters listed here are recognized by new(), initialize(), and
391             by many functions that use the named-parameter calling convention,
392             such as read(), write(), sort(), and others.
393              
394             Any parameter listed may be specified when using new().
395              
396             Most parameters should not be directly accessed or manipulated once
397             the object has been created, except those that have appropriate
398             accessor methods described throughout in this documentation.
399              
400             Each parameter in the lists below is listed along with its defaulting
401             logic as performed by new() via initialize().
402              
403             =head2 Custom field list and custom selection (record list)
404              
405             =over 4
406              
407             =item _FieldList ||= undef;
408              
409             This is the ordered list (array reference) of columns / fields present
410             in the object. It is set by read() to reflect the names and order of
411             the fields encountered or actually read from in the incoming data
412             file, if any. Initially this list is undefined, and if removed or
413             left undefined, then the de-facto field list will be a list of all
414             columns present in the table object, in alphabetical order (see
415             fieldlist() and fieldlist_all()).
416              
417             Normally, all fields present in the table object would be listed in
418             the field list.
419              
420             However, this parameter may be set in the object (or overridden in
421             named-parameter function calls like read(), write(), etc.) to cause a
422             subset of fields to be read, written or otherwise used. If a subset
423             field list is specified before reading a data file, then ONLY fields
424             listed will be read -- this is a way to read just certain fields from
425             a very large file, but may not always be what you want.
426              
427             Specifying the field list before calling read() is required if the
428             data file has no header row giving names to its fields -- the names
429             you specify in the field list, in order, will be applied to the data
430             file being read. See the _HeaderRow parameter, below.
431              
432             If a subset field list is used after columns are already loaded in
433             memory, the columns not listed in the field list will still be present
434             in the object (and can be listed by calling fieldlist_all()), but they
435             will be omitted from most operations that iterate over fields.
436              
437             =item _Selection ||= undef;
438              
439             This is a list of the record numbers of "selected" records in the
440             table, possibly indicating sorted order.
441              
442             If absent, then all records are considered to be selected, in
443             "natural" order -- i.e. the order they occur in the file.
444              
445             You can create and set your own selection list or get and modify an
446             existing one. Deleting it resets the selection (for example, by
447             calling select_all()).
448              
449             Calling sort() will create a _Selection if none existed. Otherwise it
450             operates by modifying the existing _Selection, which may be a subset
451             of all record numbers.
452              
453             =back
454              
455             =head2 Cache behavior controls
456              
457             See sections related to Cacheing, below.
458              
459             =over 4
460              
461             =item _CacheOnRead = 1 unless exists
462              
463             Boolean: whether data files read by the read() method should be cached
464             after reading. Once cached, the data will be read from the cache
465             instead of the original file the NEXT TIME READ() IS CALLED, but only
466             if: 1) the cache file is found, and 2) its date is later than the
467             original. Otherwise, the cache file is ignored or re-written.
468             Cacheing can be up to 10x faster than parsing the file, so it's almost
469             always worth doing in any situation where you'll be reading a data
470             file more often than writing it.
471              
472             This parameter defaults to true.
473              
474             =item _CacheOnWrite = 0 unless exists
475              
476             Boolean: whether tables written by the write() method should be cached
477             after writing. This defaults to false on the assumption that the
478             program won't need to re-read a file it just wrote. However, this
479             behavior would be useful if a later step in your program or another
480             program will be reading the file that was written and would benefit
481             from having the cacheing already done. Cacheing a file after writing
482             is quite fast since the data is already in memory and of course it
483             speeds up subsequent read() operations by up to 10x.
484              
485             =item _CacheExtension = ".cache" unless exists
486              
487             This is the file name extension that is added to a file's name to
488             determine the name of its corresponding cache file. (First, any
489             existing extension, if any, is removed.) If this extension is empty,
490             then the cache file will be named the same as the original assuming
491             the cache file is being stored in a different directory. (See next
492             setting.)
493              
494             =item _CacheSubDir = "cache" unless exists
495              
496             This is the absolute or relative path to the subdirectory that should
497             be used to store the cache files. The default value is the relative
498             path to a directory called "cache". Relative paths will be appended
499             to the directory path containing the original file being read.
500              
501             Absolute cache paths (such as /tmp or c:\temp\) can also be used.
502              
503             Override _CacheExtension and _CacheSubdir in a subclass, in each
504             object, or in each call to read() or write() in order to have the
505             cache files stored elsewhere. But remember: unless you use the same
506             cache settings next time you read the same file, the cache files will
507             be orphaned.
508              
509             =back
510              
511             =head2 Progress routine / setting
512              
513             =over 4
514              
515             =item _Progress = undef unless exists
516              
517             The _Progress setting controls the routing of diagnostic messages.
518             Four possible settings are recognized:
519              
520             undef (default) The class's progress settings are used.
521             subroutine reference Your own custom progress routine.
522             true Built-in progress_default() method used.
523             0/false No progress messages for this object.
524              
525             See the PROGRESS section for a description of the interface of custom
526             progress routines and for details on how the builtin one works.
527              
528             =back
529              
530             =head2 File format settings
531              
532             =over 4
533              
534             =item _LineEnding ||= undef;
535              
536             _LineEnding indicates the line ending string or setting to be used to
537             read a file, the setting that actually I used to read a file,
538             and/or the line ending that will be used to write a file.
539              
540             Set this parameter to force a particular encoding to be used.
541              
542             Otherwise, leave it undef. The program will Do What You Mean.
543              
544             If _LineEnding is undef when read() is called, read() will try to
545             guess the line ending type by inspecting the first file it reads.
546             Then it will set this setting for you. It can detect DOS, Unix, and
547             Mac line endings.
548              
549             If _LineEnding is undef when write() is called, write() will use
550             C<"\n">, which yields different strings depending on the current
551             runtime platform: \x0A on Unix; \x0D in MacPerl, \x0D\x0A on DOS.
552              
553             Otherwise, write() uses the value defined in _LineEnding, which would
554             match the value filled in by read() if this object's data originally
555             had been read from a file. So if you read a file and then later write
556             it out, the line endings in the written file will match the format of
557             original unless you override _LineEnding specifically.
558              
559             Since Data::CTable supports reading and writing all common endings,
560             base your decision on line ending format during write() on the needs
561             of other programs you might be using.
562              
563             For example: FileMaker Pro and Excel crash / hang if Unix line endings
564             are used, so be sure to use the ending format that matches the needs
565             of the other programs you plan to use.
566              
567             As a convenience, you may specify and retrieve the _LineEnding setting
568             using the mnemonic symbols "mac", "dos" and "unix." These special
569             values are converted to the string values shown in this chart:
570              
571             symbol string value chars decimal octal control
572             -------------------------------------------------------------
573             dos "\x0D\x0A" CR/LF 13,10 "\015\012" ^M^J
574             mac "\x0D" CR 13 "\015" ^M
575             unix "\x0A" LF 10 "\012" ^J
576              
577             See the section LINE ENDINGS, below, for accessor methods and
578             conversion utilities that help you get/set this parameter in either
579             symbolic format or string format as you prefer.
580              
581             =item _FDelimiter ||= undef;
582              
583             _FDelimiter is the field delimiter between field names in the header
584             row (if any) and also between fields in the body of the file. If
585             undef, read() will try to guess whether it is tab C<"\t"> or comma
586             <",">, and set this parameter accordingly. If there is only one field
587             in the file, then comma is assumed by read() and will be used by
588             write().
589              
590             To guess the delimiter, the program looks for the first comma or tab
591             character in the header row (if present) or in the first record.
592             Whichever character is found first is assumed to be the delimiter.
593              
594             If you don't want the program to guess, or you have a data file format
595             that uses a custom delimiter, specify the delimiter explicitly in the
596             object or when calling read() or make a subclass that initializes this
597             value differently. On write(), this will default to comma if it is
598             empty or undef.
599              
600             =item _QuoteFields = undef unless exists
601              
602             _QuoteFields controls how field values are quoted by write() when
603             writing the table to a delimited text file.
604              
605             An undef value (the default) means "auto" -- each field is checked
606             individually and if it contains either the _FDelimiter character or a
607             double-quote character, the field value will be surrounded by
608             double-quotes as it is written to the file. This method is slower to
609             write but faster to read, and may make the output easier for humans to
610             read.
611              
612             A true value means always put double-quotes around every field value.
613             This mode is faster to write but slower to read.
614              
615             A zero value means never to use double-quotes around field values and
616             not to check for the need to use them. This method is the fastest to
617             read and write. You may use it when you are certain that your data
618             can't contain any special characters. However, if you're wrong, this
619             mode will produce a corrupted file in the event that one of the fields
620             does contain the active delimiter (such as comma or tab) or a quote.
621              
622             =item _HeaderRow = 1 unless exists
623              
624             _HeaderRow is a boolean that says whether to expect a header row in
625             data files. The default is true: a header row is required. If false,
626             _FieldList MUST be present before calling read() or an error will be
627             generated. In this latter case, _FieldList will be assumed to give
628             the correct names of the fields in the file, in order, before the file
629             is read. In other words, the object expects that either a) it can get
630             the field names from the file's header row or b) you will supply them
631             before read() opens the file.
632              
633             =back
634              
635             =head2 Encoding of return characters within fields
636              
637             =over 4
638              
639             =item _ReturnMap = 1 unless exists
640              
641             _ReturnMap says that returns embedded in fields should be decoded on
642             read() and encoded again on write(). The industry-standard encoding
643             for embedded returns is ^K (ascii 11 -- but see next setting to change
644             it). This defaults to true but can be turned off if you want data
645             untouched by read(). This setting has no effect on data files where
646             no fields contain embedded returns. However, it is vital to leave
647             this option ON when writing any data file whose fields could contain
648             embedded returns -- if you have such data and call write() with
649             _ReturnMap turned off, the resulting file will be an invalid Merge/CSV
650             file and might not be re-readable.
651              
652             When these fields are decoded on read(), encoded returns are converted
653             to C<"\n"> in memory, whatever its interpretation may be on the current
654             platform (\x0A on Unix or DOS; \x0D on MacPerl).
655              
656             IMPORTANT NOTE: When these fields are encoded by write(), any
657             occurrence of the current _LineEnding being used to write the file is
658             searched and encoded FIRST, and THEN, any occurrence of "\n" is also
659             searched and encoded. For example, if using mac line endings (^M) to
660             write a file on a Unix machine, any ^M characters in fields will be
661             encoded, and then any "\n" (^J) characters will ALSO be encoded. This
662             may not be what you want, so be sure you know how your data is encoded
663             in cases where your field values might contain any ^J and/or ^M
664             characters.
665              
666             IMPORTANT NOTE: If you turn _ReturnMap off, fields with returns in
667             them will still be double-quoted correctly. Some parsers of tab- or
668             comma-delimited files are able to support reading such files.
669             HOWEVER, the parser in this module's read() method DOES NOT currently
670             support reading files in which a single field value appears to span
671             multiple lines in the file. If you have a need to read such a file,
672             you may need to write your own parser as a subclass of this module.
673              
674             =item _ReturnEncoding ||= "\x0B";
675              
676             This is the default encoding to assume when embedding return
677             characters within fields. The industry standard is "\x0B" (ascii 11 /
678             octal \013 / ^K) so you should probably not ever change this setting.
679              
680             When fields are encoded on write(), C<"\n"> is converted to this
681             value. Note that different platforms use different ascii values for
682             C<"\n">, which is another good reason to leave the ReturnEncoding
683             feature enabled when calling write().
684              
685             To summarize: this module likes to assume, and you should too, that
686             returns in data files on disk are encoded as "\x0B", but once loaded
687             into memory, they are encoded as the current platform's value of
688             C<"\n">.
689              
690             =item _MacRomanMap = undef unless exists
691              
692             Data::CTable assumes by default that you want field data in memory to
693             be in the ISO 8859-1 character set (the standard for Latin 1 Roman
694             characters on Unix and Windows in the English and Western European
695             languages -- and also the default encoding for HTML Web pages).
696              
697             _MacRomanMap controls the module's optional mapping of Roman
698             characters from Mac format on disk to ISO format in memory when
699             reading and writing data files. These settings are recognized:
700              
701             undef ## Auto: Read/write Mac chars if using Mac line endings
702             1 ## On: Assume Mac char set in all fields
703             0 ## Off: Don't do any character mapping at all
704              
705             The default setting is undef, which enables "Auto" mode: files found
706             to contain Mac line endings will be assumed to contain Mac upper-ASCII
707             characters and will be mapped to ISO on read(); and files to be
708             written with Mac line endings will mapped back from ISO to Mac format
709             on write().
710              
711             If your data uses any non-Latin-1 character sets, or binary data, or
712             you really want Mac upper-ASCII characters in memory, or you just
713             don't want this module messing with your encodings, set this option to
714             0 (Off) or make a subclass that always sets it to 0.
715              
716             See also the clean() methods that can help you translate just the
717             columns you want after reading a file or before writing it, which may
718             be faster for you if only a few fields might contain high-ASCII
719             characters.
720              
721             =item _FileName ||= undef;
722              
723             This is the name of the file that should be read from or WAS read
724             from. (read() will set _FileName to the value it used to read the
725             file, even if _FileName was only supplied as a named parameter.)
726              
727             This name will also be used, unless overridden, to re-write the file
728             again, but with an optional extension added. (See next setting.)
729              
730             =item _WriteExtension = ".out" unless exists
731              
732             The _WriteExtension is provided so that CTable won't overwrite your
733             input data file unless you tell it to.
734              
735             _WriteExtension will be added to the object's _FileName setting to
736             create a new, related file name, before writing.... UNLESS _FileName
737             is supplied as an direct or named parameter when calling write().
738              
739             In the latter case, write() uses the file name you supply and adds no
740             extension, even if this would mean overwriting the original data file.
741              
742             To add _WriteExtension, write() places it prior to any existing final
743             extension in the _FileName:
744              
745             _FileName default file name used by write()
746             --------------------------------------------------------------
747             People.merge.txt People.merge.out.txt
748             People People.out
749              
750             If you want to always overwrite the original file without having to
751             supply _FileName each time, simply set _WriteExtension to undef in a
752             subclass or in each instance.
753              
754             If _CacheOnWrite is true, then the _WriteExtension logic is applied
755             first to arrive at the actual name of the file to be written, and then
756             the _CacheExtension logic is applied to that name to arrive at the
757             name of the cache file to be written.
758              
759             =back
760              
761             =head2 Sorting-related parameters
762              
763             =over 4
764              
765             =item _SortOrder ||= undef;
766              
767             _SortOrder is the list of fields which should be used as primary,
768             secondary, etc. sort keys when sort() is called. Like other
769             parameters, it may be initialized by a subclass, stored in the object,
770             or provided as a named parameter on each call to sort().
771              
772             If _SortOrder is empty or undefined, then sort() sorts the records by
773             record number (i.e. they are returned to their "natural" order).
774              
775             =item _SortSpecs ||= {};
776              
777             _SortSpecs is a hash of specifications for the SortType and
778             SortDirection of fields on which sorting may be done. For any field
779             missing a sort spec or the SortType or SortDirection components of its
780             sort spec, the _DefaultSortType and _DefaultSortDirection settings
781             will be used. So, for example, if all fields are of type String and
782             you want them to sort Ascending, then you don't need to worry about
783             _SortSpecs. You only need to provide specs for fields that don't take
784             the default settings.
785              
786             _SortSpecs might look like this:
787              
788             {Age => {SortType => 'Integer'},
789             NameKey => {SortType => 'Text', SortDirection => -1}}
790              
791             =item _SRoutines ||= {};
792              
793             _SRoutines is a hash mapping any new SortTypes invented by you to your
794             custom subroutines for sorting that type of data. (See the section on
795             sort routines, below, for a full discussion.)
796              
797             =back
798              
799             =head2 Sorting defaults
800              
801             =over 4
802              
803             =item _DefaultSortType ||= 'String';
804              
805             If you sort using a field with no sort spec supplied, or whose sort
806             spec omits the SortType, it will get its SortType from this parameter.
807              
808             See the sections below on SORT TYPES and SORT ROUTINES.
809              
810             =item _DefaultSortDirection ||= 1;
811              
812             If you sort using a field with no sort spec supplied, or whose sort
813             spec omits the SortDirection, it will get its SortDirection from this
814             parameter.
815              
816             Legal sort directions are: 1 (Ascending) or -1 (Descending).
817              
818             See the section below on DEFAULT SORT DIRECTION.
819              
820             =back
821              
822             =head2 Miscellaneous parameters
823              
824             =over 4
825              
826             =item _ErrorMsg ||= "";
827              
828             This parameter is set by read() or write() methods that encounter an
829             error (usually a parameter error or file-system error) that prevents
830             them from completing. If those methods or any methods that call them
831             return a false value indicating failure, then _ErrorMsg will contain a
832             string explaining the problem. The message will also have been passed
833             to the progress() method for possible console feedback.
834              
835             =item _Subset
836              
837             This parameter is set to 1 (true) by read() if the last call to read()
838             brought in a subset of the fields available in the file; 0 otherwise.
839              
840             The object uses this field internally so it knows to abandon any cache
841             files that might not contain all requested fields upon read().
842              
843             =back
844              
845             =head1 SUBCLASSING
846              
847             Most subclasses will override initialize() to set default values for
848             the parameters of the parent class and then they may provide default
849             values for other subclass-specific parameters. Then, the subclass's
850             initialize() should call SUPER::initialize() to let the parent
851             class(es) take care of the remaining ones.
852              
853             Every initialize() method should always allow for parameters to have
854             already been provided by the $Params hash or template object. It
855             should not overwrite any valid values that already exist.
856              
857             The following sample subclass changes the default setting of the
858             _Progress parameter from undef to 1 and then overrides the
859             progress_default() method to log all progress messages into a new
860             "_ProgrLog" (progress log) parameter stored in the object.
861              
862             BEGIN
863             { ## Data::CTable::ProgressLogger: store messages in the object
864              
865             package Data::CTable::ProgressLogger;
866             use vars qw(@ISA); @ISA=qw(Data::CTable);
867              
868             sub initialize ## Add a new param; change one default
869             {
870             my $this = shift;
871             $this->{_Progress} = 1 unless exists($this->{_Progress});
872             $this->{_ProgrLog} ||= [];
873             $this->SUPER::initialize();
874             }
875              
876             sub progress_default ## Log message to object's ProgMsgs list
877             {
878             my $this = shift;
879             my ($msg) = @_;
880             chomp $msg;
881             push @{$this->{_ProgrLog}}, localtime() . " $msg";
882              
883             return(1);
884             }
885              
886             sub show_log ## Use Dumper to spit out the log list
887             {
888             my $this = shift;
889             $this->dump($this->{_ProgrLog});
890             }
891             }
892              
893             ## Later...
894              
895             my $Table = Data::CTable::ProgressLogger->new("mydata.txt");
896             # ... do stuff...
897             $Table->write();
898             $Table->show_log();
899              
900             =cut
901            
902             {}; ## Get emacs to indent correctly.
903              
904             sub new
905             {
906             ## First arg to new is always either class name or a template
907             ## object. This allows $obj->new() or CLASS->new().
908              
909             ## Second argument (if and only if it is a hash ref or an object
910             ## whose underlying representation is a hash ref) is an optional
911             ## anonymous hash of parameters which if supplied, will override
912             ## any parameters already found in the template object, if any.
913              
914             ## See the initialize method, below, for a list of parameters that
915             ## can be supplied (and will be defaulted for you if not
916             ## supplied).
917              
918             ## Note that the template object and the params hash will be
919             ## SHALLOWLY copied -- the original hash objects passed will not
920             ## be used, but any sub-structures within them will now "belong"
921             ## to the resulting new object which will feel free to manipulate
922             ## them, possibly invalidating the integrity of the original
923             ## template object.
924              
925 126     126 0 4971 my $ClassOrObj = shift;
926 126 100       1124 my ($Params) = {%{shift()}} if UNIVERSAL::isa($_[0], 'HASH');
  83         820  
927              
928             ## Shallow-copy all params from template object and/or optional
929             ## $Params hash into new hash. DON'T re-use caller's obj or hash.
930              
931 126 50       3711 my $this =
932 126 100       1454 {%{(UNIVERSAL::isa($ClassOrObj, 'HASH') ? $ClassOrObj : {})},
933 126         317 %{(UNIVERSAL::isa($Params, 'HASH') ? $Params : {})}};
934            
935             ## Bless the new object into the class
936            
937 126   33     1006 my $class = ref($ClassOrObj) || $ClassOrObj;
938 126         322 bless $this, $class;
939            
940 126         200 my $Success;
941              
942             ## Run the subclassable initialize() method to create default
943             ## settings for any private parameters.
944              
945 126 50       513 goto done unless $this->initialize();
946              
947             ## Finally, process any (other) arguments to new(), if any.
948            
949 126         333 my $RemainingArgs = [@_];
950            
951 126 50       702 goto done unless $this->process_new_args($RemainingArgs, $Params);
952            
953 126         194 $Success = 1;
954 126 50       1102 done:
955             return ($Success ? $this : undef);
956             }
957              
958             ### process_new_args
959              
960             ### Any optional remaining (non-HASH ref) arguments to new() are
961             ### treated as file names of files to open and append to the in-memory
962             ### table, creating new columns as necessary. We call the
963             ### subclassable append_files_new() method to process these.
964              
965             sub process_new_args
966             {
967 126     126 0 218 my $this = shift;
968 126         213 my ($RemainingArgs, $Params) = @_;
969              
970 126         189 my $Success;
971              
972 126 50       499 $Success = $this->append_files_new($RemainingArgs, $Params) or goto done;
973            
974 126         194 $Success = 1;
975 126         394 done:
976             return ($Success);
977             }
978              
979             ### initialize
980              
981             ### Assumptions made by initialize() (and all other methods, too):
982              
983             ### The blessed object is a hash ref.
984              
985             ### All hash keys beginning with _ are reserved for non-data columns.
986              
987             ### Hash keys beginning with a single _ are reserved for future
988             ### versions of this parent class implementation. Subclasses might
989             ### want to use double-underscore for additional slots.
990              
991             ### All other hash keys are field names; their values are data
992             ### columns (array references).
993              
994             ### initialize() sets / validates initial settings for all parameters
995             ### recognized by this parent class. It exercises caution to not
996             ### override any legal values previously set by the
997             ### subclass::initialize() or by new().
998              
999             sub initialize
1000             {
1001 126 50   126 0 1155 my $this = shift or goto done;
1002              
1003 126         204 my $Success;
1004              
1005             ## Reading / writing
1006              
1007 126   100     698 $this->{_FileName} ||= undef; ## Path of file that was read
1008              
1009 126 100       505 $this->{_WriteExtension} = ".out" unless exists($this->{_WriteExtension});
1010              
1011             ## Cache settings
1012              
1013 126 100       1102 $this->{_CacheOnRead} = 1 unless exists($this->{_CacheOnRead});
1014 126 100       458 $this->{_CacheOnWrite} = 0 unless exists($this->{_CacheOnWrite});
1015 126 100       480 $this->{_CacheExtension} = ".cache" unless exists($this->{_CacheExtension});
1016 126 100       456 $this->{_CacheSubDir} = "cache" unless exists($this->{_CacheSubDir});
1017              
1018             ## File format settings
1019              
1020 126   100     574 $this->{_LineEnding} ||= undef;
1021 126   100     697 $this->{_FDelimiter} ||= undef;
1022 126 100       500 $this->{_QuoteFields} = undef unless exists ($this->{_QuoteFields});
1023 126 100       441 $this->{_HeaderRow} = 1 unless exists ($this->{_HeaderRow});
1024              
1025             ## Return encodings
1026              
1027 126 100       11049 $this->{_ReturnMap} = 1 unless exists ($this->{_ReturnMap});
1028 126   100     803 $this->{_ReturnEncoding} ||= "\x0B"; ## Char to use for return chars
1029 126 100       480 $this->{_MacRomanMap} = undef unless exists ($this->{_MacRomanMap});
1030              
1031             ## Sorting defaults
1032              
1033 126   100     640 $this->{_DefaultSortType} ||= 'String';
1034 126   100     623 $this->{_DefaultSortDirection} ||= 1; ## Ascending (-1 = desc)
1035              
1036             ## Progress routine / setting
1037              
1038 126 100       549 $this->{_Progress} = undef unless exists ($this->{_Progress});
1039              
1040             ## Internal meta-structures
1041              
1042 126   100     551 $this->{_FieldList} ||= undef; ## List of fields; undef means all fields, alpha order
1043 126   100     684 $this->{_Selection} ||= undef; ## List of rec #s; undef means all records, natural order
1044 126   50     676 $this->{_SortOrder} ||= undef; ## List of fields; undef/empty means sort by record number
1045              
1046 126   100     591 $this->{_SortSpecs} ||= {}; ## Hash: FieldName => Sortspec
1047 126   100     702 $this->{_SRoutines} ||= {}; ## Hash: SortType => custom sort routine for type
1048              
1049             ## Miscellaneous
1050              
1051 126   50     675 $this->{_ErrorMsg} ||= ""; ## Explains last read/write failure
1052 126   50     737 $this->{_Subset} ||= 0; ## Flag indicating subset of available fields were read
1053 126   100     601 $this->{_IgnoreQuotes} ||= 0;
1054              
1055 126         178 $Success = 1;
1056 126         452 done:
1057             return($Success);
1058             }
1059              
1060             =pod
1061              
1062             =head1 FIELD LIST
1063              
1064             ## Getting / setting the object's _FieldList
1065              
1066             $t->fieldlist() ## Get _FieldList or fieldlist_all()
1067             $t->fieldlist_get()
1068             $t->fieldlist_hash() ## Get fieldlist() as keys in a hash
1069            
1070             $t->fieldlist_all() ## Get all fields (ignore _FieldList)
1071            
1072             $t->fieldlist($MyList) ## Set field list (_FieldList param)
1073             $t->fieldlist_set($MyList)
1074              
1075             $t->fieldlist(0) ## Remove field list (use default)
1076             $t->fieldlist_set()
1077              
1078             $t->fieldlist_force($MyList)## Set list; remove non-matching cols
1079              
1080             $t->fieldlist_truncate() ## Just remove nonmatching cols
1081              
1082             $t->fieldlist_default() ## Default field list (alpha-sorted)
1083              
1084             $t->fieldlist_add($MyName) ## Append new name to custom list.
1085             $t->fieldlist_delete($MyName) ## Delete name from custom list.
1086              
1087             A CTable object can optionally have a custom field list. The custom
1088             field list can store both the ORDER of the fields (which otherwise
1089             would be unordered since they are stored as keys in a hash), and also
1090             can be a subset of the fields actually in the object, allowing you to
1091             temporarily ignore certain effectively-hidden fields for the benefit
1092             of certain operations. The custom field list can be changed or
1093             removed at any time.
1094              
1095             The custom field list is stored in the private _FieldList parameter.
1096              
1097             fieldlist() always returns a list (reference). The list is either the
1098             same list as _FieldList, if present, or it is the result of calling
1099             fieldlist_default(). In CTable, fieldlist_default() in turn calls
1100             fieldlist_all() -- hence fieldlist() would yield an auto-generated
1101             list of all fields in alphabetical order.
1102              
1103             fieldlist_all() can be called directly to get a list of all fields
1104             present regardless of the presence of a _FieldList parameter. The
1105             list is an alphabetical case-insensitively sorted list of all hash
1106             keys whose names do not begin with an underscore.
1107              
1108             You could override this method if you want a different behavior. Or,
1109             you could create your own custom field list by calling fieldlist_all()
1110             and removing fields or ordering them differently.
1111              
1112             To set a custom field list (in _FieldList), call fieldlist() or
1113             fieldlist_set() with a list (reference). The list must be a list of
1114             strings (field names) that do not begin with underscore. The object
1115             owns the list you supply.
1116              
1117             To remove a custom field list (and let the default be used), call
1118             fieldlist(0) or fieldlist_set() with no arguments (these will return
1119             the fieldlist that was deleted, if any).
1120              
1121             fieldlist_freeze() "freezes" the fieldlist in its current state. This
1122             is equivalent to the following:
1123              
1124             $t->fieldlist_set($t->fieldlist());
1125              
1126             ... which would force the fieldlist to $t->fieldlist_all() if and only
1127             if there is not already a custom _FieldList present.
1128              
1129             IMPORTANT NOTE ABOUT PARTIAL FIELD LISTS: When setting a field list,
1130             the object ensures that all fields (columns) mentioned in the list are
1131             present in the object -- it creates empty columns of the correct
1132             length as necessary. However, it does NOT delete any fields not
1133             mentioned in the field list. This allows you to manipulate the field
1134             list in order to have certain fields be temporarily ignored by all
1135             other methods, then alter, restore, or remove it (allow it to revert
1136             to default) and they will be effectively unhidden again. Some methods
1137             (such as cols(), write(), etc.) also allow you to specify a custom
1138             field list that will override any other list just during the execution
1139             of that method call but will not modify the object itself.
1140              
1141             Call fieldlist_force() to set the list AND have any non-listed fields
1142             also deleted at the same time (by calling fieldlist_truncate()
1143             internally). You can also just delete individual columns one-by-one,
1144             of course, using the column-manipulation methods and the custom
1145             fieldlist, if any, will be appropriately updated for you.
1146              
1147             fieldlist_truncate() deletes any fields found in the table but not
1148             currently present in _FieldList. A hash of the deleted columns is
1149             returned to the caller. If there is no _FieldList, then this method
1150             does nothing.
1151              
1152             fieldlist_default() just calls fieldlist_all() in this implementation,
1153             but could be changed in subclasses.
1154              
1155             fieldlist_add() is the internal method that adds a new field name to
1156             the custom field list (if present) and if the field name was not
1157             already on the list. It is called by other methods any time a new
1158             column is added to the table. Don't call it directly unless you know
1159             what you're doing because the corresponding column won't be created.
1160             (Instead, use col().) The field name is appended to the end of the
1161             existing custom field list. If there is no custom field list, nothing
1162             is done.
1163              
1164             fieldlist_delete() is the internal method that deletes a field name
1165             from the custom field list (if present). It is called by other
1166             methods when columns are deleted, but it does not actually delete the
1167             columns themselves, so use with caution: deleting a field from the
1168             custom field list effectively hides the field. This method has no
1169             effect, however, if there is no custom field list present. So don't
1170             call this method directly unless you know what you're doing.
1171              
1172             =cut
1173              
1174             sub fieldlist_all
1175             {
1176 223     223 0 334 my $this = shift;
1177 223         1706 my $FieldList = [sort {lc($a) cmp lc($b)} grep {!/^_/} keys %$this];
  997         3159  
  5949         18474  
1178            
1179 223         1301 return($FieldList);
1180             }
1181              
1182             sub fieldlist_default ## Same as fieldlist_all() in this class.
1183             {
1184 40     40 0 51 my $this = shift;
1185 40         88 my $FieldList = $this->fieldlist_all();
1186              
1187 40         364 return($FieldList);
1188             }
1189              
1190             sub fieldlist
1191             {
1192 820     820 0 1093 my $this = shift;
1193 820         1954 my ($FieldList) = @_;
1194              
1195             ## Set if specified.
1196 820 100       4212 $this->fieldlist_set($FieldList) if defined($FieldList);
1197              
1198             ## Get and return.
1199 820         1756 $FieldList = $this->fieldlist_get();
1200              
1201 820         2261 return($FieldList);
1202             }
1203              
1204             sub fieldlist_get
1205             {
1206 820     820 0 1394 my $this = shift;
1207 820   66     2728 my $FieldList = $this->{_FieldList} || $this->fieldlist_default();
1208              
1209 820         1404 return($FieldList);
1210             }
1211              
1212             sub fieldlist_hash ## ([$FieldList])
1213             {
1214 113     113 0 168 my $this = shift;
1215 113         169 my ($FieldList) = @_;
1216 113   66     539 $FieldList ||= $this->fieldlist();
1217 113         197 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  113         448  
1218            
1219 113         264 return($FieldHash);
1220             }
1221              
1222             sub fieldlist_set
1223             {
1224 36     36 0 64 my $this = shift;
1225 36         59 my ($FieldList) = @_;
1226              
1227 36         242 return($this->fieldlist_set_internal($FieldList, 0));
1228             }
1229              
1230             sub fieldlist_freeze
1231             {
1232 0     0 0 0 my $this = shift;
1233 0         0 return($this->fieldlist_set($this->fieldlist()));
1234             }
1235              
1236             sub fieldlist_force
1237             {
1238 3     3 0 6 my $this = shift;
1239 3         5 my ($FieldList) = @_;
1240              
1241 3         8 return($this->fieldlist_set_internal($FieldList, 1));
1242             }
1243              
1244             sub fieldlist_set_internal
1245             {
1246 39     39 0 95 my $this = shift;
1247 39         198 my ($FieldList, $Force) = @_;
1248              
1249 39 100       151 if (ref($FieldList) eq 'ARRAY')
1250             {
1251              
1252             ## Whether forcing or not, ensure all fields mentioned in the
1253             ## list actually exist and are the correct length.
1254 33         100 $this->fieldlist_check($FieldList);
1255              
1256             ## Set the custom list
1257 33         574 $this->{_FieldList} = [@$FieldList];
1258            
1259             ## In "force" mode, remove any non-listed columns.
1260 33 100       129 $this->fieldlist_truncate() if ($Force);
1261             }
1262             else
1263             {
1264             ## Remove the custom field list.
1265 6         21 $FieldList = delete $this->{_FieldList};
1266             }
1267              
1268 39         181 return($FieldList); ## Return the one that was set or deleted.
1269             }
1270              
1271             sub fieldlist_check
1272             {
1273 33     33 0 59 my $this = shift;
1274 33         41 my ($FieldList) = @_;
1275            
1276 33   33     100 $FieldList ||= $this->fieldlist();
1277              
1278             ## Visit each field name in the current list. Make sure it is
1279             ## present.
1280              
1281 33         77 foreach my $FieldName (@$FieldList)
1282             {
1283             ## The col method will the column exist if not present.
1284 100         222 $this->col($FieldName);
1285             }
1286             }
1287              
1288             sub fieldlist_truncate
1289             {
1290 3     3 0 4 my $this = shift;
1291              
1292 3         9 my $FieldList = $this->fieldlist();
1293 3         7 my $AllFields = $this->fieldlist_all();
1294 3         6 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  3         21  
1295              
1296 3         5 my $DeletedCols = {};
1297            
1298 3         7 foreach my $FieldName (@$AllFields)
1299             {
1300 8 100       22 if (!exists($FieldHash->{$FieldName}))
1301             {
1302 3         8 $DeletedCols->{$FieldName} = delete $this->{$FieldName};
1303             }
1304             }
1305            
1306 3         13 return($DeletedCols);
1307             }
1308              
1309             sub fieldlist_add
1310             {
1311 43     43 0 58 my $this = shift;
1312 43         107 my ($FieldName) = @_;
1313            
1314 43 100       184 if (ref($this->{_FieldList}) eq 'ARRAY')
1315             {
1316 34         60 my $FieldList = $this->{_FieldList};
1317 34         62 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  34         212  
1318            
1319 34 50       113 if (!exists($FieldHash->{$FieldName}))
1320             {
1321 34         144 push @$FieldList, $FieldName;
1322             }
1323             }
1324             }
1325              
1326             sub fieldlist_delete
1327             {
1328 3     3 0 7 my $this = shift;
1329 3         6 my ($FieldName) = @_;
1330            
1331 3 50       15 if (ref($this->{_FieldList}) eq 'ARRAY')
1332             {
1333 3         5 $this->{_FieldList} = [grep {$_ ne $FieldName} @{$this->{_FieldList}}];
  11         33  
  3         30  
1334             }
1335             }
1336              
1337             =pod
1338              
1339             =head1 DATA COLUMNS (FIELD DATA)
1340              
1341             ## Getting or setting data in entire columns
1342              
1343             $t->{$ColName} ## Get a column you know exists
1344             $t->col($ColName) ## Get a column or make empty one.
1345             $t->col_get($ColName)
1346              
1347             $t->col($ColName, $ListRef) ## Set all of a column all at once.
1348             $t->col_set($ColName, $ListRef)
1349             $t->col_force($ColName, $ListRef) ## Add but don't check size or
1350             ## add to custom field list
1351              
1352             $t->col_set($ColName, undef) ## Delete a column completely
1353             $t->col_delete($ColName)
1354              
1355             $t->col_empty() ## An empty col presized for table
1356             $t->col_empty(22) ## An empty col of another length
1357             $t->col_empty($Col) ## An empty col sized to match another
1358              
1359             $t->col_default() ## Default if req. column not found.
1360              
1361             $t->col_exists($Field) ## Check existence of column
1362             $t->col_active($Field) ## Restrict check to fieldlist()
1363              
1364             $t->cols($ColList) ## Get list of multiple named columns
1365             $t->cols_hash($ColList) ## Get hash " " "
1366              
1367             $t->col_rename($Old => $New) ## Change name of columns
1368             $t->col_rename($Old1 => $New1, $Old2 => $New2) ## Change several
1369              
1370             A "column" is a field in the table and all its data. The column's
1371             field name is a key in the object itself, and may also optionally be
1372             listed in a custom field list if present. The column's data is the
1373             key's value in the hash and is an array ref of values presumed to be
1374             of the same data type (e. g. string, integer, etc.)
1375              
1376             Sometimes the terms "column" and "field" are used interchangeably in
1377             this documentation.
1378              
1379             If you already know that a column exists (because you got it from the
1380             fieldlist() method and you've not previously manipulated _FieldList
1381             directly but instead carefully used the method calls available for
1382             that), then you can safely get the column by just looking it up in the
1383             object itself.
1384              
1385             The col() method does the same thing, but forces the column to spring
1386             into existence if it did not already (which can also have the
1387             potentially unwanted side-effect of hiding coding errors in which you
1388             retreive mis-named columns: so beware). Columns brought into
1389             existence this way will automatically be pre-sized (i.e. they will
1390             will be created and set to whatever col_default() returns).
1391              
1392             The col() or col_set() methods can also be used to set a column. When
1393             the column is set, the list you pass is automatically sized
1394             (lengthened or truncated) to match the current length of the table.
1395             If this is not what you want, then call col_force() which will not
1396             check whether the new column matches the size of the others.
1397              
1398             No matter how you set it, the object now "owns" the list you gave it.
1399              
1400             As a convenience, col(), col_set() and col_force() return the column
1401             that was set. They silently discard any previous column.
1402              
1403             All three methods of column setting will append the column to the
1404             custom field list if one is present and the column name is not already
1405             listed there (by calling fieldlist_add()). They will also call the
1406             extend() method to ensure all columns have the same length (either
1407             others will be extended to match the length of the new one, or the new
1408             one will be extended to match the length of the others).
1409              
1410             col_delete() deletes a column.
1411              
1412             col_empty() returns an anonymous list reference that is pre-sized to
1413             the length of the table (by default). You could use it to get an
1414             empty column that you intend to fill up and then later insert into the
1415             table or use to hold the results of an operation on other columns. If
1416             you want a different length, specify it as a number or as an array ref
1417             whose length should be matched.
1418              
1419             col_default() is the internal method that implements the "springing
1420             into existence" of missing columns. Currently it just calls
1421             col_empty(). Other subclasses might want to have it return undef or a
1422             string like "NO_SUCH_COLUMN" in order to help track programming errors
1423             where nonexistent columns are requested.
1424              
1425             cols($FieldList) returns an ordered list of the requested column
1426             names. If no list is given, then fieldlist() is used.
1427              
1428             cols_hash($FieldList) does the same as cols(), but the result is a
1429             hash whose keys are the field names and whose values are the columns
1430             -- much like the original object itself, but not blessed into the
1431             class. The resulting hash, however, could be used as the prototype
1432             for a new Data::CTable object (by calling the new() method). However,
1433             be warned that both objects will think they "own" the resulting shared
1434             so be careful what you do..... which brings us to this:
1435              
1436             IMPORTANT NOTE ABOUT GETTING COLUMNS: The columns you retrieve from a
1437             table are still "owned" by the table object as long as it lives. If
1438             you modify them, you are modifying the table's data. If you change
1439             their length, then you may be invalidating the table's own
1440             expectations that all its columns have the same length. So beware.
1441              
1442             Just make yourself a copy of the data if that isn't what you want.
1443             For example, instead of this:
1444              
1445             my $Foo = $Table->col('Foo'); ## Reference to actual column
1446              
1447             Do this:
1448              
1449             my $Foo = [@{$Table->col('Foo')}]; ## Shallow copy of the column
1450              
1451             =cut
1452              
1453             sub col ## ($ColName, [$Vector])
1454             {
1455 1455     1455 0 1880 my $this = shift;
1456 1455         1916 my ($ColName, $Vector) = @_;
1457              
1458             ## Set if specified.
1459 1455 100       11010 my $FoundVector = $this->col_set($ColName, $Vector) if defined($Vector);
1460              
1461             ## Get and return.
1462             ## If not specified, create it with col_default()
1463 1455         2630 my $Col = $this->col_get($ColName);
1464              
1465 1455         5006 return($Col);
1466             }
1467              
1468             sub col_get
1469             {
1470 1458     1458 0 1572 my $this = shift;
1471 1458         3024 my ($ColName) = @_;
1472              
1473 1458   66     3552 my $Col = ($this->{$ColName} || $this->col_add($ColName));
1474              
1475 1458         2721 return($Col);
1476             }
1477              
1478             sub col_add
1479             {
1480 28     28 0 38 my $this = shift;
1481 28         38 my ($ColName) = @_;
1482 28         963 my $Col = $this->{$ColName} = $this->col_empty();
1483              
1484 28         78 $this->fieldlist_add($ColName);
1485 28         88 return($Col);
1486             }
1487              
1488             sub col_set_internal ## ($ColName, [$Vector], [$Force])
1489             {
1490 16     16 0 21 my $this = shift;
1491 16         27 my ($ColName, $Vector, $Force) = @_;
1492              
1493 16         43 my $Valid = (ref($Vector) eq 'ARRAY');
1494 16         41 my $Existing = (ref($this->{$ColName}) eq 'ARRAY');
1495            
1496             ## Delete existing vector by this name...
1497 16 100 66     1398 if (!$Valid && $Existing)
    100 66        
    50          
1498             {
1499 1         4 $Vector = delete $this->{$ColName}; ## Delete and save to return to caller.
1500 1         5 $this->fieldlist_delete($ColName); ## Delete from field list if needed.
1501             }
1502             ## ...or add one...
1503             elsif ($Valid && !$Existing)
1504             {
1505 11         34 $this->{$ColName} = $Vector;
1506              
1507 11 50       31 if (!$Force)
1508             {
1509 11         41 $this->extend(); ## Extend all vectors as needed to ensure same length.
1510 11         52 $this->fieldlist_add($ColName); ## Add to custom field list if needed.
1511             }
1512             }
1513             ## ...otherwise replace.
1514             elsif ($Valid)
1515             {
1516 4         8 $this->{$ColName} = $Vector;
1517              
1518 4 50       15 if (!$Force)
1519             {
1520 4         14 $this->extend(); ## Extend all vectors as needed to ensure same length.
1521             }
1522             }
1523            
1524 16         51 return($Vector); ## Return added or deleted vector for convenience.
1525             }
1526              
1527             sub col_delete ## ($ColName)
1528             {
1529 1     1 0 3 my $this = shift;
1530 1         2 my ($ColName) = @_;
1531              
1532 1         5 return($this->col_set_internal($ColName));
1533             }
1534              
1535             sub col_set ## ($ColName, $Vector)
1536             {
1537 15     15 0 25 my $this = shift;
1538 15         142 my ($ColName, $Vector) = @_;
1539              
1540 15         56 return($this->col_set_internal($ColName, $Vector));
1541             }
1542              
1543             sub col_force ## ($ColName, $Vector)
1544             {
1545 0     0 0 0 my $this = shift;
1546 0         0 my ($ColName, $Vector) = @_;
1547              
1548 0         0 return($this->col_set_internal($ColName, $Vector, 1));
1549             }
1550              
1551             sub col_empty
1552             {
1553 38     38 0 93 my $this = shift;
1554 38         51 my ($Length) = @_;
1555              
1556             ## Default to table length. Or get length from sample column.
1557 38 100       179 $Length = $this->length() unless defined($Length);
1558 38 100       279 $Length = @$Length if ref($Length) eq 'ARRAY';
1559              
1560 38         82 my $Col = [];
1561 38         141 $#$Col = $Length - 1;
1562              
1563 38         108 return($Col);
1564             }
1565              
1566             sub col_default
1567             {
1568 1     1 0 3 my $this = shift;
1569 1         4 my $Col = $this->col_empty();
1570              
1571 1         13 return($Col);
1572             }
1573              
1574             sub cols ## ($ColNames)
1575             {
1576 33     33 0 58 my $this = shift;
1577 33         47 my ($ColNames) = @_;
1578 33   66     101 $ColNames ||= $this->fieldlist();
1579 33         88 my $Cols = [map {$this->col($_)} @$ColNames];
  106         229  
1580              
1581 33         6896 return($Cols);
1582             }
1583              
1584             sub cols_hash ## ($ColNames)
1585             {
1586 25     25 0 50 my $this = shift;
1587 25         43 my ($ColNames) = @_;
1588 25   66     108 $ColNames ||= $this->fieldlist();
1589 25         243 my $Cols = $this->cols($ColNames);
1590 25         51 my $ColsHash = {}; @$ColsHash{@$ColNames} = @$Cols;
  25         107  
1591              
1592 25         249 return($ColsHash);
1593             }
1594              
1595             sub col_exists ## ($ColName, [$FieldList])
1596             {
1597 24     24 0 47 my $this = shift;
1598 24         43 my ($ColName, $FieldList) = @_;
1599              
1600             ## Default list to search is ALL fields in object.
1601 24   66     171 $FieldList ||= $this->fieldlist_all();
1602              
1603             ## Disallow column names starting with underscore.
1604 24 50       67 return(0) if $ColName =~ /^_/;
1605              
1606 24         140 my $FieldHash = $this->fieldlist_hash($FieldList);
1607 24         52 my $Exists = exists($FieldHash->{$ColName});
1608              
1609 24         133 return($Exists);
1610             }
1611              
1612             sub col_active ## ($ColName, [$FieldList])
1613             {
1614 8     8 0 17 my $this = shift;
1615 8         12 my ($ColName, $FieldList) = @_;
1616              
1617             ## Default list to search is only ACTIVE fields in object.
1618 8   33     31 $FieldList ||= $this->fieldlist();
1619              
1620             ## Disallow column names starting with underscore.
1621 8 50       20 return(0) if $ColName =~ /^_/;
1622              
1623 8         19 my $FieldHash = $this->fieldlist_hash($FieldList);
1624 8         17 my $Exists = exists($FieldHash->{$ColName});
1625              
1626 8         79 return($Exists);
1627             }
1628              
1629             sub col_rename ## ($Old => $New, [$Old => New...])
1630             {
1631 3     3 0 19 my $this = shift;
1632            
1633 3         5 my $Success;
1634              
1635 3         10 my $Fields = $this->fieldlist_all();
1636              
1637 3         6 my ($Old, $New);
1638 3         19 while (($Old, $New) = splice(@_, 0, 2))
1639             {
1640 3 50       16 $this->warn("Invalid column name: $New"), next
1641             unless ($New =~ /^[^_]+/);
1642            
1643 3 50       13 $this->warn("Column to be renamed does not exist: $Old"), next
1644             unless $this->col_exists($Old, $Fields);
1645            
1646 3 50 0     11 (($Old ne $New) && $this->warn("Failed to rename column $Old to $New: $New exists.")), next
1647             if $this->col_exists($New, $Fields);
1648            
1649 3         9 my $Col = $this->col($Old); ## Creates if not present.
1650            
1651             ## Rename the column...
1652 3         10 $this->{$New} = delete $this->{$Old};
1653            
1654             ## Then make the same change to _FieldList, _SortOrder, _SortSpecs
1655            
1656 3 100       14 $this->{_FieldList} = [map {$_ = $New if $_ eq $Old; $_} @{$this->{_FieldList}}] if (defined($this->{_FieldList}));
  10 50       25  
  10         24  
  3         7  
1657 3 0       13 $this->{_SortOrder} = [map {$_ = $New if $_ eq $Old; $_} @{$this->{_SortOrder}}] if (defined($this->{_SortOrder}));
  0 50       0  
  0         0  
  0         0  
1658 3 50 33     31 $this->{_SortSpecs}->{$New} = delete $this->{_SortSpecs}->{$Old} if (defined($this->{_SortSpecs}) &&
1659             ( $this->{_SortSpecs}->{$Old}));
1660             }
1661            
1662 3         7 $Success = 1;
1663 3         11 done:
1664             return($Success);
1665             }
1666              
1667             =pod
1668              
1669             =head1 CLEANUP AND VALIDATION
1670              
1671             ## Performing your own cleanups or validations
1672              
1673             $t->clean($Sub) ## Clean with custom subroutine
1674             $t->clean($Sub, $Fields) ## Clean specified columns only
1675              
1676             ## Cleaning whitespace
1677              
1678             $t->clean_ws() ## Clean whitespace in fieldlist() cols
1679             $t->clean_ws($Fields) ## Clean whitespace in specified cols
1680              
1681             ## Cleaning methods that map character sets
1682              
1683             $t->clean_mac_to_iso8859()
1684             $t->clean_mac_to_iso8859($Fields)
1685              
1686             $t->clean_iso8859_to_mac()
1687             $t->clean_iso8859_to_mac($Fields)
1688              
1689             ## Character mapping utilities (not methods)
1690              
1691             use Data::CTable qw(
1692             ISORoman8859_1ToMacRoman
1693             MacRomanToISORoman8859_1
1694              
1695             ISORoman8859_1ToMacRoman_clean
1696             MacRomanToISORoman8859_1_clean
1697             );
1698              
1699             &ISORoman8859_1ToMacRoman(\ $Str) ## Pass pointer to buffer
1700             &MacRomanToISORoman8859_1(\ $Str) ## Pass pointer to buffer
1701              
1702             &ISORoman8859_1ToMacRoman_clean() ## Operates on $_
1703             &MacRomanToISORoman8859_1_clean() ## Operates on $_
1704              
1705             One of the most important things you can do with your data once it's
1706             been placed in a table in Perl is to use the power of Perl to scrub it
1707             like crazy.
1708              
1709             The built-in clean_ws() method applies a standard white-space cleanup
1710             to selected records in every field in the fieldlist() or other list of
1711             fields you optionally supply (such as fieldlist_all()).
1712              
1713             It does the following cleanups that are deemed correct for the
1714             majority of data out there:
1715              
1716             - Remove all leading whitespace, including returns (\n)
1717             - Remove all trailing whitespace, including returns (\n)
1718             - Convert runs of spaces to a single space
1719             - Convert empty string values back to undef to save space
1720              
1721             Of course, depending on your data, clean_ws() might just be the first
1722             thing you do in your cleanup pass. There might be many more cleanups
1723             you'd like to apply.
1724              
1725             clean() is like clean_ws() except you supply as the first argument
1726             your own cleaning subroutine (code reference). It should do its work
1727             by modifying $_.
1728              
1729             Both clean_ws() and clean() apply cleaning ONLY to selected records.
1730             If this isn't what you want, then select_all() before cleaning.
1731              
1732             Since a cleanup subroutine can do ANY modifications to a field that it
1733             likes, you can imagine some cleanup routines that also supply default
1734             values and do other validations.
1735              
1736             For example, a cleanup routine could convert every value in each field
1737             to an integer, or apply minimum or maximum numerical limits:
1738              
1739             sub {$_ = int($_) }
1740             sub {$_ = max(int($_), 0) }
1741             sub {$_ = min(int($_), 200)}
1742              
1743             Or your cleanup routine could use regular expressions to do
1744             capitalizations or other regularizations of data:
1745              
1746             sub Capitalize {/\b([a-z])([a-z]+)\b)/\U$1\E$2/g}
1747              
1748             $t->clean(\ &Capitalize , ['FirstName', 'LastName']);
1749             $t->clean(\ &PhoneFormat, ['Phone', 'Fax' ]);
1750             $t->clean(\ &LegalUSZip, ['HomeZip', 'WorkZip' ]);
1751              
1752             ... and so on. Cleanups are easy to write and quick and easy to apply
1753             with Data::CTable. Do them early! Do them often!
1754              
1755             =head2 Hints for writing cleanup routines
1756              
1757             If your cleanup routine may be used to clean up fields that could be
1758             empty/undef and empty/undef is a legal value, it should not touch any
1759             undef values (unintentionally converting them to strings).
1760              
1761             Finally, instead of setting any values to the empty string, it should
1762             set them to undef instead. This includes any values it might have
1763             left empty during cleanup. (Using undef instead of empty string to
1764             represent empty values is one way that Data::CTable likes to save
1765             memory in tables that may have lots of those.)
1766              
1767             For an example of a well-behaved cleanup routine, consider the
1768             following implementation of the builtin CleanWhitespace behavior:
1769              
1770             sub CleanWhitespace
1771             {
1772             return unless defined; ## Empty/undef values stay that way
1773             s/ \s+$//sx; ## Remove trailing whitespace
1774             s/^\s+ //sx; ## Remove leading whitespace
1775             s/ +/ /g; ## Runs of spaces to single space
1776             $_ = undef unless length; ## (Newly?) empty strings to undef
1777             }
1778              
1779             =head2 Roman character set mapping
1780              
1781             The character set mapping cleanup routines can be used to convert
1782             upper-ASCII characters bidirectionally between two popular Roman
1783             Character sets -- Mac Roman 1 and ISO 8859-1 (also sometimes called
1784             ISO Latin 1) -- i.e. the Western European Roman character sets.
1785              
1786             By default, read() converts all incoming data fields in data files
1787             with Mac line endings to ISO format when reading in. Conversely,
1788             write() does the reverse mapping (ISO to Mac) when writing a file with
1789             Mac line endings.
1790              
1791             However, you may wish to turn off these default behaviors and instead
1792             apply the mappings manually, possibly just to certain fields.
1793              
1794             For example, if a table contains fields with non-Roman character sets,
1795             you would definitely not want to apply these mappings, and instead
1796             might want to apply some different ones that you create yourself.
1797              
1798             =head2 Utility routines for character mapping
1799              
1800             This module can optionally export four utility routines for mapping
1801             character Latin 1 character sets. Always be sure to map the correct
1802             direction -- otherwise you'll end up with garbage! Be careful to only
1803             pass Western Roman strings -- not double-byte strings or strings
1804             encoded in any single-byte Eastern European Roman or non-Roman
1805             character set.
1806              
1807             &ISORoman8859_1ToMacRoman(\ $Str) ## Pass pointer to buffer
1808             &MacRomanToISORoman8859_1(\ $Str) ## Pass pointer to buffer
1809              
1810             These routines translate characters whose values are 128-255 from one
1811             Western Roman encoding to another. The argument is a string buffer of
1812             any size passed by reference.
1813              
1814             The functions return a count of the number of characters that were
1815             mapped (zero or undef if none were).
1816              
1817             &ISORoman8859_1ToMacRoman_clean() ## Operates on $_
1818             &MacRomanToISORoman8859_1_clean() ## Operates on $_
1819              
1820             These routines are variants of the above, but they're versions that
1821             are compatible with clean() -- they operate on $_ and will take care
1822             to leave undefined values undefined. They do not have return values.
1823              
1824             =head2 More advanced cleaning and validation
1825              
1826             Unfortunately, clean() only lets you operate on a single field value
1827             at a time -- and there's no way to know the record number or other
1828             useful information inside the cleaning routine.
1829              
1830             For really powerful cleaning and validation involving access to all
1831             fields of a record as well as record numbers, see the discussion of
1832             the calc() method and other methods for doing complex field
1833             calculations in the next section.
1834              
1835             =cut
1836              
1837             sub clean
1838             {
1839 8     8 0 13 my $this = shift;
1840 8         13 my ($Sub, $Fields) = @_;
1841            
1842             ## Default is fields in the list.
1843 8   33     36 $Fields ||= $this->fieldlist();
1844              
1845 8         25 my $Sel = $this->selection();
1846            
1847 8         19 foreach (@$Fields) {foreach (@{$this->col($_)}[@$Sel]) {&$Sub()}};
  38         67  
  38         74  
  114         222  
1848 8         35 return 1;
1849             }
1850              
1851             sub clean_ws
1852             {
1853 6     6 0 28 my $this = shift;
1854 6         25 return($this->clean(\ &CleanWhitespace, @_));
1855             }
1856              
1857             sub CleanWhitespace
1858             {
1859 84 50   84 0 151 return unless defined; ## Empty/undef values stay that way
1860 84         313 s/ \s+$//sx; ## Remove trailing whitespace
1861 84         138 s/^\s+ //sx; ## Remove leading whitespace
1862 84         145 s/ +/ /g; ## Runs of spaces to single space
1863 84 50       224 $_ = undef unless length; ## (Newly?) empty strings to undef
1864             }
1865              
1866             sub clean_mac_to_iso8859
1867             {
1868 1     1 0 1033 my $this = shift;
1869 1         6 return($this->clean(\ &MacRomanToISORoman8859_1_clean, @_));
1870             }
1871              
1872             sub clean_iso8859_to_mac
1873             {
1874 0     0 0 0 my $this = shift;
1875 0         0 return($this->clean(\ &ISORoman8859_1ToMacRoman_clean, @_));
1876             }
1877              
1878             sub ISORoman8859_1ToMacRoman
1879             {
1880 20     20 0 23 return($ {$_[0]} =~
  20         49  
1881            
1882             tr/\x80-\xFF/\xDE\xDF\xE2\xC4\xE3\xC9\xA0\xE0\xF6\xE4\xBA\xDC\xCE\xAD\xB3\xB2\xB0\xD4\xD5\xD2\xD3\xA5\xF8\xD1\xF7\xAA\xF9\xDD\xCF\xF0\xDA\xD9\xCA\xC1\xA2\xA3\xDB\xB4\xF5\xA4\xAC\xA9\xBB\xC7\xC2\xD0\xA8\xC3\xA1\xB1\xFA\xFE\xAB\xB5\xA6\xE1\xFC\xFF\xBC\xC8\xC5\xFD\xFB\xC0\xCB\xE7\xE5\xCC\x80\x81\xAE\x82\xE9\x83\xE6\xE8\xED\xEA\xEB\xEC\xC6\x84\xF1\xEE\xEF\xCD\x85\xD7\xAF\xF4\xF2\xF3\x86\xB7\xB8\xA7\x88\x87\x89\x8B\x8A\x8C\xBE\x8D\x8F\x8E\x90\x91\x93\x92\x94\x95\xB6\x96\x98\x97\x99\x9B\x9A\xD6\xBF\x9D\x9C\x9E\x9F\xBD\xB9\xD8/);
1883              
1884             }
1885              
1886             sub ISORoman8859_1ToMacRoman_clean
1887             {
1888 0 0   0 0 0 return unless defined; ## Empty/undef values stay that way
1889              
1890 0         0 tr/\x80-\xFF/\xDE\xDF\xE2\xC4\xE3\xC9\xA0\xE0\xF6\xE4\xBA\xDC\xCE\xAD\xB3\xB2\xB0\xD4\xD5\xD2\xD3\xA5\xF8\xD1\xF7\xAA\xF9\xDD\xCF\xF0\xDA\xD9\xCA\xC1\xA2\xA3\xDB\xB4\xF5\xA4\xAC\xA9\xBB\xC7\xC2\xD0\xA8\xC3\xA1\xB1\xFA\xFE\xAB\xB5\xA6\xE1\xFC\xFF\xBC\xC8\xC5\xFD\xFB\xC0\xCB\xE7\xE5\xCC\x80\x81\xAE\x82\xE9\x83\xE6\xE8\xED\xEA\xEB\xEC\xC6\x84\xF1\xEE\xEF\xCD\x85\xD7\xAF\xF4\xF2\xF3\x86\xB7\xB8\xA7\x88\x87\x89\x8B\x8A\x8C\xBE\x8D\x8F\x8E\x90\x91\x93\x92\x94\x95\xB6\x96\x98\x97\x99\x9B\x9A\xD6\xBF\x9D\x9C\x9E\x9F\xBD\xB9\xD8/;
1891             }
1892              
1893             sub MacRomanToISORoman8859_1
1894             {
1895 61     61 0 87 return($ {$_[0]} =~
  61         163  
1896            
1897             tr/\x80-\xFF/\xC4\xC5\xC7\xC9\xD1\xD6\xDC\xE1\xE0\xE2\xE4\xE3\xE5\xE7\xE9\xE8\xEA\xEB\xED\xEC\xEE\xEF\xF1\xF3\xF2\xF4\xF6\xF5\xFA\xF9\xFB\xFC\x86\xB0\xA2\xA3\xA7\x95\xB6\xDF\xAE\xA9\x99\xB4\xA8\x8D\xC6\xD8\x90\xB1\x8F\x8E\xA5\xB5\xF0\xDD\xDE\xFE\x8A\xAA\xBA\xFD\xE6\xF8\xBF\xA1\xAC\xAF\x83\xBC\xD0\xAB\xBB\x85\xA0\xC0\xC3\xD5\x8C\x9C\xAD\x97\x93\x94\x91\x92\xF7\xD7\xFF\x9F\x9E\xA4\x8B\x9B\x80\x81\x87\xB7\x82\x84\x89\xC2\xCA\xC1\xCB\xC8\xCD\xCE\xCF\xCC\xD3\xD4\x9D\xD2\xDA\xDB\xD9\xA6\x88\x98\x96\x9A\xB2\xBE\xB8\xBD\xB3\xB9/);
1898              
1899             }
1900              
1901             sub MacRomanToISORoman8859_1_clean
1902             {
1903 15 50   15 0 28 return unless defined; ## Empty/undef values stay that way
1904              
1905 15         63 tr/\x80-\xFF/\xC4\xC5\xC7\xC9\xD1\xD6\xDC\xE1\xE0\xE2\xE4\xE3\xE5\xE7\xE9\xE8\xEA\xEB\xED\xEC\xEE\xEF\xF1\xF3\xF2\xF4\xF6\xF5\xFA\xF9\xFB\xFC\x86\xB0\xA2\xA3\xA7\x95\xB6\xDF\xAE\xA9\x99\xB4\xA8\x8D\xC6\xD8\x90\xB1\x8F\x8E\xA5\xB5\xF0\xDD\xDE\xFE\x8A\xAA\xBA\xFD\xE6\xF8\xBF\xA1\xAC\xAF\x83\xBC\xD0\xAB\xBB\x85\xA0\xC0\xC3\xD5\x8C\x9C\xAD\x97\x93\x94\x91\x92\xF7\xD7\xFF\x9F\x9E\xA4\x8B\x9B\x80\x81\x87\xB7\x82\x84\x89\xC2\xCA\xC1\xCB\xC8\xCD\xCE\xCF\xCC\xD3\xD4\x9D\xD2\xDA\xDB\xD9\xA6\x88\x98\x96\x9A\xB2\xBE\xB8\xBD\xB3\xB9/;
1906              
1907             }
1908              
1909             =pod
1910              
1911             =head1 CALCULATIONS USING calc()
1912              
1913             ## Calculate a new field's values based on two others
1914              
1915             $t->calc($Sub) ## Run $Sub for each row, with
1916             ## fields bound to local vars
1917              
1918             $t->calc($Sub, $Sel) ## Use these row nums
1919             $t->calc($Sub, undef, $Fields) ## Use only these fields
1920             $t->calc($Sub, $Sel, $Fields) ## Use custom rows, fields
1921              
1922             my $Col = $t->calc($Sub) ## Gather return vals in vector
1923              
1924              
1925             ## Example 1: Overwrite values in an existing column.
1926              
1927             $t->calc(sub{no strict 'vars'; $Size = (stat($Path))[7]});
1928              
1929              
1930             ## Example 2: Create empty column; fill fields 1 by 1
1931              
1932             $t->col('PersonID');
1933             $t->calc(sub{no strict 'vars'; $PersonID = "$Last$First"});
1934              
1935              
1936             ## Example 3: Calculate values; put into to table if desired
1937              
1938             $PersonID = $t->calc(sub{no strict 'vars'; "$Last$First"});
1939             $t->sel('PersonID', $PersonID);
1940              
1941              
1942             ## Example 4: Using fully-qualified variable names
1943              
1944             $t->calc(sub{$main::PersonID = "$main::Last$main::First"});
1945              
1946             calc() runs your custom calculation subroutine $Sub once for every row
1947             in the current selection() or other list of row numbers that you
1948             specify in the optional $Sel argument.
1949              
1950             This lets you apply a complex calculation to every record in a table
1951             in a single statement, storing the results in one or more columns, or
1952             retrieving them as a list.
1953              
1954             Your custom subroutine may refer to the value in any field in the
1955             current row by using a global variable with the field's name:
1956              
1957             For example, if the table has fields First, Last, and Age, then $Sub
1958             may use, modify, and set the variables $First, $Last, $Age. (Also
1959             known as $main::First, $main::Last, $main::Age).
1960              
1961             Modifying any of these specially-bound variables actually modifies the
1962             data in the correct record and field within the table.
1963              
1964             By default, the fields available to $Sub are all fields in the table.
1965             calc() must bind all the field names for you for each row, which can
1966             be time-consuming for tables with very large numbers of fields.
1967              
1968             You can speed up the operation of calc() by listing only the fields
1969             your $Sub needs in the optional parameter $Fields. Any field names
1970             you don't mention won't be available to $Sub. Conversely, calc() will
1971             run faster because it can bind only the fields you actually need.
1972              
1973             If you include non-existent fields in your custom $Fields list, calc()
1974             creates them for you before $Sub runs the first time. Then your $Sub
1975             can store field values into the new column, referring to it by name.
1976              
1977             Variables in $Sub are in the "main" package. So you should set $Sub
1978             to use pacakge "main" in case the rest of your code is not in "main".
1979              
1980             Similarly, if you "use strict", Perl will complain about global
1981             variables in $Sub. So you may need to assert "no strict 'vars'".
1982              
1983             { package Foo; use strict;
1984            
1985             $t = ...;
1986             $t->calc(sub {package main; no strict 'vars';
1987             $Age = int($Age)});
1988             }
1989              
1990             ## Or this:
1991              
1992             { package Foo; use strict;
1993              
1994             $t = ...;
1995             { package main; no strict 'vars';
1996             my $Sub = sub {$Age = int($Age)};
1997             }
1998             $t->calc($Sub);
1999             }
2000              
2001             You may be able to get around both problems more easily by prefixing
2002             each variable reference in $Sub with "main::". This takes care of the
2003             package name issue and bypasses "use strict" at the same time, at the
2004             slight cost of making the calculation itself a bit harder to read.
2005              
2006             $t->calc(sub {$main::Age = int($main::Age)}); ## OK in any package
2007              
2008             In addition to the field names, the following three values are defined
2009             during each invocation of $Sub:
2010              
2011             $_r ($main::_r) -- the row number in the entire table
2012             $_s ($main::_s) -- the item number in selection or $Recs
2013             $_t ($main::_t) -- the table object itself
2014              
2015             You could use these values to print diagnostic information or to
2016             access any of the data, parameters, or methods of the table itself
2017             from within $Sub. Or you could even calculate field values using $_r
2018             or $_s.
2019              
2020             For example, after searching & sorting, you could make a field which
2021             preserves the resulting sort order for future reference:
2022              
2023             $t->col('Ranking'); ## Create the empty column first.
2024             $t->calc(sub{$main::Ranking = $main::_s});
2025              
2026             This last example is equivalent to:
2027              
2028             $t->sel(Ranking => [0 .. $#{$t->selection()}]); ## See sel() below
2029              
2030              
2031             =head1 "MANUAL" CALCULATIONS
2032              
2033             calc() (see previous secion) is the briefer, more elegant way to do
2034             batch calculations on entire columns in a table, but it can be
2035             slightly slower than doing the calculations yourself.
2036              
2037             If you have extremely large tables, and you notice the processing time
2038             for your calculations taking more than a second, you might want to
2039             rewrite your calculations to use the more efficient techniques shown
2040             here.
2041              
2042             You will often need to create new calculated columns based on one or
2043             more existing ones, and then either insert the columns back in the
2044             tables or use them for further calculations or indexing.
2045              
2046             Examples 1a and 1b create a new field 'NameOK' containing either the
2047             string "OK" or undef (empty) depending on whether the field 'Name' is
2048             empty. Just use map() to iterate over the existing values in the
2049             other column, binding $_ to each value in turn.
2050              
2051             ### Example 1a: Calculation based on one other column
2052              
2053             ## Retrieve column
2054             my $Name = $t->col('Name');
2055            
2056             ## Make new column
2057             my $NameOK = [map {!!length && 'OK'} @$Name];
2058              
2059             ## Insert column back into table:
2060             $t->col(NameOK => $NameOK);
2061              
2062             ### Example 1b: Same calculation, in a single statement:
2063              
2064             $t->col(NameOK => [map {!!length && 'OK'} @{$t->col('Name')}]);
2065              
2066             In order to iterate over MULTIPLE columns at once, you need a list of
2067             the row numbers generated by $t->all() so you can index the two
2068             columns in tandem. Then, you use map to bind $_ to each row number,
2069             and then use the expression $t->col($ColName)->[$_] to retreive each
2070             value.
2071              
2072             Examples 2a and 2b demonstrate this method. They create a new field
2073             'FullName' which is a string joining the values in the 'First' and
2074             'Last' columns with a space between.
2075              
2076             ### Example 2a: Calculation based on multiple columns
2077              
2078             ## Retrieve columns
2079             my $First = $t->col('First');
2080             my $Last = $t->col('Last' );
2081              
2082             ## Retreive row nums
2083             my $Nums = $t->all();
2084              
2085             ## Calculate a new column based on two others
2086             my $Full = [map {"$First->[$_] $Last->[$_]"} @$Nums];
2087              
2088             ## Add new column to the table
2089             $t->col(FullName => $Full);
2090              
2091             ### Example 2b: Same calculation, in a single statement:
2092              
2093             $t->col(FullName =>
2094             [map {"$t->col('First')->[$_] t->col('Last')->[$_]"}
2095             @{$t->all()}]);
2096              
2097             In examples 1 and 2, you create entirely new columns and then add or
2098             replace them in the table.
2099              
2100             Using the approach in Examples 3a and 3b, you can assign calculated
2101             results directly into each value of an existing column as you go.
2102              
2103             ## Example 3a: Calculate by assigning directly into fields...
2104              
2105             my $A = $t->col->('A'); ## This column will be modified
2106             my $B = $t->col->('B');
2107             my $C = $t->col->('C');
2108              
2109             foreach @($t->all()) {$A->[$_] = $B->[$_] + $C->[$_];}
2110              
2111             ## Example 3b: Same calculation, in a single statement:
2112              
2113             foreach @($t->all()) {($t->col('A')->[$_] =
2114             $t->col('B')->[$_] +
2115             $t->col('C')->[$_])};
2116              
2117             Before writing your code, think about which calculation paradigms best
2118             suit your needs and your data set.
2119              
2120             Just as Perl Hackers know: There's More Than One Way To Do It!
2121              
2122             =cut
2123              
2124             {}; ## Get emacs to indent correctly.
2125              
2126             sub calc
2127             {
2128             ## We operate in package main for this subroutine so the local
2129             ## object and row number can be available to the caller's $Sub.
2130              
2131             ## The following local vars will be available to $Sub
2132             ## $_r ($main::_r) -- the row number in the entire table
2133             ## $_s ($main::_s) -- the row number in selection or $Recs
2134             ## $_t ($main::_t) -- the table object itself
2135              
2136             package main;
2137 29     29   487 use vars qw($_r $_s $_t);
  29         71  
  29         8095  
2138 3     3 0 26 local ($_r, $_s, $_t);
2139              
2140 3         6 $_t = shift;
2141 3         8 my ($Sub, $Recs, $Fields) = @_;
2142              
2143             ## These optional params default to current field and current sel
2144              
2145 3   33     23 $Recs ||= $_t->selection();
2146 3   33     21 $Fields ||= $_t->fieldlist_all();
2147            
2148             ## Local copy of symbol table. Didn't seem to help. Odd.
2149             ## local %main::;
2150            
2151             ## We'll build a column of return values from $Sub if needed.
2152 3         7 my $WantVals = defined(wantarray);
2153 3 100       16 my $Vals = $_t->col_empty() if $WantVals;
2154              
2155             ## Call col() on each field in list to make sure it exists.
2156 3         9 foreach (@$Fields) {$_t->col($_)};
  15         35  
2157              
2158 3         208 foreach $_s (0..$#$Recs)
2159             {
2160 9         33 $_r = $Recs->[$_s];
2161              
2162             ## Bind $FieldName1, $FieldName2, (etc. for each field name in
2163             ## $Fields) point to address of the current value for that
2164             ## field in this record.
2165              
2166 29     29   178 no strict 'refs';
  29         58  
  29         181568  
2167 9         16 foreach my $F (@$Fields) {*{$F} = \ $_t->{$F}->[$_r]};
  45         76  
  45         126  
2168              
2169             ## Now $Sub may refer to $_r, $_s, $_t, and ${any field name}
2170              
2171             ## Call $Sub and capture return values iff caller wants them
2172 9 100       34 ($WantVals ? $Vals->[$_r] = &$Sub() : &$Sub());
2173             }
2174              
2175             ## Return scalar column ref unless return context is undef
2176 3 100       36 return($WantVals ? $Vals : ());
2177             }
2178              
2179             =pod
2180              
2181             =head1 INDEXES
2182              
2183             ## Make indexes of columns or just selected data in columns
2184              
2185             my $Index1 = $t->index_all($Key); ## entire column
2186             my $Index2 = $t->index_sel($Key); ## selected data only
2187              
2188             ## Make hashes of 2 columns or just selected data in columns
2189              
2190             my $Index1 = $t->hash_all($KeyFld, $ValFld); ## entire column
2191             my $Index2 = $t->hash_sel($KeyFld, $ValFld); ## selected data
2192              
2193             index_all() creates and returns a hash (reference) that maps keys
2194             found in the column called $Key to corresponding record numbers.
2195              
2196             Ideally, values in $Key would be unique (that's up to you).
2197              
2198             If any values in $Key are NOT unique, then later values (higher record
2199             numbers) will be ignored.
2200              
2201             index_sel() creates and returns a hash (ref) that maps keys found in
2202             the SELECTED RECORDS of column $Key to corresponding record numbers.
2203              
2204             Any keys in unselected records are ignored. Otherwise, the behavior
2205             is equivalent to index_all().
2206              
2207             hash_all() and hash_sel() are similar, except they create and return
2208             hashes whose keys are taken from column $KeyFld, but whose values are
2209             from $ValFld in the corresponding records.
2210              
2211             So, for example, imagine you have a tab-delimited file on disk with
2212             just a single tab per line (2 fields) and no header row. The entries
2213             on the left side of the tab on each line are keys and the right side
2214             are values. You could convert that file into a hash in memory like
2215             this:
2216              
2217             my $t = Data::CTable->new({_HeaderRow=>0, _FieldList=>[qw(F1 F2)]},
2218             "DeptLookup.txt");
2219              
2220             my $DeptLookup = $t->hash_all(qw(F1 F2));
2221              
2222             =head2 Reverse indexes
2223              
2224             If you'd like an index mapping record number to key, just get
2225             $t->col($Key). That's what the data columns in Data::CTable are.
2226              
2227             =cut
2228              
2229             sub index_all
2230             {
2231 2     2 0 3897 my $this = shift;
2232 2         12 my ($Key) = @_;
2233              
2234 2         5 my $Index = {}; @$Index{reverse @{$this->col($Key)}} = reverse @{$this->all()};
  2         6  
  2         6  
  2         10  
2235              
2236 2         24 return($Index);
2237             }
2238              
2239             sub index_sel
2240             {
2241 2     2 0 4 my $this = shift;
2242 2         5 my ($Key) = @_;
2243            
2244 2         5 my $Index = {}; @$Index{reverse @{$this->sel($Key)}} = reverse @{$this->selection()};
  2         3  
  2         8  
  2         6  
2245            
2246 2         20 return($Index);
2247             }
2248              
2249              
2250             sub hash_all
2251             {
2252 2     2 0 5 my $this = shift;
2253 2         4 my ($Key, $Val) = @_;
2254              
2255 2         4 my $Hash = {}; @$Hash{reverse @{$this->col($Key)}} = reverse @{$this->col($Val)};
  2         5  
  2         5  
  2         7  
2256              
2257 2         11836 return($Hash);
2258             }
2259              
2260             sub hash_sel
2261             {
2262 2     2 0 6 my $this = shift;
2263 2         4 my ($Key, $Val) = @_;
2264            
2265 2         4 my $Hash = {}; @$Hash{reverse @{$this->sel($Key)}} = reverse @{$this->sel($Val)};
  2         5  
  2         4  
  2         21  
2266            
2267 2         29 return($Hash);
2268             }
2269              
2270              
2271             =pod
2272              
2273             =head1 DATA ROWS (RECORDS)
2274              
2275             ## Getting or setting rows / records
2276              
2277             $t->row($Num) ## Get a row or make empty one.
2278             $t->row_get($Num)
2279              
2280             $t->row($Num, $HashRef) ## Set all of a row all at once.
2281             $t->row_set($Num, $HashRef)
2282              
2283             $t->row_set($Num, undef) ## Delete a row completely
2284             $t->row_delete($Num)
2285             $t->row_delete($Beg, $End)## Delete a range of rows
2286              
2287             $t->row_move($Old, $New) ## Move a row to before $New
2288              
2289             $t->row_empty() ## An empty hash
2290             $t->row_exists($Num) ## True if $Num < $t->length()
2291              
2292             $t->rows($RowList) ## Get list of multiple row nums
2293              
2294             $t->row_list($Num) ## Get row vals as a list
2295             $t->row_list($Num, $Fields) ## Get row vals: specified fields
2296              
2297             $t->row_list_set($Num, undef, $Vals) ## Set row vals as a list
2298             $t->row_list_set($Num, $Fields, $Vals) ## Set row vals as a list
2299             $t->row_list_set($Num, $Fields) ## Set vals to empty/undef
2300              
2301             Usually, when working with Data::CTable objects, you are operating on
2302             entire columns or tables at a time (after all: any transformation you
2303             do on one record you almost always want to do on all records or all
2304             selected ones).
2305              
2306             You should very rarely need to access data by retrieving rows or
2307             setting rows, moving them around individually, and so on. (It's much
2308             cleaner, and much more efficient to manipulate the selection() (list
2309             of selected row numbers) instead -- just delete a row number from the
2310             selection, for example, and then for most operations it's almost as if
2311             the row is gone from the table, except the data is really still
2312             there.)
2313              
2314             However, if on rare occasions you really do need direct row
2315             operations, you're reading the right section.
2316              
2317             A row is generally accessed as a hash. The hash you provide or get
2318             back is not saved by the object in any way. Data values are always
2319             copied in or out of it, so you always "own" the hash.
2320              
2321             Rows are specified by $Num -- the row number with in the unsorted
2322             columns (the raw data in the table). These numbers are just array
2323             indices into the data columns, and so their legal range is:
2324              
2325             [0 .. ($t->length() - 1)] ## (Zero-based row numbering.)
2326              
2327             The row hash (or "record") has keys that are field names and values
2328             that are copies of the scalar values stored in the data columns within
2329             the table.
2330              
2331             row() always copies only the fields in fieldlist(), except for
2332             row_list() which allows you to specify an optional $Fields parameter
2333             which can override the current fieldlist().
2334              
2335             If the fieldlist happens to be a subset of all fields, but you really
2336             want to get all fields in your record, then call fieldlist_set(0)
2337             first to permanently or temporarily delete it.
2338              
2339             row() and row_get() always return a hash.
2340              
2341             row($Num, $Hash), row_set() take a hash and set just the fields you
2342             specify in the hash (in the given row of course). Any non-existent
2343             field names in the hash are created, so be careful.
2344              
2345             In fact, in general with either getting or setting rows, any
2346             non-existent fields mentioned will be created for you (by internally
2347             calling col()). So you could build a whole table of 100 rows by
2348             starting with an empty, new, table and setting row 99 from a hash that
2349             gives the field names.
2350              
2351             Setting a row number higher than any existing row number with row(),
2352             row_set() or row_force() will automatically set the new length of the
2353             entire table to match (extending all the columns with empty rows as
2354             necessary).
2355              
2356             IMPORTANT: IF YOU SIMPLY MUST ADD ROWS SEQUENTIALLY, do not let the
2357             table auto-extend by one with each row you set. This is slow and gets
2358             rapidly slower if there's lots of data because the arrays holding the
2359             data columns will keep getting reallocated on every insert. Instead,
2360             first pre-extend the table to your highest row number by calling
2361             length($Len), and then set your rows. Or easier: if convenient just
2362             set your rows starting with the highest-numbered one first. If you
2363             don't know how many you'll have, guess or estimate and pre-extend to
2364             the estimated number and then cut back later. This will be faster
2365             than extending all columns by one each time.
2366              
2367             row_delete() removes a row or range of rows completely from the table.
2368             Any rows above the deleted ones will move down and the table's
2369             length() will decrease. If the data columns are very large, this
2370             could be a bit slow because a lot of data could be moved around. The
2371             low and high row numbers will be limited for you to 0 and length() -
2372             1, respectively. Null ranges are OK and are silently ignored. The
2373             range is inclusive, so to delete just row 99, call row_delete(99) or
2374             row_delete(99,99).
2375              
2376             EFFICIENCY NOTE: Don't call row_delete() to remove lots of individual
2377             rows. Instead, select those row numbers by setting the selection (if
2378             not already selected), and then invert the selection using
2379             selection_invert(), so the undesired rows are deselected, and then use
2380             the cull() method to rewrite the entire table at once. The deselected
2381             rows will be omitted very efficiently this way.
2382              
2383             row_move() moves a row from its $Old row number to the position before
2384             the row currently in row $New (specify $New = length() to move the row
2385             to the end). Again, in shuffling data in columns, lots of data could
2386             get moved around by this operation, so expect it to be slow. If as
2387             with row_delete(), if you will be doing several moves, consider
2388             building an appropriate selection() first, and then using cull()
2389             instead.
2390              
2391             Using row_delete() and row_move() to shift records around changes the
2392             record numbers of the affected records and many others in the table.
2393             The record numbers in the custom selection, if any, are updated to
2394             reflect these changes, so the records that were selected before will
2395             still be selected after the move (except those that were deleted of
2396             course). If you had a private copy of the selection, your copy will
2397             likely become outdated after these operations. You should get it
2398             again by calling selection().
2399              
2400             row_empty() returns a hash whose keys are the entries in fieldlist()
2401             and whose values are undef. (You could use it to fill in values
2402             before calling row_set()). Note: in this class, row_empty() does
2403             exactly the same thing as fieldlist_hash() when the latter is called
2404             with no arguments.
2405              
2406             row_exists() returns true if C<(($Num E= 0) && ($Num E $t-Elength()))>.
2407              
2408             rows() calls row() for each row num in a list and returns a list of
2409             the resulting hashes.
2410              
2411             row_list() gets row values as a list instead of a hash. They appear
2412             in the order specified in fieldlist() unless you supply an optional
2413             $Fields parameter listing the fields you want to get.
2414              
2415             row_list_set() sets row values as a list instead of a hash. Pass your
2416             own $Fields list or undef and fieldlist() will be used. $Values
2417             should be a list with the same number of values as fields expected;
2418             any shortage will result in undef/empty values being set.
2419              
2420             =cut
2421              
2422             {}; ## Get emacs to indent correctly.
2423              
2424             sub row
2425             {
2426 69     69 0 144 my $this = shift;
2427 69         177 my ($Num, $Row) = @_;
2428              
2429             ## Set if specified.
2430 69 50       188 return($this->row_set($Num, $Row)) if defined($Row);
2431              
2432             ## Else get.
2433 69         161 return($this->row_get($Num));
2434             }
2435              
2436             sub row_get
2437             {
2438 69     69 0 82 my $this = shift;
2439 69         326 my ($Num) = @_;
2440              
2441 69         138 my $Fields = $this->fieldlist();
2442 69         111 my $Row = {}; @$Row{@$Fields} = map {$this->col($_)->[$Num]} @$Fields;
  69         128  
  270         1127  
2443              
2444 69         707 return($Row);
2445             }
2446              
2447             sub row_set
2448             {
2449 3     3 0 18 my $this = shift;
2450 3         7 my ($Num, $Row) = @_;
2451              
2452             ## We thoughtfully sort the keys in case columns will get created
2453             ## in this order.
2454              
2455 3         25 my $Fields = [sort keys %$Row];
2456              
2457             ## Pre-extend the table to accommodate row $Num if necessary.
2458             ## This will do nothing if there are not yet any fields in the
2459             ## table (and the length will still be effectively zero).
2460              
2461 3 100       13 $this->length($Num + 1) unless $this->length() >= $Num + 1;
2462            
2463             ## Insert into columns, creating them if necessary.
2464 3         8 foreach (@$Fields) {$this->col($_)->[$Num] = $Row->{$_}};
  10         31  
2465            
2466 3         11 return($Row); ## Why not?
2467             }
2468              
2469             sub row_delete
2470             {
2471 10     10 0 310 my $this = shift;
2472 10         17 my ($First, $Last) = @_;
2473              
2474             ## Nothing to do if $First not specified.
2475 10 100       31 return() unless defined($First);
2476              
2477             ## Default is $Last is same as first (remove one row only)
2478 9 100       25 $Last = $First unless defined($Last);
2479              
2480 9         22 my $LastIndex = $this->length() - 1;
2481              
2482             ## Restrict the range to meaningful values.
2483 9         231 $First = max($First, 0 ); ## First could be very high, to indicate a null range.
2484 9         46 $Last = min($Last, $LastIndex); ## Last could be negative, like -1, to indicate null range.
2485              
2486             ## Nothing to do if the range is empty.
2487 9 100       28 return() if $Last < $First;
2488              
2489 8         18 my $Fields = $this->fieldlist();
2490              
2491 8         15 my $RangeSize = ($Last - $First + 1);
2492            
2493 8         24 foreach (@$Fields) {splice @{$this->col($_)}, $First, $RangeSize};
  32         39  
  32         61  
2494            
2495             ## Here we could have trapped all the list segments we spliced out
2496             ## and return them in nice CTable-ish hash. Maybe we will some
2497             ## day. This would be a way to split a range of rows out of a
2498             ## table object to create another table object. We could even
2499             ## call it the "split" method...
2500            
2501             ## After deleting the rows, we need to adjust the _Selection if
2502             ## present. Deleted row numbers in the selection need to be
2503             ## omitted; row numbers greater than the range of deleted ones
2504             ## need to be decreased by the size of the range reduction, and
2505             ## others need to be left untouched.
2506            
2507 0 0       0 $this->{_Selection} =
    0          
2508             [map
2509             {
2510 0         0 ($_ < $First ? $_ : ## Before range: pass through.
2511             ($_ <= $Last ? () : ## In range: omit.
2512             $_ - $RangeSize)) ## After range: reduce by range size
2513 8 50       39 } @{ $this->{_Selection}}]
2514             if $this->{_Selection};
2515             }
2516              
2517             sub row_move
2518             {
2519 8     8 0 15 my $this = shift;
2520 8         12 my ($Old, $New) = @_;
2521            
2522             ## $Old and $New are required params and must not be undef.
2523 8 50 33     44 goto done unless defined($Old) && defined($New);
2524            
2525             ## If $Old and $New are the same or one apart, there's nothing to do.
2526 8 100       24 goto done if ($New == $Old); ## This would mean a no-op.
2527 7 100       26 goto done if ($New == $Old + 1); ## This would mean a no-op.
2528            
2529 5         13 my $Length = $this->length();
2530            
2531             ## Ensure both $Old and $New are legal indices.
2532 5 50 33     36 goto done if (($Old < 0) || ($Old > $Length - 1));
2533 5 50 33     26 goto done if (($New < 0) || ($New > $Length )); ## New has a range up to $Length, meaning move to end.
2534              
2535 5         16 my $Fields = $this->fieldlist_all();
2536              
2537 5 100       20 if ($Old < $New) ## Move forward (to higher / later row num)
2538             {
2539             ## Delete from the lower position and insert into higher -- MINUS ONE to account for prior shortening.
2540 2         5 my $Row = {};
2541 2         7 foreach (@$Fields) {$Row->{$_} = splice(@{$this->col($_)}, $Old , 1 )};
  8         10  
  8         16  
2542 2         6 foreach (@$Fields) { splice(@{$this->col($_)}, $New - 1, 0, $Row->{$_})};
  8         11  
  8         15  
2543              
2544             }
2545             else ## Move backward (to lower / earlier row num)
2546             {
2547             ## Delete from the higher position and insert into lower.
2548 3         6 my $Row = {};
2549 3         9 foreach (@$Fields) {$Row->{$_} = splice(@{$this->col($_)}, $Old , 1 )};
  12         13  
  12         24  
2550 3         9 foreach (@$Fields) { splice(@{$this->col($_)}, $New , 0, $Row->{$_})};
  12         14  
  12         21  
2551              
2552             }
2553              
2554             ## After moving the rows, we need to adjust the _Selection if
2555             ## present. Row numbers outside the shuffled range stay the same;
2556             ## the moved row number(s) change; others (inside the range) get
2557             ## shifted down or up by 1.
2558              
2559 5 100       16 if ($Old < $New) ## Move forward / higher / later
2560             {
2561 6 100       27 $this->{_Selection} =
    50          
    100          
2562             [map
2563             {
2564 2         5 ($_ == $Old ? $New - 1 : ## Moved row: change num to new - 1
2565             ($_ < $Old ? $_ : ## Less than $Old: no change
2566             ($_ >= $New ? $_ : ## Grtr= than $New: no change
2567             $_ - 1))) ## In range: shift down by 1.
2568 2 50       9 } @{ $this->{_Selection}}]
2569             if $this->{_Selection};
2570            
2571             }
2572             else ## Move backward / lower / earlier
2573             {
2574 9 100       39 $this->{_Selection} =
    100          
    100          
2575             [map
2576             {
2577 3         7 ($_ == $Old ? $New : ## Moved row: change num to new
2578             ($_ >= $Old ? $_ : ## Grtr= than $Old: no change
2579             ($_ < $New ? $_ : ## Less than $New: no change
2580             $_ + 1))) ## In range: shift up by 1.
2581 3 50       12 } @{ $this->{_Selection}}]
2582             if $this->{_Selection};
2583             }
2584              
2585 8         24 done:
2586             return;
2587             }
2588              
2589             sub row_empty
2590             {
2591 2     2 0 7 my $this = shift;
2592              
2593 2         6 my $Fields = $this->fieldlist();
2594 2         4 my $Row = {}; @$Row{@$Fields} = undef;
  2         9  
2595              
2596 2         16 return($Row);
2597             }
2598              
2599             sub row_exists
2600             {
2601 14     14 0 35 my $this = shift;
2602 14         27 my ($Num) = @_;
2603              
2604 14   100     75 return(($Num >= 0) && ($Num < $this->length()));
2605             }
2606              
2607             sub rows
2608             {
2609 4     4 0 13 my $this = shift;
2610 4         7 my ($Nums) = @_;
2611            
2612 4         10 return([map {$this->row($_)} @$Nums]);
  8         21  
2613             }
2614              
2615             sub row_list
2616             {
2617 18     18 0 781 my $this = shift;
2618 18         31 my ($Num, $Fields) = @_;
2619              
2620             ## $Fields argument is optional and defaults to fieldlist();
2621 18   66     96 $Fields ||= $this->fieldlist();
2622              
2623 18         32 my $Row = [map {$this->col($_)->[$Num]} @$Fields];
  88         226  
2624 18         118 return($Row);
2625             }
2626              
2627             sub row_list_set
2628             {
2629 6     6 0 37 my $this = shift;
2630 6         7 my ($Num, $Fields, $Vals) = @_;
2631              
2632             ## $Fields argument is optional and defaults to fieldlist();
2633 6   66     18 $Fields ||= $this->fieldlist();
2634              
2635             ## $Vals is optional and defaults to [].
2636 6   50     13 $Vals ||= [];
2637              
2638             ## Pre-extend the table to accommodate row $Num if necessary.
2639 6 50       10 $this->length($Num + 1) unless $this->length() >= $Num + 1;
2640            
2641             ## Set the $Vals in row $Num in the order given by $Fields.
2642 6         15 foreach (0..$#$Fields) {$this->col($Fields->[$_])->[$Num] = $Vals->[$_]}
  18         36  
2643             }
2644              
2645             =pod
2646              
2647             =head1 ROW / RECORD COUNT (TABLE LENGTH)
2648              
2649             ## Getting or setting table length
2650              
2651             $t->length() ## Get length
2652             $t->length_get()
2653              
2654             $t->length(22) ## Set length (truncate or pre-extend)
2655             $t->length_set(22)
2656              
2657             $t->extend() ## Set length of all columns to match longest
2658              
2659             The length* methods assume the table already has columns of equal
2660             length. So the length of the table is the length of any field taken
2661             at random. We choose the first one in the field list.
2662              
2663             Setting the length will truncate or pre-extend every column in the
2664             table to a given length as required.
2665              
2666             (Pre-extending means setting each column's length via $# so that it
2667             has the correct number of entries already allocated (and filled with
2668             undef) so that operations that fill up the table can be done much more
2669             quickly than with push().
2670              
2671             However, if a new column has been added directly, or a table has been
2672             constructed out of columns whose length may not initially match, the
2673             extend() method may be (should be) called to inspect all columns and
2674             extend them all to match the longest one. Note that extend() operates
2675             on all fields in the object, ignoring the custom _FieldList if any.
2676              
2677             The length of a table with no columns is zero.
2678              
2679             =cut
2680              
2681             sub length
2682             {
2683 471     471 0 1149 my $this = shift;
2684 471         649 my ($Length) = @_;
2685            
2686 471 100       1572 return(defined($Length) ?
2687             $this->length_set($Length) :
2688             $this->length_get());
2689             }
2690              
2691             sub length_get
2692             {
2693 469     469 0 548 my $this = shift;
2694              
2695 469         1781 my $FieldList = $this->fieldlist();
2696 469         905 my $FirstField = $FieldList->[0];
2697 469         792 my $Col = $this->{$FirstField};
2698 469 100       1240 my $Length = (ref($Col) eq 'ARRAY' ? @$Col+0 : 0);
2699              
2700 469         1967 return($Length);
2701             }
2702              
2703             sub length_set
2704             {
2705 26     26 0 73 my $this = shift;
2706 26         41 my ($Length) = @_;
2707              
2708             ## Apply the length-setting logic to any field found in the hash
2709             ## OR listed in the field list. They will all be created if not
2710             ## already present.
2711              
2712 26         963 my $FieldList = [@{$this->fieldlist_all()}, @{$this->fieldlist()}];
  26         63  
  26         1153  
2713            
2714 26         82 foreach my $FieldName (@$FieldList)
2715             {
2716 223         406 $#{$this->col($FieldName)} = ($Length - 1); ## $Length = 0 => $# = -1 => empty list.
  223         385  
2717             };
2718              
2719             ## Since records might have been deleted, re-validate the
2720             ## _Selection, if it is present.
2721              
2722 26         94 $this->selection_validate();
2723            
2724 26         76 return($Length);
2725             }
2726              
2727             sub extend
2728             {
2729 22     22 0 33 my $this = shift;
2730 22         43 my $Length = 0;
2731              
2732             ## Find the length of the longest vector...
2733            
2734 22         34 foreach (@{$this->fieldlist_all()}) {$Length = max($Length, $#{$this->{$_}} + 1)};
  22         217  
  114         135  
  114         307  
2735              
2736             ## ...and set them all to be that length.
2737              
2738 22         95 $this->length_set($Length);
2739             }
2740              
2741             =pod
2742              
2743             =head1 SELECTIONS
2744              
2745             ## Getting or setting the custom selection list itself (_Selection)
2746              
2747             $t->selection() ## Get sel if any; else all()
2748             $t->selection_get()
2749              
2750             $t->selection($List) ## Set sel (list of rec nums)
2751             $t->selection_set($List)
2752            
2753             $t->selection(0) ## Remove sel (select all)
2754             $t->selection_set(undef)
2755             $t->selection_delete()
2756             $t->select_all()
2757              
2758             $t->selection_inverse() ## Get inverse copy of selection
2759             $t->select_inverse() ## Invert the selection
2760              
2761             $t->selection_validate() ## Remove invalid #s from sel
2762              
2763             ## List of all rec nums present (regardless of selection)
2764              
2765             $t->all()
2766              
2767             ## Getting or setting just selected fields in columns
2768             ## (as contrasted with col() and friends).
2769              
2770             $t->sel($ColName) ## Get col but only records in sel
2771             $t->sel_get($ColName)
2772              
2773             $t->sel($ColName, $ListRef) ## Set selected fields in col...
2774             $t->sel_set($ColName, $ListRef) ##... in selection order
2775              
2776             $t->sel_set($ColName) ## Set selected fields to undef
2777             $t->sel_clear($ColName)
2778              
2779             $t->sels($ColList) ## Like cols, but selected fields
2780             $t->sels_hash($ColList) ## " " cols_hash()... " " " "
2781              
2782             ## Finding out size of selection (number of rows)
2783              
2784             $t->sel_len() ## Get number of selected rows.
2785              
2786             A selection is an ordered list of record numbers. The record numbers
2787             in the selection may be a subset of available records. Furthermore,
2788             they may be in non-record-number order, indicating that the records
2789             have been sorted.
2790              
2791             Record numbers are numeric array indices into the columns in the
2792             table. It is an error for any selection list to contain an index less
2793             than zero or greater than (length() - 1), so if you set a selection
2794             explicitly, be careful.
2795              
2796             Any selection list you get or set belongs to the object. Be careful
2797             of modifying its contents.
2798              
2799             The custom selection, if any, is stored internally in the _Selection
2800             parameter. If this parameter is absent, the selection defaults to
2801             all() -- i.e. a list of all record numbers, in order:
2802             [0..($this->length() - 1)] (which becomes [] if length() is 0).
2803              
2804             REMEMBER: length() is one-based, but record numbers are zero-based.
2805              
2806             Removing the selection (that is, removing the LIST itself of which
2807             records are selected), is the same as selecting all records.
2808             consequently, selection(0), selection_delete(), and select_all() are
2809             all synonymous.
2810              
2811             selection_validate() removes any entries from the current _Selection
2812             list (if any) that are not valid record numbers -- i.e. it removes any
2813             record whose integer value is < 0 or greater than length() - 1. This
2814             routine is mainly used by other methods that might delete records,
2815             such as length_set().
2816              
2817             Getting or setting just selected data from columns
2818              
2819             Sometimes, you don't want to get/set entire columns, you instead want
2820             to get or set data in just the selected fields in a column.
2821              
2822             The sel(), sel_get(), sel_set(), sels() and sels_hash() methods are
2823             analagous to the corresponding col(), ... cols_hash() methods except
2824             in these two ways:
2825              
2826             - the 'sels' variants get or set just selected data, as determined by
2827             the current selection(), which gives an ordered list of the selected /
2828             sorted records.
2829              
2830             - the 'sels' variants all make COPIES of the data you request or
2831             supply -- the data is copied out of or into the correspnding column.
2832             So, you "own" any vector you pass or receive in reply.
2833              
2834             So, for example, imagine you have just set selection() to only list
2835             record numbers where the LastName field is not empty. Then you have
2836             called sort() to sort those record numbers by the LastName field. You
2837             could then call $t->sel('LastName') to get a sorted list of all
2838             non-empty last names.
2839              
2840             It might be helpful to think of "sel" as short for "selected". So
2841             $t->sel('LastName') would mean "get the selected field values from the
2842             LastName field".
2843              
2844             =cut
2845              
2846             sub selection
2847             {
2848 477     477 1 703 my $this = shift;
2849 477         585 my ($Selection) = @_;
2850              
2851             ## Set if specified.
2852 477 100       1067 $this->selection_set($Selection) if defined($Selection);
2853            
2854             ## Get and return.
2855 477         1160 $Selection = $this->selection_get();
2856              
2857 477         1547 return($Selection);
2858             }
2859              
2860             sub selection_get
2861             {
2862 478     478 0 1715 my $this = shift;
2863              
2864 478   66     1604 my $Selection = $this->{_Selection} || $this->selection_default();
2865              
2866 478         877 return($Selection);
2867             }
2868              
2869             sub selection_set
2870             {
2871 176     176 0 260 my $this = shift;
2872 176         254 my ($Selection) = @_;
2873              
2874 176 100       458 if (ref($Selection) eq 'ARRAY')
2875             {
2876             ## Set if specified...
2877 36         75 $this->{_Selection} = $Selection;
2878             }
2879             else
2880             {
2881             ## Otherwise, delete and return original if any.
2882 140         336 $Selection = delete $this->{_Selection};
2883             }
2884            
2885 176         323 return($Selection);
2886             }
2887              
2888             sub selection_delete
2889             {
2890 2     2 0 4 my $this = shift;
2891 2         8 $this->selection_set();
2892             }
2893              
2894             sub select_all
2895             {
2896 87     87 0 142 my $this = shift;
2897 87         305 $this->selection_set();
2898             }
2899              
2900             sub select_none
2901             {
2902 4     4 0 10 my $this = shift;
2903 4         17 $this->selection_set([]);
2904             }
2905              
2906             sub selection_default
2907             {
2908 322     322 0 373 my $this = shift;
2909              
2910 322         653 my $Selection = $this->all();
2911              
2912 322         1049 return($Selection);
2913             }
2914              
2915             sub all
2916             {
2917 343     343 0 458 my $this = shift;
2918 343         857 my $RowNums = [0..($this->length() - 1)];
2919              
2920 343         631 return($RowNums);
2921             }
2922              
2923             sub selection_inverse
2924             {
2925 9     9 0 17 my $this = shift;
2926 9         24 my $Sel = $this->selection();
2927 9         26 my $All = $this->all();
2928              
2929 9         22 @$All[@$Sel] = undef;
2930 9         121 $All = [grep {defined} @$All];
  27         63  
2931              
2932 9         33 return($All);
2933             }
2934              
2935             sub select_inverse
2936             {
2937 1     1 0 4 my $this = shift;
2938              
2939 1         5 return($this->{_Selection} = $this->selection_inverse());
2940             }
2941              
2942             sub selection_validate
2943             {
2944 27     27 0 44 my $this = shift;
2945              
2946 27 100       203 if (ref($this->{_Selection}) eq 'ARRAY')
2947             {
2948 1         6 $this->{_Selection} = $this->selection_validate_internal($this->{_Selection});
2949             }
2950             }
2951              
2952             sub selection_validate_internal
2953             {
2954 19     19 0 33 my $this = shift;
2955 19         32 my ($Selection) = @_;
2956              
2957 19         59 my $Length = $this->length();
2958              
2959 19 100       41 $Selection = [grep {(($_ >= 0) && ($_ < $Length))} @$Selection];
  71         357  
2960              
2961 19         46 return($Selection);
2962             }
2963              
2964             sub sel ## ($ColName, [$Vector])
2965             {
2966 284     284 0 376 my $this = shift;
2967 284         484 my ($ColName, $Vector) = @_;
2968              
2969             ## Set if specified.
2970 284 50       819 if (defined($Vector))
2971             {
2972 0         0 $this->sel_set($ColName, $Vector);
2973             ## Nothing to return.
2974             }
2975             ## Get and return.
2976             else
2977             {
2978 284         1476 my $Sel = $this->sel_get($ColName);
2979 284         998 return($Sel);
2980             }
2981             }
2982              
2983             sub sel_get
2984             {
2985 318     318 0 372 my $this = shift;
2986 318         420 my ($ColName, $Selection) = @_;
2987              
2988 318         726 my $Col = $this->col($ColName);
2989 318   66     10828 $Selection ||= $this->selection();
2990 318         986 my $Sel = [@$Col[@$Selection]];
2991            
2992 318         661 return($Sel);
2993             }
2994              
2995             sub sel_set ## ($ColName, [$Vector])
2996             {
2997 5     5 0 8 my $this = shift;
2998 5         11 my ($ColName, $Vector) = @_;
2999              
3000 5         12 my $Col = $this->col($ColName);
3001 5         10 my $Selection = $this->selection();
3002              
3003 5 100 66     27 if (defined($Vector) && (ref($Vector) eq 'ARRAY'))
3004             {
3005 3         12 @$Col[@$Selection] = @$Vector;
3006             }
3007             else
3008             {
3009 2         8 @$Col[@$Selection] = undef;
3010             }
3011             }
3012              
3013             sub sel_clear ## ($ColName)
3014             {
3015 2     2 0 4 my $this = shift;
3016 2         4 my ($ColName) = @_;
3017            
3018 2         7 $this->sel_set($ColName);
3019             }
3020              
3021             sub sel_len
3022             {
3023 20     20 0 67 my $this = shift;
3024            
3025 16         162 return(ref($this->{_Selection}) eq 'ARRAY' ?
3026 20 100       69 @{$this->{_Selection}}+0 :
3027             $this->length());
3028             }
3029              
3030             sub sels ## ($ColNames)
3031             {
3032 56     56 0 98 my $this = shift;
3033 56         163 my ($ColNames) = @_;
3034 56   66     151 $ColNames ||= $this->fieldlist();
3035 56         322 my $Sels = [map {$this->sel($_)} @$ColNames];
  216         491  
3036              
3037 56         292 return($Sels);
3038             }
3039              
3040             sub sels_hash ## ($ColNames)
3041             {
3042 4     4 0 10 my $this = shift;
3043 4         10 my ($ColNames) = @_;
3044 4   66     23 $ColNames ||= $this->fieldlist();
3045 4         17 my $Sels = $this->sels($ColNames);
3046 4         11 my $SelsHash = {}; @$SelsHash{@$ColNames} = @$Sels;
  4         18  
3047              
3048 4         51 return($SelsHash);
3049             }
3050              
3051             =pod
3052              
3053             =head1 SEARCHING / SELECTING RECORDS
3054              
3055             ## Modifying the table's custom selection (_Selection)
3056              
3057             $t->select_all() ## Set _Selection = $t->all() or undef
3058             $t->select_none() ## Set _Selection = []
3059             $t->select_inverse() ## Invert the curr. sel. (and get it)
3060              
3061             ## Specific searches: "the select() methods"
3062              
3063             $t->select($Field1=>$Sub1, ## Del nonmatching recs from sel.
3064             $Field2=>$Sub2, ## i.e. narrow sel. to match
3065             ...);
3066              
3067             $t->omit ($Field1=>$Sub1, ## Del matching recs from sel.
3068             $Field2=>$Sub2,
3069             ...);
3070              
3071             $t->add ($Field1=>$Sub1, ## Add matching recs to sel.
3072             $Field2=>$Sub2,
3073             ...);
3074              
3075             $t->but ($Field1=>$Sub1, ## Add nonmatching recs to sel.
3076             $Field2=>$Sub2,
3077             ...);
3078              
3079             ## Getting useful lists of record numbers...
3080              
3081             $t->all() ## Get "full" sel. (all record #s)
3082             $t->selection() ## Get current selection
3083             $t->selection_inverse() ## Get inverse copy of curr. sel.
3084              
3085             ## Example 1: Refine a selection by narrowing down...
3086              
3087             $t->select_all()
3088             $t->select(Field1 => sub {$_});
3089             $t->select(Field2 => sub {$_});
3090             $t->select(Field3 => sub {$_});
3091              
3092             ## Example 2: Manually refine and set the selection...
3093              
3094             $Sel = [grep {$t->col($Field1)->[$_]} @{$t->all ()}];
3095             $Sel = [grep {$t->col($Field2)->[$_]} @$Sel];
3096             $Sel = [grep {$t->col($Field3)->[$_]} @$Sel];
3097             $t->selection($Sel); ## Set the selection when done.
3098              
3099             ## Example 3: Complex manual search using calculated value
3100            
3101             my $A = $t->col('A');
3102             my $B = $t->col('B');
3103             my $S = [grep
3104             {my $X = $A->[$_] + $B->[$_]; ($X > 100 && $X < 200);}
3105             @{$t->all()}]; ## Or could start with $t->selection().
3106             $t->selection($S); ## Set the selection when done.
3107              
3108             ## Example 4: Refine a selection by building up...
3109              
3110             $t->select_none()
3111             $t->add(Field1 => sub {$_});
3112             $t->add(Field2 => sub {$_});
3113             $t->add(Field3 => sub {$_});
3114              
3115             ## Example 5: Combining the select() methods to build a query...
3116              
3117             $t->select_all()
3118             $t->select(Status => sub {/prime/i });
3119             $t->omit (DueDate => sub {$_ > $Today});
3120             $t->add (Force => sub {$_ });
3121              
3122             select() and its friends omit(), add(), and but(), known collectively
3123             as "the select() methods," all work similarly: they take a series of
3124             one or more pairs indicating matches to be done, where each match is
3125             specified as (FieldName => Subroutine).
3126              
3127             In addition to the field names already present in the table, the
3128             FieldName in any Spec may also be one of these two special
3129             pseudo-fields:
3130              
3131             =over 4
3132              
3133             =item _RecNum
3134              
3135             the record number of the record being compared
3136              
3137             =item _SelNum
3138              
3139             the numerical position of the record being compared within the
3140             previous selection (only usable with select() and omit() since add()
3141             and but() by definition operate on non-selected records).
3142              
3143             =back
3144              
3145             For example:
3146              
3147             ## Match 2nd 100 rec numbers
3148             $t->select(_RecNum => sub {$_ >= 100 && $_ <= 199});
3149              
3150             ## Match 2nd 100 currently selected/sorted items
3151             $t->select(_SelNum => sub {$_ >= 100 && $_ <= 199});
3152              
3153             Be careful when using _SelNum in a search. In the above _SelNum search
3154             example, since the selection itself will be modified by select(), the
3155             items that were formerly selection items 100 - 199 will now be _SelNum
3156             0 - 99 in the new selection.
3157              
3158             The Subroutine is an anonymous grep-style predicate that operates on
3159             $_ and should return true/false to indicate a match with an element of
3160             the field FieldName.
3161              
3162             The order of multiple matches in a single method call is significant
3163             only in that the searches can be faster if the field that will match
3164             the fewest records is listed first.
3165              
3166             A given FieldName may be listed in the specs more than once if it has
3167             multiple search criteria that you prefer to execute as multiple
3168             subroutines (though it would be more efficient on very large tables to
3169             combine their logic into one subroutine joined with "&&").
3170              
3171             Each field match will be applied (with an implied AND joining them) to
3172             determine whether the record itself matches. Then, based on whether
3173             the record itself matches, it will either be added or deleted from the
3174             selection based on which method is being called:
3175              
3176             method... operates on... action....
3177             ------------------------------------------------------------------
3178             select() selected records Keep only recs that DO match
3179             omit() selected records Keep only recs that DO NOT match
3180             add() non-selected recs Add recs that DO match
3181             but() non-selected recs Add recs that DO NOT match
3182              
3183             Here's how to think about what's going on:
3184              
3185             methods... think...
3186             ------------------------------------------------------------------
3187             select() "SELECT things matching this"...
3188             omit() "... then OMIT those matching this."
3189              
3190             select() "SELECT things matching this"...
3191             add() "... and ADD any others matching this."
3192              
3193             select() "SELECT things matching this"...
3194             but() "... and add any others BUT those matching this."
3195              
3196             select() and omit() both NARROW the selection.
3197              
3198             add() and but() both INCREASE the selection.
3199              
3200             IMPORTANT: You DO NOT need to use these select() routines to work with
3201             selections. It may be much easier for you to clarify your logic, or
3202             more efficient to express your search, using a single grep or series
3203             of grep operations as in Examples 2 or 3 above.
3204              
3205             Building the selection manually is required if you want to filter
3206             based on any COMPLEX RELATIONSHIPS BETWEEN FIELDS. For example, if
3207             you want to add two fields and match or reject the record based on the
3208             sum of the fields.
3209              
3210             In Example 3 above, we add the values in fields "A" and "B" and then
3211             match the record only if the SUM is between 100 and 199. By grepping
3212             to produce a subset of @{$t->all()}, you end up with a Selection -- a
3213             list of record numbers you want "selected". Then you call
3214             $t->selection() to put the selection you built into the object.
3215              
3216             If you had instead wanted to narrow an existing selection in the above
3217             example, you would start with $t->selection() (which defaults to
3218             $t->all()) instead of starting with $t->all().
3219              
3220             Each of the select() methods returns $this->selection() as a
3221             convenience.
3222              
3223             =head2 The effects of modifying a sorted selection
3224              
3225             Generally, you should sort AFTER finding, and you should not generally
3226             rely on sort order after doing a find. But in case you want to know,
3227             the following chart explains what happens to the sort order after the
3228             various select() commands are called (at least in the current
3229             implementation, which may change without notice):
3230              
3231             method... effect on an existing sort order...
3232             ------------------------------------------------------------------
3233             select() relative sort order is preserved (stay sorted)
3234             omit() all selected recs restored to "natural" order (unsorted)
3235             add() orig. recs preserved; new recs appended: "natural" order
3236             but() orig. recs preserved; new recs appended: "natural" order
3237              
3238             In other words, you could sort() first and then call select() to
3239             narrow down the selection repeatedly without disrupting the sort
3240             order. However, any of the other methods will disrupt the sort order
3241             and you would need to re-sort. The preservation of order when using
3242             select(), and other sort order effects, are likely but not guaranteed
3243             to be preserved in future implementations.
3244              
3245             =head2 Hints about Boolean logic
3246              
3247             Consider the following example and the alternative below it. You
3248             might initially think these are equivalent, but they're not:
3249              
3250             ## Case 1:
3251              
3252             my $Sel = $t->add(Force => sub {$_ == 1 });
3253             my $Sel = $t->add(Status => sub {$_ eq 'Prime'});
3254              
3255             ## Case 2:
3256              
3257             my $Sel = $t->add(Force => sub {$_ == 1 },
3258             Status => sub {$_ eq 'Prime'});
3259              
3260             Case 1 extends the selection by adding all records where Force == 1,
3261             and then extends it again by adding all additional records where
3262             Status eq 'Prime'.
3263              
3264             Case 2 adds only those records where: Force == 1 AND ALSO, IN THE SAME
3265             RECORD, Status eq 'Prime'.
3266              
3267             One final note about logic. This is not SQL and these select() etc.
3268             routines are not meant to replace the full power of a programming
3269             language.
3270              
3271             If you want full Boolean expressions, use the power of Perl to form
3272             your own arbitrarily complex query using grep as in Example 3 above.
3273              
3274             Writing your own grep is also almost always faster than chaining the
3275             builtin select() methods or using multiple Field / Sub specifications,
3276             so keep that in mind when working with extremely large data sets.
3277              
3278             With tables of only a few thousand records or so, you probably won't
3279             notice the difference in efficiency.
3280              
3281             =cut
3282              
3283             {}; ## Get emacs to indent correctly.
3284              
3285 4     4 0 18 sub select {my $this = shift; return($this->select_internal(!'Add', !'Not', @_))}
  4         15  
3286 5     5 0 16 sub omit {my $this = shift; return($this->select_internal(!'Add', 'Not', @_))}
  5         24  
3287 3     3 0 5 sub add {my $this = shift; return($this->select_internal( 'Add', !'Not', @_))}
  3         11  
3288 1     1 0 3 sub but {my $this = shift; return($this->select_internal( 'Add', 'Not', @_))}
  1         6  
3289              
3290             sub select_internal ## Implements all 4 "select() methods"
3291             {
3292 13     13 0 19 my $this = shift;
3293 13         39 my ($Add, $Not, @Specs) = @_;
3294            
3295             ## In "Add" mode, we only operate on not-yet-selected records.
3296             ## Otherwise, we operate on the current selection.
3297              
3298             ## Either way, start out with all of one or the other.
3299              
3300 13 100       151 my $Start = ($Add ? $this->selection_inverse() : $this->selection());
3301              
3302             ## Then grep repeatedly for each Field/Sub spec we were given, in
3303             ## order. For a record to match, ALL specs must match -- i.e. the
3304             ## record number must make it through the grep gauntlet once for
3305             ## each Field/Sub in the Specs.
3306            
3307 13         60 my $Pseudo = {}; ## hold pseudo-columns _RecNum, _SelNum if requested.
3308            
3309 13         30 my $Matches = $Start;
3310            
3311 13         19 my $i = 0;
3312 13         39 while ($i < (@Specs - 1))
3313             {
3314 13         35 my ($Field, $Sub) = @Specs[$i++, $i++];
3315            
3316 13 50 33     78 next unless ((length($Field)) && (ref($Sub) eq 'CODE'));
3317              
3318             ## Create pseudo-fields _RecNum / _SelNum if needed, but at
3319             ## most once per invocation.
3320              
3321 13 50 0     34 $Pseudo->{_RecNum} ||= $this->all() if ($Field eq '_RecNum');
3322 13 50 0     39 $Pseudo->{_SelNum} ||= $this->selnum_mask() if ($Field eq '_SelNum');
3323            
3324             ## Narrow down $Matches using this Field/Spec, then move on to next.
3325 36 50 33     135 $Matches =
3326             [grep
3327             {
3328             ## Locally bind $_ to value of field in this column / this record.
3329 13         26 local $_ = $ {($this ->{$Field} || ## 98% of time: good field name
  36         134  
3330             $Pseudo->{$Field} || ## 1% of time: _RecNum or _SelNum
3331             ($this ->warn("Bad field name: $Field"),
3332             $this ->col_empty())
3333             )}[$_]; ## look up value in record $_ of column
3334            
3335             ## Call the sub & let it yield the Boolean value.
3336 36         224 &$Sub();
3337             }
3338             @$Matches];
3339             }
3340            
3341             ## IMPLEMENTATION NOTE:
3342            
3343             ## The logic below to support "Not" looks complicated, and indeed
3344             ## it could be made cleaner if we were to process the "Not" logic
3345             ## during the previous step. However, doing that would make the
3346             ## above nested loop(s) above much less efficient because we'd
3347             ## have to move the while() loop inside the grep -- repeating that
3348             ## loop up to several times for each record instead of just a few
3349             ## times total. So the logic below will actually save execution
3350             ## time. Besides, using array-slicing to achieve the selection
3351             ## masking is quite fast.
3352              
3353             ## "Add" means append the matching record numbers to the existing
3354             ## selection, if any.
3355            
3356             ## sub select()
3357 13 100 100     219 if (!$Not && !$Add) ## ... i.e. remove unfound (i.e. keep (only) the ones we found)...
    100 66        
    100 66        
    50 33        
3358             {
3359 4         9 $this->{_Selection} = $Matches; ## sort order is preserved...
3360             }
3361            
3362             ## sub add() ## ... i.e. add the ones we found...
3363             elsif (!$Not && $Add)
3364             {
3365 3         5 push @{$this->selection()}, @$Matches; ## order preserved in first part only
  3         9  
3366             }
3367            
3368             ## ! "Add" means replace the existing selection with those that
3369             ## matched (effectively removing any non-matching ones).
3370              
3371             ## "Not" means select the opposite of the set of records we just
3372             ## matched.
3373              
3374             ## sub omit()
3375             elsif ( $Not && !$Add) ## ... i.e. remove the opposite of the ones we found...
3376             {
3377 5         33 my $Sel = $this->col_empty(); ## Start with empty mask (all entries undef).
3378 5         17 @$Sel[@$Start] = @$Start; ## Mask in those in the original selection.
3379 5         14 @$Sel[@$Matches] = undef; ## Mask out those we found.
3380 5         14 my $NonMatches = [grep {defined} @$Sel]; ## The remaining ones are the non-matches.
  15         34  
3381              
3382 5         17 $this->{_Selection} = $NonMatches; ## The new selection IS the non-matches.
3383              
3384             ## selection order not preserved
3385             }
3386              
3387             ## sub but()
3388             elsif ( $Not && $Add) ## ... i.e. add the opposite of the ones we found...
3389             {
3390 1         5 my $Sel = $this->all(); ## Start with a full selection mask.
3391 1         3 @$Sel[@{$this->selection()}] = undef; ## Mask out those in original selection.
  1         3  
3392 1         3 @$Sel[@$Matches ] = undef; ## Mask out those we found.
3393 1         3 my $NonMatches = [grep {defined} @$Sel]; ## The remaining ones are the non-matches.
  3         8  
3394            
3395 1         2 push @{$this->selection()}, @$NonMatches; ## Add the non-matches to the selection.
  1         3  
3396              
3397             ## order preserved in first part only
3398             }
3399              
3400 13         35 return($this->selection());
3401             }
3402              
3403             sub selnum_mask ## Create mask mapping RecNum -> selected item num or undef if not selected
3404             {
3405 0     0 0 0 my $this = shift;
3406 0         0 my $Mask = $this->col_empty();
3407 0         0 my $Sel = $this->selection();
3408 0         0 @$Mask[@$Sel] = [0..$#$Sel];
3409              
3410 0         0 return($Mask);
3411             }
3412              
3413             =pod
3414              
3415             =head1 SORTING
3416              
3417             ## Sort the current table's _Selection
3418              
3419             $t->sort() ## Use existing/default params
3420             $t->sort([qw(Last First Phone)]) ## Specify _SortOrder (fields)
3421             $t->sort( ## Named-parameter call:
3422             _SortOrder => [...], ## override sort-related params.
3423             _Selection => [...], ## (See param lists above).
3424             _SortSpecs => {...},
3425             _SRoutines => {...},
3426             _DefaultSortType=>'Integer',
3427             _DefaultSortDirection=>-1,
3428             );
3429              
3430             The sort() method modifies the _Selection (creating one with all
3431             records if it was missing, undef, or not supplied by caller) so that
3432             the record numbers listed there are sorted according to the criteria
3433             implied by _SortOrder, _SortSpecs, _SRoutines, etc.
3434              
3435             For example, before sorting, a table's "natural" order might be:
3436              
3437             Rec# First Last Age State
3438             0 Chris Zack 43 CA
3439             1 Marco Bart 22 NV
3440             2 Pearl Muth 15 HI
3441              
3442             ... and its selection() method would yield: [0, 1, 2] -- which is a
3443             list of all the records, in order.
3444              
3445             After calling $t->sort([Last]), selection() would yield [1, 2, 0]. So
3446             displaying the table in "selection" order would yield:
3447              
3448             Rec# First Last Age State
3449             1 Marco Bart 22 NV
3450             2 Pearl Muth 15 HI
3451             0 Chris Zack 43 CA
3452              
3453             IMPORTANT: sorting does not alter any data in the table. It merely
3454             alters the _Selection parameter (which you can then get and set using
3455             the selection() methods described above).
3456              
3457             If you want to permanently alter the table's data in memory so that
3458             the new sorted order becomes the "natural" order, you can use the
3459             cull() method to modify the original object, the snapshot() method to
3460             make a new object, or use the write() method to write the data to disk
3461             in selected/sorted order and then read() it back again.
3462              
3463             =head2 Using the Named-parameter calling convention with sort()
3464              
3465             You may specify any combination of the parameters listed above when
3466             calling sort(). Any you specify will be used IN PLACE OF the
3467             corresponding parameters already found in the object.
3468              
3469             If you specify _Selection using the named-parameter calling, the
3470             sort() method reserves the right to "own" the list you provide, and
3471             use it as the object's new _Selection, possibly discarding the
3472             previous _Selection, if any and modifying the one you provided. So
3473             don't make any assumptions about ownership of that list object after
3474             calling sort(). Luckily, you will rarely need to provide _Selection
3475             explicitly since generally you'll want to be sorting the selection()
3476             already inherent in the object.
3477              
3478             sort() returns the _Selection list owned by the object (the same list
3479             that would be returned if you called the selection() method
3480             immediately after calling sort()).
3481              
3482             See the next sections for complete descriptions of _SortOrder and
3483             other sorting parameters.
3484              
3485             =cut
3486              
3487             {}; ## Get emacs to indent correctly.
3488              
3489             sub sort
3490             {
3491 10     10 1 36 my $this = shift;
3492 10 100       49 my $Params = (@_ == 1 ? {_SortOrder => $_[0]} : {@_});
3493              
3494 60         120 my($SortOrder, $Selection, $SortSpecs, $SRoutines, $DefaultSortType, $DefaultSortDirection)
3495 10         25 = map {$this->getparam($Params, $_)}
3496             qw(_SortOrder _Selection _SortSpecs _SRoutines _DefaultSortType _DefaultSortDirection);
3497            
3498             ## Validate / rectify all parameters...
3499              
3500             ## Default sort order is Record Number
3501              
3502 10 50 33     70 $SortOrder = [qw(_RecNum)] unless ((ref($SortOrder) eq 'ARRAY') && @$SortOrder);
3503              
3504             ## Note if we're going to sort on _RecNum ( requires extra work).
3505              
3506 10         20 my $NeedRecNum = grep {$_ eq '_RecNum'} @$SortOrder;
  14         38  
3507              
3508             ## Default list of record numbers is all of them.
3509              
3510 10 100       39 $Selection = $this->selection() unless (ref($Selection) eq 'ARRAY');
3511 10         34 $Selection = $this->selection_validate_internal($Selection);
3512              
3513             ## Our private copy of SortSpecs includes a spec for _RecNum
3514              
3515 10 50       38 $SortSpecs = {} unless (ref($SortSpecs) eq 'HASH');
3516 10         27 $SortSpecs = {%$SortSpecs};
3517 10 50 0     27 $SortSpecs ->{_RecNum} ||= {SortType => '_RecNum', SortDirection => 1} if $NeedRecNum;
3518              
3519             ## Our private copy of SRoutines also has the builtin entries
3520             ## added in (including one for _RecNum)
3521              
3522 10 50       30 $SRoutines = {} unless (ref($SRoutines) eq 'HASH');
3523 10         17 $SRoutines = {%$SRoutines, %{$this->sortroutines_builtin()}};
  10         41  
3524              
3525             ## Ensure that DefaultSortType has a reasonable value for which we
3526             ## have a sort routine.
3527            
3528 10 50 33     84 $DefaultSortType = 'String' unless (length($DefaultSortType) &&
3529             exists($SRoutines->{$DefaultSortType}));
3530            
3531             ## Ensure that DefaultSortDirection has a legal value (1 or -1;
3532             ## undef/0 will be treated as -1 (descending))
3533            
3534 10   50     36 $DefaultSortDirection = (max(min(int($DefaultSortDirection), 1), -1) || -1);
3535            
3536             ## Make some optimized lists of things to speed sorting.
3537              
3538             ## Get a hash of all data columns in $this plus a temporary one
3539             ## for _RecNum if needed.
3540 10 50       19 my $Cols = {%{$this->cols_hash($this->fieldlist_all())},
  10         32  
3541             ($NeedRecNum ? (_RecNum => $this->all()) : ())};
3542            
3543             ## Get a list mapping field numbers in $SortOrder to data columns
3544             ## Get a list mapping field numbers in $SortOrder to sort directions
3545             ## Get a list mapping field numbers in $SortOrder to sort types
3546             ## Get a list mapping field numbers in $SortOrder to sort routines
3547             ## Get a list of the field numbers in $SortOrder
3548              
3549 10 50       36 my $SortCols = [map { $Cols->{$_} || $this->col($_) } @$SortOrder];
  14         60  
3550 10 100       19 my $SortDirs = [map {$ {$SortSpecs->{$_}||{}}{SortDirection} || $DefaultSortDirection} @$SortOrder];
  14 100       18  
  14         132  
3551 10 100       21 my $SortTypes = [map {$ {$SortSpecs->{$_}||{}}{SortType } || $DefaultSortType } @$SortOrder];
  14 100       18  
  14         138  
3552 10 50 33 0   213 my $SortSubs = [map { $SRoutines->{$_} || $SRoutines->{'String'} || sub {0} } @$SortTypes];
  14         61  
  0         0  
3553 10         29 my $FieldNums = [0 .. $#$SortOrder];
3554              
3555             ## Construct a sort subroutine that sorts record numbers by
3556             ## examining values in given fields in the table, in the order
3557             ## specified in $SortOrder.
3558              
3559             ## If a given field's sort routine produces a zero $CmpVal, it
3560             ## means that the values are considered equal, and so to
3561             ## disambiguate, we keep trying the next fields in the sort order,
3562             ## until we've found one that compares non-zero or exhausted all
3563             ## the fields. If we get through all the specified sort fields
3564             ## and still get zeroes, the values must be equal in all the
3565             ## fields, and so the records are considered equal, so return 0.
3566              
3567 10         16 my $ProgCount;
3568             my $ShowedProgress;
3569 58         67 $Selection =
3570             [sort
3571             {
3572 10         31 my $CmpVal;
3573 58         102 foreach (@$FieldNums)
3574             {
3575             ## $a and $b are record numbers to be compared.
3576             ## $_ is the number of a field in the above lists.
3577            
3578 62         119 $CmpVal = (&{ $SortSubs->[$_] } ## Call the sort routine for field $_ with...
  62         132  
3579             (\ $SortCols->[$_]->[$a], ## 1st arg: ref to value in field $_, record $a
3580             \ $SortCols->[$_]->[$b]) ## 2nd arg: ref to value in field $_, record $b.
3581             * $SortDirs->[$_] ); ## Then invert cmp value if descending (-1)
3582            
3583             # print "($_, $SortCols->[$_]->[$a], $SortCols->[$_]->[$b]) ==> $CmpVal\n";
3584            
3585 62 100       158 last if $CmpVal; ## Keep going if $CmpVal == 0 (same)
3586             }
3587            
3588             ## Maybe show timed progress (only after 2 seconds have elapsed)
3589 58 100       167 my $Did = $this->progress_timed("Sorting", $ProgCount, undef, undef, 1)
3590             if ((($ProgCount++) % 200) == 0);
3591 58   33     195 $ShowedProgress ||= $Did;
3592              
3593 58         97 $CmpVal;
3594             }
3595             @$Selection];
3596              
3597             ## If no progress shown yet (sort took less than 2 seconds or 200
3598             ## operations), show a message now.
3599              
3600 10 50       44 $this->progress("Sorted.") unless $ShowedProgress;
3601            
3602             ## Replace any existing selection with the new, sorted, one.
3603 10         23 $this->{_Selection} = $Selection;
3604              
3605 10         66 done:
3606             return($Selection);
3607             }
3608              
3609             =pod
3610              
3611             =head1 SORT ORDER
3612              
3613             ## Getting / Setting table's default _SortOrder
3614              
3615             $t->sortorder() ## Get sortorder (default is [])
3616            
3617             my $Order = [qw(Last First State Zip)];
3618             $t->sortorder($Order) ## Set sortorder (use [] for none)
3619              
3620             $t->sortorder_default() ## Get the object's default sort order ([])
3621              
3622             The sort order is an optional list of field names on which to sort and
3623             sub-sort the data when sorting is requested. The field names must be
3624             the names of actual columns in the table. The names in the sort order
3625             do not necessarily need to coincide with the custom fieldlist if any.
3626              
3627             There is one special value that can be included: _RecNum. This sorts
3628             on the imaginary "record number" field. So for example, you could
3629             specify a sort order this way:
3630              
3631             [qw(Last First _RecNum)]
3632              
3633             (There is no point in putting _RecNum anywhere except at the end of
3634             the sort order because no two records will ever have the same record
3635             number so there will be no further need to disambiguate by referring
3636             to additional fields.)
3637              
3638             Sorting by _RecNum adds a bit of computational overhead because sort()
3639             first builds a record number vector for use in sorting, so for very
3640             large tables, don't do it unless you really need it.
3641              
3642             A sort order can be specified each time the object is sorted (see the
3643             sort() method for details).
3644              
3645             Or, the object's sort order can be set once, and then sort() will use
3646             that sort order when no other sort order is specified.
3647              
3648             If sorting is done when there is no sort order present in the object
3649             or specifed for the sort() method, the selection is sorted by record
3650             number (i.e. it is "unsorted" or returned to its "natural" order).
3651              
3652             In order words, a sortorder that is undef or [] is considered the same
3653             as: [qw(_RecNum)]. This is sometimes called "unsorting".
3654              
3655             In order to decide how values in each field should be compared, sort()
3656             is informed by SortSpecs (specifying SortType and SortDirection for
3657             each field) and by SortRoutines, each of which may similarly either be
3658             pre-set for the object or specified when calling sort() -- see below
3659             for further details.
3660              
3661             =cut
3662              
3663             {}; ## Get emacs to indent correctly.
3664              
3665             sub sortorder
3666             {
3667 8     8 0 15 my $this = shift;
3668 8         15 my ($SortOrder) = @_;
3669              
3670 8   100     35 my $Valid = ((ref($SortOrder) eq 'ARRAY') && (@$SortOrder > 0));
3671              
3672             ## Set if specified.
3673 8 100 100     41 if (defined($SortOrder) && $Valid)
    100          
3674             {
3675 2         6 $this->{_SortOrder} = $SortOrder;
3676             }
3677             elsif (defined($SortOrder))
3678             {
3679 1         3 $this->{_SortOrder} = undef; ## Store undef instead of []
3680             }
3681            
3682             ## Get and return.
3683 8   66     32 $SortOrder = $this->{_SortOrder} || $this->sortorder_default();
3684            
3685 8         33 return($SortOrder);
3686             }
3687              
3688             sub sortorder_default
3689             {
3690 5     5 0 7 my $this = shift;
3691              
3692 5         10 my $SortOrder = [];
3693              
3694 5         20 return($SortOrder);
3695             }
3696              
3697             sub sortorder_check
3698             {
3699 80     80 0 142 my $this = shift;
3700 80         267 my $FieldsHash = $this->fieldlist_hash();
3701              
3702             ## Remove any bogus field names from the sort order, if any.
3703              
3704 4         15 $this->{_SortOrder} = [grep {exists($FieldsHash->{$_})}
  1         4  
3705 80 100       404 @{$this->{_SortOrder}}] if defined $this->{_SortOrder};
3706             }
3707              
3708             =pod
3709              
3710             =head1 SORT SPECIFICATIONS
3711              
3712             ## Getting / Setting table's default _SortSpecs
3713              
3714             $t->sortspecs() ## Get sortspecs (default is {} -- none)
3715              
3716             my $Specs = {Last => {SortType => 'String' ,
3717             SortDirection => -1 },
3718             Zip => {SortType => 'Integer' }};
3719              
3720             $t->sortspecs($Specs) ## Set sortspecs
3721              
3722             $t->sortspecs_default() ## Get the object's default sort specs ({})
3723              
3724             The sortspecs are an optional hash mapping field names to "sort
3725             specifications".
3726              
3727             Each field's sort specification may specify zero or more of these
3728             values:
3729              
3730             =over 4
3731              
3732             =item SortType
3733              
3734             the sort type to use (For example: String, Integer)
3735              
3736             =item SortDirection
3737              
3738             the sort direction (1: ascending, -1: descending)
3739              
3740             =back
3741              
3742             Sortspecs can be specified when calling the sort() routine, or, a set
3743             of specs can be placed beforehand into the object itself and those
3744             will be used by sort() if no other specs are given.
3745              
3746             For any field listed in the sort order at the time of sorting, but
3747             lacking a sort spec or any component of the sort spec, the object's
3748             default sort type (see sorttype_default()) and default sort direction
3749             (see sortdirection_default()) will be used.
3750              
3751             In addition to getting/setting sort specs as a whole, they may be
3752             gotten/set on a per-field basis, too:
3753              
3754             sortspec($Field) ## Get sortspec for $Field or default spec
3755              
3756             my $Spec = {SortType => 'Integer', SortDirection => -1};
3757             sortspec('Zip', $Spec) ## Set sortspec
3758              
3759             sortspec_default() ## Get a sortspec with all defaults filled in
3760              
3761             For any $Field not found in the object's sortspecs, sortspec($Field)
3762             returns the same thing returned by sortspec_default(), which is a
3763             sortspec filled in with the default sort type and sort direction (see
3764             below).
3765              
3766             For a list of available built-in SortTypes, and instructions for how
3767             to define your own, see SORT ROUTINES, below.
3768              
3769             =cut
3770              
3771             {}; ## Get emacs to indent correctly.
3772              
3773             sub sortspecs
3774             {
3775 46     46 0 78 my $this = shift;
3776 46         60 my ($SortSpecs) = @_;
3777              
3778             ## Set if specified.
3779 46 100       105 $this->{_SortSpecs} = $SortSpecs if $SortSpecs;
3780            
3781             ## Get and return
3782 46   33     110 $SortSpecs = $this->{_SortSpecs} || $this->sortspecs_default();
3783              
3784 46         100 return($SortSpecs);
3785             }
3786              
3787             sub sortspecs_default
3788             {
3789 1     1 0 3 my $this = shift;
3790              
3791 1         3 my $SortSpecs = {};
3792              
3793 1         4 return($SortSpecs);
3794             }
3795              
3796             sub sortspec
3797             {
3798 4     4 0 9 my $this = shift;
3799 4         8 my ($FieldName, $SortSpec) = @_;
3800              
3801             ## Set if specified.
3802 4 100       18 $this->{_SortSpecs}->{$FieldName} = $SortSpec if $SortSpec;
3803            
3804             ## Get and return.
3805 4   33     29 my $SortSpec = ($this->{_SortSpecs}->{$FieldName} ||
3806             $this->sortspec_default($FieldName));
3807            
3808             ## Provide defaults for needed fields of sort spec.
3809 4   33     12 $SortSpec->{SortType} ||= $this->sorttype_default ();
3810 4 50       13 $SortSpec->{SortDirection} = $this->sortdirection_default()
3811             unless defined($SortSpec->{SortDirection});
3812            
3813 4         28 return($SortSpec);
3814             }
3815              
3816             sub sortspec_default
3817             {
3818 2     2 0 4 my $this = shift;
3819 2         6 my ($FieldName) = @_;
3820              
3821             ## Default sortspec for a field
3822              
3823 2         7 my $SortType = $this->sorttype_default ();
3824 2         7 my $SortDir = $this->sortdirection_default();
3825            
3826 2         6 my $Spec = {SortType => $SortType, SortDirection => $SortDir};
3827              
3828 2         7 return($Spec);
3829             }
3830              
3831              
3832             =pod
3833              
3834             =head1 DEFAULT SORT DIRECTION
3835              
3836             ## Getting / Setting table's default _DefaultSortDirection
3837              
3838             $t->sortdirection_default() ## Get default sort direction
3839              
3840             $t->sortdirection_default(-1) ## Set default sort direction
3841              
3842             Each element in a sort specification can optionally specify a sort
3843             direction.
3844              
3845             1 = ascending, -1 = descending
3846              
3847             For any sort specs that don't specify a direction, the object's
3848             default sort direction will be used. Use these routines to get/set
3849             the default sort direction.
3850              
3851             =cut
3852              
3853             {}; ## Get emacs to indent correctly.
3854              
3855             sub sortdirection_default
3856             {
3857 6     6 0 8 my $this = shift;
3858 6         19 my ($DefaultSortDir) = @_;
3859            
3860 6 100       16 if (defined($DefaultSortDir))
3861             {
3862             ## Set if specified. Force to 1 or -1. Treat 0 as -1 (descending).
3863 1   50     6 $this->{_DefaultSortDirection} =
3864             (max(min(int($DefaultSortDir), 1), -1) || -1);
3865             }
3866            
3867             ## Get and return. If not present, then 1 (ascending) is the default.
3868 6 50       20 my $SortDir = (defined($this->{_DefaultSortDirection}) ?
3869             $this->{_DefaultSortDirection} :
3870             1);
3871 6         22 return($SortDir);
3872             }
3873              
3874             =pod
3875              
3876             =head1 DEFAULT SORT TYPE
3877              
3878             ## Getting / Setting table's default _DefaultSortType
3879              
3880             $t->sorttype_default() ## Get default sort type
3881              
3882             $t->sorttype_default('Integer') ## Set default sort type
3883              
3884             Each element in a sort specification can optionally specify a sort
3885             type. The sort type is a string (like 'String' or 'Integer' or
3886             'Date') that selects from one or more sort routines. (See Sort
3887             Routines, below).
3888              
3889             There are several sort routines built into the CTable object, and you
3890             can also add as many of your own routines (and hence Sort Types) as
3891             you like or need. This allows for very flexible sorting.
3892              
3893             For any sort specs that don't specify a type, the object's default
3894             sort type will be used. Use these routines to get/set the default
3895             sort type, which initially is 'String'.
3896              
3897             =cut
3898              
3899             {}; ## Get emacs to indent correctly.
3900              
3901             sub sorttype_default
3902             {
3903 6     6 0 10 my $this = shift;
3904 6         10 my ($DefaultSortType) = @_;
3905            
3906 6 100       15 if (defined($DefaultSortType))
3907             {
3908             ## Set if specified.
3909 1         4 $this->{_DefaultSortType} = "$DefaultSortType";
3910             }
3911              
3912             ## Get and return. If not present, then 'String' is the default.
3913 6 50       21 my $SortDir = (defined($this->{_DefaultSortType}) ?
3914             $this->{_DefaultSortType} :
3915             'String');
3916 6         20 return($SortDir);
3917             }
3918              
3919             =pod
3920              
3921             =head1 SORT ROUTINES: BUILTIN AND CUSTOM
3922              
3923             ## Getting / Setting table's custom sort routines (_SRoutines)
3924              
3925             $t->sortroutine($Type) ## Get a sort routine for $Type
3926              
3927             $t->sortroutine($Type, $Sub) ## Set a sort routine for $Type
3928              
3929             $t->sortroutine($Type, 0 ) ## Remove sort routine for $Type
3930             $t->sortroutine_set($Type)
3931            
3932             $t->sortroutines() ## Get hash of any sort routines
3933            
3934             $t->sortroutines_builtin() ## Get hash of builtin routines
3935              
3936             Each SortType in the sortspecs should have a corresponding sort
3937             routine (any unrecognized type will be sorted using the 'String' sort
3938             routine).
3939              
3940             The sort() command looks up the appropriate sort routine for each
3941             field it is asked to sort, based on the SortType for that field, as
3942             given in the sortspecs, as described above.
3943              
3944             Builtin sort types, recognized and implemented natively by this
3945             module, are:
3946              
3947             String ## Fastest case-sensitive compare (data is string)
3948             Text ## Insensitive compare (lowercases, then compares)
3949             Number ## Number works for floats or integers
3950             Integer ## Faster than floats. Uses "use integer"
3951             DateSecs ## Same as integer; assumes date in seconds
3952             Boolean ## Treats item as a Perlish boolean (empty/undef = false)
3953              
3954             The above sort types are always recognized. Additional sort types may
3955             be added by subclasses (and could shadow the builtin implementations
3956             of the above types if desired) and/or may be added to instances (and
3957             again could shadow the above implementations), and/or may be specified
3958             when the sort() method is called, once again optionally shadowing any
3959             deeper definitions.
3960              
3961             =head1 CUSTOM SORT ROUTINE INTERFACE
3962              
3963             A custom sort routine is called with two arguments, each of which is a
3964             pointer to a scalar. The sort routine should dereference each pointer
3965             and compare the resulting scalars, returning -1 if the first scalar is
3966             smaller than the second, 1 if it is larger, and 0 if they are
3967             considered equal.
3968              
3969             For example, here is the built-in comparison routine for 'String':
3970              
3971             sub { $ {$_[0]} cmp $ {$_[1]} }
3972              
3973             NOTE: Your custom sort routines should NOT compare $a and $b as with
3974             Perl's builtin sort() command.
3975              
3976             Examine the variable $BuiltinSortRoutines in this module's
3977             implementation to see some additional examples of sort routines.
3978              
3979             Internally, sort() calls the sortroutines() method to get a hash that
3980             should consist of all builtin sort routines with the per-object sort
3981             routines, if any, overlaid. sortroutines() in turn calls the
3982             sortroutines_builtin() method to get a copy of the hash of all builtin
3983             sort routines for the object. (So a subclass could easily add
3984             additional SortTypes or reimplement them by just overriding
3985             sortroutines_builtin() and adding its own additional routines to the
3986             resulting hash.)
3987              
3988             sortroutine() may be called to get or set a custom sort routine for a
3989             given type in the given object.
3990              
3991             There is no way to directly manipulate the builtin sort routines for
3992             the entire class. To accomplish that, you should define and use a
3993             subclass that extends sortroutines_builtin() to add its own routines.
3994              
3995             For example:
3996              
3997             BEGIN
3998             { ## A subclass of Data::CTable with an INetAddr SortType.
3999             package IATable; use vars qw(@ISA); @ISA = qw(Data::CTable);
4000              
4001             sub sortroutines_builtin
4002             {
4003             my $this = shift;
4004             my $CustomRoutines =
4005             {INetAddr =>
4006             sub {use integer; ip2int($ {$_[0]}) <=> ip2int($ {$_[1]})}};
4007             my $AllRoutines =
4008             {%{$this->SUPER::sortroutines_builtin()} %$CustomRoutines};
4009             return($AllRoutines);
4010             };
4011              
4012             sub ip2int {.....} $# Could memoize & inline for efficiency
4013             }
4014              
4015             my $Table = IATable::new(......);
4016              
4017             The IATable class would then have all the same features of
4018             Data::CTable but would then also support the INetAddr SortType.
4019              
4020             =cut
4021              
4022             {}; ## Get emacs to indent correctly.
4023              
4024             BEGIN
4025             {
4026             my $BuiltinSortRoutines =
4027             {(
4028 64         72 String => sub { $ {$_[0]} cmp $ {$_[1]} },
  64         91  
  64         137  
4029 5         6 Text => sub { lc($ {$_[0]}) cmp lc($ {$_[1]})},
  5         9  
  5         19  
4030 8         9 Number => sub { $ {$_[0]} <=> $ {$_[1]} },
  8         13  
  8         34  
4031 29     29   38951 Integer => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
  29         350  
  29         171  
  8         10  
  8         12  
  8         23  
4032 29     29   1839 DateSecs => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
  29         672  
  29         311  
  4         6  
  4         7  
  4         17  
4033 29     29   1676 _RecNum => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
  29         71  
  29         117  
  4         6  
  4         8  
  4         15  
4034 4         6 Boolean => sub { !!$ {$_[0]} <=> !!$ {$_[1]} },
  4         5  
  4         16  
4035 29     29   84162 )};
4036            
4037             sub sortroutines_builtin ## Class or instance method
4038             {
4039 53     53 0 347 return({%$BuiltinSortRoutines}); ## Copy of above private hash.
4040             }
4041             }
4042              
4043             sub sortroutine
4044             {
4045 38     38 0 56 my $this = shift;
4046 38         59 my ($Type, $Routine) = @_;
4047              
4048 38 100       76 if (defined($Routine))
4049             {
4050             ## Set if $Routine provided.
4051 3         8 $this->sortroutine_set($Type, $Routine);
4052             }
4053              
4054             ## Get and return.
4055 38         65 $Routine = $this->sortroutine_get($Type);
4056              
4057 38         108 return($Routine);
4058             }
4059              
4060             sub sortroutine_get
4061             {
4062 38     38 0 44 my $this = shift;
4063 38         45 my ($Type) = @_;
4064 38         70 my $Routines = $this->sortroutines();
4065 38   66     89 my $Routine = $Routines->{$Type} || $Routines->{'String'};
4066            
4067 38         91 return($Routine);
4068             }
4069              
4070             sub sortroutine_set
4071             {
4072 4     4 0 5 my $this = shift;
4073 4         6 my ($Type, $Routine) = @_;
4074              
4075 4         7 my $Valid = (ref($Routine) eq 'CODE');
4076              
4077 4 100       8 if ($Valid)
4078             {
4079             ## Add / replace if a routine was supplied.
4080 2   50     3 $ {$this->{_SRoutines} ||= {}}{$Type} = $Routine;
  2         8  
4081             }
4082             else
4083             {
4084             ## Otherwise delete.
4085 2   50     3 $Routine = delete $ {$this->{_SRoutines} ||= {}}{$Type};
  2         9  
4086             }
4087            
4088 4         11 return($Routine);
4089             }
4090              
4091             sub sortroutines
4092             {
4093 42     42 0 54 my $this = shift;
4094            
4095 42 50       67 my $Routines = {%{$this->sortroutines_builtin()}, ## First builtin ones
  42         186  
4096 42         47 %{$this->{_SRoutines} || {}}}; ## Shadow with object's own
4097            
4098 42         189 return($Routines);
4099             }
4100              
4101             =pod
4102              
4103             =head1 FREEZING SELECTION & FIELD LIST
4104              
4105             ## Freeze data layout: re-order columns; omit unused fields
4106            
4107             $t->cull(...params...) ## Rebuild table in order
4108             my $s = $t->snapshot(...params...) ## Make copy as if rebuilt
4109              
4110             The cull() method re-writes all data in the table to be in the order
4111             indicated in _Selection (if present). This will cause any records not
4112             listed in _Selection to be omitted (unless selection is null in which
4113             case all records are retained in original order).
4114              
4115             In addition, if there is a custom field list present, it removes any
4116             fields NOT mentioned in _FieldList.
4117              
4118             The snapshot() method is similar, except instead of modifying the
4119             object itself, it makes a copy of the object that's equivalent to what
4120             cull() would have created, and returns that new object, leaving the
4121             original untouched. (All data structures are deep-copied from the old
4122             object to the new one, leaving the objects totally independent.)
4123              
4124             cull() and snapshot() both take two optional named parameters:
4125             _FieldList and/or _Selection to be used in place of the corresponding
4126             parameters found in the object.
4127              
4128             If only a single argument is supplied, it is assumed to be _Selection.
4129              
4130             =cut
4131              
4132             sub cull
4133             {
4134 1     1 0 4 my $this = shift;
4135 1 50       14 my $Params = (@_ == 1 ? {_Selection => $_[0]} : {@_});
4136            
4137 1         4 my($Selection, $FieldList) = map {$this->getparam($Params, $_)}
  2         5  
4138             qw(_Selection _FieldList);
4139            
4140 1   33     6 $FieldList ||= $this->{_FieldList};
4141 1   33     4 $Selection ||= $this->{_Selection};
4142              
4143             ## First cull any fields/columns not mentioned in _FieldList, if any.
4144 1 50       4 if ($FieldList)
4145             {
4146 1         2 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  1         5  
4147 1         3 my $DeadFields = [grep {!exists($FieldHash->{$_})} @{$this->fieldlist_all()}];
  4         17  
  1         5  
4148 1         4 delete @$this{@$DeadFields};
4149              
4150             ## Set the (possibly) new field list in the object.
4151 1         6 $this->fieldlist_set($FieldList);
4152             }
4153              
4154             ## Then cull / rearrange all the columns
4155 1 50       5 if ($Selection)
4156             {
4157             ## Temporarily set the selection() to be the one we may have been given...
4158 1         5 $this->selection($Selection);
4159            
4160             ## Get the de-facto field list if we don't already have it explicitly
4161 1   33     4 $FieldList ||= $this->fieldlist();
4162            
4163             ## Rearrange each column
4164 1         4 foreach my $FieldName (@$FieldList)
4165             {
4166 3         8 $this->{$FieldName} = $this->sel($FieldName);
4167             }
4168              
4169             ## Remove the _Selection since it is no longer valid.
4170 1         5 $this->selection_delete();
4171             }
4172             }
4173              
4174             sub snapshot
4175             {
4176 50     50 0 190 my $this = shift;
4177 50 50       588 my $Params = (@_ == 1 ? {_Selection => $_[0]} : {@_});
4178            
4179 50         116 my($Selection, $FieldList) = map {$this->getparam($Params, $_)}
  100         366  
4180             qw(_Selection _FieldList);
4181            
4182 50   33     266 $FieldList ||= $this->{_FieldList};
4183 50   66     294 $Selection ||= $this->{_Selection};
4184              
4185             ## First make a shallow copy of $this
4186 50         1091 my $copy = {%$this};
4187              
4188             ## Then delete any (references to) data columns owned by $this...
4189 50         162 delete @$copy{@{$this->fieldlist_all()}};
  50         177  
4190              
4191             ## Then deep-copy all other parameters from $this...
4192 50         5699 $copy = dclone($copy);
4193            
4194             ## Then new/bless/initialize the copy into the same class as $this...
4195 50         519 $copy = ref($this)->new($copy);
4196              
4197             ## Temporarily override selection if necessary.
4198 50         256 my $OldSel = $this->{_Selection};
4199 50         207 $this->selection($Selection);
4200              
4201             ## Then insert all the rearranged columns into $copy...
4202 50         75 @$copy{@$FieldList} = @{$this->sels($FieldList)};
  50         189  
4203            
4204             ## Restore old selection, if any.
4205 50         187 $this->selection_set($OldSel);
4206              
4207             ## Remove the selection in copy.
4208 50         110 delete $copy->{_Selection};
4209              
4210             ## Set copy's fieldlist to a copy of the one we used.
4211 50         183 $copy->{_FieldList} = [@$FieldList];
4212              
4213 50         243 return($copy);
4214             }
4215              
4216             =pod
4217              
4218             =head1 LINE ENDINGS
4219              
4220             ## Get current value
4221              
4222             $t->lineending() ## Get actual setting: string or symbol
4223             $t->lineending_symbol() ## Get setting's symbolic value if possible
4224             $t->lineending_string() ## Get setting's string value if possible
4225              
4226             ## Set value
4227              
4228             $t->lineending($Ending) ## Will be converted internally to symbol
4229              
4230             ## Convert a value to symbol or string form
4231              
4232             $t->lineending_symbol($L) ## Convert string form to symbolic form
4233             $t->lineending_string($L) ## Convert symbol form to string form
4234              
4235             ## Get internal conversion hash tables
4236              
4237             $t->lineending_symbols() ## Hash ref mapping known strings to symbols
4238             $t->lineending_strings() ## Hash ref mapping known symbols to strings
4239              
4240             Use these accessor functions to get/set the _LineEnding parameter.
4241              
4242             You can set the parameter in either string or symbol form as you wish.
4243             You can get it in its raw, as-stored, form, or, you can get it in
4244             string form or symbol form as desired.
4245              
4246             Finally, some utility conversion calls allows you to convert a string
4247             you have on hand to a symbolic form. For example:
4248              
4249             $L = "\x0D";
4250             print ("This file uses " . $t->lineending_symbol($L) . " endings.");
4251              
4252             This would print:
4253            
4254             This file uses mac endings.
4255              
4256             =cut
4257              
4258             {}; ## Get emacs to indent correctly.
4259              
4260             BEGIN
4261             {
4262             ## Map any recognized _LineEnding value to its actual string
4263 29     29   284 my $LineEnding_Strings =
4264             {(
4265             dos => "\x0D\x0A",
4266             mac => "\x0D",
4267             unix => "\x0A",
4268             "\x0D\x0A" => "\x0D\x0A",
4269             "\x0D" => "\x0D",
4270             "\x0A" => "\x0A",
4271             )};
4272            
4273             ## Map any recognized _LineEnding value to its logical form
4274 29         48723 my $LineEnding_Symbols =
4275             {(
4276             "\x0D\x0A" => "dos",
4277             "\x0D" => "mac",
4278             "\x0A" => "unix",
4279             dos => "dos",
4280             mac => "mac",
4281             unix => "unix",
4282             )};
4283            
4284             sub lineending_strings ## Class or instance method
4285             {
4286 33     33 0 2828 return({%$LineEnding_Strings}); ## Copy of above private hash.
4287             }
4288              
4289             sub lineending_symbols ## Class or instance method
4290             {
4291 25     25 0 7844 return({%$LineEnding_Symbols}); ## Copy of above private hash.
4292             }
4293             }
4294              
4295             sub lineending()
4296             {
4297 3     3 0 7 my $this = shift;
4298 3         5 my ($LineEnding) = @_;
4299            
4300             ## Set if specified. Try to convert to symbolic form if possible.
4301 3 50 0     11 $this->{_LineEnding} = $this->lineending_symbol($LineEnding) || $LineEnding if $LineEnding;
4302              
4303             ## Otherwise / either case, get whatever value we have and return it....
4304 3         5 my $LineEnding = $this->{_LineEnding};
4305              
4306 3         14 return($LineEnding);
4307             }
4308              
4309             sub lineending_symbol
4310             {
4311 22     22 0 34 my $this = shift;
4312 22         32 my ($LineEnding) = @_;
4313              
4314 22   66     69 $LineEnding ||= $this->{_LineEnding};
4315              
4316 22   33     27 return($ {$this->lineending_symbols()}{$LineEnding} || $LineEnding);
4317             }
4318              
4319              
4320             sub lineending_string
4321             {
4322 30     30 0 50 my $this = shift;
4323 30         54 my ($LineEnding) = @_;
4324              
4325 30   100     140 $LineEnding ||= $this->{_LineEnding};
4326              
4327 30   66     38 return($ {$this->lineending_strings()}{$LineEnding} || $LineEnding);
4328             }
4329              
4330             =pod
4331              
4332             =head1 AUTOMATIC CACHEING
4333              
4334             By default, Data::CTable makes cached versions of files it reads so it
4335             can read them much more quickly the next time. Optionally, it can
4336             also cache any file it writes for quicker re-reading later.
4337              
4338             On Unix systems, cache files are always created with 0666
4339             (world-write) permissions for easy cleanup.
4340              
4341             When reading files, Data::CTable checks the _CacheOnRead parameter.
4342             If that parameter is true, which it is by default, the module tries to
4343             find an up-to-date cache file to read instead of the original.
4344             Reading a cache file can be 10x faster than reading and parsing the
4345             original text file.
4346              
4347             In order to look for the cache file, it must first calculate the path
4348             where the cache file should be located, based on the _FileName of the
4349             file to be read.
4350              
4351             The path of the cache file is calculated as follows:
4352              
4353             If the _CacheSubDir parameter is a RELATIVE PATH, then it is appended
4354             to the directory component of _FileName to arrive at the directory to
4355             use to store the cache file. If it is an ABSOLUTE PATH, then
4356             _CacheSubDir is used by itself. (The trailing path separator is
4357             optional and an appropriate one will be added by Data::CTable if it is
4358             missing.)
4359              
4360             The file name of the cache file is calculated as follows:
4361              
4362             If the _CacheExtension parameter is specified, it is appended to the
4363             base file name component from the _FileName parameter. If you want
4364             the cached file name to be the same as the name of the original file,
4365             you can set _CacheExtension to "", which is not recommended.
4366              
4367             Then, the cache path and cache file name are joined to arrive at the
4368             name of the cache file. If both _CacheSubDir and _CacheExtension were
4369             empty, then the cache file path will be the same as the _FileName, and
4370             Data::CTable will refuse to either read or write a cache file, so
4371             setting these fields both to empty is equivalent to setting
4372             _CacheOnRead to false.
4373              
4374             The cache file contains a highly-efficient representation of all the
4375             following data that would otherwise have to be determined by reading
4376             and parsing the entire text file:
4377              
4378             - All the data columns (field values)
4379             - _FieldList: The list of fields, in order
4380             - _HeaderRow: Whether a header row is / should be present
4381             - _LineEnding: The line ending setting
4382             - _FDelimiter: The field delimiter setting
4383              
4384             If found prior to a read(), AND, the date of the cache file is LATER
4385             than the date of the original file, the cache file is used instead.
4386             (If the date is EARLIER, then the cache file is ignored because it can
4387             be presumed that the data inside the text file is newer.)
4388              
4389             If cacheing is ON, then after successfully reading the text file
4390             (either because there was no cache file yet or the cache file was out
4391             of date or corrupted or otherwise unusable), read() will then try to
4392             create a cache file. This, of course, takes some time, but the time
4393             taken will be more than made up in the speedup of the next read()
4394             operation on the same file.
4395              
4396             If creating the cache file fails (for example, because file
4397             permissions didn't allow the cache directory to be created or the
4398             cache file to be written), read() generates a warning explaining why
4399             cacheing failed, but the read() operation itself still succeeds.
4400              
4401             No parameters in the object itself are set or modified to indicate the
4402             success or failure of writing the cache file.
4403              
4404             Similarly, there is no way to tell whether a successful read()
4405             operation read from the cache or from the original data file. If you
4406             want to be SURE the reading was from the data file, either turn off
4407             _CacheOnRead, or call the read_file() method instead of read().
4408              
4409             NOTE: because the name of the cache file to be used is calculated just
4410             before the read() is actually done, the cache file can only be found
4411             if the _CacheSubDir and _CacheExtension are the same as they were when
4412             the cache was last created. If you change these parameters after
4413             having previously cached a file, the older caches could be "orphaned"
4414             and just sit around wasting disk space.
4415              
4416             =head2 Cacheing on write()
4417              
4418             You may optionally set _CacheOnWrite (default = false) to true. If
4419             done, then a cache file will be saved for files written using the
4420             write() command. Read about write() below for more about why you
4421             might want to do this.
4422              
4423             =head1 AUTOMATIC DIRECTORY CREATION
4424              
4425             When Data::CTable needs to write a file, (a cache file or a data
4426             file), it automatically tries to create any directories or
4427             subdirectories you specify in the _FileName or _CacheSubDir
4428             parameters.
4429              
4430             If it fails while writing a data file, write() will fail (and you will
4431             be warned). If it fails to create a directory while writing a cache
4432             file, a warning will be issued, but the overall read() or write()
4433             operation will still return a result indicating success.
4434              
4435             Any directories created will have the permissions 0777 (world-write)
4436             for easy cleanup.
4437              
4438             Generally, the only directory the module will have to create is a
4439             subdirectory to hold cache files.
4440              
4441             However, since other directories could be created, be sure to exercise
4442             caution when allowing the module to create any directories for you on
4443             any system where security might be an issue.
4444              
4445             Also, if the 0666 permissions on the cache files themselves are too
4446             liberal, you can either 1) turn off cacheing, or 2) call the
4447             prep_cache_file() method to get the name of the cache file that would
4448             have been written, if any, and then restrict its permissions:
4449              
4450             chmod (0600, $this->prep_cache_file());
4451              
4452             =head1 READING DATA FILES
4453              
4454             ## Replacing data in table with data read from a file
4455              
4456             $t->read($Path) ## Simple calling convention
4457              
4458             $t->read( ## Named-parameter convention
4459              
4460             ## Params that override params in the object if supplied...
4461              
4462             _FileName => $Path, ## Full or partial path of file to read
4463              
4464             _FieldList => [...], ## Fields to read; others to be discarded
4465              
4466             _HeaderRow => 0, ## No header row (_FieldList required!)
4467              
4468             _IgnoreQuotes => 0, ## Load files with unbalanced quotes
4469              
4470             _LineEnding => undef, ## Text line ending (undef means guess)
4471             _FDelimiter => undef, ## Field delimiter (undef means guess)
4472              
4473             _ReturnMap => 1, ## Whether to decode internal returns
4474             _ReturnEncoding=>"\x0B", ## How to decode returns.
4475             _MacRomanMap => undef, ## Whether/when to read Mac char set
4476              
4477             _CacheOnRead => 0, ## Enable/disable cacheing behavior
4478             _CacheExtension=> ".x", ## Extension to add to cache file name
4479             _CacheSubDir => "", ## (Sub-)dir, if any, for cache files
4480              
4481             ## Params specific to the read()/write() methods...
4482              
4483             _MaxRecords => 200, ## Limit on how many records to read
4484             )
4485              
4486             $t->read_file() ## Internal: same as read(); ignores cacheing
4487              
4488             read() opens a Merge, CSV, or Tab-delimited file and reads in all or
4489             some fields, and all or some records, REPLACING ANY EXISTING DATA in
4490             the CTable object.
4491              
4492             Using the simple calling convention, just pass it a file name. All
4493             other parameters will come from the object (or will be defaulted if
4494             absent). To specify additional parameters or override any parameters
4495             in the object while reading, use the named-parameter calling
4496             convention.
4497              
4498             See the full PARAMETER LIST, above, or read on for some extra details:
4499              
4500             _ReturnMap controls whether return characters encoded as ASCII 11
4501             should be mapped back to real newlines (C<"\n">) when read into memory.
4502             If false, they are left as ASCII 11 characters. (default is "true")
4503              
4504             _ReturnEncoding controls the character that returns are encoded as, if
4505             different from ASCII 11.
4506              
4507             _FieldList is an array (reference) listing the names of fields to
4508             import, in order (and will become the object's _FieldList upon
4509             successful completion of the read() operation). If not provided and
4510             not found in the object, or empty, then all fields found in the file
4511             are imported and the object's field list will be set from those found
4512             in the file, in the order found there. If _HeaderRow is false, then
4513             this parameter is required (either in the object or as a formal
4514             parameter) and is assumed to give the correct names for the fields as
4515             they actually occur in the file. If _HeaderRow is true and _FieldList
4516             is provided, then _FieldList specifies the (sub-)set of fields to be
4517             read from the file and others will be ignored.
4518              
4519             _HeaderRow, which defaults to true, if set to false, tells read() to
4520             not expect a header row showing the field names in the file. Instead,
4521             it assumes that the _FieldList gives those (and _FieldList must
4522             therefore be specified either as a parameter or an existing parameter
4523             in the object).
4524              
4525             _IgnoreQuotes is false by default. If true then we'll ignore quotes in
4526             the file upon import, which makes the loading of files with unbalanced
4527             quotes possible.
4528              
4529             _MaxRecords (optional) is an upper limit on the number of fields to
4530             import. If not specified, or zero, or undef, then there is no limit;
4531             all records will be imported or memory will be exhausted.
4532              
4533             read() returns a Boolean "success" code.
4534              
4535             If read() returns false, then it will also have set the _ErrorMsg
4536             parameter in the object. It may or may not have partially altered
4537             data in the object if an error is encountered.
4538              
4539             After a successful read:
4540              
4541             fieldlist() (the object's _FieldList parameter) tells which
4542             fields were actually read, in what order. It may omit any fields
4543             requested in _FieldList that were not actually found in the file for
4544             whatever reason.
4545              
4546             length() tells how many fields were read.
4547              
4548             The selection() is reset to no selection (all selected / unsorted)
4549              
4550             The object's _FileName parameter contains the path to the file
4551             that was read. If the _FileName you specified did not have a path,
4552             then _FileName will be prepended with a path component indicating
4553             "current directory" (e.g. "./" on Unix).
4554              
4555             _FDelimiter will contain the actual delimiter character that was
4556             used to read the file (either tab or comma if the delimiter was
4557             guessed, or whatever delimiter you specified).
4558              
4559             _LineEnding will contain the actual line-ending setting used to
4560             read the file. This will be either "mac" ("\x0D"), "unix" ("\x0D"),
4561             or "dos" ("\x0D\x0A") if the line endings were guessed by read().
4562             Otherwise it will be whatever _LineEnding you specified.
4563              
4564              
4565             =head1 FILE FORMAT NOTES
4566              
4567             As mentioned, read() allows the following flexibilities in reading
4568             text-based tabular data files:
4569              
4570             You may specify the line endings (record delimiters), or it can
4571             guess them (mac, unix, dos are supported).
4572              
4573             You may specify the field delimiters, or it can guess them (tab
4574             and comma are supported).
4575              
4576             It can get field names from a header row, or, if there is no
4577             header row, you can tell it the field names, in order.
4578              
4579             You can tell it whether or not to decode embedded returns in
4580             data fields, and if so, which character they were encoded as.
4581              
4582             Beyond supporting the above flexible options, read() makes the
4583             following non-flexible assumptions:
4584              
4585             Fields must NOT contain unencoded returns -- that is: whatever
4586             character sequence is specified for _LineEnding will NEVER occur
4587             inside a field in the text file; in addition, the current platform's
4588             definition of C<"\n"> will NEVER occur; these characters if present in
4589             field data, MUST have been encoded to some safe character string
4590             before the file was created.
4591              
4592             Each field may OPTIONALLY be surrounded with double-quote marks.
4593             However, if the field data itself contains either a double-quote
4594             character (C<">) or the current file's field delimiter (such as tab or
4595             comma), then the field MUST be surrounded with double-quotes.
4596             (Currently, all data written by Data::CTable have all field values
4597             surrounded by double-quotes, but a more selective policy may be used
4598             in the future.)
4599              
4600             If a field contains a double-quote character, then each double-quote
4601             character in the field must be encoded as C<""> -- i.e. each C<"> in the
4602             original data becomes C<""> in the text file.
4603              
4604             Data files may not mix line-ending types or field delimiter types.
4605             Once determined, the same endings and delimiters will be used to read
4606             the entire file.
4607              
4608             The fields recognized on each line will either be determined by
4609             the header row or the _FieldList provided by the caller. Any extra
4610             fields on any given line will be ignored. Any missing fields will be
4611             treated as undef/empty.
4612              
4613             If you are having trouble reading a delimited text file, check that
4614             all data in the file obeys these assumptions.
4615              
4616             =cut
4617              
4618             sub read ## Read, cacheing if possible
4619             {
4620 79     79 1 166 my $this = shift;
4621 79         383 return($this->read_file_or_cache(@_));
4622             }
4623              
4624             sub read_file ## Read, ignoring cacheing
4625             {
4626 19     19 0 52 my $this = shift;
4627 19 50       87 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
4628              
4629 190         361 my($FileName, $FieldList, $MaxRecords, $LineEnding, $FDelimiter, $ReturnMap, $ReturnEncoding, $MacRomanMap, $HeaderRow, $IgnoreQuotes)
4630 19         49 = map {$this->getparam($Params, $_)}
4631             qw(_FileName _FieldList _MaxRecords _LineEnding _FDelimiter _ReturnMap _ReturnEncoding _MacRomanMap _HeaderRow _IgnoreQuotes);
4632              
4633 19         39 my $Success;
4634              
4635             ## Default error message is none.
4636 19         40 $this->{_ErrorMsg} = "";
4637              
4638             ## Default for HeaderRow is true.
4639 19 50       148 $HeaderRow = 1 unless defined($HeaderRow);
4640              
4641             ## Default for ReturnEncoding is "\x0B" (control-K; ASCII 11)
4642 19 50       56 $ReturnEncoding = "\x0B" unless length($ReturnEncoding);
4643              
4644             ## Default for ReturnMap is true.
4645 19 50       44 $ReturnMap = 1 unless defined($ReturnMap);
4646              
4647             ## Default for MacRomanMap is undef ("Auto");
4648 19 100       196 $MacRomanMap = undef unless defined($MacRomanMap);
4649              
4650             ## Default for MaxRecords is 0 (meaning import all records)
4651 19 50       71 $MaxRecords = 0 unless (int($MaxRecords) == $MaxRecords);
4652              
4653             ## Precompile a regex for the return encoding since we'll call it often (on each field!) later.
4654 19         154 my $RetRegex = qr/$ReturnEncoding/;
4655              
4656 19         87 $this->progress("Reading $FileName...");
4657              
4658             ## Open the data file.
4659 0         0 my $File = IO::File->new("<$FileName") or
4660 19 50       143 do {$this->{_ErrorMsg} = "Failed to open $FileName: $!"; goto done};
  0         0  
4661            
4662             ## Get its total file size (useful for estimating table size later on).
4663 19 50       3027 my $FileSize = (stat($File))[7] or
4664             $this->{_ErrorMsg} = "File $FileName contains no data.", goto done;
4665            
4666             ## Convert from optional "dos", "mac", "unix" symbolic values.
4667 19         77 $LineEnding = $this->lineending_string($LineEnding);
4668              
4669             ## Default for LineEnding is found by inspecting data in the file.
4670 19 50 33     129 $LineEnding ||= guess_endings($File) or
4671             $this->{_ErrorMsg} = "Could not find any line endings in the file $FileName.", goto done;
4672            
4673             ## DoMacMapping is the actual setting for auto charset mapping
4674 19   66     140 my $DoMacMapping =
4675             ((!defined($MacRomanMap) && ($LineEnding eq "\x0D")) || ## Auto
4676             ($MacRomanMap)); ## On
4677              
4678 19 100       84 $this->progress("Will convert upper-ascii characters if any, from Mac Roman to ISO 8859-1.") if $DoMacMapping;
4679              
4680             ## FieldList is usable is it is a list and has at least one entry.
4681 19   66     70 my $FieldListValid = ((ref($FieldList) eq 'ARRAY') && @$FieldList);
4682            
4683             ## Set <$File> to use the line ending sequence we no known we are looking for.
4684 19         95 local $/ = $LineEnding;
4685            
4686             ## We use $_ explicitly, so must localize.
4687 19         23 local $_;
4688            
4689 19         26 my $IncomingFields;
4690              
4691 19 50       48 if ($HeaderRow)
4692             {
4693             ## Get the list of fields available in the file (first line of file).
4694              
4695 19 50       200 $_ = <$File> or
4696             $this->{_ErrorMsg} = "Could not find a first line with field names in $FileName.", goto done;
4697              
4698             ## Try to guess file delimiter from the header row if not yet specified.
4699 19 50 33     106 $FDelimiter ||= guess_delimiter($_) or
4700             $this->{_ErrorMsg} = "Could not find comma or tab delimiters in $FileName.", goto done;
4701            
4702             ## Maybe convert entire line (all records) Mac to ISO before splitting.
4703 19 100       93 &MacRomanToISORoman8859_1(\ $_) if $DoMacMapping;
4704              
4705 19         34 chomp;
4706            
4707 19         44 s/^\"//; s/\"$//; ## remove possible leading, trailing quotes surrounding header row (rare)
  19         35  
4708              
4709             ## Split header row into field names, removing optional "" around each at the same time.
4710 19         315 $IncomingFields = [split(/\"?$FDelimiter\"?/, $_)];
4711              
4712             ## Strip any leading and/or trailing control chars or spaces from field names in header.
4713 19         59 $IncomingFields = [map {s{(?:\A[\x00-\x20]+)|(?:[\x00-\x20]+\Z)}{}g; $_;} @$IncomingFields];
  79         218  
  79         184  
4714              
4715             }
4716             else
4717             {
4718             ## Otherwise, require that the caller specifies it in _FieldList
4719              
4720 0 0       0 $this->{_ErrorMsg} = "Must specify a _FieldList if _HeaderRow says no header row is present.", goto done
4721             unless $FieldListValid;
4722            
4723 0         0 $IncomingFields = [@$FieldList];
4724             }
4725              
4726             ## Remove any leading underscores in the names of the incoming
4727             ## fields (not allowed because such field names are reserved for
4728             ## other object data). Note: this could result in
4729             ## duplicate/overwritten field names that were otherwise
4730             ## apparently unique in the incoming data file.
4731              
4732 19         64 $IncomingFields = [map {(/^_*(.*)/)[0]} @$IncomingFields];
  79         293  
4733            
4734             ## Make a hash that can be used to map these fields' names to their numbers.
4735 29     29   37801 my $IncomingFieldNameToNum = {}; @$IncomingFieldNameToNum{@$IncomingFields} = ($[ .. $#$IncomingFields);
  29         32209  
  29         152618  
  19         59  
  19         214  
4736            
4737             ## Make a list of the fields we'll be importing (by taking the
4738             ## list the caller requested, and paring it down to only those
4739             ## fields that are actually available in the table.)
4740              
4741 71         179 my $FieldsToGet =
4742 19 100       79 [grep {exists($IncomingFieldNameToNum->{$_})}
4743             ($FieldListValid ? @$FieldList : @$IncomingFields)];
4744              
4745             ## Make a note of whether we're getting a subset of available
4746             ## fields because the caller requested such. If we are, we'll add
4747             ## a _Subset => 1 marker to the data for use later in ensuring the
4748             ## cache is OK.
4749            
4750 19   100     128 my $GettingSubset = ($FieldListValid && ("@{[sort @$IncomingFields]}" ne
4751             "@{[sort @$FieldList ]}"));
4752            
4753             ## Make an array of the incoming indices of these fields.
4754              
4755             ## Allocate a list of empty arrays into which we can import the
4756             ## data. Initially they'll each have 100 empty slots for data;
4757             ## after we have imported 100 records, we'll re-consider the size
4758             ## estimate. When we're all done, we'll prune them back.
4759              
4760 19         72 my $FieldNums = [@$IncomingFieldNameToNum{@$FieldsToGet}];
4761 19         34 my $FieldVectors = []; foreach (@$FieldNums) {$#{$FieldVectors->[$_] = []} = 100};
  19         41  
  71         89  
  71         325  
4762              
4763             ## We want to be cool and support any embedded NULL (ascii zero)
4764             ## characters should they exist in the data, even though we are
4765             ## going to use NULL chars to encode embedded delimiters before we
4766             ## split....
4767              
4768             ## First we create a sufficiently obscure placeholder for any
4769             ## ascii zero characters in the input text (a rare occurrence
4770             ## anyway).
4771              
4772 19         45 my $ZeroMarker = "\001ASCII_ZERO\001";
4773            
4774             ## Now ready to go through the file line-by-line (record-by-record)
4775              
4776 19         22 my $WroteProg;
4777 19         28 my $RecordsRead = 0;
4778 19         86 while (<$File>)
4779             {
4780             ## Try to guess file delimiter from the header row if not yet specified.
4781 57 50 33     187 $FDelimiter ||= guess_delimiter($_) or
4782             $this->{_ErrorMsg} = "Could not find comma or tab delimiters in $FileName.", goto done;
4783            
4784             ## Maybe convert entire line (all records) ISO to Mac before splitting.
4785 57 100       173 &MacRomanToISORoman8859_1(\ $_) if $DoMacMapping;
4786            
4787             ## Manipulate the single line of data fields into a splittable format.
4788            
4789 57         86 chomp;
4790            
4791             ## Replace any delimiters inside quotes with ASCII 0.
4792             ## Split fields on delimiters.
4793             ## Delete leading or trailing quote marks from each field.
4794             ## Restore delimiters ASCII 0 back to delimiters.
4795            
4796             ## Protect delimiters inside fields.
4797 57         89 s/\000/$ZeroMarker/go; ## Preserve genuine ASCII 0 chars.
4798 57         73 my $InQuote = 0; ## Initialize InQuote flag to zero.
4799 57 50       111 unless ($IgnoreQuotes) {
4800 57         524 s/(\")|($FDelimiter)/ ## Replace delimiters inside quotes with ASCII 0 ...
4801 180 50       1169 ($1 ? do{$InQuote^=1; $1} : ## ... if quote char, toggle InQuote flag
  0 50       0  
  0         0  
4802             ($InQuote ? "\000" : $2))/eg; ## ... if delimiter, InQuote sez whether to replace or retain.
4803             }
4804              
4805             ## Split record into fields, then clean each field.
4806              
4807 57         517 my $regex = qr/$FDelimiter/;
4808 57 50       132 unless ($IgnoreQuotes) {
4809 57         301 $regex = qr/\"?$FDelimiter\"?/;
4810             }
4811 57         144 s/^\"//; s/\"$//; ## Kill leading, trailing quotes surrounding each record
  57         82  
4812             my @FieldVals =
4813             map
4814 57 50       415 {if (length($_))
  237         456  
4815             {
4816 237         290 s/\"\"/\"/g; ## Restore Merge format's quoted double-quotes. ("" ==> ")
4817 237         258 s/\000/$FDelimiter/g; ## Restore delimiters inside fields
4818 237         571 s/\Q$ZeroMarker\E/\000/go; ## Restore preserved ASCII 0 chars.
4819 237 50       1950 s/$RetRegex/\n/g if $ReturnMap;## Restore return characters that were coded as ASCII 11 (^K)
4820             }
4821 237         542 $_;} ## Return field val after above mods.
4822             split($regex, $_); ## Split on delimiters, killing optional surrounding quotes at same time.
4823            
4824             ## Put the data into the vectors
4825 57         234 foreach (@$FieldNums)
4826             {
4827 213 50       681 $FieldVectors->[$_]->[$RecordsRead] = $FieldVals[$_] if (length($FieldVals[$_]));
4828             }
4829 57         84 $RecordsRead++;
4830            
4831             ## Stop if we've read all the records we wanted.
4832 57 50 33     226 last if ($MaxRecords && ($RecordsRead >= $MaxRecords));
4833            
4834             ## Optimization:
4835              
4836             ## After importing 100, 200, 300, 400, etc. records, we
4837             ## re-estimate the size of the table. To help make the field
4838             ## insertion more efficient (by avoiding frequent
4839             ## array-resizing), we can set the sizes of the field vectors
4840             ## to hold at least our estimated number of records).
4841             ## Ideally, this estimation/resize step will happen at most 2
4842             ## or 3 times no matter how big the incoming data file is.
4843              
4844 57         58 my $EstTotalRecords;
4845 57 50 33     276 if ((($RecordsRead % 100) == 0) && ## If we're on a record divisble by 100...
  0         0  
4846             (($RecordsRead + 100 > $#{$FieldVectors->[$FieldNums->[0]]}))) ## ... and we're getting close to max size...
4847             {
4848             ## Then estimate the size we'd like to resize it to.
4849 0         0 $EstTotalRecords = (100 + int($RecordsRead * ($FileSize / tell($File))));
4850 0 0 0     0 $EstTotalRecords = $MaxRecords if ($MaxRecords && ($MaxRecords < $EstTotalRecords));
4851            
4852             ## If this size is greater than the actual size...
4853 0 0       0 if ($EstTotalRecords > $#{$FieldVectors->[$FieldNums->[0]]})
  0         0  
4854             {
4855             ## Then resize all the vectors.
4856             ## $this->progress("$RecordsRead: Resizing to $EstTotalRecords...\n"); ## Debugging
4857 0         0 foreach (@$FieldNums) {$#{$FieldVectors->[$_]} = $EstTotalRecords};
  0         0  
  0         0  
4858             }
4859             }
4860              
4861             ## Try doing timed (throttled to 1 per 2 secs) progress at
4862             ## most every 100th record.
4863 57 0       137 my $Did = ($EstTotalRecords ?
    50          
4864             $this->progress_timed("Reading", "$RecordsRead of $EstTotalRecords (est.)", tell($File), $FileSize, 1) :
4865             $this->progress_timed("Reading", "$RecordsRead" , tell($File), $FileSize, 1))
4866             if (($RecordsRead % 100) == 0);
4867 57   33     543 $WroteProg ||= $Did;
4868             }
4869              
4870             ## If we wrote timed progress but didn't get to give the 100%
4871             ## message yet, print the 100% message now.
4872              
4873 19 50       52 if ($WroteProg)
4874             {
4875 0 0       0 $this->progress_timed("Reading", "$RecordsRead of $RecordsRead", $FileSize, $FileSize, 1)
4876             unless (($RecordsRead % 100) == 0);
4877             }
4878              
4879             ## Print the regular Done message.
4880 19         110 $this->progress("Read $FileName.");
4881              
4882             ## Set the field vectors' length to the exact length we really
4883             ## read.
4884              
4885             ## $this->progress("$RecordsRead: Truncating to @{[$RecordsRead - 1]}... \n"); ## Debugging
4886 19         144 foreach (@$FieldNums) {$#{$FieldVectors->[$_]} = ($RecordsRead - 1)};
  71         95  
  71         318  
4887            
4888             ## Delete any existing columns in the object.
4889 19         30 delete @$this{@{$this->fieldlist_all()}};
  19         66  
4890              
4891             ## Put the new columns into the object.
4892 19         96 @$this{@$FieldsToGet} = @$FieldVectors[@$FieldNums];
4893              
4894             ## Set fieldlist to fields we actually read, in order.
4895 19         69 $this->fieldlist($FieldsToGet);
4896              
4897             ## Remember the line ending char or chars that were successfully
4898             ## used to read the file. The same ending will be used by default
4899             ## to write any file based on this object.
4900 19         69 $this->{_LineEnding} = $this->lineending_symbol($LineEnding);
4901            
4902             ## Remember the field delimiter that was successfully used to read
4903             ## the file. The same delimiter will be used by default to write
4904             ## any file based on this object.
4905 19         68 $this->{_FDelimiter} = $FDelimiter;
4906              
4907             ## Remember the header row setting.
4908 19         30 $this->{_HeaderRow} = $HeaderRow;
4909              
4910             ## Remember the filename used for reading.
4911 19         37 $this->{_FileName} = $FileName;
4912              
4913             ## Remember whether we read (and maybe will cache) a subset of available fields.
4914 19   100     105 $this->{_Subset} = $GettingSubset || 0;
4915              
4916 19   50     82 $this->{_IgnoreQuotes} = $IgnoreQuotes || 0;
4917              
4918             ## Clean out _Selection and verify _SortOrder to ensure compatibility
4919             ## with current _FieldList.
4920 19         60 $this->read_postcheck();
4921              
4922             ## Other informational data and options, like sort specs, sort
4923             ## routines and so on, need not be changed or replaced when data
4924             ## changes.
4925              
4926 19         28 $Success = 1;
4927              
4928 19 50       45 done:
4929            
4930             $this->warn("FAILURE: $this->{_ErrorMsg}") unless $Success;
4931              
4932 19 50       277 close $File if $File;
4933 19         553 return($Success);
4934             }
4935              
4936             sub read_postcheck ## Called to clean up after a successful read
4937             {
4938 79     79 0 129 my $this = shift;
4939              
4940             ## Run select_all to empty out the _Selection.
4941 79         308 $this->select_all();
4942              
4943             ## Remove any bogus field names from the sort order, if any.
4944 79         309 $this->sortorder_check();
4945             }
4946              
4947             sub read_file_or_cache ## Read, cacheing if possible
4948             {
4949 79     79 0 132 my $this = shift;
4950 79         251 $DB::single = 1;
4951 79 100       438 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
4952            
4953 79         1364 my($FileName, $FieldList, $CacheOnRead, $CacheExtension, $CacheSubDir, $IgnoreQuotes) = map {$this->getparam($Params, $_)}
  474         1059  
4954             qw(_FileName _FieldList _CacheOnRead _CacheExtension _CacheSubDir, _IgnoreQuotes);
4955              
4956 79         542 my $Success;
4957              
4958             ## If cacheing is turned off, just bail prematurely and treat this
4959             ## as a call to read_file().
4960              
4961 79 100       318 return($this->read_file(%$Params)) unless $CacheOnRead;
4962              
4963             ## Otherwise... check if cacheing is possible.
4964              
4965             ## Calculate the cache file name. If it comes back empty, it
4966             ## means the cache directory probably could not be created, or the
4967             ## cache file itself either does not exist or could not be
4968             ## preflighted (either read or touched/deleted).
4969              
4970 68         332 my $CacheFileName = $this->prep_cache_file($FileName, $CacheExtension, $CacheSubDir);
4971              
4972             ## If the cache file preflight failed, treat this as a regular
4973             ## read_file() without cacheing.
4974            
4975 68 50       390 return($this->read_file(%$Params)) unless length($CacheFileName);
4976            
4977             ## At this point we believe we'll either be able to read or write
4978             ## the cache file as needed.
4979              
4980             ## Try to read the cache if both files exist and the mod date is
4981             ## later.
4982              
4983 68         117 my $Data;
4984 68 100 66     6840 if ((-e $FileName ) &&
      100        
4985             (-e $CacheFileName ) &&
4986             (((stat($CacheFileName))[9]) > ((stat($FileName))[9])) )
4987             {
4988 63         457 $this->progress("Thawing $CacheFileName...");
4989             eval
4990 63         333 {
4991 63         355 $Data = &retrieve($CacheFileName);
4992             };
4993 63 50       9820 $this-warn("Cache restore from $CacheFileName failed: $!"), unlink($CacheFileName)
4994             unless defined ($Data);
4995             }
4996            
4997 68 100       330 if (ref($Data) eq 'HASH')
4998             {
4999             ## Retrieval succeeded.
5000            
5001             ## Verify that the data in the cache is usable.
5002              
5003             ## First, check newline-encoding compatibility.
5004             (
5005 63 50       333 $this->warn("Abandoning cache due to incompatible newline encoding"),
5006             unlink $CacheFileName, goto cache_failed) unless $Data->{_Newline} eq "\n";
5007            
5008             ## Simulate an actual read_file() using data
5009             ## from the cache instead of the original file.
5010              
5011 63 100 66     425 if ((ref($FieldList) eq 'ARRAY') && @$FieldList)
    100          
5012             {
5013             ## If $FieldList requests fields not found in the cache,
5014             ## we abandon (delete and maybe rewrite) the cache: maybe
5015             ## we previously cached a different subset of fields from
5016             ## a previous request and so the cache is no longer
5017             ## adequate.
5018              
5019 18         45 my $MissingFields = [grep {!exists($Data->{$_})} @$FieldList];
  48         145  
5020             (
5021             ## $this->warn("Abandoning cache due to change in requested field list"),
5022 18 100       95 unlink $CacheFileName, goto cache_failed) if @$MissingFields;
5023            
5024             ## If there was a _FieldList supplied in $Params or $this,
5025             ## we might need to omit any fields read from cache but
5026             ## not mentioned (just as read_file() would have done).
5027              
5028 16         29 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  16         130  
5029 16         91 my $OmitFields = [grep {!exists($FieldHash->{$_})} grep {!/^_/} keys %$Data];
  64         145  
  176         607  
5030            
5031 16         62 delete @$Data{@$OmitFields};
5032            
5033             ## Finally, also pare down _FieldList to only mention
5034             ## those fields that were desired....
5035            
5036 16         34 my $AvailFields = $Data->{_FieldList};
5037 16         32 $Data->{_FieldList} = [grep {exists($Data->{$_})} @$FieldList];
  42         121  
5038              
5039             ## We might have ended up reading a subset of the
5040             ## available fields in the cache. (This is logically
5041             ## equivalent to reading a subset of the available fields
5042             ## in the real file.) If so, then change the value of
5043             ## $Data->{_Subset} to indicate that.
5044              
5045 16   100     110 my $GettingSubset = (@$OmitFields && 1 || 0);
5046 16   66     115 $Data->{_Subset} ||= $GettingSubset;
5047             }
5048             elsif ($Data->{_Subset})
5049             {
5050              
5051             ## Conversely, if no field list was specified (hence all
5052             ## fields are desired), but a subset of fields has
5053             ## previously been cached, we have to delete / abandon the
5054             ## cache and re-read the file so we are getting all
5055             ## fields.
5056              
5057             (
5058             ## $this->warn("Abandoning partial cache due to request of full field list"),
5059 1         24 unlink $CacheFileName, goto cache_failed);
5060             }
5061              
5062             ## Copy all elements from $Data, including possibly overridden
5063             ## _FieldList element if any, but excepting the cache-only
5064             ## _Newline element, into $this.
5065              
5066 60         163 $Data->{_IgnoreQuotes} = $this->{_IgnoreQuotes};
5067 60         154 delete $Data->{_Newline};
5068 60         530 @$this{keys %$Data} = values %$Data;
5069              
5070             ## If more records were written to the cache than are now
5071             ## desired, warn and truncate. This is potentially not good
5072             ## because in the reverse case (reading/saving the first time
5073             ## with a limit then re-reading with no limit) would not be
5074             ## caught.
5075              
5076 60 50 33     337 if ($this->{_MaxRecords} && $this->length() > $this->{_MaxRecords})
5077             {
5078 0         0 $this->warn("Truncating length of cached table (@{[$this->length()]}) to requested length (@{[$this->{_MaxRecords}]})");
  0         0  
  0         0  
5079 0         0 $this->length($this->{_MaxRecords});
5080             }
5081            
5082             ## Set the file name to the name of the original (not the
5083             ## cache) file, just as read() would have done.
5084              
5085 60         379 $this->{_FileName} = $FileName;
5086              
5087             ## Run the "read post-check" -- the same things we do in
5088             ## read_file() after completing the read process: Clean out
5089             ## _Selection and verify _SortOrder to ensure compatibility
5090             ## with current _FieldList.
5091              
5092 60         271 $this->read_postcheck();
5093              
5094 60         420 $this->progress("Thawed $FileName.");
5095              
5096 60         4832 $Success = 1;
5097 60         1379 goto done; ## Successful completion: we read from the cache.
5098             }
5099              
5100             ## Could not retrieve for whatever reason (maybe cache did not
5101             ## exist yet or was out of date or had to be abandoned). So just
5102             ## read normally and possibly write the cache.
5103              
5104             cache_failed:
5105             {
5106 8 50       15 $Success = $this->read_file(%$Params) or goto done;
  8         44  
5107              
5108             ## Now, having read successfully, we try to write the cache
5109             ## for next time. Writing the cache is optional; failing to
5110             ## write it is not a failure of the method.
5111              
5112             { ## Code in this block may fail and that's OK.
5113              
5114             ## First, pre-flight.
5115 8 50       15 $this->warn("Cache file $CacheFileName cannot be created/overwritten: $!"),
  8         30  
5116             goto done ## Successful completion.
5117             unless $this->try_file_write($CacheFileName);
5118              
5119             ## The data to be stored is:
5120              
5121             ## 1) All data columns read by read_file()
5122             ## 2) Any parameters set by read_file()
5123             ## 3) _Subset param indicating partial fieldlist was read from file.
5124             ## 4) _Newline setting so we know if it's compatabile when read back.
5125              
5126             ## No other parameters should be cached because we want a
5127             ## read from the cache to produce exactly the same result
5128             ## as a read from the file itself would have produced.
5129              
5130             ## After a read, fieldlist() will contain the fields
5131             ## actually read, so cols_hash() WILL yield all the
5132             ## columns.
5133              
5134 8         37 my $Data = {(
5135             ## Refs to each column read by read_file()
5136             %{ $this->cols_hash() },
5137              
5138             ## Other parameters set by read_file()
5139             _FieldList => $this->{_FieldList },
5140             _LineEnding => $this->{_LineEnding},
5141             _FDelimiter => $this->{_FDelimiter},
5142             _HeaderRow => $this->{_HeaderRow },
5143             _IgnoreQuotes => $this->{_IgnoreQuotes},
5144             _Subset => $this->{_Subset },
5145 8         19 _Newline => "\n",
5146             )};
5147            
5148 8 50       62 $this->warn("Failed to cache $CacheFileName"),
5149             unlink($CacheFileName),
5150             goto done ## Successful completion.
5151             unless $this->write_cache($Data, $CacheFileName);
5152 8         324 chmod 0666, $CacheFileName; ## Liberal perms if possible.
5153             }
5154              
5155 8         83 goto done; ## Successful completion: we read from the file & maybe saved cache.
5156             }
5157            
5158             done:
5159 68         581 return ($Success);
5160             }
5161              
5162             =pod
5163              
5164             =head1 WRITING DATA FILES
5165              
5166             ## Writing some or all data from table into a data file
5167              
5168             $t->write($Path) ## Simple calling convention
5169              
5170             $t->write( ## Named-parameter convention
5171              
5172             ## Params that override params in the object if supplied...
5173              
5174             _FileName => $Path, ## "Base path"; see _WriteExtension
5175              
5176             _WriteExtension=> ".out",## Insert/append extension to _FileName
5177              
5178             _FieldList => [...], ## Fields to write; others ignored
5179             _Selection => [...], ## Record (#s) to write; others ignored
5180              
5181             _HeaderRow => 0, ## Include header row in file
5182              
5183             _LineEnding => undef, ## Record delimiter (default is "\n")
5184             _FDelimiter => undef, ## Field delimiter (default is comma)
5185              
5186             _ReturnMap => 1, ## Whether to encode internal returns
5187             _ReturnEncoding=>"\x0B", ## How to encode returns
5188             _MacRomanMap => undef, ## Whether/when to write Mac char set
5189              
5190              
5191             _CacheOnWrite => 1, ## Enable saving cache after write()
5192             _CacheExtension=> ".x", ## Extension to add to cache file name
5193             _CacheSubDir => "", ## (Sub-)dir, if any, for cache files
5194              
5195             ## Params specific to the read()/write() methods...
5196              
5197             _MaxRecords => 200, ## Limit on how many records to write
5198             )
5199              
5200             $t->write_file() ## Internal: same as write(); ignores cacheing
5201              
5202             write() writes a Merge, CSV, or Tab-delimited file.
5203              
5204             It uses parameters as described above. Any parameters not supplied
5205             will be gotten from the object.
5206              
5207             Using the simple calling convention, just pass it a path which will
5208             override the _FileName parameter in the object, if any.
5209              
5210             All other parameters will come from the object (or will be defaulted
5211             if absent).
5212              
5213             If no _FileName or path is specified, or it is the special string "-"
5214             (dash), then the file handle \ * STDOUT will be used by default (and
5215             you could redirect it to a file). You can supply any open file handle
5216             or IO::File object of your own for the _FileName parameter.
5217              
5218             If write() is writing to a file handle by default or because you
5219             specified one, then no write-cacheing will occur.
5220              
5221             To specify additional parameters or override any parameters in the
5222             object while reading, use the named-parameter calling convention.
5223              
5224             If the object's data was previously filled in using new() or read(),
5225             then the file format parameters from the previous read() method will
5226             still be in the object, so the format of the written file will
5227             correspond as much as possible to the file that was read().
5228              
5229             write() returns the path name of the file actually written, or the
5230             empty string if a supplied file handle or STDOUT was written to, or
5231             undef if there was a failure.
5232              
5233             If write() returns undef, then it will also have set the _ErrorMsg
5234             parameter in the object.
5235              
5236             write() never modifies any data in the object itself.
5237              
5238             Consequently, if you specify a _FieldList or a _Selection, only those
5239             fields or records will be written, but the corresponding parameters in
5240             the object itself will be left untouched.
5241              
5242             =head2 How write() calculates the Path
5243              
5244             The _FileName parameter is shared with the read() method. This
5245             parameter is set by read() and may be overridden when calling write().
5246              
5247             In the base implementation of Data::CTable, write() will try not to
5248             overwrite the same file that was read, which could possibly cause data
5249             loss.
5250              
5251             To avoid this, it does not use the _FileName parameter directly.
5252             Instead, it starts with _FileName and inserts or appends the value of
5253             the _WriteExtension parameter (which defaults to ".out") into the file
5254             name before writing.
5255              
5256             If the _FileName already has an extension at the end, write() will
5257             place the _WriteExtension BEFORE the final extension; otherwise the
5258             _WriteExtension will be placed at the end of the _FileName.
5259              
5260             For example:
5261              
5262             Foobar.txt ==> Foobar.out.txt
5263             Foobar.merge.txt ==> Foobar.merge.out.txt
5264             My_Merge_Data ==> My_Merge_Data.out
5265              
5266             If you DON'T want write() to add a _WriteExtension to _FileName before
5267             it writes the file, then you must set _WriteExtension to empty/undef
5268             either in the object or when calling write(). Or, you could make a
5269             subclass that initializes _WriteExtension to be empty. If
5270             _WriteExtension is empty, then _FileName will be used exactly, which
5271             may result in overwriting the original data file.
5272              
5273             Remember: write() returns the path name it actually used to
5274             successfully write the file. Just as with read(), if the _FileName
5275             you specified did not have a path, then write() will prepend a path
5276             component indicating "current directory" (e.g. "./" on Unix) and this
5277             will be part of the return value.
5278              
5279              
5280             =head2 Cacheing with write()
5281              
5282             By default, Data::CTable only creates a cached version of a file when
5283             it reads that file for the first time (on the assumption that it will
5284             need to read the file again more often than the file's data will
5285             change.)
5286              
5287             But by default, it does not create a cached version of a file when
5288             writing it, on the assumption that the current program probably will
5289             not be re-reading the written file and any other program that wants to
5290             read it can cache it at that time.
5291              
5292             However, if you want write() to create a cache for its output file, it
5293             is much faster to create it on write() than waiting for the next
5294             read() because the next read() will be able to use the cache the very
5295             first time.
5296              
5297             To enable write-cacheing, set _CacheOnWrite to true. Then, after the
5298             write() successfully completes (and only if it does), the cached
5299             version will be written.
5300              
5301             =head1 FORMATTED TABLES (Using Data::ShowTable)
5302              
5303             ## Get formatted data in memory
5304              
5305             my $StringRef = $t->format(); ## Format same data as write()
5306             my $StringRef = $t->format(10); ## Limit records to 10
5307             my $StringRef = $t->format(...); ## Specify arbitrary params
5308             print $$StringRef;
5309              
5310             ## Write formatted table to file or terminal
5311              
5312             $t->out($Dest, ....);## $Dest as follows; other params to format()
5313             $t->out($Dest, 10, ....) ## Limit recs to 10; params to format()
5314              
5315             $t->out() ## print formatted data to STDOUT
5316             $t->out(\*STDERR) ## print to STDERR (or any named handle)
5317             $t->out("Foo.txt") ## print to any path (file to be overwritten)
5318             $t->out($FileObj) ## print to any object with a print() method
5319              
5320             out() takes a first argument specifying a destination for the output,
5321             then passes all other arguments to format() to create a nice-looking
5322             table designed to be human-readable; it takes the resulting buffer and
5323             print()s it to the destination you specified.
5324              
5325             Sample output:
5326              
5327             +-------+------+-----+-------+
5328             | First | Last | Age | State |
5329             +-------+------+-----+-------+
5330             | Chris | Zack | 43 | CA |
5331             | Marco | Bart | 22 | NV |
5332             | Pearl | Muth | 15 | HI |
5333             +-------+------+-----+-------+
5334              
5335             (Note extra space character before each line.)
5336              
5337             The destination may be a file handle (default if undef is \*STDOUT), a
5338             string (treated as a path to be overwritten), or any object that has a
5339             print() method, especially an object of type IO::File.
5340              
5341             The main purpose of out() is to give you a quick way to dump a table
5342             when debugging. out() calls format() to create the output, so read
5343             on...
5344              
5345             format() produces a human-readable version of a table, in the form of
5346             a reference to a string buffer (which could be very large), and
5347             returns the buffer to you. Dereference the resulting string reference
5348             before using.
5349              
5350             If format() is given one argument, that argument is the _MaxRecords
5351             parameter, which limits the length of the output.
5352              
5353             Otherwise, format() takes the following named-parameter arguments,
5354             which can optionally override the corresponding parameters, if any, in
5355             the object:
5356              
5357             _FieldList ## Fields to include in table
5358             _Selection ## Records to be included, in order
5359              
5360             _SortSpecs ## SortType controls number formatting
5361             _DefaultSortType
5362              
5363             _MaxRecords ## Limit number of records output
5364              
5365             _MaxWidth ## Limit width of per-col. data in printout
5366              
5367             format() will obey _MaxRecords, if you'd like to limit the number of
5368             rows to be output. _MaxRecords can also be a single argument to
5369             format(), or a second argument to out() if no other parameters are
5370             passed.
5371              
5372             format() also recognizes the _SortSpecs->{SortType} and
5373             _DefaultSortType parameters to help it determine the data types of the
5374             fields being formatted. Fields of type "Number" are output as
5375             right-justified floats; "Integer" or "Boolean" are output as
5376             right-justified integers, and all others (including the default:
5377             String) are output as left-justified strings.
5378              
5379             In addition, there is one parameter uniquely supported by format() and
5380             out():
5381              
5382             =over 4
5383              
5384             =item _MaxWidth ||= 15;
5385              
5386             =back
5387              
5388             _MaxWidth specifies the maximum width of columns. If unspecifed, this
5389             will be 15; the minimum legal value is 2. Each column may actually
5390             take up 3 more characters than _MaxWidth due to divider characters.
5391              
5392             The data to be output will be examined, and only the necessary width
5393             will be used for each column. _MaxWidth just limits the upper bound,
5394             not the lower.
5395              
5396             Data values that are too wide to fit in _MaxWidth spaces will be
5397             truncated and the tilde character "~" will appear as the last
5398             character to indicate the truncation.
5399              
5400             Data values with internal returns will have the return characters
5401             mapped to slashes for display.
5402              
5403             format() and out() will NOT wrap entries onto a second line,
5404             like you may have seen Data::ShowTable::ShowBoxTable do in some cases.
5405             Each record will get exactly one line.
5406              
5407             format() and out() ignore the _HeaderRow parameter. A header
5408             row showing the field names is always printed.
5409              
5410             format() and out() make no attempt to map upper-ascii characters from
5411             or to any particular dataset. The encoding used in memory (generally
5412             ISO 8859-1 by default) is the encoding used in the output. If you
5413             want to manipulate the encoding, first call format(), then change the
5414             encoding, then format the resulting table.
5415              
5416             =cut
5417              
5418             sub write ## Write, cacheing afterward if possible
5419             {
5420 8     8 1 55 my $this = shift;
5421 8         41 return($this->write_file_and_cache(@_));
5422             }
5423              
5424             sub write_file_and_cache ## Write, cacheing afterward if possible
5425             {
5426 8     8 0 14 my $this = shift;
5427              
5428 8 50       39 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
5429            
5430 8         91 my($FieldList, $LineEnding, $FDelimiter, $HeaderRow, $CacheOnWrite, $CacheExtension, $CacheSubDir, $IgnoreQuotes) = map {$this->getparam($Params, $_)}
  64         502  
5431             qw($FieldList _LineEnding _FDelimiter _HeaderRow _CacheOnWrite _CacheExtension _CacheSubDir _IgnoreQuotes);
5432              
5433             ## First write the file and go to done if it failed.
5434 8 50       42 my $WriteFileName = $this->write_file(@_) or goto done;
5435              
5436             ## Only try to cache if we got a non-empty $WriteFileName back.
5437             ## We won't get a name back in the case where we wrote directly to
5438             ## an open file handle.
5439            
5440 8 50       25 goto done unless $WriteFileName;
5441            
5442             ## Only try to cache if $CacheOnWrite has been turned ON.
5443 8 50       53 goto done unless $CacheOnWrite;
5444              
5445             ## Now, having written successfully, we try to write the cache for
5446             ## next time. Writing the cache is always optional; failing to
5447             ## write it is not a failure of this method. Consequently, any
5448             ## "goto done" statements beyond this point will still result in a
5449             ## successful outcome since $WriteFileName will have a value.
5450              
5451             ## Calculate the name of the cache file and fail the directory
5452             ## creation fails. prep_cache_file will have generated a warning
5453             ## if an attempt to create needed subdirectories has failed.
5454              
5455 0 0       0 my $CacheFileName = $this->prep_cache_file($WriteFileName, $CacheExtension, $CacheSubDir)
5456             or goto done;
5457            
5458             ## Pre-flight the cache file for writing.
5459 0 0       0 $this->warn("Cache file $CacheFileName cannot be created/overwritten: $!"),
5460             goto done ## Successful completion.
5461             unless $this->try_file_write($CacheFileName);
5462              
5463             ## The data to be stored is:
5464            
5465             ## 1) All data columns written by write_file()
5466             ## 2) Any file format parameters used by write_file()
5467              
5468             ## Calculate the main writing-related parameters using the same
5469             ## logic that write_file() uses...
5470              
5471             ## Default for FieldList is all fields.
5472 0   0     0 $FieldList ||= $this->fieldlist();
5473            
5474             ## Convert from optional "dos", "mac", "unix" symbolic values.
5475 0         0 $LineEnding = $this->lineending_string($LineEnding);
5476            
5477             ## Default for LineEnding is "\n" (CR on Mac; LF on Unix; CR/LF on DOS)
5478 0 0       0 $LineEnding = "\n" unless length($LineEnding);
5479            
5480             ## Default for FDelimiter is comma
5481 0 0       0 $FDelimiter = ',' unless length($FDelimiter);
5482              
5483             ## Default for HeaderRow is true.
5484 0 0       0 $HeaderRow = 1 unless defined($HeaderRow);
5485              
5486             ## No other parameters should be cached because we want a
5487             ## read from the cache to produce exactly the same result
5488             ## as a read from the file itself would have produced.
5489            
5490 0         0 my $Data = {(
5491             ## Refs to each column written
5492 0   0     0 %{ $this->cols_hash($FieldList)},
5493            
5494             ## Other relevant file-format parameters
5495             _FieldList => $FieldList,
5496             _LineEnding => $LineEnding,
5497             _FDelimiter => $FDelimiter,
5498             _HeaderRow => $HeaderRow,
5499             _Subset => $this->{_Subset} || 0,
5500             _Newline => "\n",
5501            
5502             ## We don't need to save _ReturnMap and
5503             ## _ReturnEncoding because those only are relevant
5504             ## when reading physical files. Cached data has the
5505             ## return chars already encoded as returns.
5506            
5507             )};
5508            
5509 0 0       0 $this->warn("Failed to cache $CacheFileName"),
5510             unlink($CacheFileName), ## Delete cache if failure
5511             goto done ## Successful completion.
5512             unless $this->write_cache($Data, $CacheFileName);
5513 0         0 chmod 0666, $CacheFileName; ## Liberal perms if possible.
5514            
5515 8         41 done:
5516             return($WriteFileName);
5517             }
5518              
5519             sub write_cache
5520             {
5521 8     8 0 13 my $this = shift;
5522 8         16 my ($Data, $CacheFileName) = @_;
5523            
5524 8         37 $this->progress("Storing $CacheFileName...");
5525            
5526 8         12 my $Success;
5527             eval
5528 8         17 {
5529 8         46 $Success = nstore($Data, $CacheFileName);
5530             };
5531              
5532 8 50       2316 $this->progress("Stored $CacheFileName.") if $Success;
5533            
5534 8         30 done:
5535             return($Success);
5536             }
5537              
5538             sub write_file ## Just write; don't worry about cacheing
5539             {
5540 8     8 0 13 my $this = shift;
5541 8 50       41 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
5542              
5543 96         166 my($FileName, $FieldList, $Selection, $MaxRecords, $LineEnding, $FDelimiter, $QuoteFields, $ReturnMap, $ReturnEncoding, $MacRomanMap, $HeaderRow, $WriteExtension)
5544 8         99 = map {$this->getparam($Params, $_)}
5545             qw(_FileName _FieldList _Selection _MaxRecords _LineEnding _FDelimiter _QuoteFields _ReturnMap _ReturnEncoding _MacRomanMap _HeaderRow _WriteExtension);
5546              
5547 8         22 my $Success;
5548            
5549 8         22 $this->{_ErrorMsg} = "";
5550              
5551             ## if FileName is unspecified, or is the single character "-",
5552             ## then default to STDOUT.
5553 8 50       50 $FileName = \ *STDOUT if ($FileName =~ /^-?$/);
5554            
5555             ## If we have a regular file handle, bless it into IO::File.
5556 8 50       24 $FileName = bless ($FileName, 'IO::File') if ref($FileName) =~ /(HANDLE)|(GLOB)/;
5557            
5558             ## If we have a file handle either passed or constructed, make note of that fact.
5559 8         26 my $GotHandle = ref($FileName) eq 'IO::File';
5560            
5561 8 50 33     49 $this->{_ErrorMsg} = "FileName must be specified for write()", goto done
5562             unless $GotHandle or length($FileName);
5563            
5564             ## Default for FieldList is all fields.
5565 8   33     111 $FieldList ||= $this->fieldlist();
5566            
5567             ## Default for Selection is all records.
5568 8   33     18258 $Selection ||= $this->selection();
5569              
5570             ## Default for MaxRecords is 0 (meaning write all records)
5571 8 50       37 $MaxRecords = 0 unless (int($MaxRecords) == $MaxRecords);
5572              
5573             ## Convert from optional "dos", "mac", "unix" symbolic values.
5574 8         30 $LineEnding = $this->lineending_string($LineEnding);
5575              
5576             ## Default for LineEnding is "\n" (CR on Mac; LF on Unix; CR/LF on DOS)
5577 8 50       39 $LineEnding = "\n" unless length($LineEnding);
5578              
5579             ## Default for FDelimiter is comma
5580 8 50       22 $FDelimiter = ',' unless length($FDelimiter);
5581              
5582             ## Default for QuoteFields is undef (auto)
5583 8 50       27 $QuoteFields = undef unless defined($QuoteFields);
5584              
5585             ## "QuoteCheck" mode means check each field -- this is the "auto"
5586             ## mode that kicks in when _QuoteFields is undef.
5587 8         19 my $QuoteCheck = (!defined($QuoteFields));
5588            
5589             ## Default for ReturnMap is true.
5590 8 50       24 $ReturnMap = 1 unless defined($ReturnMap);
5591              
5592             ## Default for MacRomanMap is undef ("Auto");
5593 8 50       20 $MacRomanMap = undef unless defined($MacRomanMap);
5594              
5595             ## DoMacMapping is the actual setting for auto charset mapping
5596 8   66     60 my $DoMacMapping =
5597             ((!defined($MacRomanMap) && ($LineEnding eq "\x0D")) || ## Auto
5598             ($MacRomanMap)); ## On
5599            
5600 8 100       30 $this->progress("Will convert upper-ascii characters if any, from ISO-8859-1 to Mac Roman.") if $DoMacMapping;
5601              
5602             ## Default for ReturnEncoding is "\x0B" (control-K; ASCII 11)
5603 8 50       40 $ReturnEncoding = "\x0B" unless length($ReturnEncoding);
5604              
5605             ## Default for HeaderRow is true.
5606 8 50       22 $HeaderRow = 1 unless defined($HeaderRow);
5607              
5608             ## Default for $WriteExtension is "" (none) -- meaning use exact $FileName
5609 8 50       22 $WriteExtension = "" unless defined($WriteExtension);
5610              
5611             ## Get a hash of fields actually present...
5612              
5613 8         30 my $AllFields = $this->fieldlist_all();
5614 8         18 my $AllFieldsHash = {}; @$AllFieldsHash{@$AllFields} = undef;
  8         33  
5615              
5616             ## Cull $FieldList to only include fields we have...
5617              
5618 8         16 $FieldList = [grep {exists($AllFieldsHash->{$_})} @$FieldList];
  33         328  
5619            
5620             ## Ensure $Selection contains only valid record numbers...
5621              
5622 8         36 $Selection = $this->selection_validate_internal($Selection);
5623              
5624             ## Get an ordered list of the columns.
5625 8         33 my $Columns = [@$this{@$FieldList}];
5626              
5627             ## Calculate the name of the file we'll write to, if any, and get
5628             ## the file handle either from the one we were given or by opening
5629             ## the specified file for writing.
5630              
5631 8         15 my $WriteFileName;
5632             my $OutFile;
5633              
5634 8 50       31 if ($GotHandle)
5635             {
5636 0         0 $WriteFileName = "";
5637 0         0 $OutFile = $FileName; ## Actually a handle.
5638             }
5639             else
5640             {
5641             ## Calculate the name of the file we're going to try to write to.
5642 8 50       53 $WriteFileName =
5643            
5644             ## Use file name exactly if specified in write() call.
5645             ($Params->{_FileName} ? $FileName :
5646            
5647             ## Otherwise, calculate the name by adding/appending the _WriteExtension.
5648             $this->write_file_name($FileName, $WriteExtension));
5649            
5650             ## Ensure the directory that will hold the file actually exists.
5651 29     29   367 use File::Basename qw(fileparse);
  29         69  
  29         8520  
5652 8         197 my ($Basename, $Path, $Ext) = fileparse($WriteFileName, '\.[^\.]+');
5653 8         34 my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
  8         29  
5654 8   33     30 $Path ||= $Cur; ## Once again, default $Path to cwd just in case.
5655            
5656 8 50       30 $this->{_ErrorMsg} = "Can't make directory $Path to save $WriteFileName: $!", goto done
5657             unless $this->verify_or_create_path($Path, $Sep);
5658            
5659             ## Ensure the directory is writeable and the file is overwriteable
5660             ## if it exists.
5661 8 50       177 $this->{_ErrorMsg} = "Directory $Path is not writeable.", goto done
5662             unless (-w $Path);
5663            
5664 8 50 33     679 $this->{_ErrorMsg} = "File $WriteFileName cannot be overwritten.", goto done
5665             if (-e $WriteFileName && !(-w $WriteFileName));
5666            
5667             ## Open the file for write.
5668            
5669 29     29   231 use IO::File;
  29         72  
  29         52340  
5670 8         85 $OutFile = IO::File->new(">$WriteFileName");
5671 8 50       3103 $this->{_ErrorMsg} = "Failed to open $WriteFileName for writing: $!", goto done
5672             unless $OutFile;
5673            
5674 8         47 $this->progress("Writing $WriteFileName...");
5675             }
5676              
5677             ## Figure out the line initiator & ender strings, and delimiter sequence.
5678              
5679 8         16 my ($LineStartQuote, $Delim, $LineEndQuote);
5680 8 50       31 if ($QuoteFields)
5681             {
5682             ## In "forced" quote mode, we just do the quoting by always
5683             ## putting them at the start and end of lines and in between
5684             ## each field.
5685              
5686 0         0 $LineStartQuote = "\"";
5687 0         0 $LineEndQuote = "\"";
5688 0         0 $Delim = "\"$FDelimiter\"";
5689             }
5690             else
5691             {
5692             ## In no-quote or auto-quote mode, we don't put the quotes in
5693             ## these places; they'll either be omitted entirely or
5694             ## inserted per-field.
5695              
5696 8         27 $LineStartQuote = '';
5697 8         12 $LineEndQuote = '';
5698 8         15 $Delim = $FDelimiter;
5699             }
5700              
5701             ## Precompile a regex that checks for quotes, the field delimiter
5702             ## sequence, the line ending sequence, or any line-ending-ish
5703             ## characters at all. In $QuoteCheck mode, we'll use this to
5704             ## identify fields needing double-quotes.
5705              
5706 8 50       329 my $QuoteOrDelimCheck = qr{(?:\Q$LineEnding\E)|(?:\Q$FDelimiter\E)|[\"\x0D\x0A]} if $QuoteCheck;
5707              
5708             ## Precompile a return-map-checking regex that checks first for
5709             ## the line ending actively in use and then for the platform's
5710             ## "\n". These may be the same, but are not necessarily always
5711             ## the same. We don't return-map any old \x0D or \x0A because,
5712             ## for example in a pure Unix world, \x0D would not be interpreted
5713             ## as line-ending-related and hence is a valid character in a
5714             ## field.
5715              
5716 8 50       137 my $ReturnMapCheck = qr{(?:\Q$LineEnding\E)|(?:\n)} if $ReturnMap;
5717              
5718             ## Print out the header row that lists the field names in order.
5719            
5720 8 50       29 if ($HeaderRow)
5721             {
5722 33         58 my $Line = ($LineStartQuote .
5723             join($Delim,
5724             map
5725             {
5726             ## Quote any " character as ""
5727 8         21 (my $X = $_) =~ s/\"/\"\"/g;
5728            
5729             ## In QuoteCheck mode _QuoteFields =>
5730             ## undef ("auto"): put quotes around
5731             ## field only if required.
5732            
5733 33 50 33     256 $X = "\"$X\"" if $QuoteCheck && $X =~ $QuoteOrDelimCheck;
5734            
5735             ## Convert returns back to \x0B
5736 33 50       140 $X =~ s/$ReturnMapCheck/$ReturnEncoding/g if $ReturnMap;
5737            
5738 33         92 $X;
5739             } @$FieldList) .
5740             $LineEndQuote .
5741             $LineEnding);
5742            
5743             ## Maybe convert entire line (all records) ISO to Mac before writing it.
5744 8 100       48 &ISORoman8859_1ToMacRoman(\ $Line) if $DoMacMapping;
5745            
5746 8 50       74 $OutFile->print($Line) if $HeaderRow;
5747             }
5748            
5749             ## Print out each row (record). Fields are output in $FieldList
5750             ## order (same order as they were in the header row, if any).
5751             ## Records are printed in the order specified in $Selection.
5752            
5753 8         143 my $WroteProg;
5754 8         18 my $TotalLen = @$Selection+0;
5755 8 50       23 my $RecordsToWrite = ($MaxRecords ? min($MaxRecords, $TotalLen) : $TotalLen);
5756 8         14 my $RecordsWritten = 0;
5757              
5758 8         20 foreach my $i (@$Selection)
5759             {
5760 99         321 my $Line = ($LineStartQuote .
5761             join($Delim,
5762             map
5763             {
5764             ## Quote any " character as ""
5765 24         52 (my $X = $_->[$i]) =~ s/\"/\"\"/g;
5766            
5767             ## In QuoteCheck mode _QuoteFields =>
5768             ## undef ("auto"): put quotes around
5769             ## field only if required.
5770            
5771 99 50 33     645 $X = "\"$X\"" if $QuoteCheck && $X =~ $QuoteOrDelimCheck;
5772            
5773             ## Convert returns back to \x0B
5774 99 50       371 $X =~ s/$ReturnMapCheck/$ReturnEncoding/g if $ReturnMap;
5775            
5776 99         219 $X;
5777             } @$Columns) .
5778             $LineEndQuote .
5779             $LineEnding);
5780            
5781             ## Maybe convert entire line (all records) ISO to Mac before writing it.
5782 24 100       78 &ISORoman8859_1ToMacRoman(\ $Line) if $DoMacMapping;
5783            
5784 24         93 $OutFile->print($Line);
5785            
5786 24         120 $RecordsWritten++;
5787              
5788             ## Try doing timed (throttled to 1 per 2 secs) progress at
5789             ## most every 100th record.
5790 24 50       193 my $Did = $this->progress_timed("Writing", $RecordsWritten, $RecordsWritten, $RecordsToWrite, 1)
5791             if (($RecordsWritten % 100) == 0);
5792 24   33     98 $WroteProg ||= $Did;
5793            
5794             ## Stop if we have written all the records we wanted.
5795 24 100       77 last if ($RecordsWritten >= $RecordsToWrite);
5796             }
5797            
5798             ## If we wrote timed progress but didn't get to give the 100%
5799             ## message yet, print the 100% message now.
5800 8 50       26 if ($WroteProg)
5801             {
5802 0 0       0 my $FinalProg = $this->progress_timed("Writing", $RecordsWritten, $RecordsWritten, $RecordsToWrite, 1)
5803             unless (($RecordsWritten % 100) == 0);
5804             }
5805            
5806             ## Print the regular Done message.
5807 8 50       20 if ($GotHandle)
5808             {
5809 0         0 $this->progress("Done writing.");
5810 0         0 $Success = 1;
5811             }
5812             else
5813             {
5814 8         34 $this->progress("Wrote $WriteFileName.");
5815             }
5816              
5817 8 50       27 if (!$GotHandle)
5818             {
5819             ## Close the file and check the exit code.
5820 8         58 $OutFile->close();
5821 8         739 $Success = (($?>>8) == 0);
5822            
5823 8 50       31 $this->{_ErrorMsg} = "Unexpected failure writing $WriteFileName ($!)", goto done
5824             unless $Success;
5825             }
5826              
5827             done:
5828 8 50       23 $this->warn("FAILURE: $this->{_ErrorMsg}") unless $Success;
5829              
5830 8 50       191 return($Success ? $WriteFileName : undef);
5831             }
5832              
5833             sub write_file_name ## Calculate the name of the file to be written.
5834             {
5835 8     8 0 14 my $this = shift;
5836 8         17 my ($FileName, $WriteExtension) = @_;
5837              
5838             ## Break the path into its parts...
5839 29     29   259 use File::Basename qw(fileparse);
  29         69  
  29         225715  
5840 8         416 my ($Basename, $Path, $Ext) = fileparse($FileName, '\.[^\.]+');
5841              
5842             ## If no directory is explicitly named, set $Path to be the
5843             ## implicit "current directory" (e.g. "./")
5844            
5845 8         18 my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
  8         34  
5846 8   33     31 $Path ||= $Cur;
5847            
5848             ## If $WriteExtension is empty, which is allowed, then the result
5849             ## will be the same as $FileName, which could in some cases result
5850             ## in the overwriting of the same file that was read in (and may
5851             ## be what is intended).
5852              
5853 8         26 my $WriteFileName = "$Path$Basename$WriteExtension$Ext";
5854              
5855 8         22 return($WriteFileName);
5856             }
5857              
5858             sub out
5859             {
5860 0     0 0 0 my $this = shift;
5861 0         0 my $Dest = shift;
5862              
5863 0         0 my $Success;
5864              
5865             ## First do the formatting (or fail) -- format() will warn if needed.
5866 0 0       0 my $Data = $this->format(@_) or goto done;
5867              
5868             ## If $Dest is empty or not defined, use STDOUT.
5869 0   0     0 $Dest ||= \*STDOUT;
5870            
5871             ## If given an IO handle (such as \*STDERR), bless and use it.
5872 0 0       0 if (ref($Dest) eq 'HANDLE') {$Dest = bless($Dest , 'IO::File')};
  0         0  
5873            
5874             ## If it is not an object, treat it as a file name to be opened.
5875 0 0 0     0 if (!ref($Dest)) {$Dest = (IO::File->new(">$Dest") or
  0         0  
5876             $this->warn("Can't open file $Dest: $!"), goto done)};
5877              
5878             ## At this point treat $Dest as an object with a print method and
5879             ## complain if the print method doesn't return a true value.
5880            
5881 0 0       0 $Dest->print($$Data) or $this->warn("Had trouble writing file: $!"), goto done;
5882            
5883 0         0 $Success = 1;
5884 0         0 done:
5885             return($Success);
5886             }
5887              
5888             sub format ## use Data::ShowTable to format the table in a pretty way.
5889             {
5890 1     1 1 9 my $this = shift;
5891 1 50       5 my $Params = (@_ == 1 ? {_MaxRecords => $_[0]} : {@_});
5892              
5893 1         2 my($Selection, $FieldList, $SortSpecs, $DefaultSortType, $MaxRecords, $MaxWidth) = map {$this->getparam($Params, $_)}
  6         14  
5894             qw(_Selection _FieldList _SortSpecs _DefaultSortType _MaxRecords _MaxWidth);
5895              
5896             ## This method relies on Data::ShowTable.
5897 1 50       6 $this->warn("@{[__PACKAGE__]}::show() requires optional Data::ShowTable module."), goto done
  1         9  
5898             unless $HaveShowTable;
5899            
5900 0   0     0 $FieldList ||= $this->fieldlist();
5901 0   0     0 $Selection ||= $this->selection();
5902 0   0     0 $SortSpecs ||= {};
5903 0 0       0 $DefaultSortType = 'String' unless (length($DefaultSortType));
5904 0   0     0 $MaxRecords ||= 0; ## Default is no maximum (all records).
5905 0   0     0 $MaxWidth ||= 15; ## Zero or undef means use default.
5906 0         0 $MaxWidth = max(2, $MaxWidth); ## MaxWidth must not be less than 2
5907            
5908 0         0 my $TypeMap = {qw(string char
5909             text text
5910             integer int
5911             number numeric
5912             boolean int)};
5913              
5914 0 0 0     0 my $Types = [map {$TypeMap->{lc(@{$SortSpecs->{$_} || {}}{SortType} ||
  0         0  
5915             $DefaultSortType) } || 'string'}
5916             @$FieldList];
5917              
5918 0         0 my $TotalLen = @$Selection+0;
5919 0 0       0 my $RecordsToWrite = ($MaxRecords ? min($MaxRecords, $TotalLen) : $TotalLen);
5920              
5921             ## The row-yielder subroutine and its private state variables.
5922 0         0 my $SelNum = 0;
5923             my $RowSub = sub ## A closure over the local vars in this subroutine.
5924             {
5925             ## We might be asked to rewind.
5926 0     0   0 my ($Rewind) = @_;
5927 0 0       0 $SelNum = 0, return(1) if ($Rewind);
5928            
5929             ## Done if we've written all rows.
5930 0 0       0 return() if $SelNum >= $RecordsToWrite;
5931            
5932             ## Otherwise, yield a row if we still can.
5933 0 0       0 my $List = [map
5934             {
5935             ## Truncate as needed.
5936 0         0 my $X = (length > $MaxWidth ?
5937             (substr($_, 0, ($MaxWidth - 1)) . '>') : $_);
5938              
5939             ## Encode returns, tabs as carets.
5940 0         0 $X =~ s{(?:\x0D\x0A)|[\x0D\x0A\x09]}{^}g;
5941            
5942 0         0 $X;
5943             }
5944 0         0 @{$this->row_list($Selection->[$SelNum++], $FieldList)}];
5945            
5946 0         0 return(@$List);
5947 0         0 };
5948              
5949             ## Locally replace put() and out() in Data::ShowTable so we can
5950             ## gather the data into memory instead of having it go right out
5951             ## to STDOUT before we may want it to.
5952              
5953             ## Too bad Data::ShowTable is not a subclassable object instead.
5954              
5955 0         0 my $Data = [""]; ## Array to hold output from out() and put()
5956             {
5957 0         0 local *{Data::ShowTable::out} = sub ## See sub out in ShowTable.pm
5958             {
5959 0     0   0 my $fmt = shift;
5960 0 0       0 $fmt .= "\n" unless $fmt =~ /\n$/;
5961 0         0 $Data->[-1] .= sprintf($fmt, @_);
5962 0         0 push @$Data, "";
5963 0         0 };
5964            
5965             local *{Data::ShowTable::put} = sub ## See sub put in ShowTable.pm
5966             {
5967 0     0   0 my $fmt = shift();
5968 0         0 $Data->[-1] .= sprintf($fmt, @_);
5969 0         0 };
5970            
5971 0         0 &ShowBoxTable({titles => $FieldList,
5972             types => $Types,
5973             row_sub => $RowSub,
5974             widths => [], ## Will calculate from the data
5975             });
5976             }
5977            
5978             ## Remove spurious extra newline entries at end of $Data
5979 0 0       0 pop @$Data if $Data->[-1] =~ /^\s*$/s;
5980 0 0       0 pop @$Data if $Data->[-1] =~ /^\s*$/s;
5981              
5982 0         0 my $Formatted = join("", @$Data);
5983              
5984 1         5 done:
5985             return(\ $Formatted);
5986             }
5987              
5988              
5989             =pod
5990              
5991             =head1 APPENDING / MERGING / JOINING TABLES
5992              
5993             ## Append all records from a second table
5994              
5995             $t->append($Other) ## Append records from $Other
5996             $t->append_file($File, $Params) ## Append from new($Params, $File)
5997             $t->append_files($Files, $Params) ## Call append_file for all files
5998             $t->append_files_new($Files, $Params) ## Internal helper routine
5999              
6000             ## Combine all fields from a second table
6001              
6002             $t->combine($Other) ## Combine fields from $Other
6003             $t->combine_file($File, $Params) ## Combine new($Params, $File)
6004             $t->combine_files($Files, $Params) ## combine_file on each file
6005              
6006             ## Left-join records from a second table (lookup field vals)
6007              
6008             $t->join ($Other, $KeyField1, [$KeyField2, $Fields])
6009             $t->join_file ($File, $Params, $KeyField1, [$KeyField2, $Fields])
6010             $t->join_files($Files, $Params, $KeyField1, [$KeyField2, $Fields])
6011              
6012             The append() method concatenates all the records from two CTable
6013             objects together -- even if the two tables didn't start out with
6014             exactly the same fields (or even any of the same fields).
6015              
6016             It takes all the data records from another CTable object and appends
6017             them into the present table. Any columns present in the $Other table
6018             but not in the first table, are created (and the corresponding field
6019             values in the first table will all be empty/undef). Similarly, any
6020             columns present in $t but not present in $Other will be extended
6021             to the correct new length as necessary and the field values in the
6022             original columns will be empty/undef. Columns present in both will,
6023             of course, have all the data from both the original sets of data.
6024              
6025             All data from the second table is brought into the first one. No
6026             attempt whatsoever is made to eliminate any duplicate records that
6027             might result.
6028              
6029             The number of records (length()) after this call is the sum of the
6030             length() of each of the tables before the operation.
6031              
6032             IMPORTANT NOTE: The data from the $Other object is COPIED in memory
6033             into the new object. This could be hard on memory if $Other is big.
6034             Might want to be sure to discard $Other when you're done with it.
6035              
6036             $Other is left untouched by the operation.
6037              
6038             All columns from both tables are combined whether or not they are
6039             mentioned in the custom field list of either.
6040              
6041             The custom field lists, if present in either table object, are
6042             concatenated into this object's custom field list, but with
6043             duplications eliminated, and order retained.
6044              
6045             Any existing custom selections, custom sort order, sort specs, and/or
6046             sort routines are also combined appropriately, with settings from this
6047             object taking precedence over those from $Other anywhere the two have
6048             conflicting settings.
6049              
6050             append_file() takes a file name and optional $Params hash. It uses
6051             those to create a new() object with data read from the file. Then,
6052             the new table is appended to $t using append() and then the new table
6053             is discarded.
6054              
6055             append_files() is a convenience function that calls append_file() on
6056             each file in a list, using the same optional $Params for each.
6057              
6058             append_files_new() is the internal routine that implements the
6059             processing done by new() on the optional list of files to be read. It
6060             does the following: It calls read() on the first file in the list.
6061             Then, it calls append_files() to read the remaining into their own
6062             new() objects of the same class as $t and using the same $Params to
6063             new() (if any were supplied). Then each of these is append()-ed in
6064             turn to $t and discarded. The final result will be that $t will hold
6065             a concatenation of all the data in all the files mentioned. However,
6066             consistent with the behavior of append(), the _FileName parameter and
6067             other read()-controlled settings will correspond to the first file
6068             read. The intermediate objects are discarded.
6069              
6070             NOTE: As with new() and read(), if a non-empty _FieldList Param is
6071             specified, the read() methods called internally by the append_file*()
6072             methods will read only the fields mentioned and will ignore any other
6073             fields in the files.
6074              
6075             =head2 Combining tables
6076              
6077             combine() adds columns from a second table into the current one.
6078              
6079             CAUTION: You should only use combine() when you have two tables where
6080             all the (possibly selected) records in the second table line up
6081             perfectly with all the (unselected) records in the first table -- in
6082             other words, each table before combine() should contain a few of the
6083             columns of the new table -- for example, maybe one table contains a
6084             column of file names, and the other contains columns of corresponding
6085             file sizes and modification times. If you don't understand the
6086             consequences of combine, don't use it or you could end up with some
6087             records whose field values don't refer to the same object. (Maybe you
6088             meant to use append() or join() instead.)
6089              
6090             If the second table has a custom field list, only those columns are
6091             brought in.
6092              
6093             If any column in the second table has the same name as one in the
6094             current table, the incoming column replaces the one by the same name.
6095              
6096             All columns are COPIED from the second table, so the first table owns
6097             the new data exclusively.
6098              
6099             If the second table has a selection, only those records are copied, in
6100             selection order. (select_all() first if that's not what you want.)
6101              
6102             The selection in the first table, if any, is ignored during the
6103             combine. If this isn't what you want, then consider using cull()
6104             before combine().
6105              
6106             Field list and sort order are concatenated (but retaining uniqueness:
6107             second mentions of a field in the combined lists are omitted).
6108              
6109             Custom sort routines and sort specs are combined, with those in the
6110             first table taking precedence over any copied in with the same name.
6111              
6112             The custom _Selection from the first table, if any, is retained. (It
6113             will initially omit any records added by extend()).
6114              
6115             All other parameters from the first table are retained, and from the
6116             second table are ignored.
6117              
6118             combine() calls extend() after combining to ensure that all columns
6119             have the same length: if either the older or newer columns were
6120             shorter, they will all be set to the length of the longest columns in
6121             the table -- creating some empty field values at the end of the
6122             lengthened columns.
6123              
6124             combine_file() does the same as combine() except starting with a file
6125             name, first creating the $Other object by creating it using
6126             new($Params, $File), then discarding it after combining.
6127              
6128             =head2 Joining tables (Looking up data from another table)
6129              
6130             join() looks up field values from a second table, based on common
6131             values in key fields which may have different or the same names in
6132             each table. It adds columns to the current table if necessary to hold
6133             any new field values that must be brought in.
6134              
6135             join() never adds any new or non-matching records to the table:
6136             records where the lookup fails will simply have empty/undef values in
6137             the corresponding columns.
6138              
6139             ## Example:
6140              
6141             $t->join ($People, 'FullName', 'FirstAndLast'); ## or
6142             $t->join_file("People.txt", {}, 'FullName', 'FirstAndLast');
6143              
6144              
6145             Here's how join() calculates the list of fields to bring in:
6146              
6147             - Legal field names from the optional $Fields list, if supplied
6148             - Otherwise, the fieldlist() from second table
6149             - ... minus any fields with same name as $KeyField1 or $KeyField2
6150              
6151             Join starts by adding new empty columns in the first table for any
6152             field to be brought in from the second but not yet present in the
6153             first.
6154              
6155             Here's how join() calculates the records eligible for lookup:
6156              
6157             - Join only modifies the selected records in the first table
6158             - Join only looks up values from selected records in second table
6159              
6160             (If you want all records to be used in both or either table, call the
6161             table's select_all() method before calling join().)
6162              
6163             Then, for every selected record in $t (using the example above), join
6164             examines the FullName field ($KeyField1), and looks up a corresponding
6165             entry (must be 'eq') in the FirstAndLast field ($KeyField2) in the
6166             second table.
6167              
6168             IMPORTANT NOTE ABOUT KEY LENGTH: To speed lookup, hash-based indices
6169             are made. The strings in $Key1 and $Key2 fields should not be so long
6170             that the hash lookups bog down or things could get ugly fast. There
6171             is no fixed limit to hash key length in Perl, but fewer than 128
6172             characters in length is longer than customary for such things. (Many
6173             systems require text-based keys to be no longer than 31 characters.)
6174             So be judicious about the values in $Key1 and $Key2 fields.
6175              
6176             The first record found in the second table's selection with a matching
6177             value in the key field is then copied over (but only the appropriate
6178             fields are copied, as explained above). Any field values being
6179             brought over will REPLACE corresponding field values in the first
6180             table, possibly overwriting any previous values if the field being
6181             looked up was already present in the first table and contained data.
6182              
6183             The first table's _FieldList is updated to reflect new fields added.
6184              
6185             Its _Selection is untouched.
6186              
6187             Its _SortOrder is untouched.
6188              
6189             Its _SortSpecs are augmented to include any entries from the second
6190             table that should be brought over due to the field additions.
6191              
6192             Its _SRoutines are augmented to add new ones from the second table.
6193              
6194             All other parameters of table 1 are untouched.
6195              
6196             The second table is not modified. No data structures will be shared
6197             between the tables. Data is only copied.
6198              
6199             join_file() calls join() after creating a seond table from your $File.
6200              
6201             join_files() calls join_file() repeatedly for each file in a list, but
6202             it is important to note that each file in the list of files to be
6203             joined must have a $Key2 field -- AND, that any values looked up from
6204             the second file will overwrite any values of the same key found in the
6205             first file, and so on. You probably will not ever need join_files().
6206             It is mainly here for completeness.
6207              
6208             =cut
6209            
6210             {}; ## Get emacs to indent correctly.
6211              
6212             sub append ## ($this, $OtherCTable)
6213             {
6214 10     10 0 44 my $this = shift;
6215 10         14 my ($that) = @_;
6216              
6217 10         10 my $Success;
6218              
6219             ## Get all fields in $this, but only selected ones in $that
6220 10         26 my $ThisFieldsAll = $this->fieldlist_all();
6221 10         25 my $ThatFields = $that->fieldlist();
6222              
6223             ## Figure out how many data fields in each of the tables.
6224 10         20 my $ThisFieldCount = @$ThisFieldsAll+0;
6225 10         18 my $ThatFieldCount = @$ThatFields +0;
6226              
6227             ## We're going to bring over only the selected records in $that.
6228 10         24 my $ThisSel = $this->selection();
6229 10         27 my $ThatSel = $that->selection();
6230              
6231             ## Figure out how many records there were to start with.
6232 10         23 my $ThisLength = $this->length();
6233 10         17 my $ThatLength = @$ThatSel+0;
6234              
6235             ## New record count is sum of the other two.
6236 10         16 my $NewLength = $ThisLength + $ThatLength;
6237              
6238             ## Create any missing columns not yet present in $this and,
6239             ## whether new or not, presize all vectors to the new length,
6240             ## which will create empty/undef entries as necessary.
6241              
6242 10   50     26 foreach (@$ThisFieldsAll, @$ThatFields) {$#{$this->{$_} ||= []} = ($NewLength - 1)};
  72         83  
  72         270  
6243              
6244             ## Then copy the field data from the second table into the already
6245             ## pre-sized columns in this one.
6246              
6247 10         23 foreach my $FieldName (@$ThatFields)
6248             {
6249 34         47 my $NewVector = $this->{$FieldName};
6250 34         679 my $OldVector = $that->sel_get($FieldName, $ThatSel);
6251            
6252 34         75 foreach my $RecordNum (0..$#$OldVector)
  86         227  
6253             {($NewVector->[$ThisLength + $RecordNum] =
6254             $OldVector->[ $RecordNum])};
6255             }
6256              
6257             ## Now all the data columns have been combined. We just have to
6258             ## combine any custom metadata.
6259            
6260             ## If either table had a custom fieldlist, then make a new custom
6261             ## field list which is the result of concatenating both field
6262             ## lists together, without duplicates, and of course preserving
6263             ## the original order as completely as possible (with the order
6264             ## given in the first table taking precedence).
6265              
6266 10 50 33     42 if (defined($this->{_FieldList}) ||
6267             defined($that->{_FieldList}))
6268            
6269             {
6270 10         26 my $ThisFields = $this->fieldlist();
6271 10         25 my $ThatFields = $that->fieldlist();
6272              
6273             ## Make a hash mapping field names from both tables to the
6274             ## order they should appear
6275              
6276 10         16 my $FieldOrderHash = {};
6277 10   66     25 foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} ||= (keys %$FieldOrderHash) + 1};
  69         285  
6278              
6279 10         42 my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
  44         82  
6280            
6281 10         41 $this->{_FieldList} = $FieldList;
6282             }
6283              
6284             ## If either table had a custom sortorder, then make a new custom
6285             ## sort order which is the result of concatenating both orders
6286             ## together, without duplicates, and of course preserving the
6287             ## original order as completely as possible (with the order given
6288             ## in the first table taking precedence).
6289            
6290 10 50 33     59 if (defined($this->{_SortOrder}) ||
6291             defined($that->{_SortOrder}))
6292             {
6293 0         0 my $ThisOrder = $this->sortorder();
6294 0         0 my $ThatOrder = $that->sortorder();
6295              
6296             ## Make a hash mapping field names from both lists to the
6297             ## order they should appear
6298              
6299 0         0 my $OrderHash = {};
6300 0   0     0 foreach (@$ThisOrder, @$ThatOrder) {$OrderHash->{$_} ||= (keys %$OrderHash) + 1};
  0         0  
6301            
6302 0         0 my $OrderList = [sort {$OrderHash->{$a} <=> $OrderHash->{$b}} keys %$OrderHash];
  0         0  
6303            
6304 0         0 $this->{_SortOrder} = $OrderList;
6305             }
6306              
6307             ## If either table had a custom selection, then create a new
6308             ## selection which is the concatenation of the two selections.
6309              
6310 10 100 100     3393 if (defined($this->{_Selection}) ||
6311             defined($that->{_Selection}))
6312             {
6313 3         10 $this->{_Selection} = [@$ThisSel, ## Original selected records...
6314             ## Plus an adjusted entry for newly-added ones.
6315             ($ThisLength .. ($ThisLength + @$ThatSel - 1))
6316             ];
6317             }
6318              
6319             ## If either table had custom sortspecs, then create a new
6320             ## sortspecs hash by starting with all the entries from the other
6321             ## table and adding/overwriting with those from this table.
6322              
6323 10 50 33     64 if (defined($this->{_SortSpecs}) ||
6324             defined($that->{_SortSpecs}))
6325             {
6326 10         39 my $ThisSpecs = $this->sortspecs();
6327 10         25 my $ThatSpecs = $that->sortspecs();
6328              
6329 10         87 $this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6330             }
6331              
6332             ## If either table had custom sortroutines, then create a new
6333             ## sortroutines hash by starting with all the entries from the
6334             ## other table and adding/overwriting with those from this table.
6335              
6336 10 50 33     44 if (defined($this->{_SRoutines}) ||
6337             defined($that->{_SRoutines}))
6338             {
6339 10   50     31 my $ThisRoutines = $this->{_SRoutines} || {};
6340 10   50     26 my $ThatRoutines = $that->{_SRoutines} || {};
6341              
6342 10         29 $this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6343             }
6344              
6345             ## All other settings are kept from $this and those from $that are
6346             ## ignored.
6347              
6348 10         17 $Success = 1;
6349 10         48 done:
6350             return($Success);
6351             }
6352              
6353             sub append_file
6354             {
6355 2     2 0 3 my $this = shift;
6356 2         4 my ($FileName, $Params) = @_;
6357              
6358 2         3 my $Success;
6359              
6360             ## $Params argument is optional. If supplied, it must be a hash.
6361 2   100     10 $Params ||= {};
6362              
6363             ## Create a new empty table object of the same class as $this and
6364             ## read just the specified file into it. (note: this could be a
6365             ## recursive call here).
6366              
6367 2 50       16 my $that = ref($this)->new($Params, $FileName) or goto done;
6368            
6369             ## Append the data from $that table into this one.
6370              
6371 2 50       8 $this->append($that) or goto done;
6372              
6373 2         4 $Success = 1;
6374 2         31 done:
6375             return($Success);
6376             }
6377              
6378             sub append_files
6379             {
6380 126     126 0 214 my $this = shift;
6381 126         318 my ($FileNames, $Params) = @_;
6382            
6383 126         168 my $Success;
6384            
6385 126         350 foreach my $FileName (@$FileNames)
6386             {
6387 2 50       7 goto done unless $this->append_file($FileName, $Params);
6388             }
6389            
6390 126         211 $Success = 1;
6391 126         483 done:
6392             return($Success);
6393             }
6394              
6395             sub append_files_new ## Called by new() to process its file name args.
6396             {
6397 126     126 0 240 my $this = shift;
6398 126         204 my ($FileNames, $Params) = @_;
6399              
6400 126         165 my $Success;
6401              
6402             ## First we read the first file, if any, into this object using
6403             ## the read() method.
6404              
6405 126         228 my $FirstFile = shift @$FileNames;
6406 126 100       415 if (defined($FirstFile))
6407             {
6408 74 100       122 goto done unless $this->read(%{$Params || {}}, _FileName => $FirstFile);
  74 50       707  
6409             }
6410            
6411 126 50       522 goto done unless $this->append_files($FileNames, $Params);
6412              
6413 126         407 $Success = 1;
6414 126         1970 done:
6415             return($Success);
6416             }
6417              
6418             sub combine
6419             {
6420 6     6 0 17 my $this = shift;
6421 6         12 my ($that) = @_;
6422              
6423 6         9 my $Success;
6424              
6425             ## Get a snapshot of field lists before any modifications.
6426 6         17 my $ThisFields = $this->fieldlist();
6427 6         19 my $ThatFields = $that->fieldlist();
6428              
6429             ## Bring in all (listed) fields from other table.
6430 6         14 my $IncomingFields = $ThatFields;
6431              
6432             ## Preserve any previous non-selection and force one to be saved
6433             ## in the interim. This will prevent $that->sel() from
6434             ## recalculating the selection each time if there is none.
6435 6         15 my $OldSel = $that->{_Selection};
6436 6         20 $that->{_Selection} = $that->selection(); ## Might be a no-op
6437            
6438             ## Copy columns from $that in selection order; (re)place into $this
6439 6         20 foreach (@$IncomingFields) {$this->col_set($_, $that->sel($_))};
  12         50  
6440            
6441             ## Restore the possibly-undef selection in other table.
6442 6         20 $that->{_Selection} = $OldSel; ## Might be a no-op
6443            
6444             ## Extend any short columns (whether originating from other table
6445             ## or from this one) to be the same length as all others.
6446            
6447 6         25 $this->extend();
6448              
6449             ## If either table had a custom fieldlist, then make a new custom
6450             ## field list which is the result of concatenating both field
6451             ## lists together, without duplicates, and of course preserving
6452             ## the original order as completely as possible (with the order
6453             ## given in the first table taking precedence).
6454            
6455 6 50 33     28 if (defined($this->{_FieldList}) ||
6456             defined($that->{_FieldList}))
6457             {
6458             ## Make a hash mapping field names from both tables to the
6459             ## order they should appear
6460 6         12 my $FieldOrderHash = {};
6461 6   66     16 foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} ||= (keys %$FieldOrderHash) + 1};
  47         165  
6462            
6463 6         28 my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
  49         84  
6464              
6465 6         25 $this->{_FieldList} = $FieldList;
6466             }
6467              
6468             ## If either table had a custom sortorder, then make a new custom
6469             ## sort order which is the result of concatenating both orders
6470             ## together, without duplicates, and of course preserving the
6471             ## original order as completely as possible (with the order given
6472             ## in the first table taking precedence).
6473            
6474 6 50 33     37 if (defined($this->{_SortOrder}) ||
6475             defined($that->{_SortOrder}))
6476             {
6477 0         0 my $ThisOrder = $this->sortorder();
6478 0         0 my $ThatOrder = $that->sortorder();
6479              
6480             ## Make a hash mapping field names from both lists to the
6481             ## order they should appear
6482              
6483 0         0 my $OrderHash = {};
6484 0   0     0 foreach (@$ThisOrder, @$ThatOrder) {$OrderHash->{$_} ||= (keys %$OrderHash) + 1};
  0         0  
6485            
6486 0         0 my $OrderList = [sort {$OrderHash->{$a} <=> $OrderHash->{$b}} keys %$OrderHash];
  0         0  
6487            
6488 0         0 $this->{_SortOrder} = $OrderList;
6489             }
6490              
6491             ## If either table had custom sortspecs, then create a new
6492             ## sortspecs hash by starting with all the entries from the other
6493             ## table and adding/overwriting with those from this table.
6494              
6495 6 50 33     1165 if (defined($this->{_SortSpecs}) ||
6496             defined($that->{_SortSpecs}))
6497             {
6498 6         28 my $ThisSpecs = $this->sortspecs();
6499 6         18 my $ThatSpecs = $that->sortspecs();
6500              
6501 6         24 $this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6502             }
6503              
6504             ## If either table had custom sortroutines, then create a new
6505             ## sortroutines hash by starting with all the entries from the
6506             ## other table and adding/overwriting with those from this table.
6507              
6508 6 50 33     27 if (defined($this->{_SRoutines}) ||
6509             defined($that->{_SRoutines}))
6510             {
6511 6   50     21 my $ThisRoutines = $this->{_SRoutines} || {};
6512 6   50     17 my $ThatRoutines = $that->{_SRoutines} || {};
6513              
6514 6         18 $this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6515             }
6516              
6517             ## All other settings are kept from $this and those from $that are
6518             ## ignored.
6519              
6520 6         10 $Success = 1;
6521 6         30 done:
6522             return($Success);
6523             }
6524              
6525             sub combine_file
6526             {
6527 5     5 0 52 my $this = shift;
6528 5         10 my ($FileName, $Params) = @_;
6529              
6530 5         7 my $Success;
6531              
6532             ## $Params argument is optional. If supplied, it must be a hash.
6533 5   50     15 $Params ||= {};
6534              
6535             ## Create a new empty table object of the same class as $this and
6536             ## read just the specified file into it.
6537              
6538 5 50       41 my $that = ref($this)->new($Params, $FileName) or goto done;
6539              
6540             ## Combine the data from $that table into this one.
6541              
6542 5 50       29 $this->combine($that) or goto done;
6543              
6544 5         14 $Success = 1;
6545 5         76 done:
6546             return($Success);
6547             }
6548              
6549             sub combine_files
6550             {
6551 1     1 0 15 my $this = shift;
6552 1         1 my ($FileNames, $Params) = @_;
6553            
6554 1         2 my $Success;
6555            
6556 1         3 foreach my $FileName (@$FileNames)
6557             {
6558 1 50       10 goto done unless $this->combine_file($FileName, $Params);
6559             }
6560            
6561 1         3 $Success = 1;
6562 1         5 done:
6563             return($Success);
6564             }
6565              
6566             sub join
6567             {
6568 5     5 0 20 my $this = shift;
6569 5         10 my ($that, $Key1, $Key2, $Fields) = @_;
6570              
6571 5         7 my $Success;
6572              
6573             ## $Key1 is required.
6574 5 50       14 $this->warn("Key1 is required for join()"), goto done
6575             unless ($Key1);
6576              
6577             ## $Key2 defaults to the same as $Key1.
6578 5   33     11 $Key2 ||= $Key1;
6579            
6580             ## The fields we'll be getting can optionally be overridden by
6581             ## caller; otherwise, they're the field list of the other table.
6582 5   33     27 my $IncomingFields = $Fields || $that->fieldlist();
6583            
6584             ## Cull $Key1 and $Key2 from the incoming field list.
6585 5 50       13 $IncomingFields = [grep {($_ ne $Key1) && ($_ ne $Key2)} @$IncomingFields];
  26         222  
6586            
6587             ## Preserve any previous non-selection and force one to be saved
6588             ## in the interim. This will prevent $that->sel() from
6589             ## recalculating the selection each time if there is none.
6590 5         11 my $OldSel = $that->{_Selection};
6591 5         13 $that->{_Selection} = $that->selection(); ## Might be a no-op
6592            
6593             ## Make an index mapping values in $Key2 to record numbers in
6594             ## $that. We reverse the order of insertion into the $Index
6595             ## because we want the items earliest in selection order to have
6596             ## precedence in case keys are not unique as they should be.
6597            
6598 5         8 my $Index = {}; @$Index{reverse @{$that->sel($Key2)}} = reverse @{$that->{_Selection}};
  5         10  
  5         15  
  5         13  
6599              
6600             ## Get a list of record numbers in $this that we'll be copying data into.
6601 5         17 my $Recs1 = $this->selection();
6602              
6603             ## Get a corresponding list of keys we're going to look up.
6604 5         11 my $Key1s = $this->sel($Key1);
6605              
6606             ## The default "record number" in table 2 is $that->length()
6607             ## ... i.e. an invalid record number past the end of the table.
6608             ## This will ensure that failed lookups result in lookups to this
6609             ## illegal record number, correctly producing undef in the
6610             ## corresponding joined fields, whereas looking up "undef" would
6611             ## have produced record number zero.
6612              
6613 5         14 my $DefaultRecNum = $that->length();
6614              
6615             ## Look up @$Key1s in @$Index to get a list of data-source record
6616             ## numbers in $that. Failed lookups map to $DefaultRecNum.
6617            
6618 5 100       90 my $Recs2 = [map {defined() ? $_ : $DefaultRecNum} @$Index{@$Key1s}];
  15         56  
6619              
6620             ## Copy data into selected positions within columns of $this, one
6621             ## column at a time, creating pre-sized columns in $this as
6622             ## necessary (col()). These array slice operations are very, very
6623             ## fast.
6624              
6625 5         13 foreach my $Field (@$IncomingFields)
6626             {
6627 21         34 (@{$this->col($Field)}[@$Recs1] = ## Put values into selected records of $this
  21         40  
6628 21         24 @{$that->col($Field)}[@$Recs2]); ## Get values from looked-up records of $that
6629             }
6630              
6631             ## Restore the possibly-undef selection in other table.
6632 5         11 $that->{_Selection} = $OldSel; ## Might be a no-op
6633              
6634             ## If this table had a custom fieldlist, then make a new custom
6635             ## field list which is the result of concatenating both field
6636             ## lists together, without duplicates, and of course preserving
6637             ## the original order as completely as possible (with the order
6638             ## given in the first table taking precedence).
6639              
6640 5 50       17 if (defined($this->{_FieldList}))
6641             {
6642 5         12 my $ThisFields = $this->fieldlist();
6643 5         7 my $ThatFields = $IncomingFields;
6644              
6645             ## Make a hash mapping field names from both tables to the
6646             ## order they should appear
6647              
6648 5         7 my $FieldOrderHash = {};
6649 5   66     10 foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} ||= (keys %$FieldOrderHash) + 1};
  56         166  
6650              
6651 5         20 my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
  62         79  
6652            
6653 5         22 $this->{_FieldList} = $FieldList;
6654             }
6655              
6656             ## If either table had custom sortspecs, then create a new
6657             ## sortspecs hash by starting with all the entries from the other
6658             ## table and adding/overwriting with those from this table.
6659              
6660 5 50 33     26 if (defined($this->{_SortSpecs}) ||
6661             defined($that->{_SortSpecs}))
6662             {
6663 5         14 my $ThisSpecs = $this->sortspecs();
6664 5         13 my $ThatSpecs = $that->sortspecs();
6665              
6666 5         17 $this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6667             }
6668              
6669             ## If either table had custom sortroutines, then create a new
6670             ## sortroutines hash by starting with all the entries from the
6671             ## other table and adding/overwriting with those from this table.
6672              
6673 5 50 33     19 if (defined($this->{_SRoutines}) ||
6674             defined($that->{_SRoutines}))
6675             {
6676 5   50     26 my $ThisRoutines = $this->{_SRoutines} || {};
6677 5   50     13 my $ThatRoutines = $that->{_SRoutines} || {};
6678              
6679 5         13 $this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6680             }
6681              
6682             ## All other settings are kept from $this and those from $that are
6683             ## ignored.
6684              
6685 5         8 $Success = 1;
6686 5         30 done:
6687             return($Success);
6688             }
6689              
6690             sub join_file
6691             {
6692 0     0 0 0 my $this = shift;
6693 0         0 my ($FileName, $Params, $Key1, $Key2, $Fields) = @_;
6694              
6695 0         0 my $Success;
6696              
6697             ## $Params argument may be undef. If supplied, it must be a hash.
6698 0   0     0 $Params ||= {};
6699              
6700             ## Create a new empty table object of the same class as $this and
6701             ## read just the specified file into it.
6702              
6703 0 0       0 my $that = ref($this)->new($Params, $FileName) or goto done;
6704            
6705             ## Join the data from $that table into this one.
6706              
6707 0 0       0 $this->join($that, $Key1, $Key2, $Fields) or goto done;
6708              
6709 0         0 $Success = 1;
6710 0         0 done:
6711             return($Success);
6712             }
6713              
6714             sub join_files
6715             {
6716 0     0 0 0 my $this = shift;
6717 0         0 my ($FileNames, $Params, $Key1, $Key2, $Fields) = @_;
6718            
6719 0         0 my $Success;
6720            
6721 0         0 foreach my $FileName (@$FileNames)
6722             {
6723 0 0       0 goto done unless $this->join_file($FileName, $Params, $Key1, $Key2, $Fields);
6724             }
6725            
6726 0         0 $Success = 1;
6727 0         0 done:
6728             return($Success);
6729             }
6730              
6731             =pod
6732              
6733             =head1 INVERTING A TABLE'S ROWS/COLUMNS
6734              
6735             ## Re-orient table's data using vals from $ColName as field names...
6736             $t-invert($ColName)
6737              
6738             Sometimes a situation gives you a table that's initially organized
6739             with column data in rows, and field names in one of the columns, so
6740             you need to flip the table in order to be able to work meaningfully
6741             with it.
6742              
6743             "Inverting" a table means to rewrite each row as a column. One row is
6744             designated to be used as the field names.
6745              
6746             For example, consider this table:
6747              
6748             F01 F02 F03 F04
6749             ------------------------
6750             First Chris Agnes James
6751             Last Bart Marco Nixon
6752             Age 22 33 44
6753              
6754             Calling invert() using field names from "F01"...
6755              
6756             $t->invert('F01');
6757              
6758             ... would change the table to look like this:
6759            
6760             First Last Age
6761             ----------------
6762             Chris Bart 22
6763             Agnes Marco 33
6764             James Nixon 44
6765              
6766             The field F01 which formerly contained the field names, is now gone,
6767             and the remaining data columns have been converted from their old row
6768             orientation into a column orientation.
6769              
6770             =cut
6771              
6772             sub invert
6773             {
6774 0     0 0 0 my $this = shift;
6775 0         0 my ($HeaderField) = @_;
6776              
6777 0         0 my $Success;
6778            
6779             ## Get new field names from an existing column and delete it at the same time.
6780 0 0       0 my $NewColNames = $this->col_delete($HeaderField) or
6781             $this->warn("Invalid field name given to invert() method"), goto done;
6782            
6783             ## Get a hash of all existing (remaining) data columns.
6784 0         0 my $OldColNames = $this->fieldlist_all();
6785 0         0 my $OldCols = $this->cols_hash($OldColNames);
6786            
6787             ## Make the new columns...
6788 0         0 my $NewCols = [map {$this->row_list($_, $OldColNames)} (0..$#$NewColNames)];
  0         0  
6789            
6790             ## Delete old columns from the object.
6791 0         0 delete @$this{@$OldColNames};
6792            
6793             ## Add new columns
6794 0         0 @$this{@$NewColNames} = @$NewCols;
6795              
6796             ## Set the field name list...
6797 0         0 $this->{_FieldList} = $NewColNames;
6798              
6799 0         0 $Success = 1;
6800 0         0 done:
6801             return($Success);
6802             }
6803              
6804             =pod
6805              
6806             =head1 PROGRESS MESSAGES
6807              
6808             ## Printing a progress message....
6809            
6810             $t->progress($Msg) ## Print a message per current settings
6811              
6812             ## Progress settings applying to this object only...
6813              
6814             $t->progress_get() ## Get current progress setting
6815              
6816             $t->progress_set(1) ## Use progress_default() method
6817             $t->progress_set($Sub) ## Set a custom progress routine
6818             $t->progress_set(0) ## Disable progress
6819             $t->progress_set(undef) ## Use class's settings (default)...
6820              
6821             ## Class's settings (for instances with _Progress == undef)
6822              
6823             $t->progress_class() ## Get current setting.
6824              
6825             $t->progress_class(1) ## Use progress_default() method
6826             $t->progress_class($Sub) ## Set shared custom prog routine
6827             $t->progress_class(0) ## Disable class-default progress
6828            
6829             Data::CTable->progress_class(..) ## Call without an object
6830              
6831             ## Call builtin default progress method regardless of settings
6832              
6833             $t->progress_default($Msg) ## Default prog. routine for class
6834              
6835             ## Generate a warning (used internally by other methods)
6836              
6837             $t->warn($Msg) ## In this class, calls progress_default()
6838              
6839             ## Timed progress: print msg to start, then at most once/2 sec
6840              
6841             $t->progress_timed($Op, $Item) ## Re-print msg every 2 sec
6842             $t->progress_timed($Op, $Item, $Pos, $Tot) ##... with % readout
6843             $t->progress_timed($Op, $Item, $Pos, $Tot, $Wait) ## Not 1st x
6844              
6845             $t->progress_timed_default($Msg) ## Called by progress_timed
6846              
6847             Data::CTable is especially useful in creating batch-oriented
6848             applications for processing data. As such, routines that may perform
6849             time-consuming tasks will, by default, generate helpful progress
6850             messages. The progress mechanism is highly customizable, however, to
6851             suit the needs of applications that don't require this output, or that
6852             require the output to go somewhere other than STDERR or the console.
6853              
6854             The default progress routine is one that prints a message with a
6855             date/time stamp to STDERR if and only if STDERR is an interactive
6856             terminal, and otherwise is silent.
6857              
6858             You could write a custom progress routine that does something else or
6859             something in addition (e.g. logs to a file or syslog). The custom
6860             routine could either be implemented by overriding the
6861             progress_default() method in a subclass, or by calling progress_set()
6862             in any instance.
6863              
6864             The custom progress routine, if any, is stored in the _Progress
6865             parameter of the object. But use progress_set() and progress_get() to
6866             access it.
6867              
6868             The interface for your custom progress routine should be:
6869              
6870             sub MyProgress {my ($Obj, $Message) = @_; chomp $Message; .....}
6871              
6872             In other words, the routine takes a single message which may or may
6873             not have a trailing newline. It should always chomp the newline if
6874             present, and then do its business... which generally will include
6875             printing or logging a message (usually with a newline added).
6876              
6877             The default, built-in progress routine for Data::CTable is:
6878              
6879             sub progress_default
6880             {
6881             my ($this, $msg) = @_;
6882             chomp $msg;
6883            
6884             print STDERR (localtime() . " $msg\n") if -t STDERR;
6885              
6886             return(1); ## Indicate progress actually completed
6887             }
6888              
6889             Of course, you are free to call this method directly at any time, and
6890             it will do its thing regardless of other progress-enabling settings.
6891             But the preferred way is to first set the settings and then call
6892             progress().
6893              
6894             The warn() method always calls progress_default() -- i.e. warnings
6895             will display even if progress is otherwise disabled or overridden at
6896             the object or class level. However, you could create a subclass that
6897             changes warn()'s behavior if desired. (For example, it could just
6898             call perl's builtin warn function, or be even more forceful,
6899             generating warnings even if STDERR is not a terminal, for example.)
6900              
6901             The progress_set() method may be used to override the progress routine
6902             for an individual object (set to 1/true for default behavior, or
6903             0/undef/false to disable progress for that object entirely).
6904              
6905             Call progress_class() to set similar values to control the global
6906             default behavior (e.g. turning on/off default progress behavior for
6907             all instances), but be cautious about using this approach in any
6908             environment where other programs might be accessing the same loaded
6909             class data, since the setting is stored in a class-owned global
6910             ($Data::CTable::DefaultProgress).
6911              
6912             Manipulating the class-default settings is only recommended in batch
6913             or shell-script environments, not in mod_perl Web applications where
6914             the module stays loaded into the Perl environment across multiple
6915             invocations, for example.
6916              
6917             If you want a particular method (e.g. read() but not write()) to be
6918             silent, you could make a subclass and could override that method with
6919             an implementation that first disables progress, calls the SUPER::
6920             method, and then restores the progress setting to its original
6921             setting.
6922              
6923             =head2 Timed progress
6924              
6925             Timed progress is a way of printing periodically-recurring progress
6926             messages about potentially time-consuming processes to the terminal.
6927              
6928             For example, consider the following messages which might appear every
6929             2 seconds during a lengthy read() operation:
6930              
6931             Reading... 0 (0%)
6932             Reading... 2000 (4%)
6933             ...
6934             Reading... 38000 (96%)
6935             Reading... 40000 (100%)
6936              
6937             The progress_timed() method is called internally by potentially
6938             time-consuming processes (read(), write(), and sort()), and you may
6939             want to call it yourself from your own scripts, to produce
6940             weary-programmer-soothing visual output during otherwise
6941             panic-producing long delays.
6942              
6943             Generally, progress_timed() is called with the $Wait parameter set to
6944             true, which delays the display of any messages until 2 seconds have
6945             passed, so no messages will be displayed unless the process actually
6946             does end up being slower than 2 seconds.
6947              
6948             Parameters are:
6949              
6950             $Op The string that identifies the "operation" taking place
6951             $Item A milestone such as a number or datum to indicate progress
6952             $Pos A numeric position against the (maybe estimated) baseline
6953             $Tot The baseline. If estimated, don't re-estimate too often
6954             $Wait If true, skip printing the first message for this $Op
6955              
6956             All parameters except $Op are optional.
6957              
6958             progress_timed() has a throttle that keeps it from re-triggering more
6959             often than every 2 seconds for any given sequence of the same $Op.
6960             The clock is restarted each time you call it with a different $Op or
6961             $Tot from the previous call (on the assumption that if the operation
6962             or the baseline changes then that fact should be noted).
6963              
6964             The messages printed will start with "$Op... ".
6965              
6966             If you supply $Item, which could be a number or a string, the messages
6967             will then show the $Item after the $Op.
6968              
6969             If you supply BOTH $Pos and $Tot, then a percentage will be calculated
6970             and added to the readout; otherwise omitted.
6971              
6972             If you supply $Wait, the first message (only) that uses this $Op will
6973             be skipped, and the next one won't appear for at least 2 seconds.
6974              
6975             If using $Pos and $Tot to display percentages for your user, be sure
6976             to call progress_timed() one final time when $Pos == $Tot so your user
6977             sees the satisfying 100% milestone. This "completion" call will not
6978             be skipped even if 2 seconds have not passed since the previous timed
6979             progress message was printed.
6980              
6981             Althought progress_timed() is designed to cut down on too much visual
6982             output when called often in a tight loop, remember that it still takes
6983             some processing time to call it and so if you call it too frequently,
6984             you're slowing down the very loop you wish were running faster.
6985              
6986             So, you might want to call it every tenth or 100th or even 1000th time
6987             through a tight loop, instead of every time through, using the mod (%)
6988             operator:
6989              
6990             $t->progress_timed(....) if ($LoopCount % 100) == 0;
6991              
6992             progress_timed_default() is the method called internally by
6993             progress_timed() to actually print the messages it has prepared. In
6994             this implementation, progress_timed_default() just calls
6995             progress_default(). That is, it ignores all other progress-inhibiting
6996             or -enhancing settings so delay-soothing messages will print on the
6997             terminal even if other messages are turned off.
6998              
6999             This is because the author assumes that even if you don't want all
7000             those other progress messages, you might still want these ones that
7001             explain long delays. If you REALLY don't, then just make yourself a
7002             lightweight subclass where progress_timed_default() is a no-op, or
7003             maybe calls regular progress(). For example:
7004              
7005             BEGIN {package Data::CTable::Silent; use vars qw(@ISA);
7006             @ISA=qw(Data::CTable); sub progress_timed_default{}}
7007              
7008             ## Later...
7009             my $t = Data::CTable::Silent->new(...);
7010              
7011              
7012             =cut
7013              
7014             $Data::CTable::DefaultProgress = 1;
7015              
7016             sub progress_set
7017             {
7018 5     5 0 1065 my $this = shift;
7019 5         7 my ($ProgSetting) = @_;
7020            
7021 5         14 $this->{_Progress} = $ProgSetting;
7022             }
7023              
7024             sub progress_class
7025             {
7026 232     232 0 3171 my $Ignored = shift;
7027 232         307 my ($ProgSetting) = @_;
7028              
7029             ## Set if specified...
7030 232 100       1127 $Data::CTable::DefaultProgress = $ProgSetting if defined($ProgSetting);
7031              
7032             ## Return..
7033 232         747 return($Data::CTable::DefaultProgress);
7034             }
7035              
7036             sub progress_get
7037             {
7038 0     0 0 0 my $this = shift;
7039            
7040 0         0 my $ProgSetting = $this->{_Progress};
7041              
7042 0         0 return($ProgSetting);
7043             }
7044              
7045             sub progress_default
7046             {
7047 217     217 0 539 my ($this, $msg) = @_;
7048 217         374 chomp $msg;
7049            
7050 217 50       1823 print STDERR (localtime() . " $msg\n") if -t STDERR;
7051              
7052 217         523 return(1); ## Indicate progress actually completed
7053             }
7054              
7055             sub progress
7056             {
7057 227     227 1 10303 my $this = shift;
7058 227         454 my ($msg) = @_;
7059            
7060 227         590 my $Prog1 = $this->{_Progress}; ## First check object's progress setting
7061 227         589 my $Prog2 = $this->progress_class(); ## Then check class's setting
7062              
7063             ## Calling regular progress resets the timers & ops in
7064             ## progress_timed...
7065              
7066 227         420 delete $this->{_ProgTimeInfo};
7067              
7068             ## First examine object setting to find a progress routine...
7069              
7070 227 100       1018 return(&$Prog1($this, $msg)) if ref($Prog1) eq 'CODE'; ## Code ref: return it.
7071 225 100       501 return($this->progress_default($msg)) if $Prog1; ## true: use default progress.
7072 219 100       470 return(undef) if defined($Prog1); ## false but defined: no progress.
7073            
7074             ## undef: fall through to class settings...
7075            
7076 215 100       527 return(&$Prog2($this, $msg)) if ref($Prog2) eq 'CODE'; ## Code ref: return it.
7077 213 100       859 return($this->progress_default($msg)) if $Prog2; ## true: use default progress.
7078 2         4 return(undef); ## false/undef: no progress.
7079             }
7080              
7081             sub progress_timed
7082             {
7083 214610     214610 0 651323 my $this = shift;
7084 214610         314706 my ($Op, $Item, $Pos, $Tot, $Wait) = @_;
7085              
7086             ## Get params from previous call if any.
7087 214610 100       227196 my ($LastOp, $LastItem, $LastPos, $LastTot, $LastTime) = @{$this->{_ProgTimeInfo} || []};
  214610         660913  
7088              
7089             ## print &Dumper([$Op, $Item, $Pos, $Tot, $Wait], [$LastOp, $LastItem, $LastPos, $LastTot, $LastTime]);
7090              
7091             ## Get elapsed time.
7092 214610         269250 my $Time = time();
7093 214610   66     491210 my $Elapsed = $Time - ($LastTime || $Time);
7094              
7095             ## We're on the "same" operation if the $Op name is the same and
7096             ## the total (baseline) is the same. Otherwise treat as new op.
7097              
7098 214610         281052 my $SameOp = (($Op eq $LastOp));
7099 214610   66     690117 my $SameOpAndTot = ($SameOp && ($Tot == $LastTot));
7100 214610   100     656402 my $Finished = ($Tot && ($Pos == $Tot));
7101              
7102             ## We trigger a message to print if we've been on the same op for
7103             ## 2 seconds or more, OR this is a new op.
7104              
7105 214610   66     1606127 my $Trigger = (($SameOpAndTot && ($Elapsed >= 2)) || ## Yes if same op & time has passed...
7106             ($SameOpAndTot && $Finished) || ## Yes if we're finished (100%).
7107             !$SameOpAndTot); ## Yes if new op
7108            
7109             ## Quit now if nothing to do.
7110 214610 100       536093 goto done unless $Trigger;
7111            
7112             ## Otherwise print message and save details for next time around.
7113 16 100 66     124 my $Percent = sprintf("(%2d\%)", int(($Pos * 100) / $Tot)) if (defined($Pos) && $Tot);
7114              
7115             ## If we've been asked to "wait", we skip actually printing the
7116             ## message this the first time, but act as if we did (starting the timer).
7117            
7118 16 100 100     107 my $RetVal = $this->progress_timed_default("$Op... $Item $Percent")
7119             unless (!$SameOp && $Wait); ## Skip first-time message if $Wait.
7120            
7121 16         59 $this->{_ProgTimeInfo} = [$Op, $Item, $Pos, $Tot, $Time];
7122            
7123 214610         506940 done:
7124             return($RetVal);
7125             }
7126              
7127             sub progress_timed_default
7128             {
7129 5     5 0 52 my $this = shift;
7130 5         8 my ($msg) = @_;
7131              
7132 5         25 return($this->progress_default("$msg"));
7133             }
7134              
7135             sub warn
7136             {
7137 1     1 0 2 my $this = shift;
7138 1         2 my ($msg) = @_;
7139              
7140 1         4 return($this->progress_default("WARNING: $msg"));
7141             }
7142              
7143             =pod
7144              
7145             =head1 Rejecting or reporting on groups of records and continuing
7146              
7147             Use utility methods omit_warn() and omit_note() to conditionally omit
7148             some records from a table and warn (or "note") if any were affected.
7149              
7150             Use select_extract() to do the same thing but without actually
7151             removing the extracted records from the table, and restoring the
7152             original selection before select_extract was called.
7153              
7154             If you supply a file name as the 4th argument, the omitted records
7155             will be extracted to a file for later reference.
7156              
7157             If you supply a message prefix as the 5th argument, a string other
7158             than "WARNING" or "Note" may be specified.
7159              
7160             # Reject with a progress message prefixed by "WARNING:"
7161              
7162             $t->omit_warn(FirstName => sub{!length($_)}, "First name is empty");
7163              
7164             Mon Aug 23 08:24:15 2004 WARNING: Omitting 2 of 78243 records (now 78241): First name is empty.
7165              
7166             # Reject with a progress message prefixed by "Note:", with output to a file
7167              
7168             $t->omit_note(FirstName => sub{!length($_)}, "First name is empty", "empty.names.txt");
7169              
7170             Mon Aug 23 08:24:15 2004 Note: Omitting 2 of 78243 records (now 78241): First name is empty.
7171             Mon Aug 23 08:24:15 2004 Writing bad.firstname.txt...
7172             Mon Aug 23 08:24:15 2004 Wrote bad.firstname.txt.
7173              
7174             # Extract some items, leaving original selection intact
7175              
7176             $t->select_extract(FirstName => sub{!length($_)}, "First name is empty", "empty.names.txt");
7177              
7178             Mon Aug 23 08:24:15 2004 Note: Extracting 2 of 78243 records: First name is empty.
7179              
7180             =cut
7181              
7182             sub omit_warn
7183             {
7184 0     0 0 0 my $this = shift;
7185 0         0 my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList) = @_;
7186              
7187 0 0       0 $MessagePrefix = 'WARNING' if !defined($MessagePrefix);
7188              
7189 0         0 $this->extract_with_message($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, 'DoOmit');
7190             }
7191              
7192             sub omit_note
7193             {
7194 0     0 0 0 my $this = shift;
7195 0         0 my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList) = @_;
7196              
7197 0 0       0 $MessagePrefix = 'Note' if !defined($MessagePrefix);
7198              
7199 0         0 $this->extract_with_message($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, 'DoOmit');
7200             }
7201              
7202             sub select_extract
7203             {
7204 0     0 0 0 my $this = shift;
7205 0         0 my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList) = @_;
7206              
7207 0 0       0 $MessagePrefix = 'Note' if !defined($MessagePrefix);
7208              
7209 0         0 $this->extract_with_message($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, !'DoOmit');
7210             }
7211              
7212             sub extract_with_message
7213             {
7214 0     0 0 0 my $this = shift;
7215 0         0 my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, $DoOmit) = @_;
7216            
7217             ## Find omittable items -- save and restore the selection.
7218              
7219 0         0 my $LengthBefore = $this->sel_len();
7220 0         0 my $SelBefore = $this->selection();
7221              
7222 0         0 $this->select($SelectField, $SelectSub);
7223 0         0 my $OmitCount = $this->sel_len();
7224 0         0 my $SelOmitted = [@{$this->selection()}];
  0         0  
7225              
7226 0         0 $this->selection($SelBefore);
7227            
7228             ## If we have some, perform the omit and report on the omitted ones.
7229              
7230 0 0       0 if ($OmitCount)
7231             {
7232 0         0 my $Sel = $this->col_empty(); ## Start with empty mask (all entries undef).
7233 0         0 @$Sel[@$SelBefore] = @$SelBefore; ## Mask in those in the original selection.
7234 0         0 @$Sel[@$SelOmitted] = undef; ## Mask out those we found.
7235 0         0 my $NewSel = [grep {defined} @$Sel];
  0         0  
7236 0         0 my $LengthAfter = @$NewSel + 0;
7237              
7238             ## Only actually alter the table if requested.
7239 0 0       0 $this->{_Selection} = $NewSel if $DoOmit; ## The remaining ones are the new selection.
7240            
7241 0 0       0 $this->progress("$MessagePrefix: @{[$DoOmit ? 'Omitting' : 'Extracting']} @{[$OmitCount]} of @{[$LengthBefore]} records@{[$DoOmit ? qq{ (now $LengthAfter)} : '']}: $Message");
  0 0       0  
  0         0  
  0         0  
  0         0  
7242 0 0       0 $this->write(_FileName => "$DebugFile",
    0          
7243             _FDelimiter => "\t",
7244             _LineEnding => undef,
7245             _Selection => $SelOmitted,
7246             ($ExtractFieldList ?
7247             (_FieldList => $ExtractFieldList) : ()),
7248             ) if $DebugFile;
7249             }
7250             else
7251             {
7252 0         0 unlink $DebugFile;
7253             }
7254            
7255 0         0 return($OmitCount);
7256             }
7257              
7258             =pod
7259              
7260             =head1 DEBUGGING / DUMPING
7261              
7262             ## Print some debugging output...
7263              
7264             $t->out() ## Pretty-print $t using Data::ShowTable
7265              
7266             $t->dump() ## Dump $t using Data::Dumper
7267             $t->dump($x, $y) ## Dump anything else using Data::Dumper
7268              
7269             ## Print some debugging output and then die.
7270              
7271             die $t->out() ## Same but die afterwards.
7272              
7273             die $t->dump() ## Same but die afterwards.
7274             die $t->dump($x, $y) ## Same but die afterwards.
7275              
7276             These two methods can be very helpful in debugging your scripts.
7277              
7278             The out() method, which has many options, is described in complete
7279             detail in the section below titled "FORMATTED TABLES". In short, it
7280             prints a nicely-formatted diagram of $t, obeying the custom field list
7281             if any and custom selection if any.
7282              
7283             The dump() method uses the Data::Dumper module to call &Dumper() on
7284             the table itself (by default) and prints the result to STDERR. If you
7285             specify any number of other values, those will be dumped instead using
7286             a single call to &Dumper (rather than individually).
7287              
7288             =head2 Optional module dependencies
7289              
7290             These methods require the otherwise-optional modules shown here:
7291              
7292             out() Data::ShowTable
7293             dump() Data::Dumper
7294              
7295             You'll get a warning at runtime if you try to call either method
7296             without the appropriate module installed on your system.
7297              
7298             =cut
7299              
7300             sub dump
7301             {
7302 0     0 0 0 my $this = shift;
7303 0         0 my (@Things) = @_;
7304              
7305             ## Default is to dump the object.
7306 0 0       0 @Things = ($this) unless @Things;
7307              
7308 0 0       0 if ($HaveDumper)
7309             {
7310 0         0 print STDERR &Dumper(@Things);
7311             }
7312             else
7313             {
7314 0         0 carp("Data::Dumper module is not installed. Can't dump.");
7315             }
7316            
7317 0         0 return(1);
7318             }
7319              
7320             =pod
7321              
7322             =head1 MISCELLANEOUS UTILITY METHODS
7323              
7324             The following utilities are methods of the Data::CTable object. They
7325             may be called directly by clients, subclassed, or used by subclass
7326             implementations as needed.
7327              
7328             ## Get cache file path (all args optional: defaulted from $t)
7329              
7330             $f = $t->prep_cache_file($FileName, $CacheExtension, $CacheSubDir)
7331              
7332             ## Verify all directories in a path, creating any needed ones.
7333              
7334             $ok = $t->verify_or_create_path($DirPath, $Sep)
7335              
7336             ## Testing readability / writeability of a proposed file
7337              
7338             $ok = $t->try_file_read ($Path); ## Opens for read; then closes
7339             $ok = $t->try_file_write($Path); ## Opens for write; then deletes
7340              
7341             ## Getting parameters from object with optional overrides
7342              
7343             $param = $t->getparam($Params, $Param)
7344              
7345             prep_cache_file() is the internal method used by both read() and
7346             write() to calculate the name of a cache file to be used for a given
7347             $FileName.
7348              
7349             It calculates the path to the cache file that corresponds to the given
7350             $FileName (which may be a bare file name, a relative path, or a
7351             partial path, as long as it obeys the current platform's path format
7352             rules). All arguments are optional and if absent (undef), will be
7353             defaulted from the corresponding parameters in $t.
7354              
7355             In addition to calculating the path and file name, it also prepends
7356             the "current directory" path if there was no path. Then it checks
7357             that all directories mentioned in the path actually exist. If not, it
7358             fails. Then, it checks that EITHER the file exists and is readable,
7359             OR it does not exist but would be writeable in that directory. If any
7360             of these directory creations or file checks fails, then undef is
7361             returned (and there would be no cache file).
7362              
7363             You may call it with no arguments on a file that has been read() to
7364             find the path to the cache file that may have been used and/or
7365             created, if any.
7366              
7367             You may call it with a file name that was written to, to see what the
7368             corresponding written cache file would be.
7369              
7370             For example:
7371              
7372             ## Get name of cache file used or created by read and delete it.
7373              
7374             $RCache = $t->prep_cache_file() and unlink($RCache);
7375              
7376             ## Cache on write() and get name of file and delete it.
7377              
7378             $Written = $t->write(_CacheOnWrite=>1, _FileName=>"Foo.txt");
7379             $WCache = $t->prep_cache_file($Written) and unlink($WCache);
7380              
7381             verify_or_create_path() is the internal routine used by read(),
7382             write(), and the cache-writing logic, that makes sure a requested file
7383             path exists (by creating it if necessary and possible) before any file
7384             is written by this module.
7385              
7386             (If you don't like this module's tendency to try to create
7387             directories, make yourself a subclass in which this routine simply
7388             checks -d on its $Path argument and returns the result.)
7389              
7390             It must be called with a full or partial path TO A DIRECTORY, NOT A
7391             FILE. You may supply $Sep, a platform-appropriate separator character
7392             (which defaults correctly for the runtime platform if you don't).
7393              
7394             Returns true if the path verification and/or creation ultimately
7395             succeeded, false otherwise (meaning that, after this call, there is no
7396             such directory on the system and so you should not try to write a file
7397             there).
7398              
7399             try_file_read() and try_file_write() are the internal methods called
7400             by prep_cache_file() as well as by read() and write() to preflight
7401             proposed file reading and writing locations.
7402              
7403             try_file_read() opens a file for read and closes it again; returns
7404             true if the open was possible.
7405              
7406             try_file_write() opens a file for write and closes it again, deleting
7407             it if successful. Returns true if the open for write and the delete
7408             were successful. (Be aware that this call will actually delete any
7409             existing file by this name.)
7410              
7411             The reason that failure to delete causes try_file_write() to fail is
7412             that successful cacheing depends on the ability to delete cache files
7413             as well as create them or write to them. A file in a location that
7414             couldn't be deleted will not be used for cacheing.
7415              
7416             getparam() looks up a named parameter in a params hash if it exists
7417             there, otherwise looks it up in the object, thereby allowing $Params
7418             to shadow any parameters in $this.
7419              
7420             This internal routine is used by any methods that allow overriding of
7421             parameters in the object when using a named-parameter calling
7422             interface. It should be used by any subclasses that also wish to use
7423             a named-parameter calling convention. For example:
7424              
7425             my $this = shift;
7426             my $Params = (@_ == 1 ? {_FieldList => $_[0]} : {@_});
7427              
7428             my($FieldList, $Selection) = map {$this->getparam($Params, $_)}
7429             qw(_FieldList _Selection);
7430              
7431             =cut
7432              
7433             {}; ## Get emacs to indent correctly.
7434              
7435             sub prep_cache_file
7436             {
7437 70     70 0 131 my $this = shift;
7438 70         151 my($FileName, $CacheExtension, $CacheSubDir) = @_;
7439            
7440 70         115 my $Success;
7441              
7442 70   66     244 $FileName ||= $this->{_FileName};
7443 70   66     211 $CacheExtension ||= $this->{_CacheExtension};
7444 70   33     555 $CacheSubDir ||= $this->{_CacheSubDir};
7445              
7446             ## Break the path into its parts...
7447 29     29   392 use File::Basename qw(fileparse);
  29         1720  
  29         22418  
7448 70         4287 my ($Basename, $Path, $Ext) = fileparse($FileName, '\.[^\.]+');
7449            
7450             ## Figure out what the path separator should be...
7451 70         177 my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
  70         367  
7452              
7453             ## FileDir is guaranteed to either be empty or have a trailing
7454             ## separator (see: man File::Basename). If empty, we set it to
7455             ## $Cur (the current directory).
7456              
7457 70 50       336 my $FileDir = (length($Path) ? $Path : $Cur);
7458              
7459             ## Ensure $CacheSubDir is either empty or has a trailing separator...
7460              
7461 70         1016 $CacheSubDir =~ s/([^\Q$Sep\E])$/$1$Sep/;
7462              
7463             ## Check whether it's an absolute path (and not really a "sub"-dir)
7464 70         309 my $Absolute = &path_is_absolute($CacheSubDir);
7465              
7466             ## $CacheDir is $FileDir with $CacheSubDir appended. $CacheSubDir
7467             ## may be empty, meaning $FileDir is to be used for the cache
7468             ## files. But if it's an absolute path, it stands alone.
7469              
7470 70 100       303 my $CacheDir = ($Absolute ? $CacheSubDir : "$FileDir$CacheSubDir");
7471              
7472             ## Now we need to make sure that CacheDir exists OR try to create
7473             ## it. (Warning will have already happened if necessary.)
7474 70 50       401 goto done unless $this->verify_or_create_path($CacheDir, $Sep);
7475              
7476             ## Verify that the dir is writeable.
7477 70 50       1880 $this->warn("Cache directory $CacheDir is read-only"), goto done
7478             unless -r $CacheDir;
7479              
7480             ## Full path: note $CacheExtension and $Ext may be empty.
7481 70         230 my $CacheFilePath = "$CacheDir$Basename$Ext$CacheExtension";
7482              
7483             ## If the cache path and the full path end up being the same
7484             ## (probably because the _CacheSubDir and _CacheExtension are both
7485             ## empty), we bail. Obviously, we don't want to risk overwriting
7486             ## the original data with the cache (or trying to read cache data
7487             ## from a text file).
7488              
7489 70 50       210 $this->warn("Can't cache $FileName without either _CacheSubDir or _CacheExtension"), goto done
7490             if $CacheFilePath eq $FileName;
7491            
7492             ## Pre-flight the cache file: ensure we can either read it or
7493             ## touch and then delete it.
7494            
7495 70 50 66     374 $this->warn("Cache file $CacheFilePath cannot be created/overwritten: $!"), goto done
7496             unless ($this->try_file_read ($CacheFilePath) ||
7497             $this->try_file_write($CacheFilePath));
7498            
7499 70         146 $Success = 1;
7500 70 50       825 done:
7501             return($Success ? $CacheFilePath : undef);
7502             }
7503              
7504             sub verify_or_create_path
7505             {
7506 78     78 0 1183 my $this = shift;
7507 78         164 my ($Dir, $Sep) = @_;
7508              
7509             ## Get default value for $Sep if not supplied.
7510 78   33     228 $Sep ||= ${$this->path_info()}{sep};
  0         0  
7511              
7512             ## $Dir might end in $Sep; split strips trailing one if so.
7513 78         2534 my $Parts = [split(/\Q$Sep\E/, $Dir)];
7514              
7515 78         231 my $WholePath = "";
7516 78         281 foreach (@$Parts)
7517             {
7518 533         999 $WholePath = "$WholePath$_$Sep";
7519 533 50       15780 next if -d $WholePath;
7520              
7521             ## Directory does not exist. We need to make it.
7522              
7523             ## mkdir($WholePath, 0777) or
7524             ## $this->warn("Failed to create directory '$WholePath': $!"), last;
7525            
7526             ## On some platforms (e.g. Darwin), perl's mkdir fails if
7527             ## there's a trailing separator. Others tolerate its absence,
7528             ## so we remove it.
7529            
7530 0         0 (my $TryDir = $WholePath) =~ s/\Q$Sep\E$//;
7531            
7532 0 0       0 mkdir($TryDir, 0777) or
7533             $this->warn("Failed to create directory '$TryDir': $!"), last;
7534             }
7535              
7536 78         1973 return(-d $Dir);
7537             }
7538              
7539             sub try_file_write ## like a "touch" but deletes the file if it succeeds.
7540             {
7541 10     10 0 95 my $this = shift;
7542 10         20 my ($Path) = @_;
7543              
7544 10         15 my $Success;
7545              
7546             ## Try creating it.
7547 29     29   267 use IO::File;
  29         87  
  29         13320  
7548 10         80 my $File = IO::File->new(">$Path");
7549            
7550             ## Created: close and delete it.
7551 10 50       1552 $File->close(), unlink($Path) if $File;
7552              
7553             ## Failed: bail.
7554 10 50       1127 goto done unless $File;
7555              
7556             ## If we couldn't unlink, fail. The ability to delete a failed or
7557             ## part-written cache file is a critical part of cacheing.
7558              
7559 10 50       284 goto done if -e $Path;
7560              
7561 10         18 $Success = 1;
7562 10         56 done:
7563             return($Success);
7564             }
7565              
7566             sub try_file_read ## Verifies that a file exists / can be read...
7567             {
7568 70     70 0 127 my $this = shift;
7569 70         140 my ($Path) = @_;
7570              
7571 70         101 my $Success;
7572              
7573             ## Try opening it.
7574 29     29   221 use IO::File;
  29         86  
  29         33269  
7575 70 100       1402 my $File = IO::File->new("<$Path") or goto done;
7576              
7577 68         8054 $Success = 1;
7578 70         1505 done:
7579             return($Success);
7580             }
7581              
7582             sub getparam
7583             {
7584 992     992 0 1133 my $this = shift;
7585 992         1458 my ($Params, $Param) = @_;
7586              
7587 992 100       3865 return(exists($Params->{$Param}) ?
7588             ( $Params->{$Param}) :
7589             ( $this->{$Param}) );
7590             }
7591              
7592              
7593              
7594             =pod
7595              
7596             =head1 GENERAL-PURPOSE UTILITY FUNCTIONS
7597              
7598             These general-purpose utility routines are defined in the Data::CTable
7599             module but are not method calls. You may optionally import them or
7600             call them by their fully-qualified name.
7601              
7602             use Data::CTable qw(
7603             guess_endings
7604             guess_delimiter
7605             path_info
7606             path_is_absolute
7607             min
7608             max
7609             );
7610              
7611             ## File-format guessing
7612              
7613             my $E = &guess_endings($IOHandle) ## Guess txt file line endings
7614             my $D = &guess_delimiter($String) ## Tab if found, else comma
7615              
7616             ## Cross-platform file path analysis
7617              
7618             my $Info = path_info(); ## Hash: 3 of platform's path values:
7619             my ($Sep, ## ... path separator ( / on Unix)
7620             $Up, ## ... "up" component (../ on Unix)
7621             $Cur) = ## ... curr. dir path ( ./ on Unix)
7622             @$Info{qw(sep up cur)};
7623              
7624             my $Abs = path_is_absolute($Path) ## Check path type
7625              
7626             ## Our old favorites min and max
7627              
7628             $x = max($x, 0); ## Should have been part of Perl...
7629             $x = min($x, 100);
7630              
7631             guess_endings() tries to figure out whether an open IO::File handle
7632             has DOS, Mac, or Unix file endings. It reads successively large
7633             blocks of the file until it either finds evidence of at least two
7634             separate line endings (of any type, but presumably they are the same),
7635             or until it reaches the end of the file. Then, it takes the resulting
7636             block and searches for the first qualifying line ending sequence it
7637             finds, if any. This sequence is then returned to the caller. If it
7638             returns undef, it was not able to find any evidence of line endings in
7639             the file.
7640              
7641             guess_delimiter() takes a string buffer and returns a "," unless it
7642             finds a tab character before the first comma in the $String, if any,
7643             in which case a tab is returned.
7644              
7645             path_info() returns a hash of three helpful strings for building and
7646             parsing paths on the current platform. Knows about Mac, Dos/Win, and
7647             otherwise defaults to Unix.
7648              
7649             path_is_absolute($Path) returns true if it thinks the given path
7650             string is an absolute path on the current platform.
7651              
7652             =cut
7653            
7654             {}; ## Get emacs to indent correctly.
7655              
7656             sub guess_endings
7657             {
7658 19     19 0 36 my ($File) = @_;
7659              
7660 19         32 my $Ending = undef;
7661              
7662 19         39 my $ReadCount = 0;
7663 19         30 my $BlockSize = 512;
7664              
7665 19         29 my $Buf;
7666             my $Actual;
7667              
7668 19         190 while ($File->seek(0, 0), $Actual = $File->read($Buf, ($BlockSize * ++$ReadCount)))
7669             {
7670             ## Break out of the loop if it appears a line ending match is
7671             ## found (but disallow initial match at very end of buffer).
7672            
7673 19 50       1170 last if $Buf =~ /((?:\x0D\x0A)|(?:\x0D)|(?:\x0A))[^\x0D\x0A]/;
7674            
7675             ## Break out of the loop if we just read any less than
7676             ## attempted (we are probably at the end of a very short,
7677             ## maybe one-line or even zero-line, file).
7678            
7679 0 0       0 last if $Actual < ($BlockSize * $ReadCount);
7680             }
7681              
7682             ## We can presume that the buffer we now have must either have
7683             ## line endings in it, or there is no line ending in the file at
7684             ## all. So we extract the first one we come to, (trying the DOS
7685             ## ending first since it contains the other two), if any, and we
7686             ## return it.
7687              
7688 19         196 my $Ending = ($Buf =~ /((\x0D\x0A)|(\x0D)|(\x0A))/)[0];
7689            
7690             ## &progress_default(undef, "DOS line endings") if $2; ## Debugging.
7691             ## &progress_default(undef, "Mac line endings") if $3; ## Debugging.
7692             ## &progress_default(undef, "Unix line endings") if $4; ## Debugging.
7693              
7694 19         74 done:
7695              
7696             ## We always seek back to zero when done.
7697             $File->seek(0, 0);
7698              
7699 19         245 return($Ending);
7700             }
7701              
7702             sub guess_delimiter
7703             {
7704 19     19 0 43 my ($String) = @_;
7705            
7706 19   50     170 return(($String =~ /([,\t])/)[0] || ",");
7707             }
7708              
7709             sub path_info
7710             {
7711 29     29   356 use Config qw(%Config);
  29         89  
  29         5742  
7712 86     86 0 1378 my $OSName = $Config{osname};
7713            
7714 86 50       552 return({sep =>':' , up =>'::' , cur =>':' }) if $OSName =~ /mac /ix;
7715 86 50       791 return({sep =>'\\', up =>'..\\', cur =>'.\\'}) if $OSName =~ /(?
7716 86         710 return({sep =>'/' , up =>'../' , cur =>'./' }) ;
7717             }
7718              
7719             sub path_is_absolute
7720             {
7721 70     70 0 161 my ($Path) = @_;
7722              
7723 29     29   182 use Config qw(%Config);
  29         63  
  29         21216  
7724 70         481 my $OSName = $Config{osname};
7725            
7726 70 50       339 return($Path =~ /^[^:]/) if $OSName =~ /mac /ix;
7727 70 50       891 return($Path =~ /^(([a-z][:])|(\\\\))/i) if $OSName =~ /(?
7728 70         291 return($Path =~ /^\//) ;
7729             }
7730              
7731             ### min and max
7732              
7733 214620 100   214620 0 1042393 sub min {return($_[0] < $_[1] ? $_[0] : $_[1])}
7734 134 100   134 0 756 sub max {return($_[0] > $_[1] ? $_[0] : $_[1])}
7735              
7736              
7737             =pod
7738              
7739             =head1 IMPLEMENTATION LIMITATIONS
7740              
7741             =over 4
7742              
7743             =item Column (field) names must not start with underscore
7744              
7745             This object is implemented as a blessed hash reference. By
7746             convention, keys that do not start with underscore are data columns
7747             and the key is the field name. Keys that do start with underscore
7748             refer to parameters or other data structures stored in the object.
7749              
7750             Consequently, no field names may start with underscore. When a file
7751             is read from disk, any field names that DO start with underscores will
7752             have the leading underscores stripped off. Strange things could then
7753             occur if the field names are then no longer unique. For example,
7754             field "A" and "_A" in the data file would be treated as the single
7755             field "A" after the file was read.
7756              
7757             =item Field values are always read as strings
7758              
7759             Field values when written to a file are necessarily converted to
7760             strings. When read back in, they are read as strings, regardless of
7761             original format. The sole exception is the empty string which is read
7762             back in as undef for efficiency.
7763              
7764             An exception is when the _CacheOnWrite feature is used: field values
7765             stored internally as integers or other scalar types may be saved and
7766             later restored as such. However, you should not rely on this
7767             behavior.
7768              
7769             =item Undef vs. empty
7770              
7771             Empty field values are stored as "undef" for efficiency. This means
7772             that programs should generally not rely on any differences between ""
7773             and undef in field values. However, when working with large but
7774             sparse tables, programs should take care not to convert undef values
7775             to empty strings unnecessarily since the separate string objects
7776             consume considerably more memory than undef.
7777              
7778             =back
7779              
7780             =head1 CONTRIBUTIONS
7781              
7782             Corrections, bug reports, bug fixes, or feature additions are
7783             encouraged. Please send additions or patches with a clear explanation
7784             of their purpose. Consider making additions in the form of a subclass
7785             if possible.
7786              
7787             I'm committed to bundling useful subclasses contributed by myself or
7788             others with this main distribution.
7789              
7790             So, if you've got a subclass of Data::CTable (which should have a name
7791             like Data::CTable::YourClassName) and you would like it included in
7792             the main distribution, please send it along with a test script and
7793             I'll review the code and add it (at my discretion).
7794              
7795             If you've got a module that uses, augments, or complements this one,
7796             let me know that, too, and I'll make appropriate mention of it.
7797              
7798             =head1 AUTHORS
7799              
7800             Chris Thorman
7801              
7802             Jay Hannah
7803              
7804             =head1 SEE ALSO
7805              
7806             The Data::CTable home page: http://christhorman.com/projects/perl/Data-CTable/
7807              
7808             Version control: https://github.com/jhannah/data-ctable/issues
7809              
7810             Report bugs: https://rt.cpan.org/Public/Dist/Display.html?Name=Data-CTable
7811              
7812             The implementation in CTable.pm.
7813              
7814             The test.pl script, other subclasses, and examples.
7815              
7816             The Data::ShowTable module.
7817              
7818             The Data::Table module by Yingyao Zhou & Guangzhou Zou.
7819              
7820             The perlref manual page.
7821              
7822             =head1 LICENSE
7823              
7824             Copyright (c) 1995-2012 Chris Thorman. All rights reserved.
7825              
7826             This program is free software; you can redistribute it and/or modify
7827             it under the same terms as Perl itself.
7828              
7829             =cut
7830              
7831             1;
7832              
7833              
7834