File Coverage

blib/lib/Data/CTable.pm

Criterion	Covered	Total	%
statement	1322	1527	86.5
branch	373	632	59.0
condition	207	423	48.9
subroutine	164	183	89.6
pod	6	151	3.9
total	2072	2916	71.0

line	stmt	bran	cond	sub	pod	time	code
1							#!/usr/bin/perl
2							## Emacs: -- tab-width: 4; --
3
4	29			29		1281380	use strict;
	29					80
	29					1999
5
6							package Data::CTable;
7
8	29			29		173	use vars qw($VERSION); $VERSION = '1.03_02';
	29					63
	29					5636
9
10							=pod
11
12							=head1 NAME
13
14							Data::CTable - Read, write, manipulate tabular data
15
16							=head1 SYNOPSIS
17
18							## Read some data files in various tabular formats
19							use Data::CTable;
20							my $People = Data::CTable->new("people.merge.mac.txt");
21							my $Stats = Data::CTable->new("stats.tabs.unix.txt");
22
23							## Clean stray whitespace in fields
24							$People->clean_ws();
25							$Stats ->clean_ws();
26
27							## Retrieve columns
28							my $First = $People->col('FirstName');
29							my $Last = $People->col('LastName' );
30
31							## Calculate a new column based on two others
32							my $Full = [map {"$First->[$_] $Last->[$_]"} @{$People->all()}];
33
34							## Add new column to the table
35							$People->col(FullName => $Full);
36
37							## Another way to calculate a new column
38							$People->col('Key');
39							$People->calc(sub {no strict 'vars'; $Key = "$Last,$First";});
40
41							## "Left join" records matching Stats:PersonID to People:Key
42							$Stats->join($People, PersonID => 'Key');
43
44							## Find certain records
45							$Stats->select_all();
46							$Stats->select(Department => sub {/Sale/i }); ## Sales depts
47							$Stats->omit (Department => sub {/Resale/i}); ## not Resales
48							$Stats->select(UsageIndex => sub {$_ > 20.0}); ## high usage
49
50							## Sort the found records
51							$Stats->sortspec('DeptNum' , {SortType => 'Integer'});
52							$Stats->sortspec('UsageIndex', {SortType => 'Number' });
53							$Stats->sort([qw(DeptNum UsageIndex Last First)]);
54
55							## Make copy of table with only found/sorted data, in order
56							my $Report = $Stats->snapshot();
57
58							## Write an output file
59							$Report->write(_FileName => "Rept.txt", _LineEnding => "mac");
60
61							## Print a final progress message.
62							$Stats->progress("Done!");
63
64							## Dozens more methods and parameters available...
65
66							=head1 OVERVIEW
67
68							Data::CTable is a comprehensive utility for reading, writing,
69							manipulating, cleaning and otherwise transforming tabular data. The
70							distribution includes several illustrative subclasses and utility
71							scripts.
72
73							A Columnar Table represents a table as a hash of data columns, making
74							it easy to do data cleanup, formatting, searching, calculations,
75							joins, or other complex operations.
76
77							The object's hash keys are the field names and the hash values hold
78							the data columns (as array references).
79
80							Tables also store a "selection" -- a list of selected / sorted record
81							numbers, and a "field list" -- an ordered list of all or some fields
82							to be operated on. Select() and sort() methods manipulate the
83							selection list. Later, you can optionally rewrite the table in memory
84							or on disk to reflect changes in the selection list or field list.
85
86							Data::CTable reads and writes any tabular text file format including
87							Merge, CSV, Tab-delimited, and variants. It transparently detects,
88							reads, and preserves Unix, Mac, and/or DOS line endings and tab or
89							comma field delimiters -- regardless of the runtime platform.
90
91							In addition to reading data files, CTable is a good way to gather,
92							store, and operate on tabular data in memory, and to export data to
93							delimited text files to be read by other programs or interactive
94							productivity applications.
95
96							To achieve extremely fast data loading, CTable caches data file
97							contents using the Storable module. This can be helpful in CGI
98							environments or when operating on very large data files. CTable can
99							read an entire cached table of about 120 megabytes into memory in
100							about 10 seconds on an average mid-range computer.
101
102							For simple data-driven applications needing to store and quickly
103							retrieve simple tabular data sets, CTable provides a credible
104							alternative to DBM files or SQL.
105
106							For data hygiene applications, CTable forms the foundation for writing
107							utility scripts or compilers to transfer data from external sources,
108							such as FileMaker, Excel, Access, personal organizers, etc. into
109							compiled or validated formats -- or even as a gateway to loading data
110							into SQL databases or other destinations. You can easily write short,
111							repeatable scripts in Perl to do reporting, error checking, analysis,
112							or validation that would be hard to duplicate in less-flexible
113							application environments.
114
115							The data representation is simple and open so you can directly access
116							the data in the object if you feel like it -- or you can use accessors
117							to request "clean" structures containing only the data or copies of
118							it. Or you can build your own columns in memory and then when you're
119							ready, turn them into a table object using the very flexible new()
120							method.
121
122							The highly factored interface and implementation allow fine-grained
123							subclassing so you can easily create useful lightweight subclasses.
124							Several subclasses are included with the distribution.
125
126							Most defaults and parameters can be customized by subclassing,
127							overridden at the instance level (avoiding the need to subclass too
128							often), and further overridden via optional named-parameter arguments
129							to most major method calls.
130
131							=head2 Similar / related modules on CPAN
132
133							The Data::Table module by Yingyao Zhou & Guangzhou Zou offers similar
134							functionality, but uses a different underlying data representation
135							(2-dimensional array), and has a somewhat different feature set.
136							Check it out. Maybe you will prefer it for your application.
137
138							http://search.cpan.org/search?mode=module&query=Data::Table
139
140							The Data::ShowTable module renders tables in various viewable formats.
141							CTable relies on ShowTable's ShowBoxTable method to implement its own
142							format() and out() methods.
143
144							http://search.cpan.org/search?mode=module&query=Data::ShowTable
145
146
147							=head2 Prerequisites
148
149							The CTable documentation, source code, and examples assume familiarity
150							with large nested data structures, object-oriented syntax and
151							terminology, and comfort working with array and hash references and
152							array and hash slice syntax.
153
154							See the perlref man page for more on these topics.
155
156
157							=head2 How to learn more
158
159							Dozens more methods, parameters, and examples are described below.
160
161							See the full source code in CTable.pm.
162
163							Or, after installing, read the man page using:
164
165							man Data::CTable
166							perldoc Data::CTable
167
168							See the eg/ (examples) folder in the Data::CTable distribution and the
169							test.pl script for scripts demonstrating every CTable method.
170
171							For latest version and other news, check the Data::CTable home page:
172
173							http://christhorman.com/projects/perl/Data-CTable/
174
175							Or search CPAN:
176
177							http://search.cpan.org/search?mode=module&query=Data::CTable
178
179							=head1 INSTALLATION
180
181							Using CPAN module:
182
183							perl -MCPAN -e 'install Data::CTable'
184
185							Or manually:
186
187							tar xzvf Data-CTable*gz
188							cd Data-CTable-?.??
189							perl Makefile.PL
190							make
191							make test
192							make install
193
194							=head1 INCLUDED SUBCLASSES AND UTILITIES
195
196							In addition to the module itself, there are a number of subclasses and
197							simple utilities included with the Data::CTable distribution.
198
199							=head2 Subclases
200
201							The Data::CTable distribution includes these example subclasses. Each
202							is installed in your Perl environment along with the main module, and
203							so may be used by your scripts. Each has its own man/perldoc page
204							containing more detail.
205
206							B is a subclass that logs all progress
207							messages to a list within the object itself rather than (merely)
208							echoing them to STDERR. Later, you may retrieve and examine the list.
209
210							B is a virtual subclass that includes class and
211							object methods that make it easy to write a simple interactive
212							command-line program that parses options and outputs a table.
213
214							B is a very useful subclass of
215							Data::CTable::Script that implements a souped-up Unix-like "ls" (file
216							listing) command -- it first gets an optionally-recursive listing of
217							any number of files and/or directories, builds a list of their full
218							absolute or relative paths, then build a Data::CTable::Listing object
219							that contains all the paths, plus about 25+ other pieces of useful
220							information about each file or directory.
221
222							The "tls" utility, below, is simply a command-line cover for this
223							class, but you could use this class in your own scripts in order to
224							get detailed file listings.
225
226							=head2 Utilities
227
228							Each of these utilities is provided mainly so you can see mature
229							examples of how to use Data::CTable in real-world scripts.
230
231							But each is also genuinely useful, too, and you may enjoy adding them
232							to your regular bag of tricks or using them as an easily-modifiable
233							basis for scripts of your own.
234
235							On most systems, these will be installed in an appropriate directory
236							in your path when you install CTable, and hence will be executable
237							just by typing their file name. (On Windows, they'll be wrapped by a
238							.bat file and installed in C:\Perl\bin or equivalent. On *nix,
239							they'll be in /usr/bin/ or equivalent.)
240
241							B is a command-line utility that wraps Data::CTable::Listing to
242							implement a variant on the classic Unix "ls" command in which an
243							internal CTable object is used to hold and calculate a very large
244							amount of meta-data about each file and directory and then output that
245							data in formatted tables, or as any kind of delimited text file, and
246							with much more flexibility and control over included data, and sort
247							and sub-sort order than with ls.
248
249							B is a command-line utility that reads each of its arguments as
250							a Data::CTable file and then calls the out() method to display its
251							entire contents. (Warning: out() is slow with very large data sets.)
252
253							B is a command-line utility that takes a US zip code,
254							grabs the local weather report from a popular weather web site and
255							uses a CTable object to store, process, and clean, and present the
256							table of weather data that results in a simple text format.
257
258							=cut
259
260							## Required dependencies
261
262	29			29		85720	use IO::File;
	29					524819
	29					6021
263	29			29		305	use Config; qw(%Config);
	29					63
	29					1837
264
265	29			29		363	use Carp qw(croak carp confess cluck);
	29					57
	29					2445
266	29			29		64709	use Storable qw(store nstore retrieve dclone);
	29					157544
	29					3960
267	29			29		452	use File::Basename qw(fileparse);
	29					66
	29					3786
268
269
270							## Optional dependencies
271
272							my $HaveDumper;
273							my $HaveShowTable;
274
275							BEGIN
276							{
277	29			29		4581	eval "
	29			29		63673
	29			29		574221
	29					3644
	29					32006
	0					0
	0					0
278							use Data::Dumper qw(Dumper);
279							use Data::ShowTable qw(ShowBoxTable);
280							";
281
282	29					204	$HaveDumper = $Data::Dumper::VERSION;
283	29					1218	$HaveShowTable = exists($Data::ShowTable::{ShowBoxTable});
284							};
285
286
287							## We optionally export a few general-purpose utility routines.
288
289	29			29		173	use Exporter; use vars qw(@ISA @EXPORT_OK); @ISA=qw(Exporter);
	29			29		57
	29					1376
	29					160
	29					56
	29					153849
290							@EXPORT_OK = qw(
291							&ISORoman8859_1ToMacRoman
292							&MacRomanToISORoman8859_1
293
294							&ISORoman8859_1ToMacRoman_clean
295							&MacRomanToISORoman8859_1_clean
296
297							&guess_endings
298							&guess_delimiter
299
300							&path_info
301							&path_is_absolute
302
303							&min
304							&max
305							);
306
307							=pod
308
309							=head1 CREATING TABLE OBJECTS
310
311							## Create an object / read file(s) / override params
312
313							use Data::CTable;
314
315							$t = Data::CTable->new()
316							$t = Data::CTable->new($File)
317							$t = Data::CTable->new($File1, $File2....)
318							$t = Data::CTable->new($Params)
319							$t = Data::CTable->new($Params, $File1)
320							$t = Data::CTable->new($Params, $File1, $File2....)
321
322							## Internal initializer (subclassable): called for you by new()
323
324							$t->initialize()
325
326							If the first argument to new() is a hash ref, it is the $Params hash
327							of initial parameters and/or data columns which, if supplied will form
328							the starting point for the object being created. Any non-hash and/or
329							further arguments to new() are treated as file names to be opened.
330
331							If supplied, data in the $Params hash will be shallowly copied -- the
332							original hash object passed will not be used, but any sub-structures
333							within it will now "belong" to the resulting new object which will
334							feel free to manipulate them or discard them.
335
336							Then, any parameters not supplied (usually most of them) will be
337							defaulted for you because new() will call the internal method
338							initialize() before the object is finished and returned.
339
340							See the PARAMETER REFERENCE section below for the parameters you can
341							choose to supply or have defaulted for you.
342
343							Any file name arguments will be read and appended into a single
344							object. If any of the files fails to be read, then new() will fail
345							(return a false value) and no object will be created.
346
347							initialize() makes sure there is a legal and consistent value for
348							every internal parameter in the object. Generally, initialize()
349							leaves alone any parameters you supplied to new(), and simply sets
350							default values for any that were not yet supplied.
351
352							You should never need to call initialize() directly.
353
354							After calling initialize(), new() then calls the append_files_new()
355							method to process the filename arguments. This method then calls
356							read() on the first filename, and then append_file on all subsequent
357							file names, appending them to the first, in sequence.
358
359							See append() for an explanation of how the data from multiple files is
360							combined into a single table.
361
362							=head2 Advanced: Using a template object
363
364							## Calling new() with a template object
365
366							$v = $t->new()
367							$v = $t->new($File)
368							$v = $t->new($File1, $File2....)
369							$v = $t->new($Params)
370							$v = $t->new($Params, $File1)
371							$v = $t->new($Params, $File1, $File2....)
372
373							You can also call $t->new() to use an existing object, $t, as a
374							template for the new object. $t->new() will create a new object of
375							the same class or subclass as the template object. Furthermore, the
376							template object, if provided, will be used as a starting point for the
377							resulting object -- in fact, it will initially share shallow copies of
378							all data columns, if any, and all internal parameters and data
379							structures, if any.
380
381							This advanced shared-data technique could be used to create two
382							separate table objects that share and operate on the same underlying
383							data columns in memory but have different custom field lists, custom
384							selections, sort behavior, etc. But don't do this unless you're sure
385							you understand what you're doing, because changing data in one table
386							would change it in the other.
387
388							=head1 PARAMETER REFERENCE
389
390							The parameters listed here are recognized by new(), initialize(), and
391							by many functions that use the named-parameter calling convention,
392							such as read(), write(), sort(), and others.
393
394							Any parameter listed may be specified when using new().
395
396							Most parameters should not be directly accessed or manipulated once
397							the object has been created, except those that have appropriate
398							accessor methods described throughout in this documentation.
399
400							Each parameter in the lists below is listed along with its defaulting
401							logic as performed by new() via initialize().
402
403							=head2 Custom field list and custom selection (record list)
404
405							=over 4
406
407							=item _FieldList \|\|= undef;
408
409							This is the ordered list (array reference) of columns / fields present
410							in the object. It is set by read() to reflect the names and order of
411							the fields encountered or actually read from in the incoming data
412							file, if any. Initially this list is undefined, and if removed or
413							left undefined, then the de-facto field list will be a list of all
414							columns present in the table object, in alphabetical order (see
415							fieldlist() and fieldlist_all()).
416
417							Normally, all fields present in the table object would be listed in
418							the field list.
419
420							However, this parameter may be set in the object (or overridden in
421							named-parameter function calls like read(), write(), etc.) to cause a
422							subset of fields to be read, written or otherwise used. If a subset
423							field list is specified before reading a data file, then ONLY fields
424							listed will be read -- this is a way to read just certain fields from
425							a very large file, but may not always be what you want.
426
427							Specifying the field list before calling read() is required if the
428							data file has no header row giving names to its fields -- the names
429							you specify in the field list, in order, will be applied to the data
430							file being read. See the _HeaderRow parameter, below.
431
432							If a subset field list is used after columns are already loaded in
433							memory, the columns not listed in the field list will still be present
434							in the object (and can be listed by calling fieldlist_all()), but they
435							will be omitted from most operations that iterate over fields.
436
437							=item _Selection \|\|= undef;
438
439							This is a list of the record numbers of "selected" records in the
440							table, possibly indicating sorted order.
441
442							If absent, then all records are considered to be selected, in
443							"natural" order -- i.e. the order they occur in the file.
444
445							You can create and set your own selection list or get and modify an
446							existing one. Deleting it resets the selection (for example, by
447							calling select_all()).
448
449							Calling sort() will create a _Selection if none existed. Otherwise it
450							operates by modifying the existing _Selection, which may be a subset
451							of all record numbers.
452
453							=back
454
455							=head2 Cache behavior controls
456
457							See sections related to Cacheing, below.
458
459							=over 4
460
461							=item _CacheOnRead = 1 unless exists
462
463							Boolean: whether data files read by the read() method should be cached
464							after reading. Once cached, the data will be read from the cache
465							instead of the original file the NEXT TIME READ() IS CALLED, but only
466							if: 1) the cache file is found, and 2) its date is later than the
467							original. Otherwise, the cache file is ignored or re-written.
468							Cacheing can be up to 10x faster than parsing the file, so it's almost
469							always worth doing in any situation where you'll be reading a data
470							file more often than writing it.
471
472							This parameter defaults to true.
473
474							=item _CacheOnWrite = 0 unless exists
475
476							Boolean: whether tables written by the write() method should be cached
477							after writing. This defaults to false on the assumption that the
478							program won't need to re-read a file it just wrote. However, this
479							behavior would be useful if a later step in your program or another
480							program will be reading the file that was written and would benefit
481							from having the cacheing already done. Cacheing a file after writing
482							is quite fast since the data is already in memory and of course it
483							speeds up subsequent read() operations by up to 10x.
484
485							=item _CacheExtension = ".cache" unless exists
486
487							This is the file name extension that is added to a file's name to
488							determine the name of its corresponding cache file. (First, any
489							existing extension, if any, is removed.) If this extension is empty,
490							then the cache file will be named the same as the original assuming
491							the cache file is being stored in a different directory. (See next
492							setting.)
493
494							=item _CacheSubDir = "cache" unless exists
495
496							This is the absolute or relative path to the subdirectory that should
497							be used to store the cache files. The default value is the relative
498							path to a directory called "cache". Relative paths will be appended
499							to the directory path containing the original file being read.
500
501							Absolute cache paths (such as /tmp or c:\temp\) can also be used.
502
503							Override _CacheExtension and _CacheSubdir in a subclass, in each
504							object, or in each call to read() or write() in order to have the
505							cache files stored elsewhere. But remember: unless you use the same
506							cache settings next time you read the same file, the cache files will
507							be orphaned.
508
509							=back
510
511							=head2 Progress routine / setting
512
513							=over 4
514
515							=item _Progress = undef unless exists
516
517							The _Progress setting controls the routing of diagnostic messages.
518							Four possible settings are recognized:
519
520							undef (default) The class's progress settings are used.
521							subroutine reference Your own custom progress routine.
522							true Built-in progress_default() method used.
523							0/false No progress messages for this object.
524
525							See the PROGRESS section for a description of the interface of custom
526							progress routines and for details on how the builtin one works.
527
528							=back
529
530							=head2 File format settings
531
532							=over 4
533
534							=item _LineEnding \|\|= undef;
535
536							_LineEnding indicates the line ending string or setting to be used to
537							read a file, the setting that actually I used to read a file,
538							and/or the line ending that will be used to write a file.
539
540							Set this parameter to force a particular encoding to be used.
541
542							Otherwise, leave it undef. The program will Do What You Mean.
543
544							If _LineEnding is undef when read() is called, read() will try to
545							guess the line ending type by inspecting the first file it reads.
546							Then it will set this setting for you. It can detect DOS, Unix, and
547							Mac line endings.
548
549							If _LineEnding is undef when write() is called, write() will use
550							C<"\n">, which yields different strings depending on the current
551							runtime platform: \x0A on Unix; \x0D in MacPerl, \x0D\x0A on DOS.
552
553							Otherwise, write() uses the value defined in _LineEnding, which would
554							match the value filled in by read() if this object's data originally
555							had been read from a file. So if you read a file and then later write
556							it out, the line endings in the written file will match the format of
557							original unless you override _LineEnding specifically.
558
559							Since Data::CTable supports reading and writing all common endings,
560							base your decision on line ending format during write() on the needs
561							of other programs you might be using.
562
563							For example: FileMaker Pro and Excel crash / hang if Unix line endings
564							are used, so be sure to use the ending format that matches the needs
565							of the other programs you plan to use.
566
567							As a convenience, you may specify and retrieve the _LineEnding setting
568							using the mnemonic symbols "mac", "dos" and "unix." These special
569							values are converted to the string values shown in this chart:
570
571							symbol string value chars decimal octal control
572							-------------------------------------------------------------
573							dos "\x0D\x0A" CR/LF 13,10 "\015\012" ^M^J
574							mac "\x0D" CR 13 "\015" ^M
575							unix "\x0A" LF 10 "\012" ^J
576
577							See the section LINE ENDINGS, below, for accessor methods and
578							conversion utilities that help you get/set this parameter in either
579							symbolic format or string format as you prefer.
580
581							=item _FDelimiter \|\|= undef;
582
583							_FDelimiter is the field delimiter between field names in the header
584							row (if any) and also between fields in the body of the file. If
585							undef, read() will try to guess whether it is tab C<"\t"> or comma
586							<",">, and set this parameter accordingly. If there is only one field
587							in the file, then comma is assumed by read() and will be used by
588							write().
589
590							To guess the delimiter, the program looks for the first comma or tab
591							character in the header row (if present) or in the first record.
592							Whichever character is found first is assumed to be the delimiter.
593
594							If you don't want the program to guess, or you have a data file format
595							that uses a custom delimiter, specify the delimiter explicitly in the
596							object or when calling read() or make a subclass that initializes this
597							value differently. On write(), this will default to comma if it is
598							empty or undef.
599
600							=item _QuoteFields = undef unless exists
601
602							_QuoteFields controls how field values are quoted by write() when
603							writing the table to a delimited text file.
604
605							An undef value (the default) means "auto" -- each field is checked
606							individually and if it contains either the _FDelimiter character or a
607							double-quote character, the field value will be surrounded by
608							double-quotes as it is written to the file. This method is slower to
609							write but faster to read, and may make the output easier for humans to
610							read.
611
612							A true value means always put double-quotes around every field value.
613							This mode is faster to write but slower to read.
614
615							A zero value means never to use double-quotes around field values and
616							not to check for the need to use them. This method is the fastest to
617							read and write. You may use it when you are certain that your data
618							can't contain any special characters. However, if you're wrong, this
619							mode will produce a corrupted file in the event that one of the fields
620							does contain the active delimiter (such as comma or tab) or a quote.
621
622							=item _HeaderRow = 1 unless exists
623
624							_HeaderRow is a boolean that says whether to expect a header row in
625							data files. The default is true: a header row is required. If false,
626							_FieldList MUST be present before calling read() or an error will be
627							generated. In this latter case, _FieldList will be assumed to give
628							the correct names of the fields in the file, in order, before the file
629							is read. In other words, the object expects that either a) it can get
630							the field names from the file's header row or b) you will supply them
631							before read() opens the file.
632
633							=back
634
635							=head2 Encoding of return characters within fields
636
637							=over 4
638
639							=item _ReturnMap = 1 unless exists
640
641							_ReturnMap says that returns embedded in fields should be decoded on
642							read() and encoded again on write(). The industry-standard encoding
643							for embedded returns is ^K (ascii 11 -- but see next setting to change
644							it). This defaults to true but can be turned off if you want data
645							untouched by read(). This setting has no effect on data files where
646							no fields contain embedded returns. However, it is vital to leave
647							this option ON when writing any data file whose fields could contain
648							embedded returns -- if you have such data and call write() with
649							_ReturnMap turned off, the resulting file will be an invalid Merge/CSV
650							file and might not be re-readable.
651
652							When these fields are decoded on read(), encoded returns are converted
653							to C<"\n"> in memory, whatever its interpretation may be on the current
654							platform (\x0A on Unix or DOS; \x0D on MacPerl).
655
656							IMPORTANT NOTE: When these fields are encoded by write(), any
657							occurrence of the current _LineEnding being used to write the file is
658							searched and encoded FIRST, and THEN, any occurrence of "\n" is also
659							searched and encoded. For example, if using mac line endings (^M) to
660							write a file on a Unix machine, any ^M characters in fields will be
661							encoded, and then any "\n" (^J) characters will ALSO be encoded. This
662							may not be what you want, so be sure you know how your data is encoded
663							in cases where your field values might contain any ^J and/or ^M
664							characters.
665
666							IMPORTANT NOTE: If you turn _ReturnMap off, fields with returns in
667							them will still be double-quoted correctly. Some parsers of tab- or
668							comma-delimited files are able to support reading such files.
669							HOWEVER, the parser in this module's read() method DOES NOT currently
670							support reading files in which a single field value appears to span
671							multiple lines in the file. If you have a need to read such a file,
672							you may need to write your own parser as a subclass of this module.
673
674							=item _ReturnEncoding \|\|= "\x0B";
675
676							This is the default encoding to assume when embedding return
677							characters within fields. The industry standard is "\x0B" (ascii 11 /
678							octal \013 / ^K) so you should probably not ever change this setting.
679
680							When fields are encoded on write(), C<"\n"> is converted to this
681							value. Note that different platforms use different ascii values for
682							C<"\n">, which is another good reason to leave the ReturnEncoding
683							feature enabled when calling write().
684
685							To summarize: this module likes to assume, and you should too, that
686							returns in data files on disk are encoded as "\x0B", but once loaded
687							into memory, they are encoded as the current platform's value of
688							C<"\n">.
689
690							=item _MacRomanMap = undef unless exists
691
692							Data::CTable assumes by default that you want field data in memory to
693							be in the ISO 8859-1 character set (the standard for Latin 1 Roman
694							characters on Unix and Windows in the English and Western European
695							languages -- and also the default encoding for HTML Web pages).
696
697							_MacRomanMap controls the module's optional mapping of Roman
698							characters from Mac format on disk to ISO format in memory when
699							reading and writing data files. These settings are recognized:
700
701							undef ## Auto: Read/write Mac chars if using Mac line endings
702							1 ## On: Assume Mac char set in all fields
703							0 ## Off: Don't do any character mapping at all
704
705							The default setting is undef, which enables "Auto" mode: files found
706							to contain Mac line endings will be assumed to contain Mac upper-ASCII
707							characters and will be mapped to ISO on read(); and files to be
708							written with Mac line endings will mapped back from ISO to Mac format
709							on write().
710
711							If your data uses any non-Latin-1 character sets, or binary data, or
712							you really want Mac upper-ASCII characters in memory, or you just
713							don't want this module messing with your encodings, set this option to
714							0 (Off) or make a subclass that always sets it to 0.
715
716							See also the clean() methods that can help you translate just the
717							columns you want after reading a file or before writing it, which may
718							be faster for you if only a few fields might contain high-ASCII
719							characters.
720
721							=item _FileName \|\|= undef;
722
723							This is the name of the file that should be read from or WAS read
724							from. (read() will set _FileName to the value it used to read the
725							file, even if _FileName was only supplied as a named parameter.)
726
727							This name will also be used, unless overridden, to re-write the file
728							again, but with an optional extension added. (See next setting.)
729
730							=item _WriteExtension = ".out" unless exists
731
732							The _WriteExtension is provided so that CTable won't overwrite your
733							input data file unless you tell it to.
734
735							_WriteExtension will be added to the object's _FileName setting to
736							create a new, related file name, before writing.... UNLESS _FileName
737							is supplied as an direct or named parameter when calling write().
738
739							In the latter case, write() uses the file name you supply and adds no
740							extension, even if this would mean overwriting the original data file.
741
742							To add _WriteExtension, write() places it prior to any existing final
743							extension in the _FileName:
744
745							_FileName default file name used by write()
746							--------------------------------------------------------------
747							People.merge.txt People.merge.out.txt
748							People People.out
749
750							If you want to always overwrite the original file without having to
751							supply _FileName each time, simply set _WriteExtension to undef in a
752							subclass or in each instance.
753
754							If _CacheOnWrite is true, then the _WriteExtension logic is applied
755							first to arrive at the actual name of the file to be written, and then
756							the _CacheExtension logic is applied to that name to arrive at the
757							name of the cache file to be written.
758
759							=back
760
761							=head2 Sorting-related parameters
762
763							=over 4
764
765							=item _SortOrder \|\|= undef;
766
767							_SortOrder is the list of fields which should be used as primary,
768							secondary, etc. sort keys when sort() is called. Like other
769							parameters, it may be initialized by a subclass, stored in the object,
770							or provided as a named parameter on each call to sort().
771
772							If _SortOrder is empty or undefined, then sort() sorts the records by
773							record number (i.e. they are returned to their "natural" order).
774
775							=item _SortSpecs \|\|= {};
776
777							_SortSpecs is a hash of specifications for the SortType and
778							SortDirection of fields on which sorting may be done. For any field
779							missing a sort spec or the SortType or SortDirection components of its
780							sort spec, the _DefaultSortType and _DefaultSortDirection settings
781							will be used. So, for example, if all fields are of type String and
782							you want them to sort Ascending, then you don't need to worry about
783							_SortSpecs. You only need to provide specs for fields that don't take
784							the default settings.
785
786							_SortSpecs might look like this:
787
788							{Age => {SortType => 'Integer'},
789							NameKey => {SortType => 'Text', SortDirection => -1}}
790
791							=item _SRoutines \|\|= {};
792
793							_SRoutines is a hash mapping any new SortTypes invented by you to your
794							custom subroutines for sorting that type of data. (See the section on
795							sort routines, below, for a full discussion.)
796
797							=back
798
799							=head2 Sorting defaults
800
801							=over 4
802
803							=item _DefaultSortType \|\|= 'String';
804
805							If you sort using a field with no sort spec supplied, or whose sort
806							spec omits the SortType, it will get its SortType from this parameter.
807
808							See the sections below on SORT TYPES and SORT ROUTINES.
809
810							=item _DefaultSortDirection \|\|= 1;
811
812							If you sort using a field with no sort spec supplied, or whose sort
813							spec omits the SortDirection, it will get its SortDirection from this
814							parameter.
815
816							Legal sort directions are: 1 (Ascending) or -1 (Descending).
817
818							See the section below on DEFAULT SORT DIRECTION.
819
820							=back
821
822							=head2 Miscellaneous parameters
823
824							=over 4
825
826							=item _ErrorMsg \|\|= "";
827
828							This parameter is set by read() or write() methods that encounter an
829							error (usually a parameter error or file-system error) that prevents
830							them from completing. If those methods or any methods that call them
831							return a false value indicating failure, then _ErrorMsg will contain a
832							string explaining the problem. The message will also have been passed
833							to the progress() method for possible console feedback.
834
835							=item _Subset
836
837							This parameter is set to 1 (true) by read() if the last call to read()
838							brought in a subset of the fields available in the file; 0 otherwise.
839
840							The object uses this field internally so it knows to abandon any cache
841							files that might not contain all requested fields upon read().
842
843							=back
844
845							=head1 SUBCLASSING
846
847							Most subclasses will override initialize() to set default values for
848							the parameters of the parent class and then they may provide default
849							values for other subclass-specific parameters. Then, the subclass's
850							initialize() should call SUPER::initialize() to let the parent
851							class(es) take care of the remaining ones.
852
853							Every initialize() method should always allow for parameters to have
854							already been provided by the $Params hash or template object. It
855							should not overwrite any valid values that already exist.
856
857							The following sample subclass changes the default setting of the
858							_Progress parameter from undef to 1 and then overrides the
859							progress_default() method to log all progress messages into a new
860							"_ProgrLog" (progress log) parameter stored in the object.
861
862							BEGIN
863							{ ## Data::CTable::ProgressLogger: store messages in the object
864
865							package Data::CTable::ProgressLogger;
866							use vars qw(@ISA); @ISA=qw(Data::CTable);
867
868							sub initialize ## Add a new param; change one default
869							{
870							my $this = shift;
871							$this->{_Progress} = 1 unless exists($this->{_Progress});
872							$this->{_ProgrLog} \|\|= [];
873							$this->SUPER::initialize();
874							}
875
876							sub progress_default ## Log message to object's ProgMsgs list
877							{
878							my $this = shift;
879							my ($msg) = @_;
880							chomp $msg;
881							push @{$this->{_ProgrLog}}, localtime() . " $msg";
882
883							return(1);
884							}
885
886							sub show_log ## Use Dumper to spit out the log list
887							{
888							my $this = shift;
889							$this->dump($this->{_ProgrLog});
890							}
891							}
892
893							## Later...
894
895							my $Table = Data::CTable::ProgressLogger->new("mydata.txt");
896							# ... do stuff...
897							$Table->write();
898							$Table->show_log();
899
900							=cut
901
902							{}; ## Get emacs to indent correctly.
903
904							sub new
905							{
906							## First arg to new is always either class name or a template
907							## object. This allows $obj->new() or CLASS->new().
908
909							## Second argument (if and only if it is a hash ref or an object
910							## whose underlying representation is a hash ref) is an optional
911							## anonymous hash of parameters which if supplied, will override
912							## any parameters already found in the template object, if any.
913
914							## See the initialize method, below, for a list of parameters that
915							## can be supplied (and will be defaulted for you if not
916							## supplied).
917
918							## Note that the template object and the params hash will be
919							## SHALLOWLY copied -- the original hash objects passed will not
920							## be used, but any sub-structures within them will now "belong"
921							## to the resulting new object which will feel free to manipulate
922							## them, possibly invalidating the integrity of the original
923							## template object.
924
925	126			126	0	4971	my $ClassOrObj = shift;
926	126	100				1124	my ($Params) = {%{shift()}} if UNIVERSAL::isa($_[0], 'HASH');
	83					820
927
928							## Shallow-copy all params from template object and/or optional
929							## $Params hash into new hash. DON'T re-use caller's obj or hash.
930
931	126	50				3711	my $this =
932	126	100				1454	{%{(UNIVERSAL::isa($ClassOrObj, 'HASH') ? $ClassOrObj : {})},
933	126					317	%{(UNIVERSAL::isa($Params, 'HASH') ? $Params : {})}};
934
935							## Bless the new object into the class
936
937	126		33			1006	my $class = ref($ClassOrObj) \|\| $ClassOrObj;
938	126					322	bless $this, $class;
939
940	126					200	my $Success;
941
942							## Run the subclassable initialize() method to create default
943							## settings for any private parameters.
944
945	126	50				513	goto done unless $this->initialize();
946
947							## Finally, process any (other) arguments to new(), if any.
948
949	126					333	my $RemainingArgs = [@_];
950
951	126	50				702	goto done unless $this->process_new_args($RemainingArgs, $Params);
952
953	126					194	$Success = 1;
954	126	50				1102	done:
955							return ($Success ? $this : undef);
956							}
957
958							### process_new_args
959
960							### Any optional remaining (non-HASH ref) arguments to new() are
961							### treated as file names of files to open and append to the in-memory
962							### table, creating new columns as necessary. We call the
963							### subclassable append_files_new() method to process these.
964
965							sub process_new_args
966							{
967	126			126	0	218	my $this = shift;
968	126					213	my ($RemainingArgs, $Params) = @_;
969
970	126					189	my $Success;
971
972	126	50				499	$Success = $this->append_files_new($RemainingArgs, $Params) or goto done;
973
974	126					194	$Success = 1;
975	126					394	done:
976							return ($Success);
977							}
978
979							### initialize
980
981							### Assumptions made by initialize() (and all other methods, too):
982
983							### The blessed object is a hash ref.
984
985							### All hash keys beginning with _ are reserved for non-data columns.
986
987							### Hash keys beginning with a single _ are reserved for future
988							### versions of this parent class implementation. Subclasses might
989							### want to use double-underscore for additional slots.
990
991							### All other hash keys are field names; their values are data
992							### columns (array references).
993
994							### initialize() sets / validates initial settings for all parameters
995							### recognized by this parent class. It exercises caution to not
996							### override any legal values previously set by the
997							### subclass::initialize() or by new().
998
999							sub initialize
1000							{
1001	126	50		126	0	1155	my $this = shift or goto done;
1002
1003	126					204	my $Success;
1004
1005							## Reading / writing
1006
1007	126		100			698	$this->{_FileName} \|\|= undef; ## Path of file that was read
1008
1009	126	100				505	$this->{_WriteExtension} = ".out" unless exists($this->{_WriteExtension});
1010
1011							## Cache settings
1012
1013	126	100				1102	$this->{_CacheOnRead} = 1 unless exists($this->{_CacheOnRead});
1014	126	100				458	$this->{_CacheOnWrite} = 0 unless exists($this->{_CacheOnWrite});
1015	126	100				480	$this->{_CacheExtension} = ".cache" unless exists($this->{_CacheExtension});
1016	126	100				456	$this->{_CacheSubDir} = "cache" unless exists($this->{_CacheSubDir});
1017
1018							## File format settings
1019
1020	126		100			574	$this->{_LineEnding} \|\|= undef;
1021	126		100			697	$this->{_FDelimiter} \|\|= undef;
1022	126	100				500	$this->{_QuoteFields} = undef unless exists ($this->{_QuoteFields});
1023	126	100				441	$this->{_HeaderRow} = 1 unless exists ($this->{_HeaderRow});
1024
1025							## Return encodings
1026
1027	126	100				11049	$this->{_ReturnMap} = 1 unless exists ($this->{_ReturnMap});
1028	126		100			803	$this->{_ReturnEncoding} \|\|= "\x0B"; ## Char to use for return chars
1029	126	100				480	$this->{_MacRomanMap} = undef unless exists ($this->{_MacRomanMap});
1030
1031							## Sorting defaults
1032
1033	126		100			640	$this->{_DefaultSortType} \|\|= 'String';
1034	126		100			623	$this->{_DefaultSortDirection} \|\|= 1; ## Ascending (-1 = desc)
1035
1036							## Progress routine / setting
1037
1038	126	100				549	$this->{_Progress} = undef unless exists ($this->{_Progress});
1039
1040							## Internal meta-structures
1041
1042	126		100			551	$this->{_FieldList} \|\|= undef; ## List of fields; undef means all fields, alpha order
1043	126		100			684	$this->{_Selection} \|\|= undef; ## List of rec #s; undef means all records, natural order
1044	126		50			676	$this->{_SortOrder} \|\|= undef; ## List of fields; undef/empty means sort by record number
1045
1046	126		100			591	$this->{_SortSpecs} \|\|= {}; ## Hash: FieldName => Sortspec
1047	126		100			702	$this->{_SRoutines} \|\|= {}; ## Hash: SortType => custom sort routine for type
1048
1049							## Miscellaneous
1050
1051	126		50			675	$this->{_ErrorMsg} \|\|= ""; ## Explains last read/write failure
1052	126		50			737	$this->{_Subset} \|\|= 0; ## Flag indicating subset of available fields were read
1053	126		100			601	$this->{_IgnoreQuotes} \|\|= 0;
1054
1055	126					178	$Success = 1;
1056	126					452	done:
1057							return($Success);
1058							}
1059
1060							=pod
1061
1062							=head1 FIELD LIST
1063
1064							## Getting / setting the object's _FieldList
1065
1066							$t->fieldlist() ## Get _FieldList or fieldlist_all()
1067							$t->fieldlist_get()
1068							$t->fieldlist_hash() ## Get fieldlist() as keys in a hash
1069
1070							$t->fieldlist_all() ## Get all fields (ignore _FieldList)
1071
1072							$t->fieldlist($MyList) ## Set field list (_FieldList param)
1073							$t->fieldlist_set($MyList)
1074
1075							$t->fieldlist(0) ## Remove field list (use default)
1076							$t->fieldlist_set()
1077
1078							$t->fieldlist_force($MyList)## Set list; remove non-matching cols
1079
1080							$t->fieldlist_truncate() ## Just remove nonmatching cols
1081
1082							$t->fieldlist_default() ## Default field list (alpha-sorted)
1083
1084							$t->fieldlist_add($MyName) ## Append new name to custom list.
1085							$t->fieldlist_delete($MyName) ## Delete name from custom list.
1086
1087							A CTable object can optionally have a custom field list. The custom
1088							field list can store both the ORDER of the fields (which otherwise
1089							would be unordered since they are stored as keys in a hash), and also
1090							can be a subset of the fields actually in the object, allowing you to
1091							temporarily ignore certain effectively-hidden fields for the benefit
1092							of certain operations. The custom field list can be changed or
1093							removed at any time.
1094
1095							The custom field list is stored in the private _FieldList parameter.
1096
1097							fieldlist() always returns a list (reference). The list is either the
1098							same list as _FieldList, if present, or it is the result of calling
1099							fieldlist_default(). In CTable, fieldlist_default() in turn calls
1100							fieldlist_all() -- hence fieldlist() would yield an auto-generated
1101							list of all fields in alphabetical order.
1102
1103							fieldlist_all() can be called directly to get a list of all fields
1104							present regardless of the presence of a _FieldList parameter. The
1105							list is an alphabetical case-insensitively sorted list of all hash
1106							keys whose names do not begin with an underscore.
1107
1108							You could override this method if you want a different behavior. Or,
1109							you could create your own custom field list by calling fieldlist_all()
1110							and removing fields or ordering them differently.
1111
1112							To set a custom field list (in _FieldList), call fieldlist() or
1113							fieldlist_set() with a list (reference). The list must be a list of
1114							strings (field names) that do not begin with underscore. The object
1115							owns the list you supply.
1116
1117							To remove a custom field list (and let the default be used), call
1118							fieldlist(0) or fieldlist_set() with no arguments (these will return
1119							the fieldlist that was deleted, if any).
1120
1121							fieldlist_freeze() "freezes" the fieldlist in its current state. This
1122							is equivalent to the following:
1123
1124							$t->fieldlist_set($t->fieldlist());
1125
1126							... which would force the fieldlist to $t->fieldlist_all() if and only
1127							if there is not already a custom _FieldList present.
1128
1129							IMPORTANT NOTE ABOUT PARTIAL FIELD LISTS: When setting a field list,
1130							the object ensures that all fields (columns) mentioned in the list are
1131							present in the object -- it creates empty columns of the correct
1132							length as necessary. However, it does NOT delete any fields not
1133							mentioned in the field list. This allows you to manipulate the field
1134							list in order to have certain fields be temporarily ignored by all
1135							other methods, then alter, restore, or remove it (allow it to revert
1136							to default) and they will be effectively unhidden again. Some methods
1137							(such as cols(), write(), etc.) also allow you to specify a custom
1138							field list that will override any other list just during the execution
1139							of that method call but will not modify the object itself.
1140
1141							Call fieldlist_force() to set the list AND have any non-listed fields
1142							also deleted at the same time (by calling fieldlist_truncate()
1143							internally). You can also just delete individual columns one-by-one,
1144							of course, using the column-manipulation methods and the custom
1145							fieldlist, if any, will be appropriately updated for you.
1146
1147							fieldlist_truncate() deletes any fields found in the table but not
1148							currently present in _FieldList. A hash of the deleted columns is
1149							returned to the caller. If there is no _FieldList, then this method
1150							does nothing.
1151
1152							fieldlist_default() just calls fieldlist_all() in this implementation,
1153							but could be changed in subclasses.
1154
1155							fieldlist_add() is the internal method that adds a new field name to
1156							the custom field list (if present) and if the field name was not
1157							already on the list. It is called by other methods any time a new
1158							column is added to the table. Don't call it directly unless you know
1159							what you're doing because the corresponding column won't be created.
1160							(Instead, use col().) The field name is appended to the end of the
1161							existing custom field list. If there is no custom field list, nothing
1162							is done.
1163
1164							fieldlist_delete() is the internal method that deletes a field name
1165							from the custom field list (if present). It is called by other
1166							methods when columns are deleted, but it does not actually delete the
1167							columns themselves, so use with caution: deleting a field from the
1168							custom field list effectively hides the field. This method has no
1169							effect, however, if there is no custom field list present. So don't
1170							call this method directly unless you know what you're doing.
1171
1172							=cut
1173
1174							sub fieldlist_all
1175							{
1176	223			223	0	334	my $this = shift;
1177	223					1706	my $FieldList = [sort {lc($a) cmp lc($b)} grep {!/^_/} keys %$this];
	997					3159
	5949					18474
1178
1179	223					1301	return($FieldList);
1180							}
1181
1182							sub fieldlist_default ## Same as fieldlist_all() in this class.
1183							{
1184	40			40	0	51	my $this = shift;
1185	40					88	my $FieldList = $this->fieldlist_all();
1186
1187	40					364	return($FieldList);
1188							}
1189
1190							sub fieldlist
1191							{
1192	820			820	0	1093	my $this = shift;
1193	820					1954	my ($FieldList) = @_;
1194
1195							## Set if specified.
1196	820	100				4212	$this->fieldlist_set($FieldList) if defined($FieldList);
1197
1198							## Get and return.
1199	820					1756	$FieldList = $this->fieldlist_get();
1200
1201	820					2261	return($FieldList);
1202							}
1203
1204							sub fieldlist_get
1205							{
1206	820			820	0	1394	my $this = shift;
1207	820		66			2728	my $FieldList = $this->{_FieldList} \|\| $this->fieldlist_default();
1208
1209	820					1404	return($FieldList);
1210							}
1211
1212							sub fieldlist_hash ## ([$FieldList])
1213							{
1214	113			113	0	168	my $this = shift;
1215	113					169	my ($FieldList) = @_;
1216	113		66			539	$FieldList \|\|= $this->fieldlist();
1217	113					197	my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
	113					448
1218
1219	113					264	return($FieldHash);
1220							}
1221
1222							sub fieldlist_set
1223							{
1224	36			36	0	64	my $this = shift;
1225	36					59	my ($FieldList) = @_;
1226
1227	36					242	return($this->fieldlist_set_internal($FieldList, 0));
1228							}
1229
1230							sub fieldlist_freeze
1231							{
1232	0			0	0	0	my $this = shift;
1233	0					0	return($this->fieldlist_set($this->fieldlist()));
1234							}
1235
1236							sub fieldlist_force
1237							{
1238	3			3	0	6	my $this = shift;
1239	3					5	my ($FieldList) = @_;
1240
1241	3					8	return($this->fieldlist_set_internal($FieldList, 1));
1242							}
1243
1244							sub fieldlist_set_internal
1245							{
1246	39			39	0	95	my $this = shift;
1247	39					198	my ($FieldList, $Force) = @_;
1248
1249	39	100				151	if (ref($FieldList) eq 'ARRAY')
1250							{
1251
1252							## Whether forcing or not, ensure all fields mentioned in the
1253							## list actually exist and are the correct length.
1254	33					100	$this->fieldlist_check($FieldList);
1255
1256							## Set the custom list
1257	33					574	$this->{_FieldList} = [@$FieldList];
1258
1259							## In "force" mode, remove any non-listed columns.
1260	33	100				129	$this->fieldlist_truncate() if ($Force);
1261							}
1262							else
1263							{
1264							## Remove the custom field list.
1265	6					21	$FieldList = delete $this->{_FieldList};
1266							}
1267
1268	39					181	return($FieldList); ## Return the one that was set or deleted.
1269							}
1270
1271							sub fieldlist_check
1272							{
1273	33			33	0	59	my $this = shift;
1274	33					41	my ($FieldList) = @_;
1275
1276	33		33			100	$FieldList \|\|= $this->fieldlist();
1277
1278							## Visit each field name in the current list. Make sure it is
1279							## present.
1280
1281	33					77	foreach my $FieldName (@$FieldList)
1282							{
1283							## The col method will the column exist if not present.
1284	100					222	$this->col($FieldName);
1285							}
1286							}
1287
1288							sub fieldlist_truncate
1289							{
1290	3			3	0	4	my $this = shift;
1291
1292	3					9	my $FieldList = $this->fieldlist();
1293	3					7	my $AllFields = $this->fieldlist_all();
1294	3					6	my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
	3					21
1295
1296	3					5	my $DeletedCols = {};
1297
1298	3					7	foreach my $FieldName (@$AllFields)
1299							{
1300	8	100				22	if (!exists($FieldHash->{$FieldName}))
1301							{
1302	3					8	$DeletedCols->{$FieldName} = delete $this->{$FieldName};
1303							}
1304							}
1305
1306	3					13	return($DeletedCols);
1307							}
1308
1309							sub fieldlist_add
1310							{
1311	43			43	0	58	my $this = shift;
1312	43					107	my ($FieldName) = @_;
1313
1314	43	100				184	if (ref($this->{_FieldList}) eq 'ARRAY')
1315							{
1316	34					60	my $FieldList = $this->{_FieldList};
1317	34					62	my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
	34					212
1318
1319	34	50				113	if (!exists($FieldHash->{$FieldName}))
1320							{
1321	34					144	push @$FieldList, $FieldName;
1322							}
1323							}
1324							}
1325
1326							sub fieldlist_delete
1327							{
1328	3			3	0	7	my $this = shift;
1329	3					6	my ($FieldName) = @_;
1330
1331	3	50				15	if (ref($this->{_FieldList}) eq 'ARRAY')
1332							{
1333	3					5	$this->{_FieldList} = [grep {$_ ne $FieldName} @{$this->{_FieldList}}];
	11					33
	3					30
1334							}
1335							}
1336
1337							=pod
1338
1339							=head1 DATA COLUMNS (FIELD DATA)
1340
1341							## Getting or setting data in entire columns
1342
1343							$t->{$ColName} ## Get a column you know exists
1344							$t->col($ColName) ## Get a column or make empty one.
1345							$t->col_get($ColName)
1346
1347							$t->col($ColName, $ListRef) ## Set all of a column all at once.
1348							$t->col_set($ColName, $ListRef)
1349							$t->col_force($ColName, $ListRef) ## Add but don't check size or
1350							## add to custom field list
1351
1352							$t->col_set($ColName, undef) ## Delete a column completely
1353							$t->col_delete($ColName)
1354
1355							$t->col_empty() ## An empty col presized for table
1356							$t->col_empty(22) ## An empty col of another length
1357							$t->col_empty($Col) ## An empty col sized to match another
1358
1359							$t->col_default() ## Default if req. column not found.
1360
1361							$t->col_exists($Field) ## Check existence of column
1362							$t->col_active($Field) ## Restrict check to fieldlist()
1363
1364							$t->cols($ColList) ## Get list of multiple named columns
1365							$t->cols_hash($ColList) ## Get hash " " "
1366
1367							$t->col_rename($Old => $New) ## Change name of columns
1368							$t->col_rename($Old1 => $New1, $Old2 => $New2) ## Change several
1369
1370							A "column" is a field in the table and all its data. The column's
1371							field name is a key in the object itself, and may also optionally be
1372							listed in a custom field list if present. The column's data is the
1373							key's value in the hash and is an array ref of values presumed to be
1374							of the same data type (e. g. string, integer, etc.)
1375
1376							Sometimes the terms "column" and "field" are used interchangeably in
1377							this documentation.
1378
1379							If you already know that a column exists (because you got it from the
1380							fieldlist() method and you've not previously manipulated _FieldList
1381							directly but instead carefully used the method calls available for
1382							that), then you can safely get the column by just looking it up in the
1383							object itself.
1384
1385							The col() method does the same thing, but forces the column to spring
1386							into existence if it did not already (which can also have the
1387							potentially unwanted side-effect of hiding coding errors in which you
1388							retreive mis-named columns: so beware). Columns brought into
1389							existence this way will automatically be pre-sized (i.e. they will
1390							will be created and set to whatever col_default() returns).
1391
1392							The col() or col_set() methods can also be used to set a column. When
1393							the column is set, the list you pass is automatically sized
1394							(lengthened or truncated) to match the current length of the table.
1395							If this is not what you want, then call col_force() which will not
1396							check whether the new column matches the size of the others.
1397
1398							No matter how you set it, the object now "owns" the list you gave it.
1399
1400							As a convenience, col(), col_set() and col_force() return the column
1401							that was set. They silently discard any previous column.
1402
1403							All three methods of column setting will append the column to the
1404							custom field list if one is present and the column name is not already
1405							listed there (by calling fieldlist_add()). They will also call the
1406							extend() method to ensure all columns have the same length (either
1407							others will be extended to match the length of the new one, or the new
1408							one will be extended to match the length of the others).
1409
1410							col_delete() deletes a column.
1411
1412							col_empty() returns an anonymous list reference that is pre-sized to
1413							the length of the table (by default). You could use it to get an
1414							empty column that you intend to fill up and then later insert into the
1415							table or use to hold the results of an operation on other columns. If
1416							you want a different length, specify it as a number or as an array ref
1417							whose length should be matched.
1418
1419							col_default() is the internal method that implements the "springing
1420							into existence" of missing columns. Currently it just calls
1421							col_empty(). Other subclasses might want to have it return undef or a
1422							string like "NO_SUCH_COLUMN" in order to help track programming errors
1423							where nonexistent columns are requested.
1424
1425							cols($FieldList) returns an ordered list of the requested column
1426							names. If no list is given, then fieldlist() is used.
1427
1428							cols_hash($FieldList) does the same as cols(), but the result is a
1429							hash whose keys are the field names and whose values are the columns
1430							-- much like the original object itself, but not blessed into the
1431							class. The resulting hash, however, could be used as the prototype
1432							for a new Data::CTable object (by calling the new() method). However,
1433							be warned that both objects will think they "own" the resulting shared
1434							so be careful what you do..... which brings us to this:
1435
1436							IMPORTANT NOTE ABOUT GETTING COLUMNS: The columns you retrieve from a
1437							table are still "owned" by the table object as long as it lives. If
1438							you modify them, you are modifying the table's data. If you change
1439							their length, then you may be invalidating the table's own
1440							expectations that all its columns have the same length. So beware.
1441
1442							Just make yourself a copy of the data if that isn't what you want.
1443							For example, instead of this:
1444
1445							my $Foo = $Table->col('Foo'); ## Reference to actual column
1446
1447							Do this:
1448
1449							my $Foo = [@{$Table->col('Foo')}]; ## Shallow copy of the column
1450
1451							=cut
1452
1453							sub col ## ($ColName, [$Vector])
1454							{
1455	1455			1455	0	1880	my $this = shift;
1456	1455					1916	my ($ColName, $Vector) = @_;
1457
1458							## Set if specified.
1459	1455	100				11010	my $FoundVector = $this->col_set($ColName, $Vector) if defined($Vector);
1460
1461							## Get and return.
1462							## If not specified, create it with col_default()
1463	1455					2630	my $Col = $this->col_get($ColName);
1464
1465	1455					5006	return($Col);
1466							}
1467
1468							sub col_get
1469							{
1470	1458			1458	0	1572	my $this = shift;
1471	1458					3024	my ($ColName) = @_;
1472
1473	1458		66			3552	my $Col = ($this->{$ColName} \|\| $this->col_add($ColName));
1474
1475	1458					2721	return($Col);
1476							}
1477
1478							sub col_add
1479							{
1480	28			28	0	38	my $this = shift;
1481	28					38	my ($ColName) = @_;
1482	28					963	my $Col = $this->{$ColName} = $this->col_empty();
1483
1484	28					78	$this->fieldlist_add($ColName);
1485	28					88	return($Col);
1486							}
1487
1488							sub col_set_internal ## ($ColName, [$Vector], [$Force])
1489							{
1490	16			16	0	21	my $this = shift;
1491	16					27	my ($ColName, $Vector, $Force) = @_;
1492
1493	16					43	my $Valid = (ref($Vector) eq 'ARRAY');
1494	16					41	my $Existing = (ref($this->{$ColName}) eq 'ARRAY');
1495
1496							## Delete existing vector by this name...
1497	16	100	66			1398	if (!$Valid && $Existing)
		100	66
		50
1498							{
1499	1					4	$Vector = delete $this->{$ColName}; ## Delete and save to return to caller.
1500	1					5	$this->fieldlist_delete($ColName); ## Delete from field list if needed.
1501							}
1502							## ...or add one...
1503							elsif ($Valid && !$Existing)
1504							{
1505	11					34	$this->{$ColName} = $Vector;
1506
1507	11	50				31	if (!$Force)
1508							{
1509	11					41	$this->extend(); ## Extend all vectors as needed to ensure same length.
1510	11					52	$this->fieldlist_add($ColName); ## Add to custom field list if needed.
1511							}
1512							}
1513							## ...otherwise replace.
1514							elsif ($Valid)
1515							{
1516	4					8	$this->{$ColName} = $Vector;
1517
1518	4	50				15	if (!$Force)
1519							{
1520	4					14	$this->extend(); ## Extend all vectors as needed to ensure same length.
1521							}
1522							}
1523
1524	16					51	return($Vector); ## Return added or deleted vector for convenience.
1525							}
1526
1527							sub col_delete ## ($ColName)
1528							{
1529	1			1	0	3	my $this = shift;
1530	1					2	my ($ColName) = @_;
1531
1532	1					5	return($this->col_set_internal($ColName));
1533							}
1534
1535							sub col_set ## ($ColName, $Vector)
1536							{
1537	15			15	0	25	my $this = shift;
1538	15					142	my ($ColName, $Vector) = @_;
1539
1540	15					56	return($this->col_set_internal($ColName, $Vector));
1541							}
1542
1543							sub col_force ## ($ColName, $Vector)
1544							{
1545	0			0	0	0	my $this = shift;
1546	0					0	my ($ColName, $Vector) = @_;
1547
1548	0					0	return($this->col_set_internal($ColName, $Vector, 1));
1549							}
1550
1551							sub col_empty
1552							{
1553	38			38	0	93	my $this = shift;
1554	38					51	my ($Length) = @_;
1555
1556							## Default to table length. Or get length from sample column.
1557	38	100				179	$Length = $this->length() unless defined($Length);
1558	38	100				279	$Length = @$Length if ref($Length) eq 'ARRAY';
1559
1560	38					82	my $Col = [];
1561	38					141	$#$Col = $Length - 1;
1562
1563	38					108	return($Col);
1564							}
1565
1566							sub col_default
1567							{
1568	1			1	0	3	my $this = shift;
1569	1					4	my $Col = $this->col_empty();
1570
1571	1					13	return($Col);
1572							}
1573
1574							sub cols ## ($ColNames)
1575							{
1576	33			33	0	58	my $this = shift;
1577	33					47	my ($ColNames) = @_;
1578	33		66			101	$ColNames \|\|= $this->fieldlist();
1579	33					88	my $Cols = [map {$this->col($_)} @$ColNames];
	106					229
1580
1581	33					6896	return($Cols);
1582							}
1583
1584							sub cols_hash ## ($ColNames)
1585							{
1586	25			25	0	50	my $this = shift;
1587	25					43	my ($ColNames) = @_;
1588	25		66			108	$ColNames \|\|= $this->fieldlist();
1589	25					243	my $Cols = $this->cols($ColNames);
1590	25					51	my $ColsHash = {}; @$ColsHash{@$ColNames} = @$Cols;
	25					107
1591
1592	25					249	return($ColsHash);
1593							}
1594
1595							sub col_exists ## ($ColName, [$FieldList])
1596							{
1597	24			24	0	47	my $this = shift;
1598	24					43	my ($ColName, $FieldList) = @_;
1599
1600							## Default list to search is ALL fields in object.
1601	24		66			171	$FieldList \|\|= $this->fieldlist_all();
1602
1603							## Disallow column names starting with underscore.
1604	24	50				67	return(0) if $ColName =~ /^_/;
1605
1606	24					140	my $FieldHash = $this->fieldlist_hash($FieldList);
1607	24					52	my $Exists = exists($FieldHash->{$ColName});
1608
1609	24					133	return($Exists);
1610							}
1611
1612							sub col_active ## ($ColName, [$FieldList])
1613							{
1614	8			8	0	17	my $this = shift;
1615	8					12	my ($ColName, $FieldList) = @_;
1616
1617							## Default list to search is only ACTIVE fields in object.
1618	8		33			31	$FieldList \|\|= $this->fieldlist();
1619
1620							## Disallow column names starting with underscore.
1621	8	50				20	return(0) if $ColName =~ /^_/;
1622
1623	8					19	my $FieldHash = $this->fieldlist_hash($FieldList);
1624	8					17	my $Exists = exists($FieldHash->{$ColName});
1625
1626	8					79	return($Exists);
1627							}
1628
1629							sub col_rename ## ($Old => $New, [$Old => New...])
1630							{
1631	3			3	0	19	my $this = shift;
1632
1633	3					5	my $Success;
1634
1635	3					10	my $Fields = $this->fieldlist_all();
1636
1637	3					6	my ($Old, $New);
1638	3					19	while (($Old, $New) = splice(@_, 0, 2))
1639							{
1640	3	50				16	$this->warn("Invalid column name: $New"), next
1641							unless ($New =~ /^[^_]+/);
1642
1643	3	50				13	$this->warn("Column to be renamed does not exist: $Old"), next
1644							unless $this->col_exists($Old, $Fields);
1645
1646	3	50	0			11	(($Old ne $New) && $this->warn("Failed to rename column $Old to $New: $New exists.")), next
1647							if $this->col_exists($New, $Fields);
1648
1649	3					9	my $Col = $this->col($Old); ## Creates if not present.
1650
1651							## Rename the column...
1652	3					10	$this->{$New} = delete $this->{$Old};
1653
1654							## Then make the same change to _FieldList, _SortOrder, _SortSpecs
1655
1656	3	100				14	$this->{_FieldList} = [map {$_ = $New if $_ eq $Old; $_} @{$this->{_FieldList}}] if (defined($this->{_FieldList}));
	10	50				25
	10					24
	3					7
1657	3	0				13	$this->{_SortOrder} = [map {$_ = $New if $_ eq $Old; $_} @{$this->{_SortOrder}}] if (defined($this->{_SortOrder}));
	0	50				0
	0					0
	0					0
1658	3	50	33			31	$this->{_SortSpecs}->{$New} = delete $this->{_SortSpecs}->{$Old} if (defined($this->{_SortSpecs}) &&
1659							( $this->{_SortSpecs}->{$Old}));
1660							}
1661
1662	3					7	$Success = 1;
1663	3					11	done:
1664							return($Success);
1665							}
1666
1667							=pod
1668
1669							=head1 CLEANUP AND VALIDATION
1670
1671							## Performing your own cleanups or validations
1672
1673							$t->clean($Sub) ## Clean with custom subroutine
1674							$t->clean($Sub, $Fields) ## Clean specified columns only
1675
1676							## Cleaning whitespace
1677
1678							$t->clean_ws() ## Clean whitespace in fieldlist() cols
1679							$t->clean_ws($Fields) ## Clean whitespace in specified cols
1680
1681							## Cleaning methods that map character sets
1682
1683							$t->clean_mac_to_iso8859()
1684							$t->clean_mac_to_iso8859($Fields)
1685
1686							$t->clean_iso8859_to_mac()
1687							$t->clean_iso8859_to_mac($Fields)
1688
1689							## Character mapping utilities (not methods)
1690
1691							use Data::CTable qw(
1692							ISORoman8859_1ToMacRoman
1693							MacRomanToISORoman8859_1
1694
1695							ISORoman8859_1ToMacRoman_clean
1696							MacRomanToISORoman8859_1_clean
1697							);
1698
1699							&ISORoman8859_1ToMacRoman(\ $Str) ## Pass pointer to buffer
1700							&MacRomanToISORoman8859_1(\ $Str) ## Pass pointer to buffer
1701
1702							&ISORoman8859_1ToMacRoman_clean() ## Operates on $_
1703							&MacRomanToISORoman8859_1_clean() ## Operates on $_
1704
1705							One of the most important things you can do with your data once it's
1706							been placed in a table in Perl is to use the power of Perl to scrub it
1707							like crazy.
1708
1709							The built-in clean_ws() method applies a standard white-space cleanup
1710							to selected records in every field in the fieldlist() or other list of
1711							fields you optionally supply (such as fieldlist_all()).
1712
1713							It does the following cleanups that are deemed correct for the
1714							majority of data out there:
1715
1716							- Remove all leading whitespace, including returns (\n)
1717							- Remove all trailing whitespace, including returns (\n)
1718							- Convert runs of spaces to a single space
1719							- Convert empty string values back to undef to save space
1720
1721							Of course, depending on your data, clean_ws() might just be the first
1722							thing you do in your cleanup pass. There might be many more cleanups
1723							you'd like to apply.
1724
1725							clean() is like clean_ws() except you supply as the first argument
1726							your own cleaning subroutine (code reference). It should do its work
1727							by modifying $_.
1728
1729							Both clean_ws() and clean() apply cleaning ONLY to selected records.
1730							If this isn't what you want, then select_all() before cleaning.
1731
1732							Since a cleanup subroutine can do ANY modifications to a field that it
1733							likes, you can imagine some cleanup routines that also supply default
1734							values and do other validations.
1735
1736							For example, a cleanup routine could convert every value in each field
1737							to an integer, or apply minimum or maximum numerical limits:
1738
1739							sub {$_ = int($_) }
1740							sub {$_ = max(int($_), 0) }
1741							sub {$_ = min(int($_), 200)}
1742
1743							Or your cleanup routine could use regular expressions to do
1744							capitalizations or other regularizations of data:
1745
1746							sub Capitalize {/\b([a-z])([a-z]+)\b)/\U$1\E$2/g}
1747
1748							$t->clean(\ &Capitalize , ['FirstName', 'LastName']);
1749							$t->clean(\ &PhoneFormat, ['Phone', 'Fax' ]);
1750							$t->clean(\ &LegalUSZip, ['HomeZip', 'WorkZip' ]);
1751
1752							... and so on. Cleanups are easy to write and quick and easy to apply
1753							with Data::CTable. Do them early! Do them often!
1754
1755							=head2 Hints for writing cleanup routines
1756
1757							If your cleanup routine may be used to clean up fields that could be
1758							empty/undef and empty/undef is a legal value, it should not touch any
1759							undef values (unintentionally converting them to strings).
1760
1761							Finally, instead of setting any values to the empty string, it should
1762							set them to undef instead. This includes any values it might have
1763							left empty during cleanup. (Using undef instead of empty string to
1764							represent empty values is one way that Data::CTable likes to save
1765							memory in tables that may have lots of those.)
1766
1767							For an example of a well-behaved cleanup routine, consider the
1768							following implementation of the builtin CleanWhitespace behavior:
1769
1770							sub CleanWhitespace
1771							{
1772							return unless defined; ## Empty/undef values stay that way
1773							s/ \s+$//sx; ## Remove trailing whitespace
1774							s/^\s+ //sx; ## Remove leading whitespace
1775							s/ +/ /g; ## Runs of spaces to single space
1776							$_ = undef unless length; ## (Newly?) empty strings to undef
1777							}
1778
1779							=head2 Roman character set mapping
1780
1781							The character set mapping cleanup routines can be used to convert
1782							upper-ASCII characters bidirectionally between two popular Roman
1783							Character sets -- Mac Roman 1 and ISO 8859-1 (also sometimes called
1784							ISO Latin 1) -- i.e. the Western European Roman character sets.
1785
1786							By default, read() converts all incoming data fields in data files
1787							with Mac line endings to ISO format when reading in. Conversely,
1788							write() does the reverse mapping (ISO to Mac) when writing a file with
1789							Mac line endings.
1790
1791							However, you may wish to turn off these default behaviors and instead
1792							apply the mappings manually, possibly just to certain fields.
1793
1794							For example, if a table contains fields with non-Roman character sets,
1795							you would definitely not want to apply these mappings, and instead
1796							might want to apply some different ones that you create yourself.
1797
1798							=head2 Utility routines for character mapping
1799
1800							This module can optionally export four utility routines for mapping
1801							character Latin 1 character sets. Always be sure to map the correct
1802							direction -- otherwise you'll end up with garbage! Be careful to only
1803							pass Western Roman strings -- not double-byte strings or strings
1804							encoded in any single-byte Eastern European Roman or non-Roman
1805							character set.
1806
1807							&ISORoman8859_1ToMacRoman(\ $Str) ## Pass pointer to buffer
1808							&MacRomanToISORoman8859_1(\ $Str) ## Pass pointer to buffer
1809
1810							These routines translate characters whose values are 128-255 from one
1811							Western Roman encoding to another. The argument is a string buffer of
1812							any size passed by reference.
1813
1814							The functions return a count of the number of characters that were
1815							mapped (zero or undef if none were).
1816
1817							&ISORoman8859_1ToMacRoman_clean() ## Operates on $_
1818							&MacRomanToISORoman8859_1_clean() ## Operates on $_
1819
1820							These routines are variants of the above, but they're versions that
1821							are compatible with clean() -- they operate on $_ and will take care
1822							to leave undefined values undefined. They do not have return values.
1823
1824							=head2 More advanced cleaning and validation
1825
1826							Unfortunately, clean() only lets you operate on a single field value
1827							at a time -- and there's no way to know the record number or other
1828							useful information inside the cleaning routine.
1829
1830							For really powerful cleaning and validation involving access to all
1831							fields of a record as well as record numbers, see the discussion of
1832							the calc() method and other methods for doing complex field
1833							calculations in the next section.
1834
1835							=cut
1836
1837							sub clean
1838							{
1839	8			8	0	13	my $this = shift;
1840	8					13	my ($Sub, $Fields) = @_;
1841
1842							## Default is fields in the list.
1843	8		33			36	$Fields \|\|= $this->fieldlist();
1844
1845	8					25	my $Sel = $this->selection();
1846
1847	8					19	foreach (@$Fields) {foreach (@{$this->col($_)}[@$Sel]) {&$Sub()}};
	38					67
	38					74
	114					222
1848	8					35	return 1;
1849							}
1850
1851							sub clean_ws
1852							{
1853	6			6	0	28	my $this = shift;
1854	6					25	return($this->clean(\ &CleanWhitespace, @_));
1855							}
1856
1857							sub CleanWhitespace
1858							{
1859	84	50		84	0	151	return unless defined; ## Empty/undef values stay that way
1860	84					313	s/ \s+$//sx; ## Remove trailing whitespace
1861	84					138	s/^\s+ //sx; ## Remove leading whitespace
1862	84					145	s/ +/ /g; ## Runs of spaces to single space
1863	84	50				224	$_ = undef unless length; ## (Newly?) empty strings to undef
1864							}
1865
1866							sub clean_mac_to_iso8859
1867							{
1868	1			1	0	1033	my $this = shift;
1869	1					6	return($this->clean(\ &MacRomanToISORoman8859_1_clean, @_));
1870							}
1871
1872							sub clean_iso8859_to_mac
1873							{
1874	0			0	0	0	my $this = shift;
1875	0					0	return($this->clean(\ &ISORoman8859_1ToMacRoman_clean, @_));
1876							}
1877
1878							sub ISORoman8859_1ToMacRoman
1879							{
1880	20			20	0	23	return($ {$_[0]} =~
	20					49
1881
1882							tr/\x80-\xFF/\xDE\xDF\xE2\xC4\xE3\xC9\xA0\xE0\xF6\xE4\xBA\xDC\xCE\xAD\xB3\xB2\xB0\xD4\xD5\xD2\xD3\xA5\xF8\xD1\xF7\xAA\xF9\xDD\xCF\xF0\xDA\xD9\xCA\xC1\xA2\xA3\xDB\xB4\xF5\xA4\xAC\xA9\xBB\xC7\xC2\xD0\xA8\xC3\xA1\xB1\xFA\xFE\xAB\xB5\xA6\xE1\xFC\xFF\xBC\xC8\xC5\xFD\xFB\xC0\xCB\xE7\xE5\xCC\x80\x81\xAE\x82\xE9\x83\xE6\xE8\xED\xEA\xEB\xEC\xC6\x84\xF1\xEE\xEF\xCD\x85\xD7\xAF\xF4\xF2\xF3\x86\xB7\xB8\xA7\x88\x87\x89\x8B\x8A\x8C\xBE\x8D\x8F\x8E\x90\x91\x93\x92\x94\x95\xB6\x96\x98\x97\x99\x9B\x9A\xD6\xBF\x9D\x9C\x9E\x9F\xBD\xB9\xD8/);
1883
1884							}
1885
1886							sub ISORoman8859_1ToMacRoman_clean
1887							{
1888	0	0		0	0	0	return unless defined; ## Empty/undef values stay that way
1889
1890	0					0	tr/\x80-\xFF/\xDE\xDF\xE2\xC4\xE3\xC9\xA0\xE0\xF6\xE4\xBA\xDC\xCE\xAD\xB3\xB2\xB0\xD4\xD5\xD2\xD3\xA5\xF8\xD1\xF7\xAA\xF9\xDD\xCF\xF0\xDA\xD9\xCA\xC1\xA2\xA3\xDB\xB4\xF5\xA4\xAC\xA9\xBB\xC7\xC2\xD0\xA8\xC3\xA1\xB1\xFA\xFE\xAB\xB5\xA6\xE1\xFC\xFF\xBC\xC8\xC5\xFD\xFB\xC0\xCB\xE7\xE5\xCC\x80\x81\xAE\x82\xE9\x83\xE6\xE8\xED\xEA\xEB\xEC\xC6\x84\xF1\xEE\xEF\xCD\x85\xD7\xAF\xF4\xF2\xF3\x86\xB7\xB8\xA7\x88\x87\x89\x8B\x8A\x8C\xBE\x8D\x8F\x8E\x90\x91\x93\x92\x94\x95\xB6\x96\x98\x97\x99\x9B\x9A\xD6\xBF\x9D\x9C\x9E\x9F\xBD\xB9\xD8/;
1891							}
1892
1893							sub MacRomanToISORoman8859_1
1894							{
1895	61			61	0	87	return($ {$_[0]} =~
	61					163
1896
1897							tr/\x80-\xFF/\xC4\xC5\xC7\xC9\xD1\xD6\xDC\xE1\xE0\xE2\xE4\xE3\xE5\xE7\xE9\xE8\xEA\xEB\xED\xEC\xEE\xEF\xF1\xF3\xF2\xF4\xF6\xF5\xFA\xF9\xFB\xFC\x86\xB0\xA2\xA3\xA7\x95\xB6\xDF\xAE\xA9\x99\xB4\xA8\x8D\xC6\xD8\x90\xB1\x8F\x8E\xA5\xB5\xF0\xDD\xDE\xFE\x8A\xAA\xBA\xFD\xE6\xF8\xBF\xA1\xAC\xAF\x83\xBC\xD0\xAB\xBB\x85\xA0\xC0\xC3\xD5\x8C\x9C\xAD\x97\x93\x94\x91\x92\xF7\xD7\xFF\x9F\x9E\xA4\x8B\x9B\x80\x81\x87\xB7\x82\x84\x89\xC2\xCA\xC1\xCB\xC8\xCD\xCE\xCF\xCC\xD3\xD4\x9D\xD2\xDA\xDB\xD9\xA6\x88\x98\x96\x9A\xB2\xBE\xB8\xBD\xB3\xB9/);
1898
1899							}
1900
1901							sub MacRomanToISORoman8859_1_clean
1902							{
1903	15	50		15	0	28	return unless defined; ## Empty/undef values stay that way
1904
1905	15					63	tr/\x80-\xFF/\xC4\xC5\xC7\xC9\xD1\xD6\xDC\xE1\xE0\xE2\xE4\xE3\xE5\xE7\xE9\xE8\xEA\xEB\xED\xEC\xEE\xEF\xF1\xF3\xF2\xF4\xF6\xF5\xFA\xF9\xFB\xFC\x86\xB0\xA2\xA3\xA7\x95\xB6\xDF\xAE\xA9\x99\xB4\xA8\x8D\xC6\xD8\x90\xB1\x8F\x8E\xA5\xB5\xF0\xDD\xDE\xFE\x8A\xAA\xBA\xFD\xE6\xF8\xBF\xA1\xAC\xAF\x83\xBC\xD0\xAB\xBB\x85\xA0\xC0\xC3\xD5\x8C\x9C\xAD\x97\x93\x94\x91\x92\xF7\xD7\xFF\x9F\x9E\xA4\x8B\x9B\x80\x81\x87\xB7\x82\x84\x89\xC2\xCA\xC1\xCB\xC8\xCD\xCE\xCF\xCC\xD3\xD4\x9D\xD2\xDA\xDB\xD9\xA6\x88\x98\x96\x9A\xB2\xBE\xB8\xBD\xB3\xB9/;
1906
1907							}
1908
1909							=pod
1910
1911							=head1 CALCULATIONS USING calc()
1912
1913							## Calculate a new field's values based on two others
1914
1915							$t->calc($Sub) ## Run $Sub for each row, with
1916							## fields bound to local vars
1917
1918							$t->calc($Sub, $Sel) ## Use these row nums
1919							$t->calc($Sub, undef, $Fields) ## Use only these fields
1920							$t->calc($Sub, $Sel, $Fields) ## Use custom rows, fields
1921
1922							my $Col = $t->calc($Sub) ## Gather return vals in vector
1923
1924
1925							## Example 1: Overwrite values in an existing column.
1926
1927							$t->calc(sub{no strict 'vars'; $Size = (stat($Path))[7]});
1928
1929
1930							## Example 2: Create empty column; fill fields 1 by 1
1931
1932							$t->col('PersonID');
1933							$t->calc(sub{no strict 'vars'; $PersonID = "$Last$First"});
1934
1935
1936							## Example 3: Calculate values; put into to table if desired
1937
1938							$PersonID = $t->calc(sub{no strict 'vars'; "$Last$First"});
1939							$t->sel('PersonID', $PersonID);
1940
1941
1942							## Example 4: Using fully-qualified variable names
1943
1944							$t->calc(sub{$main::PersonID = "$main::Last$main::First"});
1945
1946							calc() runs your custom calculation subroutine $Sub once for every row
1947							in the current selection() or other list of row numbers that you
1948							specify in the optional $Sel argument.
1949
1950							This lets you apply a complex calculation to every record in a table
1951							in a single statement, storing the results in one or more columns, or
1952							retrieving them as a list.
1953
1954							Your custom subroutine may refer to the value in any field in the
1955							current row by using a global variable with the field's name:
1956
1957							For example, if the table has fields First, Last, and Age, then $Sub
1958							may use, modify, and set the variables $First, $Last, $Age. (Also
1959							known as $main::First, $main::Last, $main::Age).
1960
1961							Modifying any of these specially-bound variables actually modifies the
1962							data in the correct record and field within the table.
1963
1964							By default, the fields available to $Sub are all fields in the table.
1965							calc() must bind all the field names for you for each row, which can
1966							be time-consuming for tables with very large numbers of fields.
1967
1968							You can speed up the operation of calc() by listing only the fields
1969							your $Sub needs in the optional parameter $Fields. Any field names
1970							you don't mention won't be available to $Sub. Conversely, calc() will
1971							run faster because it can bind only the fields you actually need.
1972
1973							If you include non-existent fields in your custom $Fields list, calc()
1974							creates them for you before $Sub runs the first time. Then your $Sub
1975							can store field values into the new column, referring to it by name.
1976
1977							Variables in $Sub are in the "main" package. So you should set $Sub
1978							to use pacakge "main" in case the rest of your code is not in "main".
1979
1980							Similarly, if you "use strict", Perl will complain about global
1981							variables in $Sub. So you may need to assert "no strict 'vars'".
1982
1983							{ package Foo; use strict;
1984
1985							$t = ...;
1986							$t->calc(sub {package main; no strict 'vars';
1987							$Age = int($Age)});
1988							}
1989
1990							## Or this:
1991
1992							{ package Foo; use strict;
1993
1994							$t = ...;
1995							{ package main; no strict 'vars';
1996							my $Sub = sub {$Age = int($Age)};
1997							}
1998							$t->calc($Sub);
1999							}
2000
2001							You may be able to get around both problems more easily by prefixing
2002							each variable reference in $Sub with "main::". This takes care of the
2003							package name issue and bypasses "use strict" at the same time, at the
2004							slight cost of making the calculation itself a bit harder to read.
2005
2006							$t->calc(sub {$main::Age = int($main::Age)}); ## OK in any package
2007
2008							In addition to the field names, the following three values are defined
2009							during each invocation of $Sub:
2010
2011							$_r ($main::_r) -- the row number in the entire table
2012							$_s ($main::_s) -- the item number in selection or $Recs
2013							$_t ($main::_t) -- the table object itself
2014
2015							You could use these values to print diagnostic information or to
2016							access any of the data, parameters, or methods of the table itself
2017							from within $Sub. Or you could even calculate field values using $_r
2018							or $_s.
2019
2020							For example, after searching & sorting, you could make a field which
2021							preserves the resulting sort order for future reference:
2022
2023							$t->col('Ranking'); ## Create the empty column first.
2024							$t->calc(sub{$main::Ranking = $main::_s});
2025
2026							This last example is equivalent to:
2027
2028							$t->sel(Ranking => [0 .. $#{$t->selection()}]); ## See sel() below
2029
2030
2031							=head1 "MANUAL" CALCULATIONS
2032
2033							calc() (see previous secion) is the briefer, more elegant way to do
2034							batch calculations on entire columns in a table, but it can be
2035							slightly slower than doing the calculations yourself.
2036
2037							If you have extremely large tables, and you notice the processing time
2038							for your calculations taking more than a second, you might want to
2039							rewrite your calculations to use the more efficient techniques shown
2040							here.
2041
2042							You will often need to create new calculated columns based on one or
2043							more existing ones, and then either insert the columns back in the
2044							tables or use them for further calculations or indexing.
2045
2046							Examples 1a and 1b create a new field 'NameOK' containing either the
2047							string "OK" or undef (empty) depending on whether the field 'Name' is
2048							empty. Just use map() to iterate over the existing values in the
2049							other column, binding $_ to each value in turn.
2050
2051							### Example 1a: Calculation based on one other column
2052
2053							## Retrieve column
2054							my $Name = $t->col('Name');
2055
2056							## Make new column
2057							my $NameOK = [map {!!length && 'OK'} @$Name];
2058
2059							## Insert column back into table:
2060							$t->col(NameOK => $NameOK);
2061
2062							### Example 1b: Same calculation, in a single statement:
2063
2064							$t->col(NameOK => [map {!!length && 'OK'} @{$t->col('Name')}]);
2065
2066							In order to iterate over MULTIPLE columns at once, you need a list of
2067							the row numbers generated by $t->all() so you can index the two
2068							columns in tandem. Then, you use map to bind $_ to each row number,
2069							and then use the expression $t->col($ColName)->[$_] to retreive each
2070							value.
2071
2072							Examples 2a and 2b demonstrate this method. They create a new field
2073							'FullName' which is a string joining the values in the 'First' and
2074							'Last' columns with a space between.
2075
2076							### Example 2a: Calculation based on multiple columns
2077
2078							## Retrieve columns
2079							my $First = $t->col('First');
2080							my $Last = $t->col('Last' );
2081
2082							## Retreive row nums
2083							my $Nums = $t->all();
2084
2085							## Calculate a new column based on two others
2086							my $Full = [map {"$First->[$_] $Last->[$_]"} @$Nums];
2087
2088							## Add new column to the table
2089							$t->col(FullName => $Full);
2090
2091							### Example 2b: Same calculation, in a single statement:
2092
2093							$t->col(FullName =>
2094							[map {"$t->col('First')->[$_] t->col('Last')->[$_]"}
2095							@{$t->all()}]);
2096
2097							In examples 1 and 2, you create entirely new columns and then add or
2098							replace them in the table.
2099
2100							Using the approach in Examples 3a and 3b, you can assign calculated
2101							results directly into each value of an existing column as you go.
2102
2103							## Example 3a: Calculate by assigning directly into fields...
2104
2105							my $A = $t->col->('A'); ## This column will be modified
2106							my $B = $t->col->('B');
2107							my $C = $t->col->('C');
2108
2109							foreach @($t->all()) {$A->[$_] = $B->[$_] + $C->[$_];}
2110
2111							## Example 3b: Same calculation, in a single statement:
2112
2113							foreach @($t->all()) {($t->col('A')->[$_] =
2114							$t->col('B')->[$_] +
2115							$t->col('C')->[$_])};
2116
2117							Before writing your code, think about which calculation paradigms best
2118							suit your needs and your data set.
2119
2120							Just as Perl Hackers know: There's More Than One Way To Do It!
2121
2122							=cut
2123
2124							{}; ## Get emacs to indent correctly.
2125
2126							sub calc
2127							{
2128							## We operate in package main for this subroutine so the local
2129							## object and row number can be available to the caller's $Sub.
2130
2131							## The following local vars will be available to $Sub
2132							## $_r ($main::_r) -- the row number in the entire table
2133							## $_s ($main::_s) -- the row number in selection or $Recs
2134							## $_t ($main::_t) -- the table object itself
2135
2136							package main;
2137	29			29		487	use vars qw($_r $_s $_t);
	29					71
	29					8095
2138	3			3	0	26	local ($_r, $_s, $_t);
2139
2140	3					6	$_t = shift;
2141	3					8	my ($Sub, $Recs, $Fields) = @_;
2142
2143							## These optional params default to current field and current sel
2144
2145	3		33			23	$Recs \|\|= $_t->selection();
2146	3		33			21	$Fields \|\|= $_t->fieldlist_all();
2147
2148							## Local copy of symbol table. Didn't seem to help. Odd.
2149							## local %main::;
2150
2151							## We'll build a column of return values from $Sub if needed.
2152	3					7	my $WantVals = defined(wantarray);
2153	3	100				16	my $Vals = $_t->col_empty() if $WantVals;
2154
2155							## Call col() on each field in list to make sure it exists.
2156	3					9	foreach (@$Fields) {$_t->col($_)};
	15					35
2157
2158	3					208	foreach $_s (0..$#$Recs)
2159							{
2160	9					33	$_r = $Recs->[$_s];
2161
2162							## Bind $FieldName1, $FieldName2, (etc. for each field name in
2163							## $Fields) point to address of the current value for that
2164							## field in this record.
2165
2166	29			29		178	no strict 'refs';
	29					58
	29					181568
2167	9					16	foreach my $F (@$Fields) {*{$F} = \ $_t->{$F}->[$_r]};
	45					76
	45					126
2168
2169							## Now $Sub may refer to $_r, $_s, $_t, and ${any field name}
2170
2171							## Call $Sub and capture return values iff caller wants them
2172	9	100				34	($WantVals ? $Vals->[$_r] = &$Sub() : &$Sub());
2173							}
2174
2175							## Return scalar column ref unless return context is undef
2176	3	100				36	return($WantVals ? $Vals : ());
2177							}
2178
2179							=pod
2180
2181							=head1 INDEXES
2182
2183							## Make indexes of columns or just selected data in columns
2184
2185							my $Index1 = $t->index_all($Key); ## entire column
2186							my $Index2 = $t->index_sel($Key); ## selected data only
2187
2188							## Make hashes of 2 columns or just selected data in columns
2189
2190							my $Index1 = $t->hash_all($KeyFld, $ValFld); ## entire column
2191							my $Index2 = $t->hash_sel($KeyFld, $ValFld); ## selected data
2192
2193							index_all() creates and returns a hash (reference) that maps keys
2194							found in the column called $Key to corresponding record numbers.
2195
2196							Ideally, values in $Key would be unique (that's up to you).
2197
2198							If any values in $Key are NOT unique, then later values (higher record
2199							numbers) will be ignored.
2200
2201							index_sel() creates and returns a hash (ref) that maps keys found in
2202							the SELECTED RECORDS of column $Key to corresponding record numbers.
2203
2204							Any keys in unselected records are ignored. Otherwise, the behavior
2205							is equivalent to index_all().
2206
2207							hash_all() and hash_sel() are similar, except they create and return
2208							hashes whose keys are taken from column $KeyFld, but whose values are
2209							from $ValFld in the corresponding records.
2210
2211							So, for example, imagine you have a tab-delimited file on disk with
2212							just a single tab per line (2 fields) and no header row. The entries
2213							on the left side of the tab on each line are keys and the right side
2214							are values. You could convert that file into a hash in memory like
2215							this:
2216
2217							my $t = Data::CTable->new({_HeaderRow=>0, _FieldList=>[qw(F1 F2)]},
2218							"DeptLookup.txt");
2219
2220							my $DeptLookup = $t->hash_all(qw(F1 F2));
2221
2222							=head2 Reverse indexes
2223
2224							If you'd like an index mapping record number to key, just get
2225							$t->col($Key). That's what the data columns in Data::CTable are.
2226
2227							=cut
2228
2229							sub index_all
2230							{
2231	2			2	0	3897	my $this = shift;
2232	2					12	my ($Key) = @_;
2233
2234	2					5	my $Index = {}; @$Index{reverse @{$this->col($Key)}} = reverse @{$this->all()};
	2					6
	2					6
	2					10
2235
2236	2					24	return($Index);
2237							}
2238
2239							sub index_sel
2240							{
2241	2			2	0	4	my $this = shift;
2242	2					5	my ($Key) = @_;
2243
2244	2					5	my $Index = {}; @$Index{reverse @{$this->sel($Key)}} = reverse @{$this->selection()};
	2					3
	2					8
	2					6
2245
2246	2					20	return($Index);
2247							}
2248
2249
2250							sub hash_all
2251							{
2252	2			2	0	5	my $this = shift;
2253	2					4	my ($Key, $Val) = @_;
2254
2255	2					4	my $Hash = {}; @$Hash{reverse @{$this->col($Key)}} = reverse @{$this->col($Val)};
	2					5
	2					5
	2					7
2256
2257	2					11836	return($Hash);
2258							}
2259
2260							sub hash_sel
2261							{
2262	2			2	0	6	my $this = shift;
2263	2					4	my ($Key, $Val) = @_;
2264
2265	2					4	my $Hash = {}; @$Hash{reverse @{$this->sel($Key)}} = reverse @{$this->sel($Val)};
	2					5
	2					4
	2					21
2266
2267	2					29	return($Hash);
2268							}
2269
2270
2271							=pod
2272
2273							=head1 DATA ROWS (RECORDS)
2274
2275							## Getting or setting rows / records
2276
2277							$t->row($Num) ## Get a row or make empty one.
2278							$t->row_get($Num)
2279
2280							$t->row($Num, $HashRef) ## Set all of a row all at once.
2281							$t->row_set($Num, $HashRef)
2282
2283							$t->row_set($Num, undef) ## Delete a row completely
2284							$t->row_delete($Num)
2285							$t->row_delete($Beg, $End)## Delete a range of rows
2286
2287							$t->row_move($Old, $New) ## Move a row to before $New
2288
2289							$t->row_empty() ## An empty hash
2290							$t->row_exists($Num) ## True if $Num < $t->length()
2291
2292							$t->rows($RowList) ## Get list of multiple row nums
2293
2294							$t->row_list($Num) ## Get row vals as a list
2295							$t->row_list($Num, $Fields) ## Get row vals: specified fields
2296
2297							$t->row_list_set($Num, undef, $Vals) ## Set row vals as a list
2298							$t->row_list_set($Num, $Fields, $Vals) ## Set row vals as a list
2299							$t->row_list_set($Num, $Fields) ## Set vals to empty/undef
2300
2301							Usually, when working with Data::CTable objects, you are operating on
2302							entire columns or tables at a time (after all: any transformation you
2303							do on one record you almost always want to do on all records or all
2304							selected ones).
2305
2306							You should very rarely need to access data by retrieving rows or
2307							setting rows, moving them around individually, and so on. (It's much
2308							cleaner, and much more efficient to manipulate the selection() (list
2309							of selected row numbers) instead -- just delete a row number from the
2310							selection, for example, and then for most operations it's almost as if
2311							the row is gone from the table, except the data is really still
2312							there.)
2313
2314							However, if on rare occasions you really do need direct row
2315							operations, you're reading the right section.
2316
2317							A row is generally accessed as a hash. The hash you provide or get
2318							back is not saved by the object in any way. Data values are always
2319							copied in or out of it, so you always "own" the hash.
2320
2321							Rows are specified by $Num -- the row number with in the unsorted
2322							columns (the raw data in the table). These numbers are just array
2323							indices into the data columns, and so their legal range is:
2324
2325							[0 .. ($t->length() - 1)] ## (Zero-based row numbering.)
2326
2327							The row hash (or "record") has keys that are field names and values
2328							that are copies of the scalar values stored in the data columns within
2329							the table.
2330
2331							row() always copies only the fields in fieldlist(), except for
2332							row_list() which allows you to specify an optional $Fields parameter
2333							which can override the current fieldlist().
2334
2335							If the fieldlist happens to be a subset of all fields, but you really
2336							want to get all fields in your record, then call fieldlist_set(0)
2337							first to permanently or temporarily delete it.
2338
2339							row() and row_get() always return a hash.
2340
2341							row($Num, $Hash), row_set() take a hash and set just the fields you
2342							specify in the hash (in the given row of course). Any non-existent
2343							field names in the hash are created, so be careful.
2344
2345							In fact, in general with either getting or setting rows, any
2346							non-existent fields mentioned will be created for you (by internally
2347							calling col()). So you could build a whole table of 100 rows by
2348							starting with an empty, new, table and setting row 99 from a hash that
2349							gives the field names.
2350
2351							Setting a row number higher than any existing row number with row(),
2352							row_set() or row_force() will automatically set the new length of the
2353							entire table to match (extending all the columns with empty rows as
2354							necessary).
2355
2356							IMPORTANT: IF YOU SIMPLY MUST ADD ROWS SEQUENTIALLY, do not let the
2357							table auto-extend by one with each row you set. This is slow and gets
2358							rapidly slower if there's lots of data because the arrays holding the
2359							data columns will keep getting reallocated on every insert. Instead,
2360							first pre-extend the table to your highest row number by calling
2361							length($Len), and then set your rows. Or easier: if convenient just
2362							set your rows starting with the highest-numbered one first. If you
2363							don't know how many you'll have, guess or estimate and pre-extend to
2364							the estimated number and then cut back later. This will be faster
2365							than extending all columns by one each time.
2366
2367							row_delete() removes a row or range of rows completely from the table.
2368							Any rows above the deleted ones will move down and the table's
2369							length() will decrease. If the data columns are very large, this
2370							could be a bit slow because a lot of data could be moved around. The
2371							low and high row numbers will be limited for you to 0 and length() -
2372							1, respectively. Null ranges are OK and are silently ignored. The
2373							range is inclusive, so to delete just row 99, call row_delete(99) or
2374							row_delete(99,99).
2375
2376							EFFICIENCY NOTE: Don't call row_delete() to remove lots of individual
2377							rows. Instead, select those row numbers by setting the selection (if
2378							not already selected), and then invert the selection using
2379							selection_invert(), so the undesired rows are deselected, and then use
2380							the cull() method to rewrite the entire table at once. The deselected
2381							rows will be omitted very efficiently this way.
2382
2383							row_move() moves a row from its $Old row number to the position before
2384							the row currently in row $New (specify $New = length() to move the row
2385							to the end). Again, in shuffling data in columns, lots of data could
2386							get moved around by this operation, so expect it to be slow. If as
2387							with row_delete(), if you will be doing several moves, consider
2388							building an appropriate selection() first, and then using cull()
2389							instead.
2390
2391							Using row_delete() and row_move() to shift records around changes the
2392							record numbers of the affected records and many others in the table.
2393							The record numbers in the custom selection, if any, are updated to
2394							reflect these changes, so the records that were selected before will
2395							still be selected after the move (except those that were deleted of
2396							course). If you had a private copy of the selection, your copy will
2397							likely become outdated after these operations. You should get it
2398							again by calling selection().
2399
2400							row_empty() returns a hash whose keys are the entries in fieldlist()
2401							and whose values are undef. (You could use it to fill in values
2402							before calling row_set()). Note: in this class, row_empty() does
2403							exactly the same thing as fieldlist_hash() when the latter is called
2404							with no arguments.
2405
2406							row_exists() returns true if C<(($Num E= 0) && ($Num E $t-Elength()))>.
2407
2408							rows() calls row() for each row num in a list and returns a list of
2409							the resulting hashes.
2410
2411							row_list() gets row values as a list instead of a hash. They appear
2412							in the order specified in fieldlist() unless you supply an optional
2413							$Fields parameter listing the fields you want to get.
2414
2415							row_list_set() sets row values as a list instead of a hash. Pass your
2416							own $Fields list or undef and fieldlist() will be used. $Values
2417							should be a list with the same number of values as fields expected;
2418							any shortage will result in undef/empty values being set.
2419
2420							=cut
2421
2422							{}; ## Get emacs to indent correctly.
2423
2424							sub row
2425							{
2426	69			69	0	144	my $this = shift;
2427	69					177	my ($Num, $Row) = @_;
2428
2429							## Set if specified.
2430	69	50				188	return($this->row_set($Num, $Row)) if defined($Row);
2431
2432							## Else get.
2433	69					161	return($this->row_get($Num));
2434							}
2435
2436							sub row_get
2437							{
2438	69			69	0	82	my $this = shift;
2439	69					326	my ($Num) = @_;
2440
2441	69					138	my $Fields = $this->fieldlist();
2442	69					111	my $Row = {}; @$Row{@$Fields} = map {$this->col($_)->[$Num]} @$Fields;
	69					128
	270					1127
2443
2444	69					707	return($Row);
2445							}
2446
2447							sub row_set
2448							{
2449	3			3	0	18	my $this = shift;
2450	3					7	my ($Num, $Row) = @_;
2451
2452							## We thoughtfully sort the keys in case columns will get created
2453							## in this order.
2454
2455	3					25	my $Fields = [sort keys %$Row];
2456
2457							## Pre-extend the table to accommodate row $Num if necessary.
2458							## This will do nothing if there are not yet any fields in the
2459							## table (and the length will still be effectively zero).
2460
2461	3	100				13	$this->length($Num + 1) unless $this->length() >= $Num + 1;
2462
2463							## Insert into columns, creating them if necessary.
2464	3					8	foreach (@$Fields) {$this->col($_)->[$Num] = $Row->{$_}};
	10					31
2465
2466	3					11	return($Row); ## Why not?
2467							}
2468
2469							sub row_delete
2470							{
2471	10			10	0	310	my $this = shift;
2472	10					17	my ($First, $Last) = @_;
2473
2474							## Nothing to do if $First not specified.
2475	10	100				31	return() unless defined($First);
2476
2477							## Default is $Last is same as first (remove one row only)
2478	9	100				25	$Last = $First unless defined($Last);
2479
2480	9					22	my $LastIndex = $this->length() - 1;
2481
2482							## Restrict the range to meaningful values.
2483	9					231	$First = max($First, 0 ); ## First could be very high, to indicate a null range.
2484	9					46	$Last = min($Last, $LastIndex); ## Last could be negative, like -1, to indicate null range.
2485
2486							## Nothing to do if the range is empty.
2487	9	100				28	return() if $Last < $First;
2488
2489	8					18	my $Fields = $this->fieldlist();
2490
2491	8					15	my $RangeSize = ($Last - $First + 1);
2492
2493	8					24	foreach (@$Fields) {splice @{$this->col($_)}, $First, $RangeSize};
	32					39
	32					61
2494
2495							## Here we could have trapped all the list segments we spliced out
2496							## and return them in nice CTable-ish hash. Maybe we will some
2497							## day. This would be a way to split a range of rows out of a
2498							## table object to create another table object. We could even
2499							## call it the "split" method...
2500
2501							## After deleting the rows, we need to adjust the _Selection if
2502							## present. Deleted row numbers in the selection need to be
2503							## omitted; row numbers greater than the range of deleted ones
2504							## need to be decreased by the size of the range reduction, and
2505							## others need to be left untouched.
2506
2507	0	0				0	$this->{_Selection} =
		0
2508							[map
2509							{
2510	0					0	($_ < $First ? $_ : ## Before range: pass through.
2511							($_ <= $Last ? () : ## In range: omit.
2512							$_ - $RangeSize)) ## After range: reduce by range size
2513	8	50				39	} @{ $this->{_Selection}}]
2514							if $this->{_Selection};
2515							}
2516
2517							sub row_move
2518							{
2519	8			8	0	15	my $this = shift;
2520	8					12	my ($Old, $New) = @_;
2521
2522							## $Old and $New are required params and must not be undef.
2523	8	50	33			44	goto done unless defined($Old) && defined($New);
2524
2525							## If $Old and $New are the same or one apart, there's nothing to do.
2526	8	100				24	goto done if ($New == $Old); ## This would mean a no-op.
2527	7	100				26	goto done if ($New == $Old + 1); ## This would mean a no-op.
2528
2529	5					13	my $Length = $this->length();
2530
2531							## Ensure both $Old and $New are legal indices.
2532	5	50	33			36	goto done if (($Old < 0) \|\| ($Old > $Length - 1));
2533	5	50	33			26	goto done if (($New < 0) \|\| ($New > $Length )); ## New has a range up to $Length, meaning move to end.
2534
2535	5					16	my $Fields = $this->fieldlist_all();
2536
2537	5	100				20	if ($Old < $New) ## Move forward (to higher / later row num)
2538							{
2539							## Delete from the lower position and insert into higher -- MINUS ONE to account for prior shortening.
2540	2					5	my $Row = {};
2541	2					7	foreach (@$Fields) {$Row->{$_} = splice(@{$this->col($_)}, $Old , 1 )};
	8					10
	8					16
2542	2					6	foreach (@$Fields) { splice(@{$this->col($_)}, $New - 1, 0, $Row->{$_})};
	8					11
	8					15
2543
2544							}
2545							else ## Move backward (to lower / earlier row num)
2546							{
2547							## Delete from the higher position and insert into lower.
2548	3					6	my $Row = {};
2549	3					9	foreach (@$Fields) {$Row->{$_} = splice(@{$this->col($_)}, $Old , 1 )};
	12					13
	12					24
2550	3					9	foreach (@$Fields) { splice(@{$this->col($_)}, $New , 0, $Row->{$_})};
	12					14
	12					21
2551
2552							}
2553
2554							## After moving the rows, we need to adjust the _Selection if
2555							## present. Row numbers outside the shuffled range stay the same;
2556							## the moved row number(s) change; others (inside the range) get
2557							## shifted down or up by 1.
2558
2559	5	100				16	if ($Old < $New) ## Move forward / higher / later
2560							{
2561	6	100				27	$this->{_Selection} =
		50
		100
2562							[map
2563							{
2564	2					5	($_ == $Old ? $New - 1 : ## Moved row: change num to new - 1
2565							($_ < $Old ? $_ : ## Less than $Old: no change
2566							($_ >= $New ? $_ : ## Grtr= than $New: no change
2567							$_ - 1))) ## In range: shift down by 1.
2568	2	50				9	} @{ $this->{_Selection}}]
2569							if $this->{_Selection};
2570
2571							}
2572							else ## Move backward / lower / earlier
2573							{
2574	9	100				39	$this->{_Selection} =
		100
		100
2575							[map
2576							{
2577	3					7	($_ == $Old ? $New : ## Moved row: change num to new
2578							($_ >= $Old ? $_ : ## Grtr= than $Old: no change
2579							($_ < $New ? $_ : ## Less than $New: no change
2580							$_ + 1))) ## In range: shift up by 1.
2581	3	50				12	} @{ $this->{_Selection}}]
2582							if $this->{_Selection};
2583							}
2584
2585	8					24	done:
2586							return;
2587							}
2588
2589							sub row_empty
2590							{
2591	2			2	0	7	my $this = shift;
2592
2593	2					6	my $Fields = $this->fieldlist();
2594	2					4	my $Row = {}; @$Row{@$Fields} = undef;
	2					9
2595
2596	2					16	return($Row);
2597							}
2598
2599							sub row_exists
2600							{
2601	14			14	0	35	my $this = shift;
2602	14					27	my ($Num) = @_;
2603
2604	14		100			75	return(($Num >= 0) && ($Num < $this->length()));
2605							}
2606
2607							sub rows
2608							{
2609	4			4	0	13	my $this = shift;
2610	4					7	my ($Nums) = @_;
2611
2612	4					10	return([map {$this->row($_)} @$Nums]);
	8					21
2613							}
2614
2615							sub row_list
2616							{
2617	18			18	0	781	my $this = shift;
2618	18					31	my ($Num, $Fields) = @_;
2619
2620							## $Fields argument is optional and defaults to fieldlist();
2621	18		66			96	$Fields \|\|= $this->fieldlist();
2622
2623	18					32	my $Row = [map {$this->col($_)->[$Num]} @$Fields];
	88					226
2624	18					118	return($Row);
2625							}
2626
2627							sub row_list_set
2628							{
2629	6			6	0	37	my $this = shift;
2630	6					7	my ($Num, $Fields, $Vals) = @_;
2631
2632							## $Fields argument is optional and defaults to fieldlist();
2633	6		66			18	$Fields \|\|= $this->fieldlist();
2634
2635							## $Vals is optional and defaults to [].
2636	6		50			13	$Vals \|\|= [];
2637
2638							## Pre-extend the table to accommodate row $Num if necessary.
2639	6	50				10	$this->length($Num + 1) unless $this->length() >= $Num + 1;
2640
2641							## Set the $Vals in row $Num in the order given by $Fields.
2642	6					15	foreach (0..$#$Fields) {$this->col($Fields->[$_])->[$Num] = $Vals->[$_]}
	18					36
2643							}
2644
2645							=pod
2646
2647							=head1 ROW / RECORD COUNT (TABLE LENGTH)
2648
2649							## Getting or setting table length
2650
2651							$t->length() ## Get length
2652							$t->length_get()
2653
2654							$t->length(22) ## Set length (truncate or pre-extend)
2655							$t->length_set(22)
2656
2657							$t->extend() ## Set length of all columns to match longest
2658
2659							The length* methods assume the table already has columns of equal
2660							length. So the length of the table is the length of any field taken
2661							at random. We choose the first one in the field list.
2662
2663							Setting the length will truncate or pre-extend every column in the
2664							table to a given length as required.
2665
2666							(Pre-extending means setting each column's length via $# so that it
2667							has the correct number of entries already allocated (and filled with
2668							undef) so that operations that fill up the table can be done much more
2669							quickly than with push().
2670
2671							However, if a new column has been added directly, or a table has been
2672							constructed out of columns whose length may not initially match, the
2673							extend() method may be (should be) called to inspect all columns and
2674							extend them all to match the longest one. Note that extend() operates
2675							on all fields in the object, ignoring the custom _FieldList if any.
2676
2677							The length of a table with no columns is zero.
2678
2679							=cut
2680
2681							sub length
2682							{
2683	471			471	0	1149	my $this = shift;
2684	471					649	my ($Length) = @_;
2685
2686	471	100				1572	return(defined($Length) ?
2687							$this->length_set($Length) :
2688							$this->length_get());
2689							}
2690
2691							sub length_get
2692							{
2693	469			469	0	548	my $this = shift;
2694
2695	469					1781	my $FieldList = $this->fieldlist();
2696	469					905	my $FirstField = $FieldList->[0];
2697	469					792	my $Col = $this->{$FirstField};
2698	469	100				1240	my $Length = (ref($Col) eq 'ARRAY' ? @$Col+0 : 0);
2699
2700	469					1967	return($Length);
2701							}
2702
2703							sub length_set
2704							{
2705	26			26	0	73	my $this = shift;
2706	26					41	my ($Length) = @_;
2707
2708							## Apply the length-setting logic to any field found in the hash
2709							## OR listed in the field list. They will all be created if not
2710							## already present.
2711
2712	26					963	my $FieldList = [@{$this->fieldlist_all()}, @{$this->fieldlist()}];
	26					63
	26					1153
2713
2714	26					82	foreach my $FieldName (@$FieldList)
2715							{
2716	223					406	$#{$this->col($FieldName)} = ($Length - 1); ## $Length = 0 => $# = -1 => empty list.
	223					385
2717							};
2718
2719							## Since records might have been deleted, re-validate the
2720							## _Selection, if it is present.
2721
2722	26					94	$this->selection_validate();
2723
2724	26					76	return($Length);
2725							}
2726
2727							sub extend
2728							{
2729	22			22	0	33	my $this = shift;
2730	22					43	my $Length = 0;
2731
2732							## Find the length of the longest vector...
2733
2734	22					34	foreach (@{$this->fieldlist_all()}) {$Length = max($Length, $#{$this->{$_}} + 1)};
	22					217
	114					135
	114					307
2735
2736							## ...and set them all to be that length.
2737
2738	22					95	$this->length_set($Length);
2739							}
2740
2741							=pod
2742
2743							=head1 SELECTIONS
2744
2745							## Getting or setting the custom selection list itself (_Selection)
2746
2747							$t->selection() ## Get sel if any; else all()
2748							$t->selection_get()
2749
2750							$t->selection($List) ## Set sel (list of rec nums)
2751							$t->selection_set($List)
2752
2753							$t->selection(0) ## Remove sel (select all)
2754							$t->selection_set(undef)
2755							$t->selection_delete()
2756							$t->select_all()
2757
2758							$t->selection_inverse() ## Get inverse copy of selection
2759							$t->select_inverse() ## Invert the selection
2760
2761							$t->selection_validate() ## Remove invalid #s from sel
2762
2763							## List of all rec nums present (regardless of selection)
2764
2765							$t->all()
2766
2767							## Getting or setting just selected fields in columns
2768							## (as contrasted with col() and friends).
2769
2770							$t->sel($ColName) ## Get col but only records in sel
2771							$t->sel_get($ColName)
2772
2773							$t->sel($ColName, $ListRef) ## Set selected fields in col...
2774							$t->sel_set($ColName, $ListRef) ##... in selection order
2775
2776							$t->sel_set($ColName) ## Set selected fields to undef
2777							$t->sel_clear($ColName)
2778
2779							$t->sels($ColList) ## Like cols, but selected fields
2780							$t->sels_hash($ColList) ## " " cols_hash()... " " " "
2781
2782							## Finding out size of selection (number of rows)
2783
2784							$t->sel_len() ## Get number of selected rows.
2785
2786							A selection is an ordered list of record numbers. The record numbers
2787							in the selection may be a subset of available records. Furthermore,
2788							they may be in non-record-number order, indicating that the records
2789							have been sorted.
2790
2791							Record numbers are numeric array indices into the columns in the
2792							table. It is an error for any selection list to contain an index less
2793							than zero or greater than (length() - 1), so if you set a selection
2794							explicitly, be careful.
2795
2796							Any selection list you get or set belongs to the object. Be careful
2797							of modifying its contents.
2798
2799							The custom selection, if any, is stored internally in the _Selection
2800							parameter. If this parameter is absent, the selection defaults to
2801							all() -- i.e. a list of all record numbers, in order:
2802							[0..($this->length() - 1)] (which becomes [] if length() is 0).
2803
2804							REMEMBER: length() is one-based, but record numbers are zero-based.
2805
2806							Removing the selection (that is, removing the LIST itself of which
2807							records are selected), is the same as selecting all records.
2808							consequently, selection(0), selection_delete(), and select_all() are
2809							all synonymous.
2810
2811							selection_validate() removes any entries from the current _Selection
2812							list (if any) that are not valid record numbers -- i.e. it removes any
2813							record whose integer value is < 0 or greater than length() - 1. This
2814							routine is mainly used by other methods that might delete records,
2815							such as length_set().
2816
2817							Getting or setting just selected data from columns
2818
2819							Sometimes, you don't want to get/set entire columns, you instead want
2820							to get or set data in just the selected fields in a column.
2821
2822							The sel(), sel_get(), sel_set(), sels() and sels_hash() methods are
2823							analagous to the corresponding col(), ... cols_hash() methods except
2824							in these two ways:
2825
2826							- the 'sels' variants get or set just selected data, as determined by
2827							the current selection(), which gives an ordered list of the selected /
2828							sorted records.
2829
2830							- the 'sels' variants all make COPIES of the data you request or
2831							supply -- the data is copied out of or into the correspnding column.
2832							So, you "own" any vector you pass or receive in reply.
2833
2834							So, for example, imagine you have just set selection() to only list
2835							record numbers where the LastName field is not empty. Then you have
2836							called sort() to sort those record numbers by the LastName field. You
2837							could then call $t->sel('LastName') to get a sorted list of all
2838							non-empty last names.
2839
2840							It might be helpful to think of "sel" as short for "selected". So
2841							$t->sel('LastName') would mean "get the selected field values from the
2842							LastName field".
2843
2844							=cut
2845
2846							sub selection
2847							{
2848	477			477	1	703	my $this = shift;
2849	477					585	my ($Selection) = @_;
2850
2851							## Set if specified.
2852	477	100				1067	$this->selection_set($Selection) if defined($Selection);
2853
2854							## Get and return.
2855	477					1160	$Selection = $this->selection_get();
2856
2857	477					1547	return($Selection);
2858							}
2859
2860							sub selection_get
2861							{
2862	478			478	0	1715	my $this = shift;
2863
2864	478		66			1604	my $Selection = $this->{_Selection} \|\| $this->selection_default();
2865
2866	478					877	return($Selection);
2867							}
2868
2869							sub selection_set
2870							{
2871	176			176	0	260	my $this = shift;
2872	176					254	my ($Selection) = @_;
2873
2874	176	100				458	if (ref($Selection) eq 'ARRAY')
2875							{
2876							## Set if specified...
2877	36					75	$this->{_Selection} = $Selection;
2878							}
2879							else
2880							{
2881							## Otherwise, delete and return original if any.
2882	140					336	$Selection = delete $this->{_Selection};
2883							}
2884
2885	176					323	return($Selection);
2886							}
2887
2888							sub selection_delete
2889							{
2890	2			2	0	4	my $this = shift;
2891	2					8	$this->selection_set();
2892							}
2893
2894							sub select_all
2895							{
2896	87			87	0	142	my $this = shift;
2897	87					305	$this->selection_set();
2898							}
2899
2900							sub select_none
2901							{
2902	4			4	0	10	my $this = shift;
2903	4					17	$this->selection_set([]);
2904							}
2905
2906							sub selection_default
2907							{
2908	322			322	0	373	my $this = shift;
2909
2910	322					653	my $Selection = $this->all();
2911
2912	322					1049	return($Selection);
2913							}
2914
2915							sub all
2916							{
2917	343			343	0	458	my $this = shift;
2918	343					857	my $RowNums = [0..($this->length() - 1)];
2919
2920	343					631	return($RowNums);
2921							}
2922
2923							sub selection_inverse
2924							{
2925	9			9	0	17	my $this = shift;
2926	9					24	my $Sel = $this->selection();
2927	9					26	my $All = $this->all();
2928
2929	9					22	@$All[@$Sel] = undef;
2930	9					121	$All = [grep {defined} @$All];
	27					63
2931
2932	9					33	return($All);
2933							}
2934
2935							sub select_inverse
2936							{
2937	1			1	0	4	my $this = shift;
2938
2939	1					5	return($this->{_Selection} = $this->selection_inverse());
2940							}
2941
2942							sub selection_validate
2943							{
2944	27			27	0	44	my $this = shift;
2945
2946	27	100				203	if (ref($this->{_Selection}) eq 'ARRAY')
2947							{
2948	1					6	$this->{_Selection} = $this->selection_validate_internal($this->{_Selection});
2949							}
2950							}
2951
2952							sub selection_validate_internal
2953							{
2954	19			19	0	33	my $this = shift;
2955	19					32	my ($Selection) = @_;
2956
2957	19					59	my $Length = $this->length();
2958
2959	19	100				41	$Selection = [grep {(($_ >= 0) && ($_ < $Length))} @$Selection];
	71					357
2960
2961	19					46	return($Selection);
2962							}
2963
2964							sub sel ## ($ColName, [$Vector])
2965							{
2966	284			284	0	376	my $this = shift;
2967	284					484	my ($ColName, $Vector) = @_;
2968
2969							## Set if specified.
2970	284	50				819	if (defined($Vector))
2971							{
2972	0					0	$this->sel_set($ColName, $Vector);
2973							## Nothing to return.
2974							}
2975							## Get and return.
2976							else
2977							{
2978	284					1476	my $Sel = $this->sel_get($ColName);
2979	284					998	return($Sel);
2980							}
2981							}
2982
2983							sub sel_get
2984							{
2985	318			318	0	372	my $this = shift;
2986	318					420	my ($ColName, $Selection) = @_;
2987
2988	318					726	my $Col = $this->col($ColName);
2989	318		66			10828	$Selection \|\|= $this->selection();
2990	318					986	my $Sel = [@$Col[@$Selection]];
2991
2992	318					661	return($Sel);
2993							}
2994
2995							sub sel_set ## ($ColName, [$Vector])
2996							{
2997	5			5	0	8	my $this = shift;
2998	5					11	my ($ColName, $Vector) = @_;
2999
3000	5					12	my $Col = $this->col($ColName);
3001	5					10	my $Selection = $this->selection();
3002
3003	5	100	66			27	if (defined($Vector) && (ref($Vector) eq 'ARRAY'))
3004							{
3005	3					12	@$Col[@$Selection] = @$Vector;
3006							}
3007							else
3008							{
3009	2					8	@$Col[@$Selection] = undef;
3010							}
3011							}
3012
3013							sub sel_clear ## ($ColName)
3014							{
3015	2			2	0	4	my $this = shift;
3016	2					4	my ($ColName) = @_;
3017
3018	2					7	$this->sel_set($ColName);
3019							}
3020
3021							sub sel_len
3022							{
3023	20			20	0	67	my $this = shift;
3024
3025	16					162	return(ref($this->{_Selection}) eq 'ARRAY' ?
3026	20	100				69	@{$this->{_Selection}}+0 :
3027							$this->length());
3028							}
3029
3030							sub sels ## ($ColNames)
3031							{
3032	56			56	0	98	my $this = shift;
3033	56					163	my ($ColNames) = @_;
3034	56		66			151	$ColNames \|\|= $this->fieldlist();
3035	56					322	my $Sels = [map {$this->sel($_)} @$ColNames];
	216					491
3036
3037	56					292	return($Sels);
3038							}
3039
3040							sub sels_hash ## ($ColNames)
3041							{
3042	4			4	0	10	my $this = shift;
3043	4					10	my ($ColNames) = @_;
3044	4		66			23	$ColNames \|\|= $this->fieldlist();
3045	4					17	my $Sels = $this->sels($ColNames);
3046	4					11	my $SelsHash = {}; @$SelsHash{@$ColNames} = @$Sels;
	4					18
3047
3048	4					51	return($SelsHash);
3049							}
3050
3051							=pod
3052
3053							=head1 SEARCHING / SELECTING RECORDS
3054
3055							## Modifying the table's custom selection (_Selection)
3056
3057							$t->select_all() ## Set _Selection = $t->all() or undef
3058							$t->select_none() ## Set _Selection = []
3059							$t->select_inverse() ## Invert the curr. sel. (and get it)
3060
3061							## Specific searches: "the select() methods"
3062
3063							$t->select($Field1=>$Sub1, ## Del nonmatching recs from sel.
3064							$Field2=>$Sub2, ## i.e. narrow sel. to match
3065							...);
3066
3067							$t->omit ($Field1=>$Sub1, ## Del matching recs from sel.
3068							$Field2=>$Sub2,
3069							...);
3070
3071							$t->add ($Field1=>$Sub1, ## Add matching recs to sel.
3072							$Field2=>$Sub2,
3073							...);
3074
3075							$t->but ($Field1=>$Sub1, ## Add nonmatching recs to sel.
3076							$Field2=>$Sub2,
3077							...);
3078
3079							## Getting useful lists of record numbers...
3080
3081							$t->all() ## Get "full" sel. (all record #s)
3082							$t->selection() ## Get current selection
3083							$t->selection_inverse() ## Get inverse copy of curr. sel.
3084
3085							## Example 1: Refine a selection by narrowing down...
3086
3087							$t->select_all()
3088							$t->select(Field1 => sub {$_});
3089							$t->select(Field2 => sub {$_});
3090							$t->select(Field3 => sub {$_});
3091
3092							## Example 2: Manually refine and set the selection...
3093
3094							$Sel = [grep {$t->col($Field1)->[$_]} @{$t->all ()}];
3095							$Sel = [grep {$t->col($Field2)->[$_]} @$Sel];
3096							$Sel = [grep {$t->col($Field3)->[$_]} @$Sel];
3097							$t->selection($Sel); ## Set the selection when done.
3098
3099							## Example 3: Complex manual search using calculated value
3100
3101							my $A = $t->col('A');
3102							my $B = $t->col('B');
3103							my $S = [grep
3104							{my $X = $A->[$_] + $B->[$_]; ($X > 100 && $X < 200);}
3105							@{$t->all()}]; ## Or could start with $t->selection().
3106							$t->selection($S); ## Set the selection when done.
3107
3108							## Example 4: Refine a selection by building up...
3109
3110							$t->select_none()
3111							$t->add(Field1 => sub {$_});
3112							$t->add(Field2 => sub {$_});
3113							$t->add(Field3 => sub {$_});
3114
3115							## Example 5: Combining the select() methods to build a query...
3116
3117							$t->select_all()
3118							$t->select(Status => sub {/prime/i });
3119							$t->omit (DueDate => sub {$_ > $Today});
3120							$t->add (Force => sub {$_ });
3121
3122							select() and its friends omit(), add(), and but(), known collectively
3123							as "the select() methods," all work similarly: they take a series of
3124							one or more pairs indicating matches to be done, where each match is
3125							specified as (FieldName => Subroutine).
3126
3127							In addition to the field names already present in the table, the
3128							FieldName in any Spec may also be one of these two special
3129							pseudo-fields:
3130
3131							=over 4
3132
3133							=item _RecNum
3134
3135							the record number of the record being compared
3136
3137							=item _SelNum
3138
3139							the numerical position of the record being compared within the
3140							previous selection (only usable with select() and omit() since add()
3141							and but() by definition operate on non-selected records).
3142
3143							=back
3144
3145							For example:
3146
3147							## Match 2nd 100 rec numbers
3148							$t->select(_RecNum => sub {$_ >= 100 && $_ <= 199});
3149
3150							## Match 2nd 100 currently selected/sorted items
3151							$t->select(_SelNum => sub {$_ >= 100 && $_ <= 199});
3152
3153							Be careful when using _SelNum in a search. In the above _SelNum search
3154							example, since the selection itself will be modified by select(), the
3155							items that were formerly selection items 100 - 199 will now be _SelNum
3156							0 - 99 in the new selection.
3157
3158							The Subroutine is an anonymous grep-style predicate that operates on
3159							$_ and should return true/false to indicate a match with an element of
3160							the field FieldName.
3161
3162							The order of multiple matches in a single method call is significant
3163							only in that the searches can be faster if the field that will match
3164							the fewest records is listed first.
3165
3166							A given FieldName may be listed in the specs more than once if it has
3167							multiple search criteria that you prefer to execute as multiple
3168							subroutines (though it would be more efficient on very large tables to
3169							combine their logic into one subroutine joined with "&&").
3170
3171							Each field match will be applied (with an implied AND joining them) to
3172							determine whether the record itself matches. Then, based on whether
3173							the record itself matches, it will either be added or deleted from the
3174							selection based on which method is being called:
3175
3176							method... operates on... action....
3177							------------------------------------------------------------------
3178							select() selected records Keep only recs that DO match
3179							omit() selected records Keep only recs that DO NOT match
3180							add() non-selected recs Add recs that DO match
3181							but() non-selected recs Add recs that DO NOT match
3182
3183							Here's how to think about what's going on:
3184
3185							methods... think...
3186							------------------------------------------------------------------
3187							select() "SELECT things matching this"...
3188							omit() "... then OMIT those matching this."
3189
3190							select() "SELECT things matching this"...
3191							add() "... and ADD any others matching this."
3192
3193							select() "SELECT things matching this"...
3194							but() "... and add any others BUT those matching this."
3195
3196							select() and omit() both NARROW the selection.
3197
3198							add() and but() both INCREASE the selection.
3199
3200							IMPORTANT: You DO NOT need to use these select() routines to work with
3201							selections. It may be much easier for you to clarify your logic, or
3202							more efficient to express your search, using a single grep or series
3203							of grep operations as in Examples 2 or 3 above.
3204
3205							Building the selection manually is required if you want to filter
3206							based on any COMPLEX RELATIONSHIPS BETWEEN FIELDS. For example, if
3207							you want to add two fields and match or reject the record based on the
3208							sum of the fields.
3209
3210							In Example 3 above, we add the values in fields "A" and "B" and then
3211							match the record only if the SUM is between 100 and 199. By grepping
3212							to produce a subset of @{$t->all()}, you end up with a Selection -- a
3213							list of record numbers you want "selected". Then you call
3214							$t->selection() to put the selection you built into the object.
3215
3216							If you had instead wanted to narrow an existing selection in the above
3217							example, you would start with $t->selection() (which defaults to
3218							$t->all()) instead of starting with $t->all().
3219
3220							Each of the select() methods returns $this->selection() as a
3221							convenience.
3222
3223							=head2 The effects of modifying a sorted selection
3224
3225							Generally, you should sort AFTER finding, and you should not generally
3226							rely on sort order after doing a find. But in case you want to know,
3227							the following chart explains what happens to the sort order after the
3228							various select() commands are called (at least in the current
3229							implementation, which may change without notice):
3230
3231							method... effect on an existing sort order...
3232							------------------------------------------------------------------
3233							select() relative sort order is preserved (stay sorted)
3234							omit() all selected recs restored to "natural" order (unsorted)
3235							add() orig. recs preserved; new recs appended: "natural" order
3236							but() orig. recs preserved; new recs appended: "natural" order
3237
3238							In other words, you could sort() first and then call select() to
3239							narrow down the selection repeatedly without disrupting the sort
3240							order. However, any of the other methods will disrupt the sort order
3241							and you would need to re-sort. The preservation of order when using
3242							select(), and other sort order effects, are likely but not guaranteed
3243							to be preserved in future implementations.
3244
3245							=head2 Hints about Boolean logic
3246
3247							Consider the following example and the alternative below it. You
3248							might initially think these are equivalent, but they're not:
3249
3250							## Case 1:
3251
3252							my $Sel = $t->add(Force => sub {$_ == 1 });
3253							my $Sel = $t->add(Status => sub {$_ eq 'Prime'});
3254
3255							## Case 2:
3256
3257							my $Sel = $t->add(Force => sub {$_ == 1 },
3258							Status => sub {$_ eq 'Prime'});
3259
3260							Case 1 extends the selection by adding all records where Force == 1,
3261							and then extends it again by adding all additional records where
3262							Status eq 'Prime'.
3263
3264							Case 2 adds only those records where: Force == 1 AND ALSO, IN THE SAME
3265							RECORD, Status eq 'Prime'.
3266
3267							One final note about logic. This is not SQL and these select() etc.
3268							routines are not meant to replace the full power of a programming
3269							language.
3270
3271							If you want full Boolean expressions, use the power of Perl to form
3272							your own arbitrarily complex query using grep as in Example 3 above.
3273
3274							Writing your own grep is also almost always faster than chaining the
3275							builtin select() methods or using multiple Field / Sub specifications,
3276							so keep that in mind when working with extremely large data sets.
3277
3278							With tables of only a few thousand records or so, you probably won't
3279							notice the difference in efficiency.
3280
3281							=cut
3282
3283							{}; ## Get emacs to indent correctly.
3284
3285	4			4	0	18	sub select {my $this = shift; return($this->select_internal(!'Add', !'Not', @_))}
	4					15
3286	5			5	0	16	sub omit {my $this = shift; return($this->select_internal(!'Add', 'Not', @_))}
	5					24
3287	3			3	0	5	sub add {my $this = shift; return($this->select_internal( 'Add', !'Not', @_))}
	3					11
3288	1			1	0	3	sub but {my $this = shift; return($this->select_internal( 'Add', 'Not', @_))}
	1					6
3289
3290							sub select_internal ## Implements all 4 "select() methods"
3291							{
3292	13			13	0	19	my $this = shift;
3293	13					39	my ($Add, $Not, @Specs) = @_;
3294
3295							## In "Add" mode, we only operate on not-yet-selected records.
3296							## Otherwise, we operate on the current selection.
3297
3298							## Either way, start out with all of one or the other.
3299
3300	13	100				151	my $Start = ($Add ? $this->selection_inverse() : $this->selection());
3301
3302							## Then grep repeatedly for each Field/Sub spec we were given, in
3303							## order. For a record to match, ALL specs must match -- i.e. the
3304							## record number must make it through the grep gauntlet once for
3305							## each Field/Sub in the Specs.
3306
3307	13					60	my $Pseudo = {}; ## hold pseudo-columns _RecNum, _SelNum if requested.
3308
3309	13					30	my $Matches = $Start;
3310
3311	13					19	my $i = 0;
3312	13					39	while ($i < (@Specs - 1))
3313							{
3314	13					35	my ($Field, $Sub) = @Specs[$i++, $i++];
3315
3316	13	50	33			78	next unless ((length($Field)) && (ref($Sub) eq 'CODE'));
3317
3318							## Create pseudo-fields _RecNum / _SelNum if needed, but at
3319							## most once per invocation.
3320
3321	13	50	0			34	$Pseudo->{_RecNum} \|\|= $this->all() if ($Field eq '_RecNum');
3322	13	50	0			39	$Pseudo->{_SelNum} \|\|= $this->selnum_mask() if ($Field eq '_SelNum');
3323
3324							## Narrow down $Matches using this Field/Spec, then move on to next.
3325	36	50	33			135	$Matches =
3326							[grep
3327							{
3328							## Locally bind $_ to value of field in this column / this record.
3329	13					26	local $_ = $ {($this ->{$Field} \|\| ## 98% of time: good field name
	36					134
3330							$Pseudo->{$Field} \|\| ## 1% of time: _RecNum or _SelNum
3331							($this ->warn("Bad field name: $Field"),
3332							$this ->col_empty())
3333							)}[$_]; ## look up value in record $_ of column
3334
3335							## Call the sub & let it yield the Boolean value.
3336	36					224	&$Sub();
3337							}
3338							@$Matches];
3339							}
3340
3341							## IMPLEMENTATION NOTE:
3342
3343							## The logic below to support "Not" looks complicated, and indeed
3344							## it could be made cleaner if we were to process the "Not" logic
3345							## during the previous step. However, doing that would make the
3346							## above nested loop(s) above much less efficient because we'd
3347							## have to move the while() loop inside the grep -- repeating that
3348							## loop up to several times for each record instead of just a few
3349							## times total. So the logic below will actually save execution
3350							## time. Besides, using array-slicing to achieve the selection
3351							## masking is quite fast.
3352
3353							## "Add" means append the matching record numbers to the existing
3354							## selection, if any.
3355
3356							## sub select()
3357	13	100	100			219	if (!$Not && !$Add) ## ... i.e. remove unfound (i.e. keep (only) the ones we found)...
		100	66
		100	66
		50	33
3358							{
3359	4					9	$this->{_Selection} = $Matches; ## sort order is preserved...
3360							}
3361
3362							## sub add() ## ... i.e. add the ones we found...
3363							elsif (!$Not && $Add)
3364							{
3365	3					5	push @{$this->selection()}, @$Matches; ## order preserved in first part only
	3					9
3366							}
3367
3368							## ! "Add" means replace the existing selection with those that
3369							## matched (effectively removing any non-matching ones).
3370
3371							## "Not" means select the opposite of the set of records we just
3372							## matched.
3373
3374							## sub omit()
3375							elsif ( $Not && !$Add) ## ... i.e. remove the opposite of the ones we found...
3376							{
3377	5					33	my $Sel = $this->col_empty(); ## Start with empty mask (all entries undef).
3378	5					17	@$Sel[@$Start] = @$Start; ## Mask in those in the original selection.
3379	5					14	@$Sel[@$Matches] = undef; ## Mask out those we found.
3380	5					14	my $NonMatches = [grep {defined} @$Sel]; ## The remaining ones are the non-matches.
	15					34
3381
3382	5					17	$this->{_Selection} = $NonMatches; ## The new selection IS the non-matches.
3383
3384							## selection order not preserved
3385							}
3386
3387							## sub but()
3388							elsif ( $Not && $Add) ## ... i.e. add the opposite of the ones we found...
3389							{
3390	1					5	my $Sel = $this->all(); ## Start with a full selection mask.
3391	1					3	@$Sel[@{$this->selection()}] = undef; ## Mask out those in original selection.
	1					3
3392	1					3	@$Sel[@$Matches ] = undef; ## Mask out those we found.
3393	1					3	my $NonMatches = [grep {defined} @$Sel]; ## The remaining ones are the non-matches.
	3					8
3394
3395	1					2	push @{$this->selection()}, @$NonMatches; ## Add the non-matches to the selection.
	1					3
3396
3397							## order preserved in first part only
3398							}
3399
3400	13					35	return($this->selection());
3401							}
3402
3403							sub selnum_mask ## Create mask mapping RecNum -> selected item num or undef if not selected
3404							{
3405	0			0	0	0	my $this = shift;
3406	0					0	my $Mask = $this->col_empty();
3407	0					0	my $Sel = $this->selection();
3408	0					0	@$Mask[@$Sel] = [0..$#$Sel];
3409
3410	0					0	return($Mask);
3411							}
3412
3413							=pod
3414
3415							=head1 SORTING
3416
3417							## Sort the current table's _Selection
3418
3419							$t->sort() ## Use existing/default params
3420							$t->sort([qw(Last First Phone)]) ## Specify _SortOrder (fields)
3421							$t->sort( ## Named-parameter call:
3422							_SortOrder => [...], ## override sort-related params.
3423							_Selection => [...], ## (See param lists above).
3424							_SortSpecs => {...},
3425							_SRoutines => {...},
3426							_DefaultSortType=>'Integer',
3427							_DefaultSortDirection=>-1,
3428							);
3429
3430							The sort() method modifies the _Selection (creating one with all
3431							records if it was missing, undef, or not supplied by caller) so that
3432							the record numbers listed there are sorted according to the criteria
3433							implied by _SortOrder, _SortSpecs, _SRoutines, etc.
3434
3435							For example, before sorting, a table's "natural" order might be:
3436
3437							Rec# First Last Age State
3438							0 Chris Zack 43 CA
3439							1 Marco Bart 22 NV
3440							2 Pearl Muth 15 HI
3441
3442							... and its selection() method would yield: [0, 1, 2] -- which is a
3443							list of all the records, in order.
3444
3445							After calling $t->sort([Last]), selection() would yield [1, 2, 0]. So
3446							displaying the table in "selection" order would yield:
3447
3448							Rec# First Last Age State
3449							1 Marco Bart 22 NV
3450							2 Pearl Muth 15 HI
3451							0 Chris Zack 43 CA
3452
3453							IMPORTANT: sorting does not alter any data in the table. It merely
3454							alters the _Selection parameter (which you can then get and set using
3455							the selection() methods described above).
3456
3457							If you want to permanently alter the table's data in memory so that
3458							the new sorted order becomes the "natural" order, you can use the
3459							cull() method to modify the original object, the snapshot() method to
3460							make a new object, or use the write() method to write the data to disk
3461							in selected/sorted order and then read() it back again.
3462
3463							=head2 Using the Named-parameter calling convention with sort()
3464
3465							You may specify any combination of the parameters listed above when
3466							calling sort(). Any you specify will be used IN PLACE OF the
3467							corresponding parameters already found in the object.
3468
3469							If you specify _Selection using the named-parameter calling, the
3470							sort() method reserves the right to "own" the list you provide, and
3471							use it as the object's new _Selection, possibly discarding the
3472							previous _Selection, if any and modifying the one you provided. So
3473							don't make any assumptions about ownership of that list object after
3474							calling sort(). Luckily, you will rarely need to provide _Selection
3475							explicitly since generally you'll want to be sorting the selection()
3476							already inherent in the object.
3477
3478							sort() returns the _Selection list owned by the object (the same list
3479							that would be returned if you called the selection() method
3480							immediately after calling sort()).
3481
3482							See the next sections for complete descriptions of _SortOrder and
3483							other sorting parameters.
3484
3485							=cut
3486
3487							{}; ## Get emacs to indent correctly.
3488
3489							sub sort
3490							{
3491	10			10	1	36	my $this = shift;
3492	10	100				49	my $Params = (@_ == 1 ? {_SortOrder => $_[0]} : {@_});
3493
3494	60					120	my($SortOrder, $Selection, $SortSpecs, $SRoutines, $DefaultSortType, $DefaultSortDirection)
3495	10					25	= map {$this->getparam($Params, $_)}
3496							qw(_SortOrder _Selection _SortSpecs _SRoutines _DefaultSortType _DefaultSortDirection);
3497
3498							## Validate / rectify all parameters...
3499
3500							## Default sort order is Record Number
3501
3502	10	50	33			70	$SortOrder = [qw(_RecNum)] unless ((ref($SortOrder) eq 'ARRAY') && @$SortOrder);
3503
3504							## Note if we're going to sort on _RecNum ( requires extra work).
3505
3506	10					20	my $NeedRecNum = grep {$_ eq '_RecNum'} @$SortOrder;
	14					38
3507
3508							## Default list of record numbers is all of them.
3509
3510	10	100				39	$Selection = $this->selection() unless (ref($Selection) eq 'ARRAY');
3511	10					34	$Selection = $this->selection_validate_internal($Selection);
3512
3513							## Our private copy of SortSpecs includes a spec for _RecNum
3514
3515	10	50				38	$SortSpecs = {} unless (ref($SortSpecs) eq 'HASH');
3516	10					27	$SortSpecs = {%$SortSpecs};
3517	10	50	0			27	$SortSpecs ->{_RecNum} \|\|= {SortType => '_RecNum', SortDirection => 1} if $NeedRecNum;
3518
3519							## Our private copy of SRoutines also has the builtin entries
3520							## added in (including one for _RecNum)
3521
3522	10	50				30	$SRoutines = {} unless (ref($SRoutines) eq 'HASH');
3523	10					17	$SRoutines = {%$SRoutines, %{$this->sortroutines_builtin()}};
	10					41
3524
3525							## Ensure that DefaultSortType has a reasonable value for which we
3526							## have a sort routine.
3527
3528	10	50	33			84	$DefaultSortType = 'String' unless (length($DefaultSortType) &&
3529							exists($SRoutines->{$DefaultSortType}));
3530
3531							## Ensure that DefaultSortDirection has a legal value (1 or -1;
3532							## undef/0 will be treated as -1 (descending))
3533
3534	10		50			36	$DefaultSortDirection = (max(min(int($DefaultSortDirection), 1), -1) \|\| -1);
3535
3536							## Make some optimized lists of things to speed sorting.
3537
3538							## Get a hash of all data columns in $this plus a temporary one
3539							## for _RecNum if needed.
3540	10	50				19	my $Cols = {%{$this->cols_hash($this->fieldlist_all())},
	10					32
3541							($NeedRecNum ? (_RecNum => $this->all()) : ())};
3542
3543							## Get a list mapping field numbers in $SortOrder to data columns
3544							## Get a list mapping field numbers in $SortOrder to sort directions
3545							## Get a list mapping field numbers in $SortOrder to sort types
3546							## Get a list mapping field numbers in $SortOrder to sort routines
3547							## Get a list of the field numbers in $SortOrder
3548
3549	10	50				36	my $SortCols = [map { $Cols->{$_} \|\| $this->col($_) } @$SortOrder];
	14					60
3550	10	100				19	my $SortDirs = [map {$ {$SortSpecs->{$_}\|\|{}}{SortDirection} \|\| $DefaultSortDirection} @$SortOrder];
	14	100				18
	14					132
3551	10	100				21	my $SortTypes = [map {$ {$SortSpecs->{$_}\|\|{}}{SortType } \|\| $DefaultSortType } @$SortOrder];
	14	100				18
	14					138
3552	10	50	33	0		213	my $SortSubs = [map { $SRoutines->{$_} \|\| $SRoutines->{'String'} \|\| sub {0} } @$SortTypes];
	14					61
	0					0
3553	10					29	my $FieldNums = [0 .. $#$SortOrder];
3554
3555							## Construct a sort subroutine that sorts record numbers by
3556							## examining values in given fields in the table, in the order
3557							## specified in $SortOrder.
3558
3559							## If a given field's sort routine produces a zero $CmpVal, it
3560							## means that the values are considered equal, and so to
3561							## disambiguate, we keep trying the next fields in the sort order,
3562							## until we've found one that compares non-zero or exhausted all
3563							## the fields. If we get through all the specified sort fields
3564							## and still get zeroes, the values must be equal in all the
3565							## fields, and so the records are considered equal, so return 0.
3566
3567	10					16	my $ProgCount;
3568							my $ShowedProgress;
3569	58					67	$Selection =
3570							[sort
3571							{
3572	10					31	my $CmpVal;
3573	58					102	foreach (@$FieldNums)
3574							{
3575							## $a and $b are record numbers to be compared.
3576							## $_ is the number of a field in the above lists.
3577
3578	62					119	$CmpVal = (&{ $SortSubs->[$_] } ## Call the sort routine for field $_ with...
	62					132
3579							(\ $SortCols->[$_]->[$a], ## 1st arg: ref to value in field $_, record $a
3580							\ $SortCols->[$_]->[$b]) ## 2nd arg: ref to value in field $_, record $b.
3581							* $SortDirs->[$_] ); ## Then invert cmp value if descending (-1)
3582
3583							# print "($_, $SortCols->[$_]->[$a], $SortCols->[$_]->[$b]) ==> $CmpVal\n";
3584
3585	62	100				158	last if $CmpVal; ## Keep going if $CmpVal == 0 (same)
3586							}
3587
3588							## Maybe show timed progress (only after 2 seconds have elapsed)
3589	58	100				167	my $Did = $this->progress_timed("Sorting", $ProgCount, undef, undef, 1)
3590							if ((($ProgCount++) % 200) == 0);
3591	58		33			195	$ShowedProgress \|\|= $Did;
3592
3593	58					97	$CmpVal;
3594							}
3595							@$Selection];
3596
3597							## If no progress shown yet (sort took less than 2 seconds or 200
3598							## operations), show a message now.
3599
3600	10	50				44	$this->progress("Sorted.") unless $ShowedProgress;
3601
3602							## Replace any existing selection with the new, sorted, one.
3603	10					23	$this->{_Selection} = $Selection;
3604
3605	10					66	done:
3606							return($Selection);
3607							}
3608
3609							=pod
3610
3611							=head1 SORT ORDER
3612
3613							## Getting / Setting table's default _SortOrder
3614
3615							$t->sortorder() ## Get sortorder (default is [])
3616
3617							my $Order = [qw(Last First State Zip)];
3618							$t->sortorder($Order) ## Set sortorder (use [] for none)
3619
3620							$t->sortorder_default() ## Get the object's default sort order ([])
3621
3622							The sort order is an optional list of field names on which to sort and
3623							sub-sort the data when sorting is requested. The field names must be
3624							the names of actual columns in the table. The names in the sort order
3625							do not necessarily need to coincide with the custom fieldlist if any.
3626
3627							There is one special value that can be included: _RecNum. This sorts
3628							on the imaginary "record number" field. So for example, you could
3629							specify a sort order this way:
3630
3631							[qw(Last First _RecNum)]
3632
3633							(There is no point in putting _RecNum anywhere except at the end of
3634							the sort order because no two records will ever have the same record
3635							number so there will be no further need to disambiguate by referring
3636							to additional fields.)
3637
3638							Sorting by _RecNum adds a bit of computational overhead because sort()
3639							first builds a record number vector for use in sorting, so for very
3640							large tables, don't do it unless you really need it.
3641
3642							A sort order can be specified each time the object is sorted (see the
3643							sort() method for details).
3644
3645							Or, the object's sort order can be set once, and then sort() will use
3646							that sort order when no other sort order is specified.
3647
3648							If sorting is done when there is no sort order present in the object
3649							or specifed for the sort() method, the selection is sorted by record
3650							number (i.e. it is "unsorted" or returned to its "natural" order).
3651
3652							In order words, a sortorder that is undef or [] is considered the same
3653							as: [qw(_RecNum)]. This is sometimes called "unsorting".
3654
3655							In order to decide how values in each field should be compared, sort()
3656							is informed by SortSpecs (specifying SortType and SortDirection for
3657							each field) and by SortRoutines, each of which may similarly either be
3658							pre-set for the object or specified when calling sort() -- see below
3659							for further details.
3660
3661							=cut
3662
3663							{}; ## Get emacs to indent correctly.
3664
3665							sub sortorder
3666							{
3667	8			8	0	15	my $this = shift;
3668	8					15	my ($SortOrder) = @_;
3669
3670	8		100			35	my $Valid = ((ref($SortOrder) eq 'ARRAY') && (@$SortOrder > 0));
3671
3672							## Set if specified.
3673	8	100	100			41	if (defined($SortOrder) && $Valid)
		100
3674							{
3675	2					6	$this->{_SortOrder} = $SortOrder;
3676							}
3677							elsif (defined($SortOrder))
3678							{
3679	1					3	$this->{_SortOrder} = undef; ## Store undef instead of []
3680							}
3681
3682							## Get and return.
3683	8		66			32	$SortOrder = $this->{_SortOrder} \|\| $this->sortorder_default();
3684
3685	8					33	return($SortOrder);
3686							}
3687
3688							sub sortorder_default
3689							{
3690	5			5	0	7	my $this = shift;
3691
3692	5					10	my $SortOrder = [];
3693
3694	5					20	return($SortOrder);
3695							}
3696
3697							sub sortorder_check
3698							{
3699	80			80	0	142	my $this = shift;
3700	80					267	my $FieldsHash = $this->fieldlist_hash();
3701
3702							## Remove any bogus field names from the sort order, if any.
3703
3704	4					15	$this->{_SortOrder} = [grep {exists($FieldsHash->{$_})}
	1					4
3705	80	100				404	@{$this->{_SortOrder}}] if defined $this->{_SortOrder};
3706							}
3707
3708							=pod
3709
3710							=head1 SORT SPECIFICATIONS
3711
3712							## Getting / Setting table's default _SortSpecs
3713
3714							$t->sortspecs() ## Get sortspecs (default is {} -- none)
3715
3716							my $Specs = {Last => {SortType => 'String' ,
3717							SortDirection => -1 },
3718							Zip => {SortType => 'Integer' }};
3719
3720							$t->sortspecs($Specs) ## Set sortspecs
3721
3722							$t->sortspecs_default() ## Get the object's default sort specs ({})
3723
3724							The sortspecs are an optional hash mapping field names to "sort
3725							specifications".
3726
3727							Each field's sort specification may specify zero or more of these
3728							values:
3729
3730							=over 4
3731
3732							=item SortType
3733
3734							the sort type to use (For example: String, Integer)
3735
3736							=item SortDirection
3737
3738							the sort direction (1: ascending, -1: descending)
3739
3740							=back
3741
3742							Sortspecs can be specified when calling the sort() routine, or, a set
3743							of specs can be placed beforehand into the object itself and those
3744							will be used by sort() if no other specs are given.
3745
3746							For any field listed in the sort order at the time of sorting, but
3747							lacking a sort spec or any component of the sort spec, the object's
3748							default sort type (see sorttype_default()) and default sort direction
3749							(see sortdirection_default()) will be used.
3750
3751							In addition to getting/setting sort specs as a whole, they may be
3752							gotten/set on a per-field basis, too:
3753
3754							sortspec($Field) ## Get sortspec for $Field or default spec
3755
3756							my $Spec = {SortType => 'Integer', SortDirection => -1};
3757							sortspec('Zip', $Spec) ## Set sortspec
3758
3759							sortspec_default() ## Get a sortspec with all defaults filled in
3760
3761							For any $Field not found in the object's sortspecs, sortspec($Field)
3762							returns the same thing returned by sortspec_default(), which is a
3763							sortspec filled in with the default sort type and sort direction (see
3764							below).
3765
3766							For a list of available built-in SortTypes, and instructions for how
3767							to define your own, see SORT ROUTINES, below.
3768
3769							=cut
3770
3771							{}; ## Get emacs to indent correctly.
3772
3773							sub sortspecs
3774							{
3775	46			46	0	78	my $this = shift;
3776	46					60	my ($SortSpecs) = @_;
3777
3778							## Set if specified.
3779	46	100				105	$this->{_SortSpecs} = $SortSpecs if $SortSpecs;
3780
3781							## Get and return
3782	46		33			110	$SortSpecs = $this->{_SortSpecs} \|\| $this->sortspecs_default();
3783
3784	46					100	return($SortSpecs);
3785							}
3786
3787							sub sortspecs_default
3788							{
3789	1			1	0	3	my $this = shift;
3790
3791	1					3	my $SortSpecs = {};
3792
3793	1					4	return($SortSpecs);
3794							}
3795
3796							sub sortspec
3797							{
3798	4			4	0	9	my $this = shift;
3799	4					8	my ($FieldName, $SortSpec) = @_;
3800
3801							## Set if specified.
3802	4	100				18	$this->{_SortSpecs}->{$FieldName} = $SortSpec if $SortSpec;
3803
3804							## Get and return.
3805	4		33			29	my $SortSpec = ($this->{_SortSpecs}->{$FieldName} \|\|
3806							$this->sortspec_default($FieldName));
3807
3808							## Provide defaults for needed fields of sort spec.
3809	4		33			12	$SortSpec->{SortType} \|\|= $this->sorttype_default ();
3810	4	50				13	$SortSpec->{SortDirection} = $this->sortdirection_default()
3811							unless defined($SortSpec->{SortDirection});
3812
3813	4					28	return($SortSpec);
3814							}
3815
3816							sub sortspec_default
3817							{
3818	2			2	0	4	my $this = shift;
3819	2					6	my ($FieldName) = @_;
3820
3821							## Default sortspec for a field
3822
3823	2					7	my $SortType = $this->sorttype_default ();
3824	2					7	my $SortDir = $this->sortdirection_default();
3825
3826	2					6	my $Spec = {SortType => $SortType, SortDirection => $SortDir};
3827
3828	2					7	return($Spec);
3829							}
3830
3831
3832							=pod
3833
3834							=head1 DEFAULT SORT DIRECTION
3835
3836							## Getting / Setting table's default _DefaultSortDirection
3837
3838							$t->sortdirection_default() ## Get default sort direction
3839
3840							$t->sortdirection_default(-1) ## Set default sort direction
3841
3842							Each element in a sort specification can optionally specify a sort
3843							direction.
3844
3845							1 = ascending, -1 = descending
3846
3847							For any sort specs that don't specify a direction, the object's
3848							default sort direction will be used. Use these routines to get/set
3849							the default sort direction.
3850
3851							=cut
3852
3853							{}; ## Get emacs to indent correctly.
3854
3855							sub sortdirection_default
3856							{
3857	6			6	0	8	my $this = shift;
3858	6					19	my ($DefaultSortDir) = @_;
3859
3860	6	100				16	if (defined($DefaultSortDir))
3861							{
3862							## Set if specified. Force to 1 or -1. Treat 0 as -1 (descending).
3863	1		50			6	$this->{_DefaultSortDirection} =
3864							(max(min(int($DefaultSortDir), 1), -1) \|\| -1);
3865							}
3866
3867							## Get and return. If not present, then 1 (ascending) is the default.
3868	6	50				20	my $SortDir = (defined($this->{_DefaultSortDirection}) ?
3869							$this->{_DefaultSortDirection} :
3870							1);
3871	6					22	return($SortDir);
3872							}
3873
3874							=pod
3875
3876							=head1 DEFAULT SORT TYPE
3877
3878							## Getting / Setting table's default _DefaultSortType
3879
3880							$t->sorttype_default() ## Get default sort type
3881
3882							$t->sorttype_default('Integer') ## Set default sort type
3883
3884							Each element in a sort specification can optionally specify a sort
3885							type. The sort type is a string (like 'String' or 'Integer' or
3886							'Date') that selects from one or more sort routines. (See Sort
3887							Routines, below).
3888
3889							There are several sort routines built into the CTable object, and you
3890							can also add as many of your own routines (and hence Sort Types) as
3891							you like or need. This allows for very flexible sorting.
3892
3893							For any sort specs that don't specify a type, the object's default
3894							sort type will be used. Use these routines to get/set the default
3895							sort type, which initially is 'String'.
3896
3897							=cut
3898
3899							{}; ## Get emacs to indent correctly.
3900
3901							sub sorttype_default
3902							{
3903	6			6	0	10	my $this = shift;
3904	6					10	my ($DefaultSortType) = @_;
3905
3906	6	100				15	if (defined($DefaultSortType))
3907							{
3908							## Set if specified.
3909	1					4	$this->{_DefaultSortType} = "$DefaultSortType";
3910							}
3911
3912							## Get and return. If not present, then 'String' is the default.
3913	6	50				21	my $SortDir = (defined($this->{_DefaultSortType}) ?
3914							$this->{_DefaultSortType} :
3915							'String');
3916	6					20	return($SortDir);
3917							}
3918
3919							=pod
3920
3921							=head1 SORT ROUTINES: BUILTIN AND CUSTOM
3922
3923							## Getting / Setting table's custom sort routines (_SRoutines)
3924
3925							$t->sortroutine($Type) ## Get a sort routine for $Type
3926
3927							$t->sortroutine($Type, $Sub) ## Set a sort routine for $Type
3928
3929							$t->sortroutine($Type, 0 ) ## Remove sort routine for $Type
3930							$t->sortroutine_set($Type)
3931
3932							$t->sortroutines() ## Get hash of any sort routines
3933
3934							$t->sortroutines_builtin() ## Get hash of builtin routines
3935
3936							Each SortType in the sortspecs should have a corresponding sort
3937							routine (any unrecognized type will be sorted using the 'String' sort
3938							routine).
3939
3940							The sort() command looks up the appropriate sort routine for each
3941							field it is asked to sort, based on the SortType for that field, as
3942							given in the sortspecs, as described above.
3943
3944							Builtin sort types, recognized and implemented natively by this
3945							module, are:
3946
3947							String ## Fastest case-sensitive compare (data is string)
3948							Text ## Insensitive compare (lowercases, then compares)
3949							Number ## Number works for floats or integers
3950							Integer ## Faster than floats. Uses "use integer"
3951							DateSecs ## Same as integer; assumes date in seconds
3952							Boolean ## Treats item as a Perlish boolean (empty/undef = false)
3953
3954							The above sort types are always recognized. Additional sort types may
3955							be added by subclasses (and could shadow the builtin implementations
3956							of the above types if desired) and/or may be added to instances (and
3957							again could shadow the above implementations), and/or may be specified
3958							when the sort() method is called, once again optionally shadowing any
3959							deeper definitions.
3960
3961							=head1 CUSTOM SORT ROUTINE INTERFACE
3962
3963							A custom sort routine is called with two arguments, each of which is a
3964							pointer to a scalar. The sort routine should dereference each pointer
3965							and compare the resulting scalars, returning -1 if the first scalar is
3966							smaller than the second, 1 if it is larger, and 0 if they are
3967							considered equal.
3968
3969							For example, here is the built-in comparison routine for 'String':
3970
3971							sub { $ {$_[0]} cmp $ {$_[1]} }
3972
3973							NOTE: Your custom sort routines should NOT compare $a and $b as with
3974							Perl's builtin sort() command.
3975
3976							Examine the variable $BuiltinSortRoutines in this module's
3977							implementation to see some additional examples of sort routines.
3978
3979							Internally, sort() calls the sortroutines() method to get a hash that
3980							should consist of all builtin sort routines with the per-object sort
3981							routines, if any, overlaid. sortroutines() in turn calls the
3982							sortroutines_builtin() method to get a copy of the hash of all builtin
3983							sort routines for the object. (So a subclass could easily add
3984							additional SortTypes or reimplement them by just overriding
3985							sortroutines_builtin() and adding its own additional routines to the
3986							resulting hash.)
3987
3988							sortroutine() may be called to get or set a custom sort routine for a
3989							given type in the given object.
3990
3991							There is no way to directly manipulate the builtin sort routines for
3992							the entire class. To accomplish that, you should define and use a
3993							subclass that extends sortroutines_builtin() to add its own routines.
3994
3995							For example:
3996
3997							BEGIN
3998							{ ## A subclass of Data::CTable with an INetAddr SortType.
3999							package IATable; use vars qw(@ISA); @ISA = qw(Data::CTable);
4000
4001							sub sortroutines_builtin
4002							{
4003							my $this = shift;
4004							my $CustomRoutines =
4005							{INetAddr =>
4006							sub {use integer; ip2int($ {$_[0]}) <=> ip2int($ {$_[1]})}};
4007							my $AllRoutines =
4008							{%{$this->SUPER::sortroutines_builtin()} %$CustomRoutines};
4009							return($AllRoutines);
4010							};
4011
4012							sub ip2int {.....} $# Could memoize & inline for efficiency
4013							}
4014
4015							my $Table = IATable::new(......);
4016
4017							The IATable class would then have all the same features of
4018							Data::CTable but would then also support the INetAddr SortType.
4019
4020							=cut
4021
4022							{}; ## Get emacs to indent correctly.
4023
4024							BEGIN
4025							{
4026							my $BuiltinSortRoutines =
4027							{(
4028	64					72	String => sub { $ {$_[0]} cmp $ {$_[1]} },
	64					91
	64					137
4029	5					6	Text => sub { lc($ {$_[0]}) cmp lc($ {$_[1]})},
	5					9
	5					19
4030	8					9	Number => sub { $ {$_[0]} <=> $ {$_[1]} },
	8					13
	8					34
4031	29			29		38951	Integer => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
	29					350
	29					171
	8					10
	8					12
	8					23
4032	29			29		1839	DateSecs => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
	29					672
	29					311
	4					6
	4					7
	4					17
4033	29			29		1676	_RecNum => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
	29					71
	29					117
	4					6
	4					8
	4					15
4034	4					6	Boolean => sub { !!$ {$_[0]} <=> !!$ {$_[1]} },
	4					5
	4					16
4035	29			29		84162	)};
4036
4037							sub sortroutines_builtin ## Class or instance method
4038							{
4039	53			53	0	347	return({%$BuiltinSortRoutines}); ## Copy of above private hash.
4040							}
4041							}
4042
4043							sub sortroutine
4044							{
4045	38			38	0	56	my $this = shift;
4046	38					59	my ($Type, $Routine) = @_;
4047
4048	38	100				76	if (defined($Routine))
4049							{
4050							## Set if $Routine provided.
4051	3					8	$this->sortroutine_set($Type, $Routine);
4052							}
4053
4054							## Get and return.
4055	38					65	$Routine = $this->sortroutine_get($Type);
4056
4057	38					108	return($Routine);
4058							}
4059
4060							sub sortroutine_get
4061							{
4062	38			38	0	44	my $this = shift;
4063	38					45	my ($Type) = @_;
4064	38					70	my $Routines = $this->sortroutines();
4065	38		66			89	my $Routine = $Routines->{$Type} \|\| $Routines->{'String'};
4066
4067	38					91	return($Routine);
4068							}
4069
4070							sub sortroutine_set
4071							{
4072	4			4	0	5	my $this = shift;
4073	4					6	my ($Type, $Routine) = @_;
4074
4075	4					7	my $Valid = (ref($Routine) eq 'CODE');
4076
4077	4	100				8	if ($Valid)
4078							{
4079							## Add / replace if a routine was supplied.
4080	2		50			3	$ {$this->{_SRoutines} \|\|= {}}{$Type} = $Routine;
	2					8
4081							}
4082							else
4083							{
4084							## Otherwise delete.
4085	2		50			3	$Routine = delete $ {$this->{_SRoutines} \|\|= {}}{$Type};
	2					9
4086							}
4087
4088	4					11	return($Routine);
4089							}
4090
4091							sub sortroutines
4092							{
4093	42			42	0	54	my $this = shift;
4094
4095	42	50				67	my $Routines = {%{$this->sortroutines_builtin()}, ## First builtin ones
	42					186
4096	42					47	%{$this->{_SRoutines} \|\| {}}}; ## Shadow with object's own
4097
4098	42					189	return($Routines);
4099							}
4100
4101							=pod
4102
4103							=head1 FREEZING SELECTION & FIELD LIST
4104
4105							## Freeze data layout: re-order columns; omit unused fields
4106
4107							$t->cull(...params...) ## Rebuild table in order
4108							my $s = $t->snapshot(...params...) ## Make copy as if rebuilt
4109
4110							The cull() method re-writes all data in the table to be in the order
4111							indicated in _Selection (if present). This will cause any records not
4112							listed in _Selection to be omitted (unless selection is null in which
4113							case all records are retained in original order).
4114
4115							In addition, if there is a custom field list present, it removes any
4116							fields NOT mentioned in _FieldList.
4117
4118							The snapshot() method is similar, except instead of modifying the
4119							object itself, it makes a copy of the object that's equivalent to what
4120							cull() would have created, and returns that new object, leaving the
4121							original untouched. (All data structures are deep-copied from the old
4122							object to the new one, leaving the objects totally independent.)
4123
4124							cull() and snapshot() both take two optional named parameters:
4125							_FieldList and/or _Selection to be used in place of the corresponding
4126							parameters found in the object.
4127
4128							If only a single argument is supplied, it is assumed to be _Selection.
4129
4130							=cut
4131
4132							sub cull
4133							{
4134	1			1	0	4	my $this = shift;
4135	1	50				14	my $Params = (@_ == 1 ? {_Selection => $_[0]} : {@_});
4136
4137	1					4	my($Selection, $FieldList) = map {$this->getparam($Params, $_)}
	2					5
4138							qw(_Selection _FieldList);
4139
4140	1		33			6	$FieldList \|\|= $this->{_FieldList};
4141	1		33			4	$Selection \|\|= $this->{_Selection};
4142
4143							## First cull any fields/columns not mentioned in _FieldList, if any.
4144	1	50				4	if ($FieldList)
4145							{
4146	1					2	my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
	1					5
4147	1					3	my $DeadFields = [grep {!exists($FieldHash->{$_})} @{$this->fieldlist_all()}];
	4					17
	1					5
4148	1					4	delete @$this{@$DeadFields};
4149
4150							## Set the (possibly) new field list in the object.
4151	1					6	$this->fieldlist_set($FieldList);
4152							}
4153
4154							## Then cull / rearrange all the columns
4155	1	50				5	if ($Selection)
4156							{
4157							## Temporarily set the selection() to be the one we may have been given...
4158	1					5	$this->selection($Selection);
4159
4160							## Get the de-facto field list if we don't already have it explicitly
4161	1		33			4	$FieldList \|\|= $this->fieldlist();
4162
4163							## Rearrange each column
4164	1					4	foreach my $FieldName (@$FieldList)
4165							{
4166	3					8	$this->{$FieldName} = $this->sel($FieldName);
4167							}
4168
4169							## Remove the _Selection since it is no longer valid.
4170	1					5	$this->selection_delete();
4171							}
4172							}
4173
4174							sub snapshot
4175							{
4176	50			50	0	190	my $this = shift;
4177	50	50				588	my $Params = (@_ == 1 ? {_Selection => $_[0]} : {@_});
4178
4179	50					116	my($Selection, $FieldList) = map {$this->getparam($Params, $_)}
	100					366
4180							qw(_Selection _FieldList);
4181
4182	50		33			266	$FieldList \|\|= $this->{_FieldList};
4183	50		66			294	$Selection \|\|= $this->{_Selection};
4184
4185							## First make a shallow copy of $this
4186	50					1091	my $copy = {%$this};
4187
4188							## Then delete any (references to) data columns owned by $this...
4189	50					162	delete @$copy{@{$this->fieldlist_all()}};
	50					177
4190
4191							## Then deep-copy all other parameters from $this...
4192	50					5699	$copy = dclone($copy);
4193
4194							## Then new/bless/initialize the copy into the same class as $this...
4195	50					519	$copy = ref($this)->new($copy);
4196
4197							## Temporarily override selection if necessary.
4198	50					256	my $OldSel = $this->{_Selection};
4199	50					207	$this->selection($Selection);
4200
4201							## Then insert all the rearranged columns into $copy...
4202	50					75	@$copy{@$FieldList} = @{$this->sels($FieldList)};
	50					189
4203
4204							## Restore old selection, if any.
4205	50					187	$this->selection_set($OldSel);
4206
4207							## Remove the selection in copy.
4208	50					110	delete $copy->{_Selection};
4209
4210							## Set copy's fieldlist to a copy of the one we used.
4211	50					183	$copy->{_FieldList} = [@$FieldList];
4212
4213	50					243	return($copy);
4214							}
4215
4216							=pod
4217
4218							=head1 LINE ENDINGS
4219
4220							## Get current value
4221
4222							$t->lineending() ## Get actual setting: string or symbol
4223							$t->lineending_symbol() ## Get setting's symbolic value if possible
4224							$t->lineending_string() ## Get setting's string value if possible
4225
4226							## Set value
4227
4228							$t->lineending($Ending) ## Will be converted internally to symbol
4229
4230							## Convert a value to symbol or string form
4231
4232							$t->lineending_symbol($L) ## Convert string form to symbolic form
4233							$t->lineending_string($L) ## Convert symbol form to string form
4234
4235							## Get internal conversion hash tables
4236
4237							$t->lineending_symbols() ## Hash ref mapping known strings to symbols
4238							$t->lineending_strings() ## Hash ref mapping known symbols to strings
4239
4240							Use these accessor functions to get/set the _LineEnding parameter.
4241
4242							You can set the parameter in either string or symbol form as you wish.
4243							You can get it in its raw, as-stored, form, or, you can get it in
4244							string form or symbol form as desired.
4245
4246							Finally, some utility conversion calls allows you to convert a string
4247							you have on hand to a symbolic form. For example:
4248
4249							$L = "\x0D";
4250							print ("This file uses " . $t->lineending_symbol($L) . " endings.");
4251
4252							This would print:
4253
4254							This file uses mac endings.
4255
4256							=cut
4257
4258							{}; ## Get emacs to indent correctly.
4259
4260							BEGIN
4261							{
4262							## Map any recognized _LineEnding value to its actual string
4263	29			29		284	my $LineEnding_Strings =
4264							{(
4265							dos => "\x0D\x0A",
4266							mac => "\x0D",
4267							unix => "\x0A",
4268							"\x0D\x0A" => "\x0D\x0A",
4269							"\x0D" => "\x0D",
4270							"\x0A" => "\x0A",
4271							)};
4272
4273							## Map any recognized _LineEnding value to its logical form
4274	29					48723	my $LineEnding_Symbols =
4275							{(
4276							"\x0D\x0A" => "dos",
4277							"\x0D" => "mac",
4278							"\x0A" => "unix",
4279							dos => "dos",
4280							mac => "mac",
4281							unix => "unix",
4282							)};
4283
4284							sub lineending_strings ## Class or instance method
4285							{
4286	33			33	0	2828	return({%$LineEnding_Strings}); ## Copy of above private hash.
4287							}
4288
4289							sub lineending_symbols ## Class or instance method
4290							{
4291	25			25	0	7844	return({%$LineEnding_Symbols}); ## Copy of above private hash.
4292							}
4293							}
4294
4295							sub lineending()
4296							{
4297	3			3	0	7	my $this = shift;
4298	3					5	my ($LineEnding) = @_;
4299
4300							## Set if specified. Try to convert to symbolic form if possible.
4301	3	50	0			11	$this->{_LineEnding} = $this->lineending_symbol($LineEnding) \|\| $LineEnding if $LineEnding;
4302
4303							## Otherwise / either case, get whatever value we have and return it....
4304	3					5	my $LineEnding = $this->{_LineEnding};
4305
4306	3					14	return($LineEnding);
4307							}
4308
4309							sub lineending_symbol
4310							{
4311	22			22	0	34	my $this = shift;
4312	22					32	my ($LineEnding) = @_;
4313
4314	22		66			69	$LineEnding \|\|= $this->{_LineEnding};
4315
4316	22		33			27	return($ {$this->lineending_symbols()}{$LineEnding} \|\| $LineEnding);
4317							}
4318
4319
4320							sub lineending_string
4321							{
4322	30			30	0	50	my $this = shift;
4323	30					54	my ($LineEnding) = @_;
4324
4325	30		100			140	$LineEnding \|\|= $this->{_LineEnding};
4326
4327	30		66			38	return($ {$this->lineending_strings()}{$LineEnding} \|\| $LineEnding);
4328							}
4329
4330							=pod
4331
4332							=head1 AUTOMATIC CACHEING
4333
4334							By default, Data::CTable makes cached versions of files it reads so it
4335							can read them much more quickly the next time. Optionally, it can
4336							also cache any file it writes for quicker re-reading later.
4337
4338							On Unix systems, cache files are always created with 0666
4339							(world-write) permissions for easy cleanup.
4340
4341							When reading files, Data::CTable checks the _CacheOnRead parameter.
4342							If that parameter is true, which it is by default, the module tries to
4343							find an up-to-date cache file to read instead of the original.
4344							Reading a cache file can be 10x faster than reading and parsing the
4345							original text file.
4346
4347							In order to look for the cache file, it must first calculate the path
4348							where the cache file should be located, based on the _FileName of the
4349							file to be read.
4350
4351							The path of the cache file is calculated as follows:
4352
4353							If the _CacheSubDir parameter is a RELATIVE PATH, then it is appended
4354							to the directory component of _FileName to arrive at the directory to
4355							use to store the cache file. If it is an ABSOLUTE PATH, then
4356							_CacheSubDir is used by itself. (The trailing path separator is
4357							optional and an appropriate one will be added by Data::CTable if it is
4358							missing.)
4359
4360							The file name of the cache file is calculated as follows:
4361
4362							If the _CacheExtension parameter is specified, it is appended to the
4363							base file name component from the _FileName parameter. If you want
4364							the cached file name to be the same as the name of the original file,
4365							you can set _CacheExtension to "", which is not recommended.
4366
4367							Then, the cache path and cache file name are joined to arrive at the
4368							name of the cache file. If both _CacheSubDir and _CacheExtension were
4369							empty, then the cache file path will be the same as the _FileName, and
4370							Data::CTable will refuse to either read or write a cache file, so
4371							setting these fields both to empty is equivalent to setting
4372							_CacheOnRead to false.
4373
4374							The cache file contains a highly-efficient representation of all the
4375							following data that would otherwise have to be determined by reading
4376							and parsing the entire text file:
4377
4378							- All the data columns (field values)
4379							- _FieldList: The list of fields, in order
4380							- _HeaderRow: Whether a header row is / should be present
4381							- _LineEnding: The line ending setting
4382							- _FDelimiter: The field delimiter setting
4383
4384							If found prior to a read(), AND, the date of the cache file is LATER
4385							than the date of the original file, the cache file is used instead.
4386							(If the date is EARLIER, then the cache file is ignored because it can
4387							be presumed that the data inside the text file is newer.)
4388
4389							If cacheing is ON, then after successfully reading the text file
4390							(either because there was no cache file yet or the cache file was out
4391							of date or corrupted or otherwise unusable), read() will then try to
4392							create a cache file. This, of course, takes some time, but the time
4393							taken will be more than made up in the speedup of the next read()
4394							operation on the same file.
4395
4396							If creating the cache file fails (for example, because file
4397							permissions didn't allow the cache directory to be created or the
4398							cache file to be written), read() generates a warning explaining why
4399							cacheing failed, but the read() operation itself still succeeds.
4400
4401							No parameters in the object itself are set or modified to indicate the
4402							success or failure of writing the cache file.
4403
4404							Similarly, there is no way to tell whether a successful read()
4405							operation read from the cache or from the original data file. If you
4406							want to be SURE the reading was from the data file, either turn off
4407							_CacheOnRead, or call the read_file() method instead of read().
4408
4409							NOTE: because the name of the cache file to be used is calculated just
4410							before the read() is actually done, the cache file can only be found
4411							if the _CacheSubDir and _CacheExtension are the same as they were when
4412							the cache was last created. If you change these parameters after
4413							having previously cached a file, the older caches could be "orphaned"
4414							and just sit around wasting disk space.
4415
4416							=head2 Cacheing on write()
4417
4418							You may optionally set _CacheOnWrite (default = false) to true. If
4419							done, then a cache file will be saved for files written using the
4420							write() command. Read about write() below for more about why you
4421							might want to do this.
4422
4423							=head1 AUTOMATIC DIRECTORY CREATION
4424
4425							When Data::CTable needs to write a file, (a cache file or a data
4426							file), it automatically tries to create any directories or
4427							subdirectories you specify in the _FileName or _CacheSubDir
4428							parameters.
4429
4430							If it fails while writing a data file, write() will fail (and you will
4431							be warned). If it fails to create a directory while writing a cache
4432							file, a warning will be issued, but the overall read() or write()
4433							operation will still return a result indicating success.
4434
4435							Any directories created will have the permissions 0777 (world-write)
4436							for easy cleanup.
4437
4438							Generally, the only directory the module will have to create is a
4439							subdirectory to hold cache files.
4440
4441							However, since other directories could be created, be sure to exercise
4442							caution when allowing the module to create any directories for you on
4443							any system where security might be an issue.
4444
4445							Also, if the 0666 permissions on the cache files themselves are too
4446							liberal, you can either 1) turn off cacheing, or 2) call the
4447							prep_cache_file() method to get the name of the cache file that would
4448							have been written, if any, and then restrict its permissions:
4449
4450							chmod (0600, $this->prep_cache_file());
4451
4452							=head1 READING DATA FILES
4453
4454							## Replacing data in table with data read from a file
4455
4456							$t->read($Path) ## Simple calling convention
4457
4458							$t->read( ## Named-parameter convention
4459
4460							## Params that override params in the object if supplied...
4461
4462							_FileName => $Path, ## Full or partial path of file to read
4463
4464							_FieldList => [...], ## Fields to read; others to be discarded
4465
4466							_HeaderRow => 0, ## No header row (_FieldList required!)
4467
4468							_IgnoreQuotes => 0, ## Load files with unbalanced quotes
4469
4470							_LineEnding => undef, ## Text line ending (undef means guess)
4471							_FDelimiter => undef, ## Field delimiter (undef means guess)
4472
4473							_ReturnMap => 1, ## Whether to decode internal returns
4474							_ReturnEncoding=>"\x0B", ## How to decode returns.
4475							_MacRomanMap => undef, ## Whether/when to read Mac char set
4476
4477							_CacheOnRead => 0, ## Enable/disable cacheing behavior
4478							_CacheExtension=> ".x", ## Extension to add to cache file name
4479							_CacheSubDir => "", ## (Sub-)dir, if any, for cache files
4480
4481							## Params specific to the read()/write() methods...
4482
4483							_MaxRecords => 200, ## Limit on how many records to read
4484							)
4485
4486							$t->read_file() ## Internal: same as read(); ignores cacheing
4487
4488							read() opens a Merge, CSV, or Tab-delimited file and reads in all or
4489							some fields, and all or some records, REPLACING ANY EXISTING DATA in
4490							the CTable object.
4491
4492							Using the simple calling convention, just pass it a file name. All
4493							other parameters will come from the object (or will be defaulted if
4494							absent). To specify additional parameters or override any parameters
4495							in the object while reading, use the named-parameter calling
4496							convention.
4497
4498							See the full PARAMETER LIST, above, or read on for some extra details:
4499
4500							_ReturnMap controls whether return characters encoded as ASCII 11
4501							should be mapped back to real newlines (C<"\n">) when read into memory.
4502							If false, they are left as ASCII 11 characters. (default is "true")
4503
4504							_ReturnEncoding controls the character that returns are encoded as, if
4505							different from ASCII 11.
4506
4507							_FieldList is an array (reference) listing the names of fields to
4508							import, in order (and will become the object's _FieldList upon
4509							successful completion of the read() operation). If not provided and
4510							not found in the object, or empty, then all fields found in the file
4511							are imported and the object's field list will be set from those found
4512							in the file, in the order found there. If _HeaderRow is false, then
4513							this parameter is required (either in the object or as a formal
4514							parameter) and is assumed to give the correct names for the fields as
4515							they actually occur in the file. If _HeaderRow is true and _FieldList
4516							is provided, then _FieldList specifies the (sub-)set of fields to be
4517							read from the file and others will be ignored.
4518
4519							_HeaderRow, which defaults to true, if set to false, tells read() to
4520							not expect a header row showing the field names in the file. Instead,
4521							it assumes that the _FieldList gives those (and _FieldList must
4522							therefore be specified either as a parameter or an existing parameter
4523							in the object).
4524
4525							_IgnoreQuotes is false by default. If true then we'll ignore quotes in
4526							the file upon import, which makes the loading of files with unbalanced
4527							quotes possible.
4528
4529							_MaxRecords (optional) is an upper limit on the number of fields to
4530							import. If not specified, or zero, or undef, then there is no limit;
4531							all records will be imported or memory will be exhausted.
4532
4533							read() returns a Boolean "success" code.
4534
4535							If read() returns false, then it will also have set the _ErrorMsg
4536							parameter in the object. It may or may not have partially altered
4537							data in the object if an error is encountered.
4538
4539							After a successful read:
4540
4541							fieldlist() (the object's _FieldList parameter) tells which
4542							fields were actually read, in what order. It may omit any fields
4543							requested in _FieldList that were not actually found in the file for
4544							whatever reason.
4545
4546							length() tells how many fields were read.
4547
4548							The selection() is reset to no selection (all selected / unsorted)
4549
4550							The object's _FileName parameter contains the path to the file
4551							that was read. If the _FileName you specified did not have a path,
4552							then _FileName will be prepended with a path component indicating
4553							"current directory" (e.g. "./" on Unix).
4554
4555							_FDelimiter will contain the actual delimiter character that was
4556							used to read the file (either tab or comma if the delimiter was
4557							guessed, or whatever delimiter you specified).
4558
4559							_LineEnding will contain the actual line-ending setting used to
4560							read the file. This will be either "mac" ("\x0D"), "unix" ("\x0D"),
4561							or "dos" ("\x0D\x0A") if the line endings were guessed by read().
4562							Otherwise it will be whatever _LineEnding you specified.
4563
4564
4565							=head1 FILE FORMAT NOTES
4566
4567							As mentioned, read() allows the following flexibilities in reading
4568							text-based tabular data files:
4569
4570							You may specify the line endings (record delimiters), or it can
4571							guess them (mac, unix, dos are supported).
4572
4573							You may specify the field delimiters, or it can guess them (tab
4574							and comma are supported).
4575
4576							It can get field names from a header row, or, if there is no
4577							header row, you can tell it the field names, in order.
4578
4579							You can tell it whether or not to decode embedded returns in
4580							data fields, and if so, which character they were encoded as.
4581
4582							Beyond supporting the above flexible options, read() makes the
4583							following non-flexible assumptions:
4584
4585							Fields must NOT contain unencoded returns -- that is: whatever
4586							character sequence is specified for _LineEnding will NEVER occur
4587							inside a field in the text file; in addition, the current platform's
4588							definition of C<"\n"> will NEVER occur; these characters if present in
4589							field data, MUST have been encoded to some safe character string
4590							before the file was created.
4591
4592							Each field may OPTIONALLY be surrounded with double-quote marks.
4593							However, if the field data itself contains either a double-quote
4594							character (C<">) or the current file's field delimiter (such as tab or
4595							comma), then the field MUST be surrounded with double-quotes.
4596							(Currently, all data written by Data::CTable have all field values
4597							surrounded by double-quotes, but a more selective policy may be used
4598							in the future.)
4599
4600							If a field contains a double-quote character, then each double-quote
4601							character in the field must be encoded as C<""> -- i.e. each C<"> in the
4602							original data becomes C<""> in the text file.
4603
4604							Data files may not mix line-ending types or field delimiter types.
4605							Once determined, the same endings and delimiters will be used to read
4606							the entire file.
4607
4608							The fields recognized on each line will either be determined by
4609							the header row or the _FieldList provided by the caller. Any extra
4610							fields on any given line will be ignored. Any missing fields will be
4611							treated as undef/empty.
4612
4613							If you are having trouble reading a delimited text file, check that
4614							all data in the file obeys these assumptions.
4615
4616							=cut
4617
4618							sub read ## Read, cacheing if possible
4619							{
4620	79			79	1	166	my $this = shift;
4621	79					383	return($this->read_file_or_cache(@_));
4622							}
4623
4624							sub read_file ## Read, ignoring cacheing
4625							{
4626	19			19	0	52	my $this = shift;
4627	19	50				87	my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
4628
4629	190					361	my($FileName, $FieldList, $MaxRecords, $LineEnding, $FDelimiter, $ReturnMap, $ReturnEncoding, $MacRomanMap, $HeaderRow, $IgnoreQuotes)
4630	19					49	= map {$this->getparam($Params, $_)}
4631							qw(_FileName _FieldList _MaxRecords _LineEnding _FDelimiter _ReturnMap _ReturnEncoding _MacRomanMap _HeaderRow _IgnoreQuotes);
4632
4633	19					39	my $Success;
4634
4635							## Default error message is none.
4636	19					40	$this->{_ErrorMsg} = "";
4637
4638							## Default for HeaderRow is true.
4639	19	50				148	$HeaderRow = 1 unless defined($HeaderRow);
4640
4641							## Default for ReturnEncoding is "\x0B" (control-K; ASCII 11)
4642	19	50				56	$ReturnEncoding = "\x0B" unless length($ReturnEncoding);
4643
4644							## Default for ReturnMap is true.
4645	19	50				44	$ReturnMap = 1 unless defined($ReturnMap);
4646
4647							## Default for MacRomanMap is undef ("Auto");
4648	19	100				196	$MacRomanMap = undef unless defined($MacRomanMap);
4649
4650							## Default for MaxRecords is 0 (meaning import all records)
4651	19	50				71	$MaxRecords = 0 unless (int($MaxRecords) == $MaxRecords);
4652
4653							## Precompile a regex for the return encoding since we'll call it often (on each field!) later.
4654	19					154	my $RetRegex = qr/$ReturnEncoding/;
4655
4656	19					87	$this->progress("Reading $FileName...");
4657
4658							## Open the data file.
4659	0					0	my $File = IO::File->new("<$FileName") or
4660	19	50				143	do {$this->{_ErrorMsg} = "Failed to open $FileName: $!"; goto done};
	0					0
4661
4662							## Get its total file size (useful for estimating table size later on).
4663	19	50				3027	my $FileSize = (stat($File))[7] or
4664							$this->{_ErrorMsg} = "File $FileName contains no data.", goto done;
4665
4666							## Convert from optional "dos", "mac", "unix" symbolic values.
4667	19					77	$LineEnding = $this->lineending_string($LineEnding);
4668
4669							## Default for LineEnding is found by inspecting data in the file.
4670	19	50	33			129	$LineEnding \|\|= guess_endings($File) or
4671							$this->{_ErrorMsg} = "Could not find any line endings in the file $FileName.", goto done;
4672
4673							## DoMacMapping is the actual setting for auto charset mapping
4674	19		66			140	my $DoMacMapping =
4675							((!defined($MacRomanMap) && ($LineEnding eq "\x0D")) \|\| ## Auto
4676							($MacRomanMap)); ## On
4677
4678	19	100				84	$this->progress("Will convert upper-ascii characters if any, from Mac Roman to ISO 8859-1.") if $DoMacMapping;
4679
4680							## FieldList is usable is it is a list and has at least one entry.
4681	19		66			70	my $FieldListValid = ((ref($FieldList) eq 'ARRAY') && @$FieldList);
4682
4683							## Set <$File> to use the line ending sequence we no known we are looking for.
4684	19					95	local $/ = $LineEnding;
4685
4686							## We use $_ explicitly, so must localize.
4687	19					23	local $_;
4688
4689	19					26	my $IncomingFields;
4690
4691	19	50				48	if ($HeaderRow)
4692							{
4693							## Get the list of fields available in the file (first line of file).
4694
4695	19	50				200	$_ = <$File> or
4696							$this->{_ErrorMsg} = "Could not find a first line with field names in $FileName.", goto done;
4697
4698							## Try to guess file delimiter from the header row if not yet specified.
4699	19	50	33			106	$FDelimiter \|\|= guess_delimiter($_) or
4700							$this->{_ErrorMsg} = "Could not find comma or tab delimiters in $FileName.", goto done;
4701
4702							## Maybe convert entire line (all records) Mac to ISO before splitting.
4703	19	100				93	&MacRomanToISORoman8859_1(\ $_) if $DoMacMapping;
4704
4705	19					34	chomp;
4706
4707	19					44	s/^\"//; s/\"$//; ## remove possible leading, trailing quotes surrounding header row (rare)
	19					35
4708
4709							## Split header row into field names, removing optional "" around each at the same time.
4710	19					315	$IncomingFields = [split(/\"?$FDelimiter\"?/, $_)];
4711
4712							## Strip any leading and/or trailing control chars or spaces from field names in header.
4713	19					59	$IncomingFields = [map {s{(?:\A[\x00-\x20]+)\|(?:[\x00-\x20]+\Z)}{}g; $_;} @$IncomingFields];
	79					218
	79					184
4714
4715							}
4716							else
4717							{
4718							## Otherwise, require that the caller specifies it in _FieldList
4719
4720	0	0				0	$this->{_ErrorMsg} = "Must specify a _FieldList if _HeaderRow says no header row is present.", goto done
4721							unless $FieldListValid;
4722
4723	0					0	$IncomingFields = [@$FieldList];
4724							}
4725
4726							## Remove any leading underscores in the names of the incoming
4727							## fields (not allowed because such field names are reserved for
4728							## other object data). Note: this could result in
4729							## duplicate/overwritten field names that were otherwise
4730							## apparently unique in the incoming data file.
4731
4732	19					64	$IncomingFields = [map {(/^_(.)/)[0]} @$IncomingFields];
	79					293
4733
4734							## Make a hash that can be used to map these fields' names to their numbers.
4735	29			29		37801	my $IncomingFieldNameToNum = {}; @$IncomingFieldNameToNum{@$IncomingFields} = ($[ .. $#$IncomingFields);
	29					32209
	29					152618
	19					59
	19					214
4736
4737							## Make a list of the fields we'll be importing (by taking the
4738							## list the caller requested, and paring it down to only those
4739							## fields that are actually available in the table.)
4740
4741	71					179	my $FieldsToGet =
4742	19	100				79	[grep {exists($IncomingFieldNameToNum->{$_})}
4743							($FieldListValid ? @$FieldList : @$IncomingFields)];
4744
4745							## Make a note of whether we're getting a subset of available
4746							## fields because the caller requested such. If we are, we'll add
4747							## a _Subset => 1 marker to the data for use later in ensuring the
4748							## cache is OK.
4749
4750	19		100			128	my $GettingSubset = ($FieldListValid && ("@{[sort @$IncomingFields]}" ne
4751							"@{[sort @$FieldList ]}"));
4752
4753							## Make an array of the incoming indices of these fields.
4754
4755							## Allocate a list of empty arrays into which we can import the
4756							## data. Initially they'll each have 100 empty slots for data;
4757							## after we have imported 100 records, we'll re-consider the size
4758							## estimate. When we're all done, we'll prune them back.
4759
4760	19					72	my $FieldNums = [@$IncomingFieldNameToNum{@$FieldsToGet}];
4761	19					34	my $FieldVectors = []; foreach (@$FieldNums) {$#{$FieldVectors->[$_] = []} = 100};
	19					41
	71					89
	71					325
4762
4763							## We want to be cool and support any embedded NULL (ascii zero)
4764							## characters should they exist in the data, even though we are
4765							## going to use NULL chars to encode embedded delimiters before we
4766							## split....
4767
4768							## First we create a sufficiently obscure placeholder for any
4769							## ascii zero characters in the input text (a rare occurrence
4770							## anyway).
4771
4772	19					45	my $ZeroMarker = "\001ASCII_ZERO\001";
4773
4774							## Now ready to go through the file line-by-line (record-by-record)
4775
4776	19					22	my $WroteProg;
4777	19					28	my $RecordsRead = 0;
4778	19					86	while (<$File>)
4779							{
4780							## Try to guess file delimiter from the header row if not yet specified.
4781	57	50	33			187	$FDelimiter \|\|= guess_delimiter($_) or
4782							$this->{_ErrorMsg} = "Could not find comma or tab delimiters in $FileName.", goto done;
4783
4784							## Maybe convert entire line (all records) ISO to Mac before splitting.
4785	57	100				173	&MacRomanToISORoman8859_1(\ $_) if $DoMacMapping;
4786
4787							## Manipulate the single line of data fields into a splittable format.
4788
4789	57					86	chomp;
4790
4791							## Replace any delimiters inside quotes with ASCII 0.
4792							## Split fields on delimiters.
4793							## Delete leading or trailing quote marks from each field.
4794							## Restore delimiters ASCII 0 back to delimiters.
4795
4796							## Protect delimiters inside fields.
4797	57					89	s/\000/$ZeroMarker/go; ## Preserve genuine ASCII 0 chars.
4798	57					73	my $InQuote = 0; ## Initialize InQuote flag to zero.
4799	57	50				111	unless ($IgnoreQuotes) {
4800	57					524	s/(\")\|($FDelimiter)/ ## Replace delimiters inside quotes with ASCII 0 ...
4801	180	50				1169	($1 ? do{$InQuote^=1; $1} : ## ... if quote char, toggle InQuote flag
	0	50				0
	0					0
4802							($InQuote ? "\000" : $2))/eg; ## ... if delimiter, InQuote sez whether to replace or retain.
4803							}
4804
4805							## Split record into fields, then clean each field.
4806
4807	57					517	my $regex = qr/$FDelimiter/;
4808	57	50				132	unless ($IgnoreQuotes) {
4809	57					301	$regex = qr/\"?$FDelimiter\"?/;
4810							}
4811	57					144	s/^\"//; s/\"$//; ## Kill leading, trailing quotes surrounding each record
	57					82
4812							my @FieldVals =
4813							map
4814	57	50				415	{if (length($_))
	237					456
4815							{
4816	237					290	s/\"\"/\"/g; ## Restore Merge format's quoted double-quotes. ("" ==> ")
4817	237					258	s/\000/$FDelimiter/g; ## Restore delimiters inside fields
4818	237					571	s/\Q$ZeroMarker\E/\000/go; ## Restore preserved ASCII 0 chars.
4819	237	50				1950	s/$RetRegex/\n/g if $ReturnMap;## Restore return characters that were coded as ASCII 11 (^K)
4820							}
4821	237					542	$_;} ## Return field val after above mods.
4822							split($regex, $_); ## Split on delimiters, killing optional surrounding quotes at same time.
4823
4824							## Put the data into the vectors
4825	57					234	foreach (@$FieldNums)
4826							{
4827	213	50				681	$FieldVectors->[$_]->[$RecordsRead] = $FieldVals[$_] if (length($FieldVals[$_]));
4828							}
4829	57					84	$RecordsRead++;
4830
4831							## Stop if we've read all the records we wanted.
4832	57	50	33			226	last if ($MaxRecords && ($RecordsRead >= $MaxRecords));
4833
4834							## Optimization:
4835
4836							## After importing 100, 200, 300, 400, etc. records, we
4837							## re-estimate the size of the table. To help make the field
4838							## insertion more efficient (by avoiding frequent
4839							## array-resizing), we can set the sizes of the field vectors
4840							## to hold at least our estimated number of records).
4841							## Ideally, this estimation/resize step will happen at most 2
4842							## or 3 times no matter how big the incoming data file is.
4843
4844	57					58	my $EstTotalRecords;
4845	57	50	33			276	if ((($RecordsRead % 100) == 0) && ## If we're on a record divisble by 100...
	0					0
4846							(($RecordsRead + 100 > $#{$FieldVectors->[$FieldNums->[0]]}))) ## ... and we're getting close to max size...
4847							{
4848							## Then estimate the size we'd like to resize it to.
4849	0					0	$EstTotalRecords = (100 + int($RecordsRead * ($FileSize / tell($File))));
4850	0	0	0			0	$EstTotalRecords = $MaxRecords if ($MaxRecords && ($MaxRecords < $EstTotalRecords));
4851
4852							## If this size is greater than the actual size...
4853	0	0				0	if ($EstTotalRecords > $#{$FieldVectors->[$FieldNums->[0]]})
	0					0
4854							{
4855							## Then resize all the vectors.
4856							## $this->progress("$RecordsRead: Resizing to $EstTotalRecords...\n"); ## Debugging
4857	0					0	foreach (@$FieldNums) {$#{$FieldVectors->[$_]} = $EstTotalRecords};
	0					0
	0					0
4858							}
4859							}
4860
4861							## Try doing timed (throttled to 1 per 2 secs) progress at
4862							## most every 100th record.
4863	57	0				137	my $Did = ($EstTotalRecords ?
		50
4864							$this->progress_timed("Reading", "$RecordsRead of $EstTotalRecords (est.)", tell($File), $FileSize, 1) :
4865							$this->progress_timed("Reading", "$RecordsRead" , tell($File), $FileSize, 1))
4866							if (($RecordsRead % 100) == 0);
4867	57		33			543	$WroteProg \|\|= $Did;
4868							}
4869
4870							## If we wrote timed progress but didn't get to give the 100%
4871							## message yet, print the 100% message now.
4872
4873	19	50				52	if ($WroteProg)
4874							{
4875	0	0				0	$this->progress_timed("Reading", "$RecordsRead of $RecordsRead", $FileSize, $FileSize, 1)
4876							unless (($RecordsRead % 100) == 0);
4877							}
4878
4879							## Print the regular Done message.
4880	19					110	$this->progress("Read $FileName.");
4881
4882							## Set the field vectors' length to the exact length we really
4883							## read.
4884
4885							## $this->progress("$RecordsRead: Truncating to @{[$RecordsRead - 1]}... \n"); ## Debugging
4886	19					144	foreach (@$FieldNums) {$#{$FieldVectors->[$_]} = ($RecordsRead - 1)};
	71					95
	71					318
4887
4888							## Delete any existing columns in the object.
4889	19					30	delete @$this{@{$this->fieldlist_all()}};
	19					66
4890
4891							## Put the new columns into the object.
4892	19					96	@$this{@$FieldsToGet} = @$FieldVectors[@$FieldNums];
4893
4894							## Set fieldlist to fields we actually read, in order.
4895	19					69	$this->fieldlist($FieldsToGet);
4896
4897							## Remember the line ending char or chars that were successfully
4898							## used to read the file. The same ending will be used by default
4899							## to write any file based on this object.
4900	19					69	$this->{_LineEnding} = $this->lineending_symbol($LineEnding);
4901
4902							## Remember the field delimiter that was successfully used to read
4903							## the file. The same delimiter will be used by default to write
4904							## any file based on this object.
4905	19					68	$this->{_FDelimiter} = $FDelimiter;
4906
4907							## Remember the header row setting.
4908	19					30	$this->{_HeaderRow} = $HeaderRow;
4909
4910							## Remember the filename used for reading.
4911	19					37	$this->{_FileName} = $FileName;
4912
4913							## Remember whether we read (and maybe will cache) a subset of available fields.
4914	19		100			105	$this->{_Subset} = $GettingSubset \|\| 0;
4915
4916	19		50			82	$this->{_IgnoreQuotes} = $IgnoreQuotes \|\| 0;
4917
4918							## Clean out _Selection and verify _SortOrder to ensure compatibility
4919							## with current _FieldList.
4920	19					60	$this->read_postcheck();
4921
4922							## Other informational data and options, like sort specs, sort
4923							## routines and so on, need not be changed or replaced when data
4924							## changes.
4925
4926	19					28	$Success = 1;
4927
4928	19	50				45	done:
4929
4930							$this->warn("FAILURE: $this->{_ErrorMsg}") unless $Success;
4931
4932	19	50				277	close $File if $File;
4933	19					553	return($Success);
4934							}
4935
4936							sub read_postcheck ## Called to clean up after a successful read
4937							{
4938	79			79	0	129	my $this = shift;
4939
4940							## Run select_all to empty out the _Selection.
4941	79					308	$this->select_all();
4942
4943							## Remove any bogus field names from the sort order, if any.
4944	79					309	$this->sortorder_check();
4945							}
4946
4947							sub read_file_or_cache ## Read, cacheing if possible
4948							{
4949	79			79	0	132	my $this = shift;
4950	79					251	$DB::single = 1;
4951	79	100				438	my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
4952
4953	79					1364	my($FileName, $FieldList, $CacheOnRead, $CacheExtension, $CacheSubDir, $IgnoreQuotes) = map {$this->getparam($Params, $_)}
	474					1059
4954							qw(_FileName _FieldList _CacheOnRead _CacheExtension _CacheSubDir, _IgnoreQuotes);
4955
4956	79					542	my $Success;
4957
4958							## If cacheing is turned off, just bail prematurely and treat this
4959							## as a call to read_file().
4960
4961	79	100				318	return($this->read_file(%$Params)) unless $CacheOnRead;
4962
4963							## Otherwise... check if cacheing is possible.
4964
4965							## Calculate the cache file name. If it comes back empty, it
4966							## means the cache directory probably could not be created, or the
4967							## cache file itself either does not exist or could not be
4968							## preflighted (either read or touched/deleted).
4969
4970	68					332	my $CacheFileName = $this->prep_cache_file($FileName, $CacheExtension, $CacheSubDir);
4971
4972							## If the cache file preflight failed, treat this as a regular
4973							## read_file() without cacheing.
4974
4975	68	50				390	return($this->read_file(%$Params)) unless length($CacheFileName);
4976
4977							## At this point we believe we'll either be able to read or write
4978							## the cache file as needed.
4979
4980							## Try to read the cache if both files exist and the mod date is
4981							## later.
4982
4983	68					117	my $Data;
4984	68	100	66			6840	if ((-e $FileName ) &&
			100
4985							(-e $CacheFileName ) &&
4986							(((stat($CacheFileName))[9]) > ((stat($FileName))[9])) )
4987							{
4988	63					457	$this->progress("Thawing $CacheFileName...");
4989							eval
4990	63					333	{
4991	63					355	$Data = &retrieve($CacheFileName);
4992							};
4993	63	50				9820	$this-warn("Cache restore from $CacheFileName failed: $!"), unlink($CacheFileName)
4994							unless defined ($Data);
4995							}
4996
4997	68	100				330	if (ref($Data) eq 'HASH')
4998							{
4999							## Retrieval succeeded.
5000
5001							## Verify that the data in the cache is usable.
5002
5003							## First, check newline-encoding compatibility.
5004							(
5005	63	50				333	$this->warn("Abandoning cache due to incompatible newline encoding"),
5006							unlink $CacheFileName, goto cache_failed) unless $Data->{_Newline} eq "\n";
5007
5008							## Simulate an actual read_file() using data
5009							## from the cache instead of the original file.
5010
5011	63	100	66			425	if ((ref($FieldList) eq 'ARRAY') && @$FieldList)
		100
5012							{
5013							## If $FieldList requests fields not found in the cache,
5014							## we abandon (delete and maybe rewrite) the cache: maybe
5015							## we previously cached a different subset of fields from
5016							## a previous request and so the cache is no longer
5017							## adequate.
5018
5019	18					45	my $MissingFields = [grep {!exists($Data->{$_})} @$FieldList];
	48					145
5020							(
5021							## $this->warn("Abandoning cache due to change in requested field list"),
5022	18	100				95	unlink $CacheFileName, goto cache_failed) if @$MissingFields;
5023
5024							## If there was a _FieldList supplied in $Params or $this,
5025							## we might need to omit any fields read from cache but
5026							## not mentioned (just as read_file() would have done).
5027
5028	16					29	my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
	16					130
5029	16					91	my $OmitFields = [grep {!exists($FieldHash->{$_})} grep {!/^_/} keys %$Data];
	64					145
	176					607
5030
5031	16					62	delete @$Data{@$OmitFields};
5032
5033							## Finally, also pare down _FieldList to only mention
5034							## those fields that were desired....
5035
5036	16					34	my $AvailFields = $Data->{_FieldList};
5037	16					32	$Data->{_FieldList} = [grep {exists($Data->{$_})} @$FieldList];
	42					121
5038
5039							## We might have ended up reading a subset of the
5040							## available fields in the cache. (This is logically
5041							## equivalent to reading a subset of the available fields
5042							## in the real file.) If so, then change the value of
5043							## $Data->{_Subset} to indicate that.
5044
5045	16		100			110	my $GettingSubset = (@$OmitFields && 1 \|\| 0);
5046	16		66			115	$Data->{_Subset} \|\|= $GettingSubset;
5047							}
5048							elsif ($Data->{_Subset})
5049							{
5050
5051							## Conversely, if no field list was specified (hence all
5052							## fields are desired), but a subset of fields has
5053							## previously been cached, we have to delete / abandon the
5054							## cache and re-read the file so we are getting all
5055							## fields.
5056
5057							(
5058							## $this->warn("Abandoning partial cache due to request of full field list"),
5059	1					24	unlink $CacheFileName, goto cache_failed);
5060							}
5061
5062							## Copy all elements from $Data, including possibly overridden
5063							## _FieldList element if any, but excepting the cache-only
5064							## _Newline element, into $this.
5065
5066	60					163	$Data->{_IgnoreQuotes} = $this->{_IgnoreQuotes};
5067	60					154	delete $Data->{_Newline};
5068	60					530	@$this{keys %$Data} = values %$Data;
5069
5070							## If more records were written to the cache than are now
5071							## desired, warn and truncate. This is potentially not good
5072							## because in the reverse case (reading/saving the first time
5073							## with a limit then re-reading with no limit) would not be
5074							## caught.
5075
5076	60	50	33			337	if ($this->{_MaxRecords} && $this->length() > $this->{_MaxRecords})
5077							{
5078	0					0	$this->warn("Truncating length of cached table (@{[$this->length()]}) to requested length (@{[$this->{_MaxRecords}]})");
	0					0
	0					0
5079	0					0	$this->length($this->{_MaxRecords});
5080							}
5081
5082							## Set the file name to the name of the original (not the
5083							## cache) file, just as read() would have done.
5084
5085	60					379	$this->{_FileName} = $FileName;
5086
5087							## Run the "read post-check" -- the same things we do in
5088							## read_file() after completing the read process: Clean out
5089							## _Selection and verify _SortOrder to ensure compatibility
5090							## with current _FieldList.
5091
5092	60					271	$this->read_postcheck();
5093
5094	60					420	$this->progress("Thawed $FileName.");
5095
5096	60					4832	$Success = 1;
5097	60					1379	goto done; ## Successful completion: we read from the cache.
5098							}
5099
5100							## Could not retrieve for whatever reason (maybe cache did not
5101							## exist yet or was out of date or had to be abandoned). So just
5102							## read normally and possibly write the cache.
5103
5104							cache_failed:
5105							{
5106	8	50				15	$Success = $this->read_file(%$Params) or goto done;
	8					44
5107
5108							## Now, having read successfully, we try to write the cache
5109							## for next time. Writing the cache is optional; failing to
5110							## write it is not a failure of the method.
5111
5112							{ ## Code in this block may fail and that's OK.
5113
5114							## First, pre-flight.
5115	8	50				15	$this->warn("Cache file $CacheFileName cannot be created/overwritten: $!"),
	8					30
5116							goto done ## Successful completion.
5117							unless $this->try_file_write($CacheFileName);
5118
5119							## The data to be stored is:
5120
5121							## 1) All data columns read by read_file()
5122							## 2) Any parameters set by read_file()
5123							## 3) _Subset param indicating partial fieldlist was read from file.
5124							## 4) _Newline setting so we know if it's compatabile when read back.
5125
5126							## No other parameters should be cached because we want a
5127							## read from the cache to produce exactly the same result
5128							## as a read from the file itself would have produced.
5129
5130							## After a read, fieldlist() will contain the fields
5131							## actually read, so cols_hash() WILL yield all the
5132							## columns.
5133
5134	8					37	my $Data = {(
5135							## Refs to each column read by read_file()
5136							%{ $this->cols_hash() },
5137
5138							## Other parameters set by read_file()
5139							_FieldList => $this->{_FieldList },
5140							_LineEnding => $this->{_LineEnding},
5141							_FDelimiter => $this->{_FDelimiter},
5142							_HeaderRow => $this->{_HeaderRow },
5143							_IgnoreQuotes => $this->{_IgnoreQuotes},
5144							_Subset => $this->{_Subset },
5145	8					19	_Newline => "\n",
5146							)};
5147
5148	8	50				62	$this->warn("Failed to cache $CacheFileName"),
5149							unlink($CacheFileName),
5150							goto done ## Successful completion.
5151							unless $this->write_cache($Data, $CacheFileName);
5152	8					324	chmod 0666, $CacheFileName; ## Liberal perms if possible.
5153							}
5154
5155	8					83	goto done; ## Successful completion: we read from the file & maybe saved cache.
5156							}
5157
5158							done:
5159	68					581	return ($Success);
5160							}
5161
5162							=pod
5163
5164							=head1 WRITING DATA FILES
5165
5166							## Writing some or all data from table into a data file
5167
5168							$t->write($Path) ## Simple calling convention
5169
5170							$t->write( ## Named-parameter convention
5171
5172							## Params that override params in the object if supplied...
5173
5174							_FileName => $Path, ## "Base path"; see _WriteExtension
5175
5176							_WriteExtension=> ".out",## Insert/append extension to _FileName
5177
5178							_FieldList => [...], ## Fields to write; others ignored
5179							_Selection => [...], ## Record (#s) to write; others ignored
5180
5181							_HeaderRow => 0, ## Include header row in file
5182
5183							_LineEnding => undef, ## Record delimiter (default is "\n")
5184							_FDelimiter => undef, ## Field delimiter (default is comma)
5185
5186							_ReturnMap => 1, ## Whether to encode internal returns
5187							_ReturnEncoding=>"\x0B", ## How to encode returns
5188							_MacRomanMap => undef, ## Whether/when to write Mac char set
5189
5190
5191							_CacheOnWrite => 1, ## Enable saving cache after write()
5192							_CacheExtension=> ".x", ## Extension to add to cache file name
5193							_CacheSubDir => "", ## (Sub-)dir, if any, for cache files
5194
5195							## Params specific to the read()/write() methods...
5196
5197							_MaxRecords => 200, ## Limit on how many records to write
5198							)
5199
5200							$t->write_file() ## Internal: same as write(); ignores cacheing
5201
5202							write() writes a Merge, CSV, or Tab-delimited file.
5203
5204							It uses parameters as described above. Any parameters not supplied
5205							will be gotten from the object.
5206
5207							Using the simple calling convention, just pass it a path which will
5208							override the _FileName parameter in the object, if any.
5209
5210							All other parameters will come from the object (or will be defaulted
5211							if absent).
5212
5213							If no _FileName or path is specified, or it is the special string "-"
5214							(dash), then the file handle \ * STDOUT will be used by default (and
5215							you could redirect it to a file). You can supply any open file handle
5216							or IO::File object of your own for the _FileName parameter.
5217
5218							If write() is writing to a file handle by default or because you
5219							specified one, then no write-cacheing will occur.
5220
5221							To specify additional parameters or override any parameters in the
5222							object while reading, use the named-parameter calling convention.
5223
5224							If the object's data was previously filled in using new() or read(),
5225							then the file format parameters from the previous read() method will
5226							still be in the object, so the format of the written file will
5227							correspond as much as possible to the file that was read().
5228
5229							write() returns the path name of the file actually written, or the
5230							empty string if a supplied file handle or STDOUT was written to, or
5231							undef if there was a failure.
5232
5233							If write() returns undef, then it will also have set the _ErrorMsg
5234							parameter in the object.
5235
5236							write() never modifies any data in the object itself.
5237
5238							Consequently, if you specify a _FieldList or a _Selection, only those
5239							fields or records will be written, but the corresponding parameters in
5240							the object itself will be left untouched.
5241
5242							=head2 How write() calculates the Path
5243
5244							The _FileName parameter is shared with the read() method. This
5245							parameter is set by read() and may be overridden when calling write().
5246
5247							In the base implementation of Data::CTable, write() will try not to
5248							overwrite the same file that was read, which could possibly cause data
5249							loss.
5250
5251							To avoid this, it does not use the _FileName parameter directly.
5252							Instead, it starts with _FileName and inserts or appends the value of
5253							the _WriteExtension parameter (which defaults to ".out") into the file
5254							name before writing.
5255
5256							If the _FileName already has an extension at the end, write() will
5257							place the _WriteExtension BEFORE the final extension; otherwise the
5258							_WriteExtension will be placed at the end of the _FileName.
5259
5260							For example:
5261
5262							Foobar.txt ==> Foobar.out.txt
5263							Foobar.merge.txt ==> Foobar.merge.out.txt
5264							My_Merge_Data ==> My_Merge_Data.out
5265
5266							If you DON'T want write() to add a _WriteExtension to _FileName before
5267							it writes the file, then you must set _WriteExtension to empty/undef
5268							either in the object or when calling write(). Or, you could make a
5269							subclass that initializes _WriteExtension to be empty. If
5270							_WriteExtension is empty, then _FileName will be used exactly, which
5271							may result in overwriting the original data file.
5272
5273							Remember: write() returns the path name it actually used to
5274							successfully write the file. Just as with read(), if the _FileName
5275							you specified did not have a path, then write() will prepend a path
5276							component indicating "current directory" (e.g. "./" on Unix) and this
5277							will be part of the return value.
5278
5279
5280							=head2 Cacheing with write()
5281
5282							By default, Data::CTable only creates a cached version of a file when
5283							it reads that file for the first time (on the assumption that it will
5284							need to read the file again more often than the file's data will
5285							change.)
5286
5287							But by default, it does not create a cached version of a file when
5288							writing it, on the assumption that the current program probably will
5289							not be re-reading the written file and any other program that wants to
5290							read it can cache it at that time.
5291
5292							However, if you want write() to create a cache for its output file, it
5293							is much faster to create it on write() than waiting for the next
5294							read() because the next read() will be able to use the cache the very
5295							first time.
5296
5297							To enable write-cacheing, set _CacheOnWrite to true. Then, after the
5298							write() successfully completes (and only if it does), the cached
5299							version will be written.
5300
5301							=head1 FORMATTED TABLES (Using Data::ShowTable)
5302
5303							## Get formatted data in memory
5304
5305							my $StringRef = $t->format(); ## Format same data as write()
5306							my $StringRef = $t->format(10); ## Limit records to 10
5307							my $StringRef = $t->format(...); ## Specify arbitrary params
5308							print $$StringRef;
5309
5310							## Write formatted table to file or terminal
5311
5312							$t->out($Dest, ....);## $Dest as follows; other params to format()
5313							$t->out($Dest, 10, ....) ## Limit recs to 10; params to format()
5314
5315							$t->out() ## print formatted data to STDOUT
5316							$t->out(\*STDERR) ## print to STDERR (or any named handle)
5317							$t->out("Foo.txt") ## print to any path (file to be overwritten)
5318							$t->out($FileObj) ## print to any object with a print() method
5319
5320							out() takes a first argument specifying a destination for the output,
5321							then passes all other arguments to format() to create a nice-looking
5322							table designed to be human-readable; it takes the resulting buffer and
5323							print()s it to the destination you specified.
5324
5325							Sample output:
5326
5327							+-------+------+-----+-------+
5328							\| First \| Last \| Age \| State \|
5329							+-------+------+-----+-------+
5330							\| Chris \| Zack \| 43 \| CA \|
5331							\| Marco \| Bart \| 22 \| NV \|
5332							\| Pearl \| Muth \| 15 \| HI \|
5333							+-------+------+-----+-------+
5334
5335							(Note extra space character before each line.)
5336
5337							The destination may be a file handle (default if undef is \*STDOUT), a
5338							string (treated as a path to be overwritten), or any object that has a
5339							print() method, especially an object of type IO::File.
5340
5341							The main purpose of out() is to give you a quick way to dump a table
5342							when debugging. out() calls format() to create the output, so read
5343							on...
5344
5345							format() produces a human-readable version of a table, in the form of
5346							a reference to a string buffer (which could be very large), and
5347							returns the buffer to you. Dereference the resulting string reference
5348							before using.
5349
5350							If format() is given one argument, that argument is the _MaxRecords
5351							parameter, which limits the length of the output.
5352
5353							Otherwise, format() takes the following named-parameter arguments,
5354							which can optionally override the corresponding parameters, if any, in
5355							the object:
5356
5357							_FieldList ## Fields to include in table
5358							_Selection ## Records to be included, in order
5359
5360							_SortSpecs ## SortType controls number formatting
5361							_DefaultSortType
5362
5363							_MaxRecords ## Limit number of records output
5364
5365							_MaxWidth ## Limit width of per-col. data in printout
5366
5367							format() will obey _MaxRecords, if you'd like to limit the number of
5368							rows to be output. _MaxRecords can also be a single argument to
5369							format(), or a second argument to out() if no other parameters are
5370							passed.
5371
5372							format() also recognizes the _SortSpecs->{SortType} and
5373							_DefaultSortType parameters to help it determine the data types of the
5374							fields being formatted. Fields of type "Number" are output as
5375							right-justified floats; "Integer" or "Boolean" are output as
5376							right-justified integers, and all others (including the default:
5377							String) are output as left-justified strings.
5378
5379							In addition, there is one parameter uniquely supported by format() and
5380							out():
5381
5382							=over 4
5383
5384							=item _MaxWidth \|\|= 15;
5385
5386							=back
5387
5388							_MaxWidth specifies the maximum width of columns. If unspecifed, this
5389							will be 15; the minimum legal value is 2. Each column may actually
5390							take up 3 more characters than _MaxWidth due to divider characters.
5391
5392							The data to be output will be examined, and only the necessary width
5393							will be used for each column. _MaxWidth just limits the upper bound,
5394							not the lower.
5395
5396							Data values that are too wide to fit in _MaxWidth spaces will be
5397							truncated and the tilde character "~" will appear as the last
5398							character to indicate the truncation.
5399
5400							Data values with internal returns will have the return characters
5401							mapped to slashes for display.
5402
5403							format() and out() will NOT wrap entries onto a second line,
5404							like you may have seen Data::ShowTable::ShowBoxTable do in some cases.
5405							Each record will get exactly one line.
5406
5407							format() and out() ignore the _HeaderRow parameter. A header
5408							row showing the field names is always printed.
5409
5410							format() and out() make no attempt to map upper-ascii characters from
5411							or to any particular dataset. The encoding used in memory (generally
5412							ISO 8859-1 by default) is the encoding used in the output. If you
5413							want to manipulate the encoding, first call format(), then change the
5414							encoding, then format the resulting table.
5415
5416							=cut
5417
5418							sub write ## Write, cacheing afterward if possible
5419							{
5420	8			8	1	55	my $this = shift;
5421	8					41	return($this->write_file_and_cache(@_));
5422							}
5423
5424							sub write_file_and_cache ## Write, cacheing afterward if possible
5425							{
5426	8			8	0	14	my $this = shift;
5427
5428	8	50				39	my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
5429
5430	8					91	my($FieldList, $LineEnding, $FDelimiter, $HeaderRow, $CacheOnWrite, $CacheExtension, $CacheSubDir, $IgnoreQuotes) = map {$this->getparam($Params, $_)}
	64					502
5431							qw($FieldList _LineEnding _FDelimiter _HeaderRow _CacheOnWrite _CacheExtension _CacheSubDir _IgnoreQuotes);
5432
5433							## First write the file and go to done if it failed.
5434	8	50				42	my $WriteFileName = $this->write_file(@_) or goto done;
5435
5436							## Only try to cache if we got a non-empty $WriteFileName back.
5437							## We won't get a name back in the case where we wrote directly to
5438							## an open file handle.
5439
5440	8	50				25	goto done unless $WriteFileName;
5441
5442							## Only try to cache if $CacheOnWrite has been turned ON.
5443	8	50				53	goto done unless $CacheOnWrite;
5444
5445							## Now, having written successfully, we try to write the cache for
5446							## next time. Writing the cache is always optional; failing to
5447							## write it is not a failure of this method. Consequently, any
5448							## "goto done" statements beyond this point will still result in a
5449							## successful outcome since $WriteFileName will have a value.
5450
5451							## Calculate the name of the cache file and fail the directory
5452							## creation fails. prep_cache_file will have generated a warning
5453							## if an attempt to create needed subdirectories has failed.
5454
5455	0	0				0	my $CacheFileName = $this->prep_cache_file($WriteFileName, $CacheExtension, $CacheSubDir)
5456							or goto done;
5457
5458							## Pre-flight the cache file for writing.
5459	0	0				0	$this->warn("Cache file $CacheFileName cannot be created/overwritten: $!"),
5460							goto done ## Successful completion.
5461							unless $this->try_file_write($CacheFileName);
5462
5463							## The data to be stored is:
5464
5465							## 1) All data columns written by write_file()
5466							## 2) Any file format parameters used by write_file()
5467
5468							## Calculate the main writing-related parameters using the same
5469							## logic that write_file() uses...
5470
5471							## Default for FieldList is all fields.
5472	0		0			0	$FieldList \|\|= $this->fieldlist();
5473
5474							## Convert from optional "dos", "mac", "unix" symbolic values.
5475	0					0	$LineEnding = $this->lineending_string($LineEnding);
5476
5477							## Default for LineEnding is "\n" (CR on Mac; LF on Unix; CR/LF on DOS)
5478	0	0				0	$LineEnding = "\n" unless length($LineEnding);
5479
5480							## Default for FDelimiter is comma
5481	0	0				0	$FDelimiter = ',' unless length($FDelimiter);
5482
5483							## Default for HeaderRow is true.
5484	0	0				0	$HeaderRow = 1 unless defined($HeaderRow);
5485
5486							## No other parameters should be cached because we want a
5487							## read from the cache to produce exactly the same result
5488							## as a read from the file itself would have produced.
5489
5490	0					0	my $Data = {(
5491							## Refs to each column written
5492	0		0			0	%{ $this->cols_hash($FieldList)},
5493
5494							## Other relevant file-format parameters
5495							_FieldList => $FieldList,
5496							_LineEnding => $LineEnding,
5497							_FDelimiter => $FDelimiter,
5498							_HeaderRow => $HeaderRow,
5499							_Subset => $this->{_Subset} \|\| 0,
5500							_Newline => "\n",
5501
5502							## We don't need to save _ReturnMap and
5503							## _ReturnEncoding because those only are relevant
5504							## when reading physical files. Cached data has the
5505							## return chars already encoded as returns.
5506
5507							)};
5508
5509	0	0				0	$this->warn("Failed to cache $CacheFileName"),
5510							unlink($CacheFileName), ## Delete cache if failure
5511							goto done ## Successful completion.
5512							unless $this->write_cache($Data, $CacheFileName);
5513	0					0	chmod 0666, $CacheFileName; ## Liberal perms if possible.
5514
5515	8					41	done:
5516							return($WriteFileName);
5517							}
5518
5519							sub write_cache
5520							{
5521	8			8	0	13	my $this = shift;
5522	8					16	my ($Data, $CacheFileName) = @_;
5523
5524	8					37	$this->progress("Storing $CacheFileName...");
5525
5526	8					12	my $Success;
5527							eval
5528	8					17	{
5529	8					46	$Success = nstore($Data, $CacheFileName);
5530							};
5531
5532	8	50				2316	$this->progress("Stored $CacheFileName.") if $Success;
5533
5534	8					30	done:
5535							return($Success);
5536							}
5537
5538							sub write_file ## Just write; don't worry about cacheing
5539							{
5540	8			8	0	13	my $this = shift;
5541	8	50				41	my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
5542
5543	96					166	my($FileName, $FieldList, $Selection, $MaxRecords, $LineEnding, $FDelimiter, $QuoteFields, $ReturnMap, $ReturnEncoding, $MacRomanMap, $HeaderRow, $WriteExtension)
5544	8					99	= map {$this->getparam($Params, $_)}
5545							qw(_FileName _FieldList _Selection _MaxRecords _LineEnding _FDelimiter _QuoteFields _ReturnMap _ReturnEncoding _MacRomanMap _HeaderRow _WriteExtension);
5546
5547	8					22	my $Success;
5548
5549	8					22	$this->{_ErrorMsg} = "";
5550
5551							## if FileName is unspecified, or is the single character "-",
5552							## then default to STDOUT.
5553	8	50				50	$FileName = \ *STDOUT if ($FileName =~ /^-?$/);
5554
5555							## If we have a regular file handle, bless it into IO::File.
5556	8	50				24	$FileName = bless ($FileName, 'IO::File') if ref($FileName) =~ /(HANDLE)\|(GLOB)/;
5557
5558							## If we have a file handle either passed or constructed, make note of that fact.
5559	8					26	my $GotHandle = ref($FileName) eq 'IO::File';
5560
5561	8	50	33			49	$this->{_ErrorMsg} = "FileName must be specified for write()", goto done
5562							unless $GotHandle or length($FileName);
5563
5564							## Default for FieldList is all fields.
5565	8		33			111	$FieldList \|\|= $this->fieldlist();
5566
5567							## Default for Selection is all records.
5568	8		33			18258	$Selection \|\|= $this->selection();
5569
5570							## Default for MaxRecords is 0 (meaning write all records)
5571	8	50				37	$MaxRecords = 0 unless (int($MaxRecords) == $MaxRecords);
5572
5573							## Convert from optional "dos", "mac", "unix" symbolic values.
5574	8					30	$LineEnding = $this->lineending_string($LineEnding);
5575
5576							## Default for LineEnding is "\n" (CR on Mac; LF on Unix; CR/LF on DOS)
5577	8	50				39	$LineEnding = "\n" unless length($LineEnding);
5578
5579							## Default for FDelimiter is comma
5580	8	50				22	$FDelimiter = ',' unless length($FDelimiter);
5581
5582							## Default for QuoteFields is undef (auto)
5583	8	50				27	$QuoteFields = undef unless defined($QuoteFields);
5584
5585							## "QuoteCheck" mode means check each field -- this is the "auto"
5586							## mode that kicks in when _QuoteFields is undef.
5587	8					19	my $QuoteCheck = (!defined($QuoteFields));
5588
5589							## Default for ReturnMap is true.
5590	8	50				24	$ReturnMap = 1 unless defined($ReturnMap);
5591
5592							## Default for MacRomanMap is undef ("Auto");
5593	8	50				20	$MacRomanMap = undef unless defined($MacRomanMap);
5594
5595							## DoMacMapping is the actual setting for auto charset mapping
5596	8		66			60	my $DoMacMapping =
5597							((!defined($MacRomanMap) && ($LineEnding eq "\x0D")) \|\| ## Auto
5598							($MacRomanMap)); ## On
5599
5600	8	100				30	$this->progress("Will convert upper-ascii characters if any, from ISO-8859-1 to Mac Roman.") if $DoMacMapping;
5601
5602							## Default for ReturnEncoding is "\x0B" (control-K; ASCII 11)
5603	8	50				40	$ReturnEncoding = "\x0B" unless length($ReturnEncoding);
5604
5605							## Default for HeaderRow is true.
5606	8	50				22	$HeaderRow = 1 unless defined($HeaderRow);
5607
5608							## Default for $WriteExtension is "" (none) -- meaning use exact $FileName
5609	8	50				22	$WriteExtension = "" unless defined($WriteExtension);
5610
5611							## Get a hash of fields actually present...
5612
5613	8					30	my $AllFields = $this->fieldlist_all();
5614	8					18	my $AllFieldsHash = {}; @$AllFieldsHash{@$AllFields} = undef;
	8					33
5615
5616							## Cull $FieldList to only include fields we have...
5617
5618	8					16	$FieldList = [grep {exists($AllFieldsHash->{$_})} @$FieldList];
	33					328
5619
5620							## Ensure $Selection contains only valid record numbers...
5621
5622	8					36	$Selection = $this->selection_validate_internal($Selection);
5623
5624							## Get an ordered list of the columns.
5625	8					33	my $Columns = [@$this{@$FieldList}];
5626
5627							## Calculate the name of the file we'll write to, if any, and get
5628							## the file handle either from the one we were given or by opening
5629							## the specified file for writing.
5630
5631	8					15	my $WriteFileName;
5632							my $OutFile;
5633
5634	8	50				31	if ($GotHandle)
5635							{
5636	0					0	$WriteFileName = "";
5637	0					0	$OutFile = $FileName; ## Actually a handle.
5638							}
5639							else
5640							{
5641							## Calculate the name of the file we're going to try to write to.
5642	8	50				53	$WriteFileName =
5643
5644							## Use file name exactly if specified in write() call.
5645							($Params->{_FileName} ? $FileName :
5646
5647							## Otherwise, calculate the name by adding/appending the _WriteExtension.
5648							$this->write_file_name($FileName, $WriteExtension));
5649
5650							## Ensure the directory that will hold the file actually exists.
5651	29			29		367	use File::Basename qw(fileparse);
	29					69
	29					8520
5652	8					197	my ($Basename, $Path, $Ext) = fileparse($WriteFileName, '\.[^\.]+');
5653	8					34	my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
	8					29
5654	8		33			30	$Path \|\|= $Cur; ## Once again, default $Path to cwd just in case.
5655
5656	8	50				30	$this->{_ErrorMsg} = "Can't make directory $Path to save $WriteFileName: $!", goto done
5657							unless $this->verify_or_create_path($Path, $Sep);
5658
5659							## Ensure the directory is writeable and the file is overwriteable
5660							## if it exists.
5661	8	50				177	$this->{_ErrorMsg} = "Directory $Path is not writeable.", goto done
5662							unless (-w $Path);
5663
5664	8	50	33			679	$this->{_ErrorMsg} = "File $WriteFileName cannot be overwritten.", goto done
5665							if (-e $WriteFileName && !(-w $WriteFileName));
5666
5667							## Open the file for write.
5668
5669	29			29		231	use IO::File;
	29					72
	29					52340
5670	8					85	$OutFile = IO::File->new(">$WriteFileName");
5671	8	50				3103	$this->{_ErrorMsg} = "Failed to open $WriteFileName for writing: $!", goto done
5672							unless $OutFile;
5673
5674	8					47	$this->progress("Writing $WriteFileName...");
5675							}
5676
5677							## Figure out the line initiator & ender strings, and delimiter sequence.
5678
5679	8					16	my ($LineStartQuote, $Delim, $LineEndQuote);
5680	8	50				31	if ($QuoteFields)
5681							{
5682							## In "forced" quote mode, we just do the quoting by always
5683							## putting them at the start and end of lines and in between
5684							## each field.
5685
5686	0					0	$LineStartQuote = "\"";
5687	0					0	$LineEndQuote = "\"";
5688	0					0	$Delim = "\"$FDelimiter\"";
5689							}
5690							else
5691							{
5692							## In no-quote or auto-quote mode, we don't put the quotes in
5693							## these places; they'll either be omitted entirely or
5694							## inserted per-field.
5695
5696	8					27	$LineStartQuote = '';
5697	8					12	$LineEndQuote = '';
5698	8					15	$Delim = $FDelimiter;
5699							}
5700
5701							## Precompile a regex that checks for quotes, the field delimiter
5702							## sequence, the line ending sequence, or any line-ending-ish
5703							## characters at all. In $QuoteCheck mode, we'll use this to
5704							## identify fields needing double-quotes.
5705
5706	8	50				329	my $QuoteOrDelimCheck = qr{(?:\Q$LineEnding\E)\|(?:\Q$FDelimiter\E)\|[\"\x0D\x0A]} if $QuoteCheck;
5707
5708							## Precompile a return-map-checking regex that checks first for
5709							## the line ending actively in use and then for the platform's
5710							## "\n". These may be the same, but are not necessarily always
5711							## the same. We don't return-map any old \x0D or \x0A because,
5712							## for example in a pure Unix world, \x0D would not be interpreted
5713							## as line-ending-related and hence is a valid character in a
5714							## field.
5715
5716	8	50				137	my $ReturnMapCheck = qr{(?:\Q$LineEnding\E)\|(?:\n)} if $ReturnMap;
5717
5718							## Print out the header row that lists the field names in order.
5719
5720	8	50				29	if ($HeaderRow)
5721							{
5722	33					58	my $Line = ($LineStartQuote .
5723							join($Delim,
5724							map
5725							{
5726							## Quote any " character as ""
5727	8					21	(my $X = $_) =~ s/\"/\"\"/g;
5728
5729							## In QuoteCheck mode _QuoteFields =>
5730							## undef ("auto"): put quotes around
5731							## field only if required.
5732
5733	33	50	33			256	$X = "\"$X\"" if $QuoteCheck && $X =~ $QuoteOrDelimCheck;
5734
5735							## Convert returns back to \x0B
5736	33	50				140	$X =~ s/$ReturnMapCheck/$ReturnEncoding/g if $ReturnMap;
5737
5738	33					92	$X;
5739							} @$FieldList) .
5740							$LineEndQuote .
5741							$LineEnding);
5742
5743							## Maybe convert entire line (all records) ISO to Mac before writing it.
5744	8	100				48	&ISORoman8859_1ToMacRoman(\ $Line) if $DoMacMapping;
5745
5746	8	50				74	$OutFile->print($Line) if $HeaderRow;
5747							}
5748
5749							## Print out each row (record). Fields are output in $FieldList
5750							## order (same order as they were in the header row, if any).
5751							## Records are printed in the order specified in $Selection.
5752
5753	8					143	my $WroteProg;
5754	8					18	my $TotalLen = @$Selection+0;
5755	8	50				23	my $RecordsToWrite = ($MaxRecords ? min($MaxRecords, $TotalLen) : $TotalLen);
5756	8					14	my $RecordsWritten = 0;
5757
5758	8					20	foreach my $i (@$Selection)
5759							{
5760	99					321	my $Line = ($LineStartQuote .
5761							join($Delim,
5762							map
5763							{
5764							## Quote any " character as ""
5765	24					52	(my $X = $_->[$i]) =~ s/\"/\"\"/g;
5766
5767							## In QuoteCheck mode _QuoteFields =>
5768							## undef ("auto"): put quotes around
5769							## field only if required.
5770
5771	99	50	33			645	$X = "\"$X\"" if $QuoteCheck && $X =~ $QuoteOrDelimCheck;
5772
5773							## Convert returns back to \x0B
5774	99	50				371	$X =~ s/$ReturnMapCheck/$ReturnEncoding/g if $ReturnMap;
5775
5776	99					219	$X;
5777							} @$Columns) .
5778							$LineEndQuote .
5779							$LineEnding);
5780
5781							## Maybe convert entire line (all records) ISO to Mac before writing it.
5782	24	100				78	&ISORoman8859_1ToMacRoman(\ $Line) if $DoMacMapping;
5783
5784	24					93	$OutFile->print($Line);
5785
5786	24					120	$RecordsWritten++;
5787
5788							## Try doing timed (throttled to 1 per 2 secs) progress at
5789							## most every 100th record.
5790	24	50				193	my $Did = $this->progress_timed("Writing", $RecordsWritten, $RecordsWritten, $RecordsToWrite, 1)
5791							if (($RecordsWritten % 100) == 0);
5792	24		33			98	$WroteProg \|\|= $Did;
5793
5794							## Stop if we have written all the records we wanted.
5795	24	100				77	last if ($RecordsWritten >= $RecordsToWrite);
5796							}
5797
5798							## If we wrote timed progress but didn't get to give the 100%
5799							## message yet, print the 100% message now.
5800	8	50				26	if ($WroteProg)
5801							{
5802	0	0				0	my $FinalProg = $this->progress_timed("Writing", $RecordsWritten, $RecordsWritten, $RecordsToWrite, 1)
5803							unless (($RecordsWritten % 100) == 0);
5804							}
5805
5806							## Print the regular Done message.
5807	8	50				20	if ($GotHandle)
5808							{
5809	0					0	$this->progress("Done writing.");
5810	0					0	$Success = 1;
5811							}
5812							else
5813							{
5814	8					34	$this->progress("Wrote $WriteFileName.");
5815							}
5816
5817	8	50				27	if (!$GotHandle)
5818							{
5819							## Close the file and check the exit code.
5820	8					58	$OutFile->close();
5821	8					739	$Success = (($?>>8) == 0);
5822
5823	8	50				31	$this->{_ErrorMsg} = "Unexpected failure writing $WriteFileName ($!)", goto done
5824							unless $Success;
5825							}
5826
5827							done:
5828	8	50				23	$this->warn("FAILURE: $this->{_ErrorMsg}") unless $Success;
5829
5830	8	50				191	return($Success ? $WriteFileName : undef);
5831							}
5832
5833							sub write_file_name ## Calculate the name of the file to be written.
5834							{
5835	8			8	0	14	my $this = shift;
5836	8					17	my ($FileName, $WriteExtension) = @_;
5837
5838							## Break the path into its parts...
5839	29			29		259	use File::Basename qw(fileparse);
	29					69
	29					225715
5840	8					416	my ($Basename, $Path, $Ext) = fileparse($FileName, '\.[^\.]+');
5841
5842							## If no directory is explicitly named, set $Path to be the
5843							## implicit "current directory" (e.g. "./")
5844
5845	8					18	my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
	8					34
5846	8		33			31	$Path \|\|= $Cur;
5847
5848							## If $WriteExtension is empty, which is allowed, then the result
5849							## will be the same as $FileName, which could in some cases result
5850							## in the overwriting of the same file that was read in (and may
5851							## be what is intended).
5852
5853	8					26	my $WriteFileName = "$Path$Basename$WriteExtension$Ext";
5854
5855	8					22	return($WriteFileName);
5856							}
5857
5858							sub out
5859							{
5860	0			0	0	0	my $this = shift;
5861	0					0	my $Dest = shift;
5862
5863	0					0	my $Success;
5864
5865							## First do the formatting (or fail) -- format() will warn if needed.
5866	0	0				0	my $Data = $this->format(@_) or goto done;
5867
5868							## If $Dest is empty or not defined, use STDOUT.
5869	0		0			0	$Dest \|\|= \*STDOUT;
5870
5871							## If given an IO handle (such as \*STDERR), bless and use it.
5872	0	0				0	if (ref($Dest) eq 'HANDLE') {$Dest = bless($Dest , 'IO::File')};
	0					0
5873
5874							## If it is not an object, treat it as a file name to be opened.
5875	0	0	0			0	if (!ref($Dest)) {$Dest = (IO::File->new(">$Dest") or
	0					0
5876							$this->warn("Can't open file $Dest: $!"), goto done)};
5877
5878							## At this point treat $Dest as an object with a print method and
5879							## complain if the print method doesn't return a true value.
5880
5881	0	0				0	$Dest->print($$Data) or $this->warn("Had trouble writing file: $!"), goto done;
5882
5883	0					0	$Success = 1;
5884	0					0	done:
5885							return($Success);
5886							}
5887
5888							sub format ## use Data::ShowTable to format the table in a pretty way.
5889							{
5890	1			1	1	9	my $this = shift;
5891	1	50				5	my $Params = (@_ == 1 ? {_MaxRecords => $_[0]} : {@_});
5892
5893	1					2	my($Selection, $FieldList, $SortSpecs, $DefaultSortType, $MaxRecords, $MaxWidth) = map {$this->getparam($Params, $_)}
	6					14
5894							qw(_Selection _FieldList _SortSpecs _DefaultSortType _MaxRecords _MaxWidth);
5895
5896							## This method relies on Data::ShowTable.
5897	1	50				6	$this->warn("@{[__PACKAGE__]}::show() requires optional Data::ShowTable module."), goto done
	1					9
5898							unless $HaveShowTable;
5899
5900	0		0			0	$FieldList \|\|= $this->fieldlist();
5901	0		0			0	$Selection \|\|= $this->selection();
5902	0		0			0	$SortSpecs \|\|= {};
5903	0	0				0	$DefaultSortType = 'String' unless (length($DefaultSortType));
5904	0		0			0	$MaxRecords \|\|= 0; ## Default is no maximum (all records).
5905	0		0			0	$MaxWidth \|\|= 15; ## Zero or undef means use default.
5906	0					0	$MaxWidth = max(2, $MaxWidth); ## MaxWidth must not be less than 2
5907
5908	0					0	my $TypeMap = {qw(string char
5909							text text
5910							integer int
5911							number numeric
5912							boolean int)};
5913
5914	0	0	0			0	my $Types = [map {$TypeMap->{lc(@{$SortSpecs->{$_} \|\| {}}{SortType} \|\|
	0					0
5915							$DefaultSortType) } \|\| 'string'}
5916							@$FieldList];
5917
5918	0					0	my $TotalLen = @$Selection+0;
5919	0	0				0	my $RecordsToWrite = ($MaxRecords ? min($MaxRecords, $TotalLen) : $TotalLen);
5920
5921							## The row-yielder subroutine and its private state variables.
5922	0					0	my $SelNum = 0;
5923							my $RowSub = sub ## A closure over the local vars in this subroutine.
5924							{
5925							## We might be asked to rewind.
5926	0			0		0	my ($Rewind) = @_;
5927	0	0				0	$SelNum = 0, return(1) if ($Rewind);
5928
5929							## Done if we've written all rows.
5930	0	0				0	return() if $SelNum >= $RecordsToWrite;
5931
5932							## Otherwise, yield a row if we still can.
5933	0	0				0	my $List = [map
5934							{
5935							## Truncate as needed.
5936	0					0	my $X = (length > $MaxWidth ?
5937							(substr($_, 0, ($MaxWidth - 1)) . '>') : $_);
5938
5939							## Encode returns, tabs as carets.
5940	0					0	$X =~ s{(?:\x0D\x0A)\|[\x0D\x0A\x09]}{^}g;
5941
5942	0					0	$X;
5943							}
5944	0					0	@{$this->row_list($Selection->[$SelNum++], $FieldList)}];
5945
5946	0					0	return(@$List);
5947	0					0	};
5948
5949							## Locally replace put() and out() in Data::ShowTable so we can
5950							## gather the data into memory instead of having it go right out
5951							## to STDOUT before we may want it to.
5952
5953							## Too bad Data::ShowTable is not a subclassable object instead.
5954
5955	0					0	my $Data = [""]; ## Array to hold output from out() and put()
5956							{
5957	0					0	local *{Data::ShowTable::out} = sub ## See sub out in ShowTable.pm
5958							{
5959	0			0		0	my $fmt = shift;
5960	0	0				0	$fmt .= "\n" unless $fmt =~ /\n$/;
5961	0					0	$Data->[-1] .= sprintf($fmt, @_);
5962	0					0	push @$Data, "";
5963	0					0	};
5964
5965							local *{Data::ShowTable::put} = sub ## See sub put in ShowTable.pm
5966							{
5967	0			0		0	my $fmt = shift();
5968	0					0	$Data->[-1] .= sprintf($fmt, @_);
5969	0					0	};
5970
5971	0					0	&ShowBoxTable({titles => $FieldList,
5972							types => $Types,
5973							row_sub => $RowSub,
5974							widths => [], ## Will calculate from the data
5975							});
5976							}
5977
5978							## Remove spurious extra newline entries at end of $Data
5979	0	0				0	pop @$Data if $Data->[-1] =~ /^\s*$/s;
5980	0	0				0	pop @$Data if $Data->[-1] =~ /^\s*$/s;
5981
5982	0					0	my $Formatted = join("", @$Data);
5983
5984	1					5	done:
5985							return(\ $Formatted);
5986							}
5987
5988
5989							=pod
5990
5991							=head1 APPENDING / MERGING / JOINING TABLES
5992
5993							## Append all records from a second table
5994
5995							$t->append($Other) ## Append records from $Other
5996							$t->append_file($File, $Params) ## Append from new($Params, $File)
5997							$t->append_files($Files, $Params) ## Call append_file for all files
5998							$t->append_files_new($Files, $Params) ## Internal helper routine
5999
6000							## Combine all fields from a second table
6001
6002							$t->combine($Other) ## Combine fields from $Other
6003							$t->combine_file($File, $Params) ## Combine new($Params, $File)
6004							$t->combine_files($Files, $Params) ## combine_file on each file
6005
6006							## Left-join records from a second table (lookup field vals)
6007
6008							$t->join ($Other, $KeyField1, [$KeyField2, $Fields])
6009							$t->join_file ($File, $Params, $KeyField1, [$KeyField2, $Fields])
6010							$t->join_files($Files, $Params, $KeyField1, [$KeyField2, $Fields])
6011
6012							The append() method concatenates all the records from two CTable
6013							objects together -- even if the two tables didn't start out with
6014							exactly the same fields (or even any of the same fields).
6015
6016							It takes all the data records from another CTable object and appends
6017							them into the present table. Any columns present in the $Other table
6018							but not in the first table, are created (and the corresponding field
6019							values in the first table will all be empty/undef). Similarly, any
6020							columns present in $t but not present in $Other will be extended
6021							to the correct new length as necessary and the field values in the
6022							original columns will be empty/undef. Columns present in both will,
6023							of course, have all the data from both the original sets of data.
6024
6025							All data from the second table is brought into the first one. No
6026							attempt whatsoever is made to eliminate any duplicate records that
6027							might result.
6028
6029							The number of records (length()) after this call is the sum of the
6030							length() of each of the tables before the operation.
6031
6032							IMPORTANT NOTE: The data from the $Other object is COPIED in memory
6033							into the new object. This could be hard on memory if $Other is big.
6034							Might want to be sure to discard $Other when you're done with it.
6035
6036							$Other is left untouched by the operation.
6037
6038							All columns from both tables are combined whether or not they are
6039							mentioned in the custom field list of either.
6040
6041							The custom field lists, if present in either table object, are
6042							concatenated into this object's custom field list, but with
6043							duplications eliminated, and order retained.
6044
6045							Any existing custom selections, custom sort order, sort specs, and/or
6046							sort routines are also combined appropriately, with settings from this
6047							object taking precedence over those from $Other anywhere the two have
6048							conflicting settings.
6049
6050							append_file() takes a file name and optional $Params hash. It uses
6051							those to create a new() object with data read from the file. Then,
6052							the new table is appended to $t using append() and then the new table
6053							is discarded.
6054
6055							append_files() is a convenience function that calls append_file() on
6056							each file in a list, using the same optional $Params for each.
6057
6058							append_files_new() is the internal routine that implements the
6059							processing done by new() on the optional list of files to be read. It
6060							does the following: It calls read() on the first file in the list.
6061							Then, it calls append_files() to read the remaining into their own
6062							new() objects of the same class as $t and using the same $Params to
6063							new() (if any were supplied). Then each of these is append()-ed in
6064							turn to $t and discarded. The final result will be that $t will hold
6065							a concatenation of all the data in all the files mentioned. However,
6066							consistent with the behavior of append(), the _FileName parameter and
6067							other read()-controlled settings will correspond to the first file
6068							read. The intermediate objects are discarded.
6069
6070							NOTE: As with new() and read(), if a non-empty _FieldList Param is
6071							specified, the read() methods called internally by the append_file*()
6072							methods will read only the fields mentioned and will ignore any other
6073							fields in the files.
6074
6075							=head2 Combining tables
6076
6077							combine() adds columns from a second table into the current one.
6078
6079							CAUTION: You should only use combine() when you have two tables where
6080							all the (possibly selected) records in the second table line up
6081							perfectly with all the (unselected) records in the first table -- in
6082							other words, each table before combine() should contain a few of the
6083							columns of the new table -- for example, maybe one table contains a
6084							column of file names, and the other contains columns of corresponding
6085							file sizes and modification times. If you don't understand the
6086							consequences of combine, don't use it or you could end up with some
6087							records whose field values don't refer to the same object. (Maybe you
6088							meant to use append() or join() instead.)
6089
6090							If the second table has a custom field list, only those columns are
6091							brought in.
6092
6093							If any column in the second table has the same name as one in the
6094							current table, the incoming column replaces the one by the same name.
6095
6096							All columns are COPIED from the second table, so the first table owns
6097							the new data exclusively.
6098
6099							If the second table has a selection, only those records are copied, in
6100							selection order. (select_all() first if that's not what you want.)
6101
6102							The selection in the first table, if any, is ignored during the
6103							combine. If this isn't what you want, then consider using cull()
6104							before combine().
6105
6106							Field list and sort order are concatenated (but retaining uniqueness:
6107							second mentions of a field in the combined lists are omitted).
6108
6109							Custom sort routines and sort specs are combined, with those in the
6110							first table taking precedence over any copied in with the same name.
6111
6112							The custom _Selection from the first table, if any, is retained. (It
6113							will initially omit any records added by extend()).
6114
6115							All other parameters from the first table are retained, and from the
6116							second table are ignored.
6117
6118							combine() calls extend() after combining to ensure that all columns
6119							have the same length: if either the older or newer columns were
6120							shorter, they will all be set to the length of the longest columns in
6121							the table -- creating some empty field values at the end of the
6122							lengthened columns.
6123
6124							combine_file() does the same as combine() except starting with a file
6125							name, first creating the $Other object by creating it using
6126							new($Params, $File), then discarding it after combining.
6127
6128							=head2 Joining tables (Looking up data from another table)
6129
6130							join() looks up field values from a second table, based on common
6131							values in key fields which may have different or the same names in
6132							each table. It adds columns to the current table if necessary to hold
6133							any new field values that must be brought in.
6134
6135							join() never adds any new or non-matching records to the table:
6136							records where the lookup fails will simply have empty/undef values in
6137							the corresponding columns.
6138
6139							## Example:
6140
6141							$t->join ($People, 'FullName', 'FirstAndLast'); ## or
6142							$t->join_file("People.txt", {}, 'FullName', 'FirstAndLast');
6143
6144
6145							Here's how join() calculates the list of fields to bring in:
6146
6147							- Legal field names from the optional $Fields list, if supplied
6148							- Otherwise, the fieldlist() from second table
6149							- ... minus any fields with same name as $KeyField1 or $KeyField2
6150
6151							Join starts by adding new empty columns in the first table for any
6152							field to be brought in from the second but not yet present in the
6153							first.
6154
6155							Here's how join() calculates the records eligible for lookup:
6156
6157							- Join only modifies the selected records in the first table
6158							- Join only looks up values from selected records in second table
6159
6160							(If you want all records to be used in both or either table, call the
6161							table's select_all() method before calling join().)
6162
6163							Then, for every selected record in $t (using the example above), join
6164							examines the FullName field ($KeyField1), and looks up a corresponding
6165							entry (must be 'eq') in the FirstAndLast field ($KeyField2) in the
6166							second table.
6167
6168							IMPORTANT NOTE ABOUT KEY LENGTH: To speed lookup, hash-based indices
6169							are made. The strings in $Key1 and $Key2 fields should not be so long
6170							that the hash lookups bog down or things could get ugly fast. There
6171							is no fixed limit to hash key length in Perl, but fewer than 128
6172							characters in length is longer than customary for such things. (Many
6173							systems require text-based keys to be no longer than 31 characters.)
6174							So be judicious about the values in $Key1 and $Key2 fields.
6175
6176							The first record found in the second table's selection with a matching
6177							value in the key field is then copied over (but only the appropriate
6178							fields are copied, as explained above). Any field values being
6179							brought over will REPLACE corresponding field values in the first
6180							table, possibly overwriting any previous values if the field being
6181							looked up was already present in the first table and contained data.
6182
6183							The first table's _FieldList is updated to reflect new fields added.
6184
6185							Its _Selection is untouched.
6186
6187							Its _SortOrder is untouched.
6188
6189							Its _SortSpecs are augmented to include any entries from the second
6190							table that should be brought over due to the field additions.
6191
6192							Its _SRoutines are augmented to add new ones from the second table.
6193
6194							All other parameters of table 1 are untouched.
6195
6196							The second table is not modified. No data structures will be shared
6197							between the tables. Data is only copied.
6198
6199							join_file() calls join() after creating a seond table from your $File.
6200
6201							join_files() calls join_file() repeatedly for each file in a list, but
6202							it is important to note that each file in the list of files to be
6203							joined must have a $Key2 field -- AND, that any values looked up from
6204							the second file will overwrite any values of the same key found in the
6205							first file, and so on. You probably will not ever need join_files().
6206							It is mainly here for completeness.
6207
6208							=cut
6209
6210							{}; ## Get emacs to indent correctly.
6211
6212							sub append ## ($this, $OtherCTable)
6213							{
6214	10			10	0	44	my $this = shift;
6215	10					14	my ($that) = @_;
6216
6217	10					10	my $Success;
6218
6219							## Get all fields in $this, but only selected ones in $that
6220	10					26	my $ThisFieldsAll = $this->fieldlist_all();
6221	10					25	my $ThatFields = $that->fieldlist();
6222
6223							## Figure out how many data fields in each of the tables.
6224	10					20	my $ThisFieldCount = @$ThisFieldsAll+0;
6225	10					18	my $ThatFieldCount = @$ThatFields +0;
6226
6227							## We're going to bring over only the selected records in $that.
6228	10					24	my $ThisSel = $this->selection();
6229	10					27	my $ThatSel = $that->selection();
6230
6231							## Figure out how many records there were to start with.
6232	10					23	my $ThisLength = $this->length();
6233	10					17	my $ThatLength = @$ThatSel+0;
6234
6235							## New record count is sum of the other two.
6236	10					16	my $NewLength = $ThisLength + $ThatLength;
6237
6238							## Create any missing columns not yet present in $this and,
6239							## whether new or not, presize all vectors to the new length,
6240							## which will create empty/undef entries as necessary.
6241
6242	10		50			26	foreach (@$ThisFieldsAll, @$ThatFields) {$#{$this->{$_} \|\|= []} = ($NewLength - 1)};
	72					83
	72					270
6243
6244							## Then copy the field data from the second table into the already
6245							## pre-sized columns in this one.
6246
6247	10					23	foreach my $FieldName (@$ThatFields)
6248							{
6249	34					47	my $NewVector = $this->{$FieldName};
6250	34					679	my $OldVector = $that->sel_get($FieldName, $ThatSel);
6251
6252	34					75	foreach my $RecordNum (0..$#$OldVector)
	86					227
6253							{($NewVector->[$ThisLength + $RecordNum] =
6254							$OldVector->[ $RecordNum])};
6255							}
6256
6257							## Now all the data columns have been combined. We just have to
6258							## combine any custom metadata.
6259
6260							## If either table had a custom fieldlist, then make a new custom
6261							## field list which is the result of concatenating both field
6262							## lists together, without duplicates, and of course preserving
6263							## the original order as completely as possible (with the order
6264							## given in the first table taking precedence).
6265
6266	10	50	33			42	if (defined($this->{_FieldList}) \|\|
6267							defined($that->{_FieldList}))
6268
6269							{
6270	10					26	my $ThisFields = $this->fieldlist();
6271	10					25	my $ThatFields = $that->fieldlist();
6272
6273							## Make a hash mapping field names from both tables to the
6274							## order they should appear
6275
6276	10					16	my $FieldOrderHash = {};
6277	10		66			25	foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} \|\|= (keys %$FieldOrderHash) + 1};
	69					285
6278
6279	10					42	my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
	44					82
6280
6281	10					41	$this->{_FieldList} = $FieldList;
6282							}
6283
6284							## If either table had a custom sortorder, then make a new custom
6285							## sort order which is the result of concatenating both orders
6286							## together, without duplicates, and of course preserving the
6287							## original order as completely as possible (with the order given
6288							## in the first table taking precedence).
6289
6290	10	50	33			59	if (defined($this->{_SortOrder}) \|\|
6291							defined($that->{_SortOrder}))
6292							{
6293	0					0	my $ThisOrder = $this->sortorder();
6294	0					0	my $ThatOrder = $that->sortorder();
6295
6296							## Make a hash mapping field names from both lists to the
6297							## order they should appear
6298
6299	0					0	my $OrderHash = {};
6300	0		0			0	foreach (@$ThisOrder, @$ThatOrder) {$OrderHash->{$_} \|\|= (keys %$OrderHash) + 1};
	0					0
6301
6302	0					0	my $OrderList = [sort {$OrderHash->{$a} <=> $OrderHash->{$b}} keys %$OrderHash];
	0					0
6303
6304	0					0	$this->{_SortOrder} = $OrderList;
6305							}
6306
6307							## If either table had a custom selection, then create a new
6308							## selection which is the concatenation of the two selections.
6309
6310	10	100	100			3393	if (defined($this->{_Selection}) \|\|
6311							defined($that->{_Selection}))
6312							{
6313	3					10	$this->{_Selection} = [@$ThisSel, ## Original selected records...
6314							## Plus an adjusted entry for newly-added ones.
6315							($ThisLength .. ($ThisLength + @$ThatSel - 1))
6316							];
6317							}
6318
6319							## If either table had custom sortspecs, then create a new
6320							## sortspecs hash by starting with all the entries from the other
6321							## table and adding/overwriting with those from this table.
6322
6323	10	50	33			64	if (defined($this->{_SortSpecs}) \|\|
6324							defined($that->{_SortSpecs}))
6325							{
6326	10					39	my $ThisSpecs = $this->sortspecs();
6327	10					25	my $ThatSpecs = $that->sortspecs();
6328
6329	10					87	$this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6330							}
6331
6332							## If either table had custom sortroutines, then create a new
6333							## sortroutines hash by starting with all the entries from the
6334							## other table and adding/overwriting with those from this table.
6335
6336	10	50	33			44	if (defined($this->{_SRoutines}) \|\|
6337							defined($that->{_SRoutines}))
6338							{
6339	10		50			31	my $ThisRoutines = $this->{_SRoutines} \|\| {};
6340	10		50			26	my $ThatRoutines = $that->{_SRoutines} \|\| {};
6341
6342	10					29	$this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6343							}
6344
6345							## All other settings are kept from $this and those from $that are
6346							## ignored.
6347
6348	10					17	$Success = 1;
6349	10					48	done:
6350							return($Success);
6351							}
6352
6353							sub append_file
6354							{
6355	2			2	0	3	my $this = shift;
6356	2					4	my ($FileName, $Params) = @_;
6357
6358	2					3	my $Success;
6359
6360							## $Params argument is optional. If supplied, it must be a hash.
6361	2		100			10	$Params \|\|= {};
6362
6363							## Create a new empty table object of the same class as $this and
6364							## read just the specified file into it. (note: this could be a
6365							## recursive call here).
6366
6367	2	50				16	my $that = ref($this)->new($Params, $FileName) or goto done;
6368
6369							## Append the data from $that table into this one.
6370
6371	2	50				8	$this->append($that) or goto done;
6372
6373	2					4	$Success = 1;
6374	2					31	done:
6375							return($Success);
6376							}
6377
6378							sub append_files
6379							{
6380	126			126	0	214	my $this = shift;
6381	126					318	my ($FileNames, $Params) = @_;
6382
6383	126					168	my $Success;
6384
6385	126					350	foreach my $FileName (@$FileNames)
6386							{
6387	2	50				7	goto done unless $this->append_file($FileName, $Params);
6388							}
6389
6390	126					211	$Success = 1;
6391	126					483	done:
6392							return($Success);
6393							}
6394
6395							sub append_files_new ## Called by new() to process its file name args.
6396							{
6397	126			126	0	240	my $this = shift;
6398	126					204	my ($FileNames, $Params) = @_;
6399
6400	126					165	my $Success;
6401
6402							## First we read the first file, if any, into this object using
6403							## the read() method.
6404
6405	126					228	my $FirstFile = shift @$FileNames;
6406	126	100				415	if (defined($FirstFile))
6407							{
6408	74	100				122	goto done unless $this->read(%{$Params \|\| {}}, _FileName => $FirstFile);
	74	50				707
6409							}
6410
6411	126	50				522	goto done unless $this->append_files($FileNames, $Params);
6412
6413	126					407	$Success = 1;
6414	126					1970	done:
6415							return($Success);
6416							}
6417
6418							sub combine
6419							{
6420	6			6	0	17	my $this = shift;
6421	6					12	my ($that) = @_;
6422
6423	6					9	my $Success;
6424
6425							## Get a snapshot of field lists before any modifications.
6426	6					17	my $ThisFields = $this->fieldlist();
6427	6					19	my $ThatFields = $that->fieldlist();
6428
6429							## Bring in all (listed) fields from other table.
6430	6					14	my $IncomingFields = $ThatFields;
6431
6432							## Preserve any previous non-selection and force one to be saved
6433							## in the interim. This will prevent $that->sel() from
6434							## recalculating the selection each time if there is none.
6435	6					15	my $OldSel = $that->{_Selection};
6436	6					20	$that->{_Selection} = $that->selection(); ## Might be a no-op
6437
6438							## Copy columns from $that in selection order; (re)place into $this
6439	6					20	foreach (@$IncomingFields) {$this->col_set($_, $that->sel($_))};
	12					50
6440
6441							## Restore the possibly-undef selection in other table.
6442	6					20	$that->{_Selection} = $OldSel; ## Might be a no-op
6443
6444							## Extend any short columns (whether originating from other table
6445							## or from this one) to be the same length as all others.
6446
6447	6					25	$this->extend();
6448
6449							## If either table had a custom fieldlist, then make a new custom
6450							## field list which is the result of concatenating both field
6451							## lists together, without duplicates, and of course preserving
6452							## the original order as completely as possible (with the order
6453							## given in the first table taking precedence).
6454
6455	6	50	33			28	if (defined($this->{_FieldList}) \|\|
6456							defined($that->{_FieldList}))
6457							{
6458							## Make a hash mapping field names from both tables to the
6459							## order they should appear
6460	6					12	my $FieldOrderHash = {};
6461	6		66			16	foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} \|\|= (keys %$FieldOrderHash) + 1};
	47					165
6462
6463	6					28	my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
	49					84
6464
6465	6					25	$this->{_FieldList} = $FieldList;
6466							}
6467
6468							## If either table had a custom sortorder, then make a new custom
6469							## sort order which is the result of concatenating both orders
6470							## together, without duplicates, and of course preserving the
6471							## original order as completely as possible (with the order given
6472							## in the first table taking precedence).
6473
6474	6	50	33			37	if (defined($this->{_SortOrder}) \|\|
6475							defined($that->{_SortOrder}))
6476							{
6477	0					0	my $ThisOrder = $this->sortorder();
6478	0					0	my $ThatOrder = $that->sortorder();
6479
6480							## Make a hash mapping field names from both lists to the
6481							## order they should appear
6482
6483	0					0	my $OrderHash = {};
6484	0		0			0	foreach (@$ThisOrder, @$ThatOrder) {$OrderHash->{$_} \|\|= (keys %$OrderHash) + 1};
	0					0
6485
6486	0					0	my $OrderList = [sort {$OrderHash->{$a} <=> $OrderHash->{$b}} keys %$OrderHash];
	0					0
6487
6488	0					0	$this->{_SortOrder} = $OrderList;
6489							}
6490
6491							## If either table had custom sortspecs, then create a new
6492							## sortspecs hash by starting with all the entries from the other
6493							## table and adding/overwriting with those from this table.
6494
6495	6	50	33			1165	if (defined($this->{_SortSpecs}) \|\|
6496							defined($that->{_SortSpecs}))
6497							{
6498	6					28	my $ThisSpecs = $this->sortspecs();
6499	6					18	my $ThatSpecs = $that->sortspecs();
6500
6501	6					24	$this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6502							}
6503
6504							## If either table had custom sortroutines, then create a new
6505							## sortroutines hash by starting with all the entries from the
6506							## other table and adding/overwriting with those from this table.
6507
6508	6	50	33			27	if (defined($this->{_SRoutines}) \|\|
6509							defined($that->{_SRoutines}))
6510							{
6511	6		50			21	my $ThisRoutines = $this->{_SRoutines} \|\| {};
6512	6		50			17	my $ThatRoutines = $that->{_SRoutines} \|\| {};
6513
6514	6					18	$this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6515							}
6516
6517							## All other settings are kept from $this and those from $that are
6518							## ignored.
6519
6520	6					10	$Success = 1;
6521	6					30	done:
6522							return($Success);
6523							}
6524
6525							sub combine_file
6526							{
6527	5			5	0	52	my $this = shift;
6528	5					10	my ($FileName, $Params) = @_;
6529
6530	5					7	my $Success;
6531
6532							## $Params argument is optional. If supplied, it must be a hash.
6533	5		50			15	$Params \|\|= {};
6534
6535							## Create a new empty table object of the same class as $this and
6536							## read just the specified file into it.
6537
6538	5	50				41	my $that = ref($this)->new($Params, $FileName) or goto done;
6539
6540							## Combine the data from $that table into this one.
6541
6542	5	50				29	$this->combine($that) or goto done;
6543
6544	5					14	$Success = 1;
6545	5					76	done:
6546							return($Success);
6547							}
6548
6549							sub combine_files
6550							{
6551	1			1	0	15	my $this = shift;
6552	1					1	my ($FileNames, $Params) = @_;
6553
6554	1					2	my $Success;
6555
6556	1					3	foreach my $FileName (@$FileNames)
6557							{
6558	1	50				10	goto done unless $this->combine_file($FileName, $Params);
6559							}
6560
6561	1					3	$Success = 1;
6562	1					5	done:
6563							return($Success);
6564							}
6565
6566							sub join
6567							{
6568	5			5	0	20	my $this = shift;
6569	5					10	my ($that, $Key1, $Key2, $Fields) = @_;
6570
6571	5					7	my $Success;
6572
6573							## $Key1 is required.
6574	5	50				14	$this->warn("Key1 is required for join()"), goto done
6575							unless ($Key1);
6576
6577							## $Key2 defaults to the same as $Key1.
6578	5		33			11	$Key2 \|\|= $Key1;
6579
6580							## The fields we'll be getting can optionally be overridden by
6581							## caller; otherwise, they're the field list of the other table.
6582	5		33			27	my $IncomingFields = $Fields \|\| $that->fieldlist();
6583
6584							## Cull $Key1 and $Key2 from the incoming field list.
6585	5	50				13	$IncomingFields = [grep {($_ ne $Key1) && ($_ ne $Key2)} @$IncomingFields];
	26					222
6586
6587							## Preserve any previous non-selection and force one to be saved
6588							## in the interim. This will prevent $that->sel() from
6589							## recalculating the selection each time if there is none.
6590	5					11	my $OldSel = $that->{_Selection};
6591	5					13	$that->{_Selection} = $that->selection(); ## Might be a no-op
6592
6593							## Make an index mapping values in $Key2 to record numbers in
6594							## $that. We reverse the order of insertion into the $Index
6595							## because we want the items earliest in selection order to have
6596							## precedence in case keys are not unique as they should be.
6597
6598	5					8	my $Index = {}; @$Index{reverse @{$that->sel($Key2)}} = reverse @{$that->{_Selection}};
	5					10
	5					15
	5					13
6599
6600							## Get a list of record numbers in $this that we'll be copying data into.
6601	5					17	my $Recs1 = $this->selection();
6602
6603							## Get a corresponding list of keys we're going to look up.
6604	5					11	my $Key1s = $this->sel($Key1);
6605
6606							## The default "record number" in table 2 is $that->length()
6607							## ... i.e. an invalid record number past the end of the table.
6608							## This will ensure that failed lookups result in lookups to this
6609							## illegal record number, correctly producing undef in the
6610							## corresponding joined fields, whereas looking up "undef" would
6611							## have produced record number zero.
6612
6613	5					14	my $DefaultRecNum = $that->length();
6614
6615							## Look up @$Key1s in @$Index to get a list of data-source record
6616							## numbers in $that. Failed lookups map to $DefaultRecNum.
6617
6618	5	100				90	my $Recs2 = [map {defined() ? $_ : $DefaultRecNum} @$Index{@$Key1s}];
	15					56
6619
6620							## Copy data into selected positions within columns of $this, one
6621							## column at a time, creating pre-sized columns in $this as
6622							## necessary (col()). These array slice operations are very, very
6623							## fast.
6624
6625	5					13	foreach my $Field (@$IncomingFields)
6626							{
6627	21					34	(@{$this->col($Field)}[@$Recs1] = ## Put values into selected records of $this
	21					40
6628	21					24	@{$that->col($Field)}[@$Recs2]); ## Get values from looked-up records of $that
6629							}
6630
6631							## Restore the possibly-undef selection in other table.
6632	5					11	$that->{_Selection} = $OldSel; ## Might be a no-op
6633
6634							## If this table had a custom fieldlist, then make a new custom
6635							## field list which is the result of concatenating both field
6636							## lists together, without duplicates, and of course preserving
6637							## the original order as completely as possible (with the order
6638							## given in the first table taking precedence).
6639
6640	5	50				17	if (defined($this->{_FieldList}))
6641							{
6642	5					12	my $ThisFields = $this->fieldlist();
6643	5					7	my $ThatFields = $IncomingFields;
6644
6645							## Make a hash mapping field names from both tables to the
6646							## order they should appear
6647
6648	5					7	my $FieldOrderHash = {};
6649	5		66			10	foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} \|\|= (keys %$FieldOrderHash) + 1};
	56					166
6650
6651	5					20	my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
	62					79
6652
6653	5					22	$this->{_FieldList} = $FieldList;
6654							}
6655
6656							## If either table had custom sortspecs, then create a new
6657							## sortspecs hash by starting with all the entries from the other
6658							## table and adding/overwriting with those from this table.
6659
6660	5	50	33			26	if (defined($this->{_SortSpecs}) \|\|
6661							defined($that->{_SortSpecs}))
6662							{
6663	5					14	my $ThisSpecs = $this->sortspecs();
6664	5					13	my $ThatSpecs = $that->sortspecs();
6665
6666	5					17	$this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6667							}
6668
6669							## If either table had custom sortroutines, then create a new
6670							## sortroutines hash by starting with all the entries from the
6671							## other table and adding/overwriting with those from this table.
6672
6673	5	50	33			19	if (defined($this->{_SRoutines}) \|\|
6674							defined($that->{_SRoutines}))
6675							{
6676	5		50			26	my $ThisRoutines = $this->{_SRoutines} \|\| {};
6677	5		50			13	my $ThatRoutines = $that->{_SRoutines} \|\| {};
6678
6679	5					13	$this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6680							}
6681
6682							## All other settings are kept from $this and those from $that are
6683							## ignored.
6684
6685	5					8	$Success = 1;
6686	5					30	done:
6687							return($Success);
6688							}
6689
6690							sub join_file
6691							{
6692	0			0	0	0	my $this = shift;
6693	0					0	my ($FileName, $Params, $Key1, $Key2, $Fields) = @_;
6694
6695	0					0	my $Success;
6696
6697							## $Params argument may be undef. If supplied, it must be a hash.
6698	0		0			0	$Params \|\|= {};
6699
6700							## Create a new empty table object of the same class as $this and
6701							## read just the specified file into it.
6702
6703	0	0				0	my $that = ref($this)->new($Params, $FileName) or goto done;
6704
6705							## Join the data from $that table into this one.
6706
6707	0	0				0	$this->join($that, $Key1, $Key2, $Fields) or goto done;
6708
6709	0					0	$Success = 1;
6710	0					0	done:
6711							return($Success);
6712							}
6713
6714							sub join_files
6715							{
6716	0			0	0	0	my $this = shift;
6717	0					0	my ($FileNames, $Params, $Key1, $Key2, $Fields) = @_;
6718
6719	0					0	my $Success;
6720
6721	0					0	foreach my $FileName (@$FileNames)
6722							{
6723	0	0				0	goto done unless $this->join_file($FileName, $Params, $Key1, $Key2, $Fields);
6724							}
6725
6726	0					0	$Success = 1;
6727	0					0	done:
6728							return($Success);
6729							}
6730
6731							=pod
6732
6733							=head1 INVERTING A TABLE'S ROWS/COLUMNS
6734
6735							## Re-orient table's data using vals from $ColName as field names...
6736							$t-invert($ColName)
6737
6738							Sometimes a situation gives you a table that's initially organized
6739							with column data in rows, and field names in one of the columns, so
6740							you need to flip the table in order to be able to work meaningfully
6741							with it.
6742
6743							"Inverting" a table means to rewrite each row as a column. One row is
6744							designated to be used as the field names.
6745
6746							For example, consider this table:
6747
6748							F01 F02 F03 F04
6749							------------------------
6750							First Chris Agnes James
6751							Last Bart Marco Nixon
6752							Age 22 33 44
6753
6754							Calling invert() using field names from "F01"...
6755
6756							$t->invert('F01');
6757
6758							... would change the table to look like this:
6759
6760							First Last Age
6761							----------------
6762							Chris Bart 22
6763							Agnes Marco 33
6764							James Nixon 44
6765
6766							The field F01 which formerly contained the field names, is now gone,
6767							and the remaining data columns have been converted from their old row
6768							orientation into a column orientation.
6769
6770							=cut
6771
6772							sub invert
6773							{
6774	0			0	0	0	my $this = shift;
6775	0					0	my ($HeaderField) = @_;
6776
6777	0					0	my $Success;
6778
6779							## Get new field names from an existing column and delete it at the same time.
6780	0	0				0	my $NewColNames = $this->col_delete($HeaderField) or
6781							$this->warn("Invalid field name given to invert() method"), goto done;
6782
6783							## Get a hash of all existing (remaining) data columns.
6784	0					0	my $OldColNames = $this->fieldlist_all();
6785	0					0	my $OldCols = $this->cols_hash($OldColNames);
6786
6787							## Make the new columns...
6788	0					0	my $NewCols = [map {$this->row_list($_, $OldColNames)} (0..$#$NewColNames)];
	0					0
6789
6790							## Delete old columns from the object.
6791	0					0	delete @$this{@$OldColNames};
6792
6793							## Add new columns
6794	0					0	@$this{@$NewColNames} = @$NewCols;
6795
6796							## Set the field name list...
6797	0					0	$this->{_FieldList} = $NewColNames;
6798
6799	0					0	$Success = 1;
6800	0					0	done:
6801							return($Success);
6802							}
6803
6804							=pod
6805
6806							=head1 PROGRESS MESSAGES
6807
6808							## Printing a progress message....
6809
6810							$t->progress($Msg) ## Print a message per current settings
6811
6812							## Progress settings applying to this object only...
6813
6814							$t->progress_get() ## Get current progress setting
6815
6816							$t->progress_set(1) ## Use progress_default() method
6817							$t->progress_set($Sub) ## Set a custom progress routine
6818							$t->progress_set(0) ## Disable progress
6819							$t->progress_set(undef) ## Use class's settings (default)...
6820
6821							## Class's settings (for instances with _Progress == undef)
6822
6823							$t->progress_class() ## Get current setting.
6824
6825							$t->progress_class(1) ## Use progress_default() method
6826							$t->progress_class($Sub) ## Set shared custom prog routine
6827							$t->progress_class(0) ## Disable class-default progress
6828
6829							Data::CTable->progress_class(..) ## Call without an object
6830
6831							## Call builtin default progress method regardless of settings
6832
6833							$t->progress_default($Msg) ## Default prog. routine for class
6834
6835							## Generate a warning (used internally by other methods)
6836
6837							$t->warn($Msg) ## In this class, calls progress_default()
6838
6839							## Timed progress: print msg to start, then at most once/2 sec
6840
6841							$t->progress_timed($Op, $Item) ## Re-print msg every 2 sec
6842							$t->progress_timed($Op, $Item, $Pos, $Tot) ##... with % readout
6843							$t->progress_timed($Op, $Item, $Pos, $Tot, $Wait) ## Not 1st x
6844
6845							$t->progress_timed_default($Msg) ## Called by progress_timed
6846
6847							Data::CTable is especially useful in creating batch-oriented
6848							applications for processing data. As such, routines that may perform
6849							time-consuming tasks will, by default, generate helpful progress
6850							messages. The progress mechanism is highly customizable, however, to
6851							suit the needs of applications that don't require this output, or that
6852							require the output to go somewhere other than STDERR or the console.
6853
6854							The default progress routine is one that prints a message with a
6855							date/time stamp to STDERR if and only if STDERR is an interactive
6856							terminal, and otherwise is silent.
6857
6858							You could write a custom progress routine that does something else or
6859							something in addition (e.g. logs to a file or syslog). The custom
6860							routine could either be implemented by overriding the
6861							progress_default() method in a subclass, or by calling progress_set()
6862							in any instance.
6863
6864							The custom progress routine, if any, is stored in the _Progress
6865							parameter of the object. But use progress_set() and progress_get() to
6866							access it.
6867
6868							The interface for your custom progress routine should be:
6869
6870							sub MyProgress {my ($Obj, $Message) = @_; chomp $Message; .....}
6871
6872							In other words, the routine takes a single message which may or may
6873							not have a trailing newline. It should always chomp the newline if
6874							present, and then do its business... which generally will include
6875							printing or logging a message (usually with a newline added).
6876
6877							The default, built-in progress routine for Data::CTable is:
6878
6879							sub progress_default
6880							{
6881							my ($this, $msg) = @_;
6882							chomp $msg;
6883
6884							print STDERR (localtime() . " $msg\n") if -t STDERR;
6885
6886							return(1); ## Indicate progress actually completed
6887							}
6888
6889							Of course, you are free to call this method directly at any time, and
6890							it will do its thing regardless of other progress-enabling settings.
6891							But the preferred way is to first set the settings and then call
6892							progress().
6893
6894							The warn() method always calls progress_default() -- i.e. warnings
6895							will display even if progress is otherwise disabled or overridden at
6896							the object or class level. However, you could create a subclass that
6897							changes warn()'s behavior if desired. (For example, it could just
6898							call perl's builtin warn function, or be even more forceful,
6899							generating warnings even if STDERR is not a terminal, for example.)
6900
6901							The progress_set() method may be used to override the progress routine
6902							for an individual object (set to 1/true for default behavior, or
6903							0/undef/false to disable progress for that object entirely).
6904
6905							Call progress_class() to set similar values to control the global
6906							default behavior (e.g. turning on/off default progress behavior for
6907							all instances), but be cautious about using this approach in any
6908							environment where other programs might be accessing the same loaded
6909							class data, since the setting is stored in a class-owned global
6910							($Data::CTable::DefaultProgress).
6911
6912							Manipulating the class-default settings is only recommended in batch
6913							or shell-script environments, not in mod_perl Web applications where
6914							the module stays loaded into the Perl environment across multiple
6915							invocations, for example.
6916
6917							If you want a particular method (e.g. read() but not write()) to be
6918							silent, you could make a subclass and could override that method with
6919							an implementation that first disables progress, calls the SUPER::
6920							method, and then restores the progress setting to its original
6921							setting.
6922
6923							=head2 Timed progress
6924
6925							Timed progress is a way of printing periodically-recurring progress
6926							messages about potentially time-consuming processes to the terminal.
6927
6928							For example, consider the following messages which might appear every
6929							2 seconds during a lengthy read() operation:
6930
6931							Reading... 0 (0%)
6932							Reading... 2000 (4%)
6933							...
6934							Reading... 38000 (96%)
6935							Reading... 40000 (100%)
6936
6937							The progress_timed() method is called internally by potentially
6938							time-consuming processes (read(), write(), and sort()), and you may
6939							want to call it yourself from your own scripts, to produce
6940							weary-programmer-soothing visual output during otherwise
6941							panic-producing long delays.
6942
6943							Generally, progress_timed() is called with the $Wait parameter set to
6944							true, which delays the display of any messages until 2 seconds have
6945							passed, so no messages will be displayed unless the process actually
6946							does end up being slower than 2 seconds.
6947
6948							Parameters are:
6949
6950							$Op The string that identifies the "operation" taking place
6951							$Item A milestone such as a number or datum to indicate progress
6952							$Pos A numeric position against the (maybe estimated) baseline
6953							$Tot The baseline. If estimated, don't re-estimate too often
6954							$Wait If true, skip printing the first message for this $Op
6955
6956							All parameters except $Op are optional.
6957
6958							progress_timed() has a throttle that keeps it from re-triggering more
6959							often than every 2 seconds for any given sequence of the same $Op.
6960							The clock is restarted each time you call it with a different $Op or
6961							$Tot from the previous call (on the assumption that if the operation
6962							or the baseline changes then that fact should be noted).
6963
6964							The messages printed will start with "$Op... ".
6965
6966							If you supply $Item, which could be a number or a string, the messages
6967							will then show the $Item after the $Op.
6968
6969							If you supply BOTH $Pos and $Tot, then a percentage will be calculated
6970							and added to the readout; otherwise omitted.
6971
6972							If you supply $Wait, the first message (only) that uses this $Op will
6973							be skipped, and the next one won't appear for at least 2 seconds.
6974
6975							If using $Pos and $Tot to display percentages for your user, be sure
6976							to call progress_timed() one final time when $Pos == $Tot so your user
6977							sees the satisfying 100% milestone. This "completion" call will not
6978							be skipped even if 2 seconds have not passed since the previous timed
6979							progress message was printed.
6980
6981							Althought progress_timed() is designed to cut down on too much visual
6982							output when called often in a tight loop, remember that it still takes
6983							some processing time to call it and so if you call it too frequently,
6984							you're slowing down the very loop you wish were running faster.
6985
6986							So, you might want to call it every tenth or 100th or even 1000th time
6987							through a tight loop, instead of every time through, using the mod (%)
6988							operator:
6989
6990							$t->progress_timed(....) if ($LoopCount % 100) == 0;
6991
6992							progress_timed_default() is the method called internally by
6993							progress_timed() to actually print the messages it has prepared. In
6994							this implementation, progress_timed_default() just calls
6995							progress_default(). That is, it ignores all other progress-inhibiting
6996							or -enhancing settings so delay-soothing messages will print on the
6997							terminal even if other messages are turned off.
6998
6999							This is because the author assumes that even if you don't want all
7000							those other progress messages, you might still want these ones that
7001							explain long delays. If you REALLY don't, then just make yourself a
7002							lightweight subclass where progress_timed_default() is a no-op, or
7003							maybe calls regular progress(). For example:
7004
7005							BEGIN {package Data::CTable::Silent; use vars qw(@ISA);
7006							@ISA=qw(Data::CTable); sub progress_timed_default{}}
7007
7008							## Later...
7009							my $t = Data::CTable::Silent->new(...);
7010
7011
7012							=cut
7013
7014							$Data::CTable::DefaultProgress = 1;
7015
7016							sub progress_set
7017							{
7018	5			5	0	1065	my $this = shift;
7019	5					7	my ($ProgSetting) = @_;
7020
7021	5					14	$this->{_Progress} = $ProgSetting;
7022							}
7023
7024							sub progress_class
7025							{
7026	232			232	0	3171	my $Ignored = shift;
7027	232					307	my ($ProgSetting) = @_;
7028
7029							## Set if specified...
7030	232	100				1127	$Data::CTable::DefaultProgress = $ProgSetting if defined($ProgSetting);
7031
7032							## Return..
7033	232					747	return($Data::CTable::DefaultProgress);
7034							}
7035
7036							sub progress_get
7037							{
7038	0			0	0	0	my $this = shift;
7039
7040	0					0	my $ProgSetting = $this->{_Progress};
7041
7042	0					0	return($ProgSetting);
7043							}
7044
7045							sub progress_default
7046							{
7047	217			217	0	539	my ($this, $msg) = @_;
7048	217					374	chomp $msg;
7049
7050	217	50				1823	print STDERR (localtime() . " $msg\n") if -t STDERR;
7051
7052	217					523	return(1); ## Indicate progress actually completed
7053							}
7054
7055							sub progress
7056							{
7057	227			227	1	10303	my $this = shift;
7058	227					454	my ($msg) = @_;
7059
7060	227					590	my $Prog1 = $this->{_Progress}; ## First check object's progress setting
7061	227					589	my $Prog2 = $this->progress_class(); ## Then check class's setting
7062
7063							## Calling regular progress resets the timers & ops in
7064							## progress_timed...
7065
7066	227					420	delete $this->{_ProgTimeInfo};
7067
7068							## First examine object setting to find a progress routine...
7069
7070	227	100				1018	return(&$Prog1($this, $msg)) if ref($Prog1) eq 'CODE'; ## Code ref: return it.
7071	225	100				501	return($this->progress_default($msg)) if $Prog1; ## true: use default progress.
7072	219	100				470	return(undef) if defined($Prog1); ## false but defined: no progress.
7073
7074							## undef: fall through to class settings...
7075
7076	215	100				527	return(&$Prog2($this, $msg)) if ref($Prog2) eq 'CODE'; ## Code ref: return it.
7077	213	100				859	return($this->progress_default($msg)) if $Prog2; ## true: use default progress.
7078	2					4	return(undef); ## false/undef: no progress.
7079							}
7080
7081							sub progress_timed
7082							{
7083	214610			214610	0	651323	my $this = shift;
7084	214610					314706	my ($Op, $Item, $Pos, $Tot, $Wait) = @_;
7085
7086							## Get params from previous call if any.
7087	214610	100				227196	my ($LastOp, $LastItem, $LastPos, $LastTot, $LastTime) = @{$this->{_ProgTimeInfo} \|\| []};
	214610					660913
7088
7089							## print &Dumper([$Op, $Item, $Pos, $Tot, $Wait], [$LastOp, $LastItem, $LastPos, $LastTot, $LastTime]);
7090
7091							## Get elapsed time.
7092	214610					269250	my $Time = time();
7093	214610		66			491210	my $Elapsed = $Time - ($LastTime \|\| $Time);
7094
7095							## We're on the "same" operation if the $Op name is the same and
7096							## the total (baseline) is the same. Otherwise treat as new op.
7097
7098	214610					281052	my $SameOp = (($Op eq $LastOp));
7099	214610		66			690117	my $SameOpAndTot = ($SameOp && ($Tot == $LastTot));
7100	214610		100			656402	my $Finished = ($Tot && ($Pos == $Tot));
7101
7102							## We trigger a message to print if we've been on the same op for
7103							## 2 seconds or more, OR this is a new op.
7104
7105	214610		66			1606127	my $Trigger = (($SameOpAndTot && ($Elapsed >= 2)) \|\| ## Yes if same op & time has passed...
7106							($SameOpAndTot && $Finished) \|\| ## Yes if we're finished (100%).
7107							!$SameOpAndTot); ## Yes if new op
7108
7109							## Quit now if nothing to do.
7110	214610	100				536093	goto done unless $Trigger;
7111
7112							## Otherwise print message and save details for next time around.
7113	16	100	66			124	my $Percent = sprintf("(%2d\%)", int(($Pos * 100) / $Tot)) if (defined($Pos) && $Tot);
7114
7115							## If we've been asked to "wait", we skip actually printing the
7116							## message this the first time, but act as if we did (starting the timer).
7117
7118	16	100	100			107	my $RetVal = $this->progress_timed_default("$Op... $Item $Percent")
7119							unless (!$SameOp && $Wait); ## Skip first-time message if $Wait.
7120
7121	16					59	$this->{_ProgTimeInfo} = [$Op, $Item, $Pos, $Tot, $Time];
7122
7123	214610					506940	done:
7124							return($RetVal);
7125							}
7126
7127							sub progress_timed_default
7128							{
7129	5			5	0	52	my $this = shift;
7130	5					8	my ($msg) = @_;
7131
7132	5					25	return($this->progress_default("$msg"));
7133							}
7134
7135							sub warn
7136							{
7137	1			1	0	2	my $this = shift;
7138	1					2	my ($msg) = @_;
7139
7140	1					4	return($this->progress_default("WARNING: $msg"));
7141							}
7142
7143							=pod
7144
7145							=head1 Rejecting or reporting on groups of records and continuing
7146
7147							Use utility methods omit_warn() and omit_note() to conditionally omit
7148							some records from a table and warn (or "note") if any were affected.
7149
7150							Use select_extract() to do the same thing but without actually
7151							removing the extracted records from the table, and restoring the
7152							original selection before select_extract was called.
7153
7154							If you supply a file name as the 4th argument, the omitted records
7155							will be extracted to a file for later reference.
7156
7157							If you supply a message prefix as the 5th argument, a string other
7158							than "WARNING" or "Note" may be specified.
7159
7160							# Reject with a progress message prefixed by "WARNING:"
7161
7162							$t->omit_warn(FirstName => sub{!length($_)}, "First name is empty");
7163
7164							Mon Aug 23 08:24:15 2004 WARNING: Omitting 2 of 78243 records (now 78241): First name is empty.
7165
7166							# Reject with a progress message prefixed by "Note:", with output to a file
7167
7168							$t->omit_note(FirstName => sub{!length($_)}, "First name is empty", "empty.names.txt");
7169
7170							Mon Aug 23 08:24:15 2004 Note: Omitting 2 of 78243 records (now 78241): First name is empty.
7171							Mon Aug 23 08:24:15 2004 Writing bad.firstname.txt...
7172							Mon Aug 23 08:24:15 2004 Wrote bad.firstname.txt.
7173
7174							# Extract some items, leaving original selection intact
7175
7176							$t->select_extract(FirstName => sub{!length($_)}, "First name is empty", "empty.names.txt");
7177
7178							Mon Aug 23 08:24:15 2004 Note: Extracting 2 of 78243 records: First name is empty.
7179
7180							=cut
7181
7182							sub omit_warn
7183							{
7184	0			0	0	0	my $this = shift;
7185	0					0	my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList) = @_;
7186
7187	0	0				0	$MessagePrefix = 'WARNING' if !defined($MessagePrefix);
7188
7189	0					0	$this->extract_with_message($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, 'DoOmit');
7190							}
7191
7192							sub omit_note
7193							{
7194	0			0	0	0	my $this = shift;
7195	0					0	my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList) = @_;
7196
7197	0	0				0	$MessagePrefix = 'Note' if !defined($MessagePrefix);
7198
7199	0					0	$this->extract_with_message($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, 'DoOmit');
7200							}
7201
7202							sub select_extract
7203							{
7204	0			0	0	0	my $this = shift;
7205	0					0	my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList) = @_;
7206
7207	0	0				0	$MessagePrefix = 'Note' if !defined($MessagePrefix);
7208
7209	0					0	$this->extract_with_message($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, !'DoOmit');
7210							}
7211
7212							sub extract_with_message
7213							{
7214	0			0	0	0	my $this = shift;
7215	0					0	my ($SelectField, $SelectSub, $Message, $DebugFile, $MessagePrefix, $ExtractFieldList, $DoOmit) = @_;
7216
7217							## Find omittable items -- save and restore the selection.
7218
7219	0					0	my $LengthBefore = $this->sel_len();
7220	0					0	my $SelBefore = $this->selection();
7221
7222	0					0	$this->select($SelectField, $SelectSub);
7223	0					0	my $OmitCount = $this->sel_len();
7224	0					0	my $SelOmitted = [@{$this->selection()}];
	0					0
7225
7226	0					0	$this->selection($SelBefore);
7227
7228							## If we have some, perform the omit and report on the omitted ones.
7229
7230	0	0				0	if ($OmitCount)
7231							{
7232	0					0	my $Sel = $this->col_empty(); ## Start with empty mask (all entries undef).
7233	0					0	@$Sel[@$SelBefore] = @$SelBefore; ## Mask in those in the original selection.
7234	0					0	@$Sel[@$SelOmitted] = undef; ## Mask out those we found.
7235	0					0	my $NewSel = [grep {defined} @$Sel];
	0					0
7236	0					0	my $LengthAfter = @$NewSel + 0;
7237
7238							## Only actually alter the table if requested.
7239	0	0				0	$this->{_Selection} = $NewSel if $DoOmit; ## The remaining ones are the new selection.
7240
7241	0	0				0	$this->progress("$MessagePrefix: @{[$DoOmit ? 'Omitting' : 'Extracting']} @{[$OmitCount]} of @{[$LengthBefore]} records@{[$DoOmit ? qq{ (now $LengthAfter)} : '']}: $Message");
	0	0				0
	0					0
	0					0
	0					0
7242	0	0				0	$this->write(_FileName => "$DebugFile",
		0
7243							_FDelimiter => "\t",
7244							_LineEnding => undef,
7245							_Selection => $SelOmitted,
7246							($ExtractFieldList ?
7247							(_FieldList => $ExtractFieldList) : ()),
7248							) if $DebugFile;
7249							}
7250							else
7251							{
7252	0					0	unlink $DebugFile;
7253							}
7254
7255	0					0	return($OmitCount);
7256							}
7257
7258							=pod
7259
7260							=head1 DEBUGGING / DUMPING
7261
7262							## Print some debugging output...
7263
7264							$t->out() ## Pretty-print $t using Data::ShowTable
7265
7266							$t->dump() ## Dump $t using Data::Dumper
7267							$t->dump($x, $y) ## Dump anything else using Data::Dumper
7268
7269							## Print some debugging output and then die.
7270
7271							die $t->out() ## Same but die afterwards.
7272
7273							die $t->dump() ## Same but die afterwards.
7274							die $t->dump($x, $y) ## Same but die afterwards.
7275
7276							These two methods can be very helpful in debugging your scripts.
7277
7278							The out() method, which has many options, is described in complete
7279							detail in the section below titled "FORMATTED TABLES". In short, it
7280							prints a nicely-formatted diagram of $t, obeying the custom field list
7281							if any and custom selection if any.
7282
7283							The dump() method uses the Data::Dumper module to call &Dumper() on
7284							the table itself (by default) and prints the result to STDERR. If you
7285							specify any number of other values, those will be dumped instead using
7286							a single call to &Dumper (rather than individually).
7287
7288							=head2 Optional module dependencies
7289
7290							These methods require the otherwise-optional modules shown here:
7291
7292							out() Data::ShowTable
7293							dump() Data::Dumper
7294
7295							You'll get a warning at runtime if you try to call either method
7296							without the appropriate module installed on your system.
7297
7298							=cut
7299
7300							sub dump
7301							{
7302	0			0	0	0	my $this = shift;
7303	0					0	my (@Things) = @_;
7304
7305							## Default is to dump the object.
7306	0	0				0	@Things = ($this) unless @Things;
7307
7308	0	0				0	if ($HaveDumper)
7309							{
7310	0					0	print STDERR &Dumper(@Things);
7311							}
7312							else
7313							{
7314	0					0	carp("Data::Dumper module is not installed. Can't dump.");
7315							}
7316
7317	0					0	return(1);
7318							}
7319
7320							=pod
7321
7322							=head1 MISCELLANEOUS UTILITY METHODS
7323
7324							The following utilities are methods of the Data::CTable object. They
7325							may be called directly by clients, subclassed, or used by subclass
7326							implementations as needed.
7327
7328							## Get cache file path (all args optional: defaulted from $t)
7329
7330							$f = $t->prep_cache_file($FileName, $CacheExtension, $CacheSubDir)
7331
7332							## Verify all directories in a path, creating any needed ones.
7333
7334							$ok = $t->verify_or_create_path($DirPath, $Sep)
7335
7336							## Testing readability / writeability of a proposed file
7337
7338							$ok = $t->try_file_read ($Path); ## Opens for read; then closes
7339							$ok = $t->try_file_write($Path); ## Opens for write; then deletes
7340
7341							## Getting parameters from object with optional overrides
7342
7343							$param = $t->getparam($Params, $Param)
7344
7345							prep_cache_file() is the internal method used by both read() and
7346							write() to calculate the name of a cache file to be used for a given
7347							$FileName.
7348
7349							It calculates the path to the cache file that corresponds to the given
7350							$FileName (which may be a bare file name, a relative path, or a
7351							partial path, as long as it obeys the current platform's path format
7352							rules). All arguments are optional and if absent (undef), will be
7353							defaulted from the corresponding parameters in $t.
7354
7355							In addition to calculating the path and file name, it also prepends
7356							the "current directory" path if there was no path. Then it checks
7357							that all directories mentioned in the path actually exist. If not, it
7358							fails. Then, it checks that EITHER the file exists and is readable,
7359							OR it does not exist but would be writeable in that directory. If any
7360							of these directory creations or file checks fails, then undef is
7361							returned (and there would be no cache file).
7362
7363							You may call it with no arguments on a file that has been read() to
7364							find the path to the cache file that may have been used and/or
7365							created, if any.
7366
7367							You may call it with a file name that was written to, to see what the
7368							corresponding written cache file would be.
7369
7370							For example:
7371
7372							## Get name of cache file used or created by read and delete it.
7373
7374							$RCache = $t->prep_cache_file() and unlink($RCache);
7375
7376							## Cache on write() and get name of file and delete it.
7377
7378							$Written = $t->write(_CacheOnWrite=>1, _FileName=>"Foo.txt");
7379							$WCache = $t->prep_cache_file($Written) and unlink($WCache);
7380
7381							verify_or_create_path() is the internal routine used by read(),
7382							write(), and the cache-writing logic, that makes sure a requested file
7383							path exists (by creating it if necessary and possible) before any file
7384							is written by this module.
7385
7386							(If you don't like this module's tendency to try to create
7387							directories, make yourself a subclass in which this routine simply
7388							checks -d on its $Path argument and returns the result.)
7389
7390							It must be called with a full or partial path TO A DIRECTORY, NOT A
7391							FILE. You may supply $Sep, a platform-appropriate separator character
7392							(which defaults correctly for the runtime platform if you don't).
7393
7394							Returns true if the path verification and/or creation ultimately
7395							succeeded, false otherwise (meaning that, after this call, there is no
7396							such directory on the system and so you should not try to write a file
7397							there).
7398
7399							try_file_read() and try_file_write() are the internal methods called
7400							by prep_cache_file() as well as by read() and write() to preflight
7401							proposed file reading and writing locations.
7402
7403							try_file_read() opens a file for read and closes it again; returns
7404							true if the open was possible.
7405
7406							try_file_write() opens a file for write and closes it again, deleting
7407							it if successful. Returns true if the open for write and the delete
7408							were successful. (Be aware that this call will actually delete any
7409							existing file by this name.)
7410
7411							The reason that failure to delete causes try_file_write() to fail is
7412							that successful cacheing depends on the ability to delete cache files
7413							as well as create them or write to them. A file in a location that
7414							couldn't be deleted will not be used for cacheing.
7415
7416							getparam() looks up a named parameter in a params hash if it exists
7417							there, otherwise looks it up in the object, thereby allowing $Params
7418							to shadow any parameters in $this.
7419
7420							This internal routine is used by any methods that allow overriding of
7421							parameters in the object when using a named-parameter calling
7422							interface. It should be used by any subclasses that also wish to use
7423							a named-parameter calling convention. For example:
7424
7425							my $this = shift;
7426							my $Params = (@_ == 1 ? {_FieldList => $_[0]} : {@_});
7427
7428							my($FieldList, $Selection) = map {$this->getparam($Params, $_)}
7429							qw(_FieldList _Selection);
7430
7431							=cut
7432
7433							{}; ## Get emacs to indent correctly.
7434
7435							sub prep_cache_file
7436							{
7437	70			70	0	131	my $this = shift;
7438	70					151	my($FileName, $CacheExtension, $CacheSubDir) = @_;
7439
7440	70					115	my $Success;
7441
7442	70		66			244	$FileName \|\|= $this->{_FileName};
7443	70		66			211	$CacheExtension \|\|= $this->{_CacheExtension};
7444	70		33			555	$CacheSubDir \|\|= $this->{_CacheSubDir};
7445
7446							## Break the path into its parts...
7447	29			29		392	use File::Basename qw(fileparse);
	29					1720
	29					22418
7448	70					4287	my ($Basename, $Path, $Ext) = fileparse($FileName, '\.[^\.]+');
7449
7450							## Figure out what the path separator should be...
7451	70					177	my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
	70					367
7452
7453							## FileDir is guaranteed to either be empty or have a trailing
7454							## separator (see: man File::Basename). If empty, we set it to
7455							## $Cur (the current directory).
7456
7457	70	50				336	my $FileDir = (length($Path) ? $Path : $Cur);
7458
7459							## Ensure $CacheSubDir is either empty or has a trailing separator...
7460
7461	70					1016	$CacheSubDir =~ s/([^\Q$Sep\E])$/$1$Sep/;
7462
7463							## Check whether it's an absolute path (and not really a "sub"-dir)
7464	70					309	my $Absolute = &path_is_absolute($CacheSubDir);
7465
7466							## $CacheDir is $FileDir with $CacheSubDir appended. $CacheSubDir
7467							## may be empty, meaning $FileDir is to be used for the cache
7468							## files. But if it's an absolute path, it stands alone.
7469
7470	70	100				303	my $CacheDir = ($Absolute ? $CacheSubDir : "$FileDir$CacheSubDir");
7471
7472							## Now we need to make sure that CacheDir exists OR try to create
7473							## it. (Warning will have already happened if necessary.)
7474	70	50				401	goto done unless $this->verify_or_create_path($CacheDir, $Sep);
7475
7476							## Verify that the dir is writeable.
7477	70	50				1880	$this->warn("Cache directory $CacheDir is read-only"), goto done
7478							unless -r $CacheDir;
7479
7480							## Full path: note $CacheExtension and $Ext may be empty.
7481	70					230	my $CacheFilePath = "$CacheDir$Basename$Ext$CacheExtension";
7482
7483							## If the cache path and the full path end up being the same
7484							## (probably because the _CacheSubDir and _CacheExtension are both
7485							## empty), we bail. Obviously, we don't want to risk overwriting
7486							## the original data with the cache (or trying to read cache data
7487							## from a text file).
7488
7489	70	50				210	$this->warn("Can't cache $FileName without either _CacheSubDir or _CacheExtension"), goto done
7490							if $CacheFilePath eq $FileName;
7491
7492							## Pre-flight the cache file: ensure we can either read it or
7493							## touch and then delete it.
7494
7495	70	50	66			374	$this->warn("Cache file $CacheFilePath cannot be created/overwritten: $!"), goto done
7496							unless ($this->try_file_read ($CacheFilePath) \|\|
7497							$this->try_file_write($CacheFilePath));
7498
7499	70					146	$Success = 1;
7500	70	50				825	done:
7501							return($Success ? $CacheFilePath : undef);
7502							}
7503
7504							sub verify_or_create_path
7505							{
7506	78			78	0	1183	my $this = shift;
7507	78					164	my ($Dir, $Sep) = @_;
7508
7509							## Get default value for $Sep if not supplied.
7510	78		33			228	$Sep \|\|= ${$this->path_info()}{sep};
	0					0
7511
7512							## $Dir might end in $Sep; split strips trailing one if so.
7513	78					2534	my $Parts = [split(/\Q$Sep\E/, $Dir)];
7514
7515	78					231	my $WholePath = "";
7516	78					281	foreach (@$Parts)
7517							{
7518	533					999	$WholePath = "$WholePath$_$Sep";
7519	533	50				15780	next if -d $WholePath;
7520
7521							## Directory does not exist. We need to make it.
7522
7523							## mkdir($WholePath, 0777) or
7524							## $this->warn("Failed to create directory '$WholePath': $!"), last;
7525
7526							## On some platforms (e.g. Darwin), perl's mkdir fails if
7527							## there's a trailing separator. Others tolerate its absence,
7528							## so we remove it.
7529
7530	0					0	(my $TryDir = $WholePath) =~ s/\Q$Sep\E$//;
7531
7532	0	0				0	mkdir($TryDir, 0777) or
7533							$this->warn("Failed to create directory '$TryDir': $!"), last;
7534							}
7535
7536	78					1973	return(-d $Dir);
7537							}
7538
7539							sub try_file_write ## like a "touch" but deletes the file if it succeeds.
7540							{
7541	10			10	0	95	my $this = shift;
7542	10					20	my ($Path) = @_;
7543
7544	10					15	my $Success;
7545
7546							## Try creating it.
7547	29			29		267	use IO::File;
	29					87
	29					13320
7548	10					80	my $File = IO::File->new(">$Path");
7549
7550							## Created: close and delete it.
7551	10	50				1552	$File->close(), unlink($Path) if $File;
7552
7553							## Failed: bail.
7554	10	50				1127	goto done unless $File;
7555
7556							## If we couldn't unlink, fail. The ability to delete a failed or
7557							## part-written cache file is a critical part of cacheing.
7558
7559	10	50				284	goto done if -e $Path;
7560
7561	10					18	$Success = 1;
7562	10					56	done:
7563							return($Success);
7564							}
7565
7566							sub try_file_read ## Verifies that a file exists / can be read...
7567							{
7568	70			70	0	127	my $this = shift;
7569	70					140	my ($Path) = @_;
7570
7571	70					101	my $Success;
7572
7573							## Try opening it.
7574	29			29		221	use IO::File;
	29					86
	29					33269
7575	70	100				1402	my $File = IO::File->new("<$Path") or goto done;
7576
7577	68					8054	$Success = 1;
7578	70					1505	done:
7579							return($Success);
7580							}
7581
7582							sub getparam
7583							{
7584	992			992	0	1133	my $this = shift;
7585	992					1458	my ($Params, $Param) = @_;
7586
7587	992	100				3865	return(exists($Params->{$Param}) ?
7588							( $Params->{$Param}) :
7589							( $this->{$Param}) );
7590							}
7591
7592
7593
7594							=pod
7595
7596							=head1 GENERAL-PURPOSE UTILITY FUNCTIONS
7597
7598							These general-purpose utility routines are defined in the Data::CTable
7599							module but are not method calls. You may optionally import them or
7600							call them by their fully-qualified name.
7601
7602							use Data::CTable qw(
7603							guess_endings
7604							guess_delimiter
7605							path_info
7606							path_is_absolute
7607							min
7608							max
7609							);
7610
7611							## File-format guessing
7612
7613							my $E = &guess_endings($IOHandle) ## Guess txt file line endings
7614							my $D = &guess_delimiter($String) ## Tab if found, else comma
7615
7616							## Cross-platform file path analysis
7617
7618							my $Info = path_info(); ## Hash: 3 of platform's path values:
7619							my ($Sep, ## ... path separator ( / on Unix)
7620							$Up, ## ... "up" component (../ on Unix)
7621							$Cur) = ## ... curr. dir path ( ./ on Unix)
7622							@$Info{qw(sep up cur)};
7623
7624							my $Abs = path_is_absolute($Path) ## Check path type
7625
7626							## Our old favorites min and max
7627
7628							$x = max($x, 0); ## Should have been part of Perl...
7629							$x = min($x, 100);
7630
7631							guess_endings() tries to figure out whether an open IO::File handle
7632							has DOS, Mac, or Unix file endings. It reads successively large
7633							blocks of the file until it either finds evidence of at least two
7634							separate line endings (of any type, but presumably they are the same),
7635							or until it reaches the end of the file. Then, it takes the resulting
7636							block and searches for the first qualifying line ending sequence it
7637							finds, if any. This sequence is then returned to the caller. If it
7638							returns undef, it was not able to find any evidence of line endings in
7639							the file.
7640
7641							guess_delimiter() takes a string buffer and returns a "," unless it
7642							finds a tab character before the first comma in the $String, if any,
7643							in which case a tab is returned.
7644
7645							path_info() returns a hash of three helpful strings for building and
7646							parsing paths on the current platform. Knows about Mac, Dos/Win, and
7647							otherwise defaults to Unix.
7648
7649							path_is_absolute($Path) returns true if it thinks the given path
7650							string is an absolute path on the current platform.
7651
7652							=cut
7653
7654							{}; ## Get emacs to indent correctly.
7655
7656							sub guess_endings
7657							{
7658	19			19	0	36	my ($File) = @_;
7659
7660	19					32	my $Ending = undef;
7661
7662	19					39	my $ReadCount = 0;
7663	19					30	my $BlockSize = 512;
7664
7665	19					29	my $Buf;
7666							my $Actual;
7667
7668	19					190	while ($File->seek(0, 0), $Actual = $File->read($Buf, ($BlockSize * ++$ReadCount)))
7669							{
7670							## Break out of the loop if it appears a line ending match is
7671							## found (but disallow initial match at very end of buffer).
7672
7673	19	50				1170	last if $Buf =~ /((?:\x0D\x0A)\|(?:\x0D)\|(?:\x0A))[^\x0D\x0A]/;
7674
7675							## Break out of the loop if we just read any less than
7676							## attempted (we are probably at the end of a very short,
7677							## maybe one-line or even zero-line, file).
7678
7679	0	0				0	last if $Actual < ($BlockSize * $ReadCount);
7680							}
7681
7682							## We can presume that the buffer we now have must either have
7683							## line endings in it, or there is no line ending in the file at
7684							## all. So we extract the first one we come to, (trying the DOS
7685							## ending first since it contains the other two), if any, and we
7686							## return it.
7687
7688	19					196	my $Ending = ($Buf =~ /((\x0D\x0A)\|(\x0D)\|(\x0A))/)[0];
7689
7690							## &progress_default(undef, "DOS line endings") if $2; ## Debugging.
7691							## &progress_default(undef, "Mac line endings") if $3; ## Debugging.
7692							## &progress_default(undef, "Unix line endings") if $4; ## Debugging.
7693
7694	19					74	done:
7695
7696							## We always seek back to zero when done.
7697							$File->seek(0, 0);
7698
7699	19					245	return($Ending);
7700							}
7701
7702							sub guess_delimiter
7703							{
7704	19			19	0	43	my ($String) = @_;
7705
7706	19		50			170	return(($String =~ /([,\t])/)[0] \|\| ",");
7707							}
7708
7709							sub path_info
7710							{
7711	29			29		356	use Config qw(%Config);
	29					89
	29					5742
7712	86			86	0	1378	my $OSName = $Config{osname};
7713
7714	86	50				552	return({sep =>':' , up =>'::' , cur =>':' }) if $OSName =~ /mac /ix;
7715	86	50				791	return({sep =>'\\', up =>'..\\', cur =>'.\\'}) if $OSName =~ /(?
7716	86					710	return({sep =>'/' , up =>'../' , cur =>'./' }) ;
7717							}
7718
7719							sub path_is_absolute
7720							{
7721	70			70	0	161	my ($Path) = @_;
7722
7723	29			29		182	use Config qw(%Config);
	29					63
	29					21216
7724	70					481	my $OSName = $Config{osname};
7725
7726	70	50				339	return($Path =~ /^[^:]/) if $OSName =~ /mac /ix;
7727	70	50				891	return($Path =~ /^(([a-z][:])\|(\\\\))/i) if $OSName =~ /(?
7728	70					291	return($Path =~ /^\//) ;
7729							}
7730
7731							### min and max
7732
7733	214620	100		214620	0	1042393	sub min {return($_[0] < $_[1] ? $_[0] : $_[1])}
7734	134	100		134	0	756	sub max {return($_[0] > $_[1] ? $_[0] : $_[1])}
7735
7736
7737							=pod
7738
7739							=head1 IMPLEMENTATION LIMITATIONS
7740
7741							=over 4
7742
7743							=item Column (field) names must not start with underscore
7744
7745							This object is implemented as a blessed hash reference. By
7746							convention, keys that do not start with underscore are data columns
7747							and the key is the field name. Keys that do start with underscore
7748							refer to parameters or other data structures stored in the object.
7749
7750							Consequently, no field names may start with underscore. When a file
7751							is read from disk, any field names that DO start with underscores will
7752							have the leading underscores stripped off. Strange things could then
7753							occur if the field names are then no longer unique. For example,
7754							field "A" and "_A" in the data file would be treated as the single
7755							field "A" after the file was read.
7756
7757							=item Field values are always read as strings
7758
7759							Field values when written to a file are necessarily converted to
7760							strings. When read back in, they are read as strings, regardless of
7761							original format. The sole exception is the empty string which is read
7762							back in as undef for efficiency.
7763
7764							An exception is when the _CacheOnWrite feature is used: field values
7765							stored internally as integers or other scalar types may be saved and
7766							later restored as such. However, you should not rely on this
7767							behavior.
7768
7769							=item Undef vs. empty
7770
7771							Empty field values are stored as "undef" for efficiency. This means
7772							that programs should generally not rely on any differences between ""
7773							and undef in field values. However, when working with large but
7774							sparse tables, programs should take care not to convert undef values
7775							to empty strings unnecessarily since the separate string objects
7776							consume considerably more memory than undef.
7777
7778							=back
7779
7780							=head1 CONTRIBUTIONS
7781
7782							Corrections, bug reports, bug fixes, or feature additions are
7783							encouraged. Please send additions or patches with a clear explanation
7784							of their purpose. Consider making additions in the form of a subclass
7785							if possible.
7786
7787							I'm committed to bundling useful subclasses contributed by myself or
7788							others with this main distribution.
7789
7790							So, if you've got a subclass of Data::CTable (which should have a name
7791							like Data::CTable::YourClassName) and you would like it included in
7792							the main distribution, please send it along with a test script and
7793							I'll review the code and add it (at my discretion).
7794
7795							If you've got a module that uses, augments, or complements this one,
7796							let me know that, too, and I'll make appropriate mention of it.
7797
7798							=head1 AUTHORS
7799
7800							Chris Thorman
7801
7802							Jay Hannah
7803
7804							=head1 SEE ALSO
7805
7806							The Data::CTable home page: http://christhorman.com/projects/perl/Data-CTable/
7807
7808							Version control: https://github.com/jhannah/data-ctable/issues
7809
7810							Report bugs: https://rt.cpan.org/Public/Dist/Display.html?Name=Data-CTable
7811
7812							The implementation in CTable.pm.
7813
7814							The test.pl script, other subclasses, and examples.
7815
7816							The Data::ShowTable module.
7817
7818							The Data::Table module by Yingyao Zhou & Guangzhou Zou.
7819
7820							The perlref manual page.
7821
7822							=head1 LICENSE
7823
7824							Copyright (c) 1995-2012 Chris Thorman. All rights reserved.
7825
7826							This program is free software; you can redistribute it and/or modify
7827							it under the same terms as Perl itself.
7828
7829							=cut
7830
7831							1;
7832
7833
7834