File Coverage

blib/lib/Data/CompactReadonly.pm
Criterion Covered Total %
statement 39 40 97.5
branch 16 18 88.8
condition 2 3 66.6
subroutine 5 5 100.0
pod 2 2 100.0
total 64 68 94.1


line stmt bran cond sub pod time code
1             package Data::CompactReadonly;
2              
3 9     9   829477 use warnings;
  9         84  
  9         264  
4 9     9   46 use strict;
  9         15  
  9         192  
5              
6 9     9   3951 use Data::CompactReadonly::V0::Node;
  9         22  
  9         4685  
7              
8             # Yuck, semver. I give in, the stupid cult that doesn't understand
9             # what the *number* bit of *version number* means has won.
10             our $VERSION = '0.0.6';
11              
12             =head1 NAME
13              
14             Data::CompactReadonly
15              
16             =head1 DESCRIPTION
17              
18             A Compact Read Only Database that consumes very little memory. Once created a
19             database can not be practically updated except by re-writing the whole thing.
20             The aim is for random-access read performance to be on a par with L
21             and for files to be much smaller.
22              
23             =head1 VERSION 'NUMBERS'
24              
25             This module uses semantic versioning. That means that the version 'number' isn't
26             really a number but has three parts: C.
27              
28             The C number will increase when the API changes incompatibly;
29              
30             The C number will increase when backward-compatible additions are made to the API;
31              
32             The C number will increase when bugs are fixed backward-compatibly.
33              
34             =head1 FILE FORMAT VERSIONS
35              
36             All versions so far support file format version 0 only.
37              
38             See L for details of what that means.
39              
40             =head1 METHODS
41              
42             =head2 create
43              
44             Takes two arguments, the name of file into which to write a database, and some
45             data. The data can be undef, a number, some text, or a reference to an array
46             or hash that in turn consists of undefs, numbers, text, references to arrays or
47             hashes, and so on ad infinitum.
48              
49             This method may be very slow. It constructs a file by making lots
50             of little writes and seek()ing all over the place. It doesn't do anything
51             clever to figure out what pointer size to use, it just tries the shortest
52             first, and then if that's not enough tries again, and again, bigger each time.
53             See L for more on pointer sizes. It may also eat B of
54             memory. It keeps a cache of everything it has seen while building your
55             database, so that it can re-use data by just pointing at it instead of writing
56             multiple copies of the same data into the file.
57              
58             It tries really hard to preserve data types. So for example, C<60000> is stored
59             and read back as an integer, but C<"60000"> is stored and read back as a string.
60             This means that you can correctly store and retrieve C<"007"> but that C<007>
61             will have the leading zeroes removed before Data::CompactReadonly ever sees it
62             and so will be treated as exactly equivalent to C<7>. The same applies to floating
63             point values too. C<"7.10"> is stored as a four byte string, but C<7.10> is stored
64             the same as C<7.1>, as an eight byte IEEE754 double precision float. Note that
65             perl parses values like C<7.0> as floating point, and thus so does this module.
66              
67             Finally, while the file format permits numeric keys in hashes, this method
68             always coerces them to text. This is because if you allow numeric keys,
69             numbers that can't be represented in an C, such as 1e100 or 3.14 will
70             be subject to floating point imprecision, and so it is unlikely that you
71             will ever be able to retrieve them as no exact match is possible.
72              
73             =head2 read
74              
75             Takes a single compulsory argument, which is a filename or an already open file
76             handle, and some options.
77              
78             If the first argument is a filehandle, the current file pointer should be at
79             the start of the database (not necessarily at the start of the file; the
80             database could be in a C<__DATA__> segment) and B have been opened in
81             "just the bytes ma'am" mode.
82              
83             It is a fatal error to pass in a filehandle which was not opened correctly or
84             the name of a file that can't be opened or which doesn't contain a valid
85             database.
86              
87             The options are name/value pairs. Valid options are:
88              
89             =over
90              
91             =item tie
92              
93             If true return tied objects instead of normal objects. This means that you will
94             be able to access data by de-referencing and pretending to access elements
95             directly. Under the bonnet this wraps around the objects as documented below,
96             so is just a layer of indirection. On modern hardware you probably won't notice
97             the concomittant slow down but may appreciate the convenience.
98              
99             =item fast_collections
100              
101             If true Dictionary keys and values will be permanently cached in memory the
102             first time they are seen, instead of being fetched from the file when needed.
103             Yes, this means that objects will grow in memory, potentially very large.
104             Only use this if if it an acceptable pay-off for much faster access.
105              
106             This is not yet implemented for Arrays.
107              
108             =back
109              
110             Returns the "root node" of the database. If that root node is a number, some
111             piece of text, or Null, then it is decoded and the value returned. Otherwise an
112             object (possibly a tied object) representing an Array or a Dictionary is returned.
113              
114             =head1 OBJECTS
115              
116             If you asked for normal objects to be returned instead of tied objects, then
117             these are sub-classes of either C or
118             C. Both implement the following three methods:
119              
120             =head2 id
121              
122             Returns a unique id for this object within the database. Note that circular data
123             structures are supported, and looking at the C is the only way to detect them.
124              
125             This is not accessible when using tied objects.
126              
127             =head2 count
128              
129             Returns the number of elements in the structure.
130              
131             =head2 indices
132              
133             Returns a list of all the available indices in the structure.
134              
135             =head2 element
136              
137             Takes a single argument, which must match one of the values that would be returned
138             by C, and returns the associated data.
139              
140             If the data is a number, Null, or text, the value will be returned directly. If the
141             data is in turn another array or dictionary, an object will be returned.
142              
143             =head2 exists
144              
145             Takes a single argument and tell you whether an index exists for it. It will still
146             die if you ask it fomr something stupid such as a floating point array index or
147             a Null dictionary entry.
148              
149             =head1 UNSUPPORTED PERL TYPES
150              
151             Globs, Regexes, References (except to Arrays and Dictionaries)
152              
153             =head1 BUGS/FEEDBACK
154              
155             Please report bugs by at L, including, if possible, a test case.
156              
157             =head1 SEE ALSO
158              
159             L if you need updateable databases.
160              
161             =head1 SOURCE CODE REPOSITORY
162              
163             L
164              
165             =head1 AUTHOR, COPYRIGHT and LICENCE
166              
167             Copyright 2020 David Cantrell EFE
168              
169             This software is free-as-in-speech software, and may be used,
170             distributed, and modified under the terms of either the GNU
171             General Public Licence version 2 or the Artistic Licence. It's
172             up to you which one you use. The full text of the licences can
173             be found in the files GPL2.txt and ARTISTIC.txt, respectively.
174              
175             =head1 CONSPIRACY
176              
177             This module is also free-as-in-mason software.
178              
179             =cut
180              
181             sub create {
182 53     53 1 56980 my($class, $file, $data) = @_;
183              
184 53         111 my $version = 0;
185              
186 53         135 PTR_SIZE: foreach my $ptr_size (1 .. 8) {
187 57         196 my $byte5 = chr(($version << 3) + $ptr_size - 1);
188 57 50       72418 open(my $fh, '>:unix', $file) || die("Can't write $file: $! \n");
189 57         1451 print $fh "CROD$byte5";
190 57         184 eval {
191 57         965 "Data::CompactReadonly::V${version}::Node"->_create(
192             filename => $file,
193             fh => $fh,
194             ptr_size => $ptr_size,
195             data => $data,
196             globals => { next_free_ptr => tell($fh), already_seen => {} }
197             );
198             };
199 57 100 66     12657 if($@ && index($@, "Data::CompactReadonly::V${version}::Node"->_ptr_blown()) != -1) {
    50          
200 4         412 next PTR_SIZE;
201 0         0 } elsif($@) { die($@); }
202 53         6414 last PTR_SIZE;
203             }
204             }
205              
206             sub read {
207 114     114 1 50372 my($class, $file, %args) = @_;
208 114         190 my $fh;
209 114 100       314 if(ref($file)) {
210 58         86 $fh = $file;
211 58         256 my @layers = PerlIO::get_layers($fh);
212 58 100       139 if(grep { $_ !~ /^(unix|perlio|scalar)$/ } @layers) {
  63         455  
213 2         23 die(
214             "$class: file handle has invalid encoding [".
215             join(', ', @layers).
216             "]\n"
217             );
218             }
219             } else {
220 56 100       2124 open($fh, '<', $file) || die("$class couldn't open file $file: $!\n");
221 55         277 binmode($fh);
222             }
223            
224 111         248 my $original_file_pointer = tell($fh);
225              
226 111         1583 read($fh, my $header, 5);
227 111         584 (my $byte5) = ($header =~ /^CROD(.)/);
228 111 100       333 die("$class: $file header invalid: doesn't match /CROD./\n") unless(defined($byte5));
229              
230 110         270 my $version = (ord($byte5) & 0b11111000) >> 3;
231 110         190 my $ptr_size = (ord($byte5) & 0b00000111) + 1;
232 110 100       271 die("$class: $file header invalid: bad version\n") if($version == 0b11111);
233              
234             return "Data::CompactReadonly::V${version}::Node"->_init(
235             ptr_size => $ptr_size,
236             fh => $fh,
237             db_base => $original_file_pointer,
238             map {
239 109 100       376 exists($args{$_}) ? ($_ => 1 ) : ()
  218         984  
240             } qw(fast_collections tie)
241             );
242             }
243              
244             1;