File Coverage

blib/lib/Linux/NFS/BigDir.pm
Criterion Covered Total %
statement 24 24 100.0
branch n/a
condition n/a
subroutine 8 8 100.0
pod n/a
total 32 32 100.0


line stmt bran cond sub pod time code
1             package Linux::NFS::BigDir;
2 2     2   32632 use strict;
  2         5  
  2         48  
3 2     2   9 use warnings;
  2         4  
  2         48  
4 2     2   11 use Exporter 'import';
  2         4  
  2         44  
5 2     2   9 use Carp;
  2         3  
  2         87  
6 2     2   10 use Fcntl;
  2         2  
  2         351  
7 2     2   11 use Config;
  2         4  
  2         62  
8              
9 2     2   10 use constant BUF_SIZE => 4096;
  2         2  
  2         122  
10             use constant SYS_getdents => do {
11 2     2   948 use Inline 0.80 C => <<'...';
  2         25886  
  2         14  
12             #include
13              
14             int _get_syscall_num() {
15             return SYS_getdents;
16             }
17             ...
18              
19             _get_syscall_num();
20             };
21              
22             our $VERSION = '0.002'; # VERSION
23              
24             =pod
25              
26             =head1 NAME
27              
28             Linux::NFS::BigDir - use Linux getdents syscall to read large directories over NFS
29              
30             =head1 SYNOPSIS
31              
32             use Linux::NFS::BigDir qw(getdents);
33             # entries_ref is an array reference
34             my $entries_ref = getdents($very_large_dir);
35              
36             =head1 DESCRIPTION
37              
38             This module was created to solve a very specific problem: you have a directory over NFS, mounted by
39             a Linux OS, and that directory has a very large number of items (files, directories, etc). The number of entries
40             is so large that you have trouble to list the contents with C or even C from the shell. In extreme
41             cases, the operation just "hangs" and will provide a feedback hours later.
42              
43             I observed this behavior only with NFS version 3 (and wasn't able to simulate it with local EXT3/EXT4): you might find in different situations,
44             but in that case it migh be a wrong configuration regarding the filesystem. Ask your administrator first.
45              
46             If you can't fix (or get fixed) the problem, then you might want to try to use this module. It will use the C
47             syscall from Linux. You can check the documentation about this syscall with C in a shell.
48              
49             In short, this syscall will return a data structure, but you probably will want to use only the name of each entry in the directory.
50              
51             How can this be useful? Here are some directions:
52              
53             =over
54              
55             =item 1.
56              
57             You want to remove all directory content.
58              
59             =item 2.
60              
61             You want to remove files from the directory with a pattern in their filename (using regular expressions, for example).
62              
63             =item 3.
64              
65             You want to select specific files by their filenames and then test something else (like atime).
66              
67             =back
68              
69             These are examples, but it should cover the vast majority of what you want to do. C syscall will be more effective because
70             it will not call C of each of those files before returning the information to you. That means, you will have the opportunity to filter
71             whatever you need and then call C if you really need.
72              
73             I came up at C after researching about "how to remove million of files". After a while I reached an C program example that uses C
74             to print the filenames under the directory. By using it, I was able to cleanup directories with thousands (or even millions) of files in a couple of minutes,
75             instead of many hours.
76              
77             This module is a Perl implementation of that.
78              
79             =head1 FUNCTIONS
80              
81             The sub C and C are exported on demand.
82              
83             =cut
84              
85             our @EXPORT_OK = qw(getdents getdents_safe);
86              
87             =head2 getdents
88              
89             Expects the complete path to the directory as a parameter.
90              
91             Returns an array reference with all files inside that directory but the 'dot' files.
92              
93             Meanwhile simple (and probably faster), you should be careful regarding memory restrictions when using this functions.
94              
95             If you have too many files, you program may try to allocate too much memory, with all the undesired effects. See C.
96              
97             =cut
98              
99             sub getdents {
100             my $dir = shift;
101             confess "directory $dir is not available" unless ( -d $dir );
102             sysopen( my $fd, $dir, O_RDONLY | O_DIRECTORY );
103             my @items;
104              
105             while (1) {
106             my $buf = "\0" x BUF_SIZE;
107             my $read = syscall( SYS_getdents, fileno($fd), $buf, BUF_SIZE );
108              
109             if ( ( $read == -1 ) and ( $! != 0 ) ) {
110             confess "failed to syscall getdents: $!";
111             }
112              
113             last if ( $read == 0 );
114              
115             while ( $read != 0 ) {
116             my ( $ino, $off, $len, $name ) = unpack( "L!L!SZ*", $buf );
117             push( @items, $name );
118             substr( $buf, 0, $len ) = '';
119             $read -= $len;
120             }
121              
122             }
123              
124             # removing '.' and '..'
125             shift(@items);
126             shift(@items);
127             return \@items;
128             }
129              
130             =head2 getdents_safe
131              
132             "Safe" version of C because it will write each entry read to a text file instead of storing
133             all the entries on memory.
134              
135             Expects as parameters:
136              
137             =over
138              
139             =item *
140              
141             The complete path to the directory to be read.
142              
143             =item *
144              
145             The complete path to the file that will be used to print each entry, one per line. As convenience, all filenames will be
146             prepended with the complete path to the directory given as parameter.
147              
148             =back
149              
150             The filename given will be created. If it already exists, this function will C.
151              
152             This function returns the number of files read from the given directory.
153              
154             =cut
155              
156             sub getdents_safe {
157             my ( $dir, $output ) = @_;
158             confess "directory $dir is not available" unless ( -d $dir );
159             sysopen( my $fd, $dir, O_RDONLY | O_DIRECTORY );
160             sysopen( my $out, $output, O_CREAT | O_RDWR | O_EXCL )
161             or die "Cannot create $output: $!";
162             my $dots = 0;
163             my $counter = 0;
164              
165             while (1) {
166             my $buf = "\0" x BUF_SIZE;
167             my $read = syscall( SYS_getdents, fileno($fd), $buf, BUF_SIZE );
168              
169             if ( ( $read == -1 ) and ( $! != 0 ) ) {
170             confess "failed to syscall getdents: $!";
171             }
172              
173             last if ( $read == 0 );
174              
175             if ( $dots == 2 ) {
176              
177             while ( $read != 0 ) {
178             my ( $ino, $off, $len, $name ) = unpack( "L!L!SZ*", $buf );
179             print $out $dir, '/', $name, "\n";
180             $counter++;
181             substr( $buf, 0, $len ) = '';
182             $read -= $len;
183             }
184              
185             }
186             else {
187              
188             while ( $read != 0 ) {
189             my ( $ino, $off, $len, $name ) = unpack( "L!L!SZ*", $buf );
190              
191             unless ( ( $name eq '.' ) or ( $name eq '..' ) ) {
192             print $out $dir, '/', $name, "\n";
193             $counter++;
194             }
195             else {
196             $dots++;
197             }
198              
199             substr( $buf, 0, $len ) = '';
200             $read -= $len;
201             }
202              
203             }
204              
205             }
206              
207             close($out);
208             return $counter;
209             }
210              
211              
212             =head1 TO DO
213              
214             Create C versions of C and C with L to see if they get close to C
215             speed when running over a B file system (currently they are slower).
216              
217             =head1 SEE ALSO
218              
219             =over
220              
221             =item *
222              
223             L
224              
225             =item *
226              
227             L
228              
229             =item *
230              
231             The manual page of C.
232              
233             =item *
234              
235             L.
236              
237             =back
238              
239             =head1 AUTHOR
240              
241             Alceu Rodrigues de Freitas Junior, Earfreitas@cpan.orgE
242              
243             =head1 COPYRIGHT AND LICENSE
244              
245             This software is copyright (c) 2016 of Alceu Rodrigues de Freitas Junior, Earfreitas@cpan.orgE
246              
247             This file is part of Linux-NFS-BigDir distribution.
248              
249             Linux-NFS-BigDir is free software: you can redistribute it and/or modify
250             it under the terms of the GNU General Public License as published by
251             the Free Software Foundation, either version 3 of the License, or
252             (at your option) any later version.
253              
254             Linux-NFS-BigDir is distributed in the hope that it will be useful,
255             but WITHOUT ANY WARRANTY; without even the implied warranty of
256             MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
257             GNU General Public License for more details.
258              
259             You should have received a copy of the GNU General Public License
260             along with Linux-NFS-BigDir. If not, see .
261              
262             =cut
263              
264             1;