File Coverage

blib/lib/Linux/NFS/BigDir.pm
Criterion Covered Total %
statement 69 71 97.1
branch 13 18 72.2
condition 5 9 55.5
subroutine 10 10 100.0
pod 2 2 100.0
total 99 110 90.0


line stmt bran cond sub pod time code
1             package Linux::NFS::BigDir;
2 2     2   33559 use strict;
  2         6  
  2         52  
3 2     2   10 use warnings;
  2         3  
  2         61  
4 2     2   8 use Exporter 'import';
  2         14  
  2         45  
5 2     2   9 use Carp;
  2         3  
  2         161  
6 2     2   12 use Fcntl;
  2         3  
  2         407  
7 2     2   13 use Config;
  2         11  
  2         61  
8 2     2   659 use Linux::NFS::BigDir::Syscalls;
  2         4  
  2         83  
9              
10 2     2   10 use constant BUF_SIZE => 4096;
  2         4  
  2         1024  
11             our $VERSION = '0.004'; # VERSION
12              
13             =pod
14              
15             =head1 NAME
16              
17             Linux::NFS::BigDir - use Linux getdents syscall to read large directories over NFS
18              
19             =head1 SYNOPSIS
20              
21             use Linux::NFS::BigDir qw(getdents);
22             # entries_ref is an array reference
23             my $entries_ref = getdents($very_large_dir);
24              
25             =head1 DESCRIPTION
26              
27             This module was created to solve a very specific problem: you have a directory over NFS, mounted by
28             a Linux OS, and that directory has a very large number of items (files, directories, etc). The number of entries
29             is so large that you have trouble to list the contents with C or even C from the shell. In extreme
30             cases, the operation just "hangs" and will provide a feedback hours later.
31              
32             I observed this behavior only with NFS version 3 (and wasn't able to simulate it with local EXT3/EXT4): you might find in different situations,
33             but in that case it migh be a wrong configuration regarding the filesystem. Ask your administrator first.
34              
35             If you can't fix (or get fixed) the problem, then you might want to try to use this module. It will use the C
36             syscall from Linux. You can check the documentation about this syscall with C in a shell.
37              
38             In short, this syscall will return a data structure, but you probably will want to use only the name of each entry in the directory.
39              
40             How can this be useful? Here are some directions:
41              
42             =over
43              
44             =item 1.
45              
46             You want to remove all directory content.
47              
48             =item 2.
49              
50             You want to remove files from the directory with a pattern in their filename (using regular expressions, for example).
51              
52             =item 3.
53              
54             You want to select specific files by their filenames and then test something else (like atime).
55              
56             =back
57              
58             These are examples, but it should cover the vast majority of what you want to do. C syscall will be more effective because
59             it will not call C of each of those files before returning the information to you. That means, you will have the opportunity to filter
60             whatever you need and then call C if you really need.
61              
62             I came up at C after researching about "how to remove million of files". After a while I reached an C program example that uses C
63             to print the filenames under the directory. By using it, I was able to cleanup directories with thousands (or even millions) of files in a couple of minutes,
64             instead of many hours.
65              
66             This module is a Perl implementation of that.
67              
68             =head1 FUNCTIONS
69              
70             The sub C and C are exported on demand.
71              
72             =cut
73              
74             our @EXPORT_OK = qw(getdents getdents_safe);
75              
76             =head2 getdents
77              
78             Expects the complete path to the directory as a parameter.
79              
80             Returns an array reference with all files inside that directory but the 'dot' files.
81              
82             Meanwhile simple (and probably faster), you should be careful regarding memory restrictions when using this functions.
83              
84             If you have too many files, you program may try to allocate too much memory, with all the undesired effects. See C.
85              
86             =cut
87              
88             sub getdents {
89 1     1 1 6714671 my $dir = shift;
90 1 50       43 confess "directory $dir is not available" unless ( -d $dir );
91 1         71 sysopen( my $fd, $dir, O_RDONLY | O_DIRECTORY );
92 1         7 my @items;
93              
94 1         5 while (1) {
95 783         1503 my $buf = "\0" x BUF_SIZE;
96 783         1959 my $read = syscall( SYS_getdents, fileno($fd), $buf, BUF_SIZE );
97              
98 783 50 33     2122 if ( ( $read == -1 ) and ( $! != 0 ) ) {
99 0         0 confess "failed to syscall getdents: $!";
100             }
101              
102 783 100       1511 last if ( $read == 0 );
103              
104 782         1462 while ( $read != 0 ) {
105 100002         216944 my ( $ino, $off, $len, $name ) = unpack( "L!L!SZ*", $buf );
106 100002         154748 push( @items, $name );
107 100002         142167 substr( $buf, 0, $len ) = '';
108 100002         191733 $read -= $len;
109             }
110              
111             }
112              
113             # removing '.' and '..'
114 1         5 shift(@items);
115 1         4 shift(@items);
116 1         825 return \@items;
117             }
118              
119             =head2 getdents_safe
120              
121             "Safe" version of C because it will write each entry read to a text file instead of storing
122             all the entries on memory.
123              
124             Expects as parameters:
125              
126             =over
127              
128             =item *
129              
130             The complete path to the directory to be read.
131              
132             =item *
133              
134             The complete path to the file that will be used to print each entry, one per line. As convenience, all filenames will be
135             prepended with the complete path to the directory given as parameter.
136              
137             =back
138              
139             The filename given will be created. If it already exists, this function will C.
140              
141             This function returns the number of files read from the given directory.
142              
143             =cut
144              
145             sub getdents_safe {
146 1     1 1 121333 my ( $dir, $output ) = @_;
147 1 50       26 confess "directory $dir is not available" unless ( -d $dir );
148 1         43 sysopen( my $fd, $dir, O_RDONLY | O_DIRECTORY );
149 1 50       120 sysopen( my $out, $output, O_CREAT | O_RDWR | O_EXCL )
150             or die "Cannot create $output: $!";
151 1         5 my $dots = 0;
152 1         3 my $counter = 0;
153              
154 1         3 while (1) {
155 783         1640 my $buf = "\0" x BUF_SIZE;
156 783         2172 my $read = syscall( SYS_getdents, fileno($fd), $buf, BUF_SIZE );
157              
158 783 50 33     2068 if ( ( $read == -1 ) and ( $! != 0 ) ) {
159 0         0 confess "failed to syscall getdents: $!";
160             }
161              
162 783 100       1527 last if ( $read == 0 );
163              
164 782 100       1452 if ( $dots == 2 ) {
165              
166 781         1485 while ( $read != 0 ) {
167 99874         220477 my ( $ino, $off, $len, $name ) = unpack( "L!L!SZ*", $buf );
168 99874         202602 print $out $dir, '/', $name, "\n";
169 99874         132403 $counter++;
170 99874         144354 substr( $buf, 0, $len ) = '';
171 99874         199011 $read -= $len;
172             }
173              
174             }
175             else {
176              
177 1         5 while ( $read != 0 ) {
178 128         311 my ( $ino, $off, $len, $name ) = unpack( "L!L!SZ*", $buf );
179              
180 128 100 100     435 unless ( ( $name eq '.' ) or ( $name eq '..' ) ) {
181 126         262 print $out $dir, '/', $name, "\n";
182 126         164 $counter++;
183             }
184             else {
185 2         5 $dots++;
186             }
187              
188 128         175 substr( $buf, 0, $len ) = '';
189 128         251 $read -= $len;
190             }
191              
192             }
193              
194             }
195              
196 1         92 close($out);
197 1         756 return $counter;
198             }
199              
200              
201             =head1 TO DO
202              
203             Create C versions of C and C with L to see if they get close to C
204             speed when running over a B file system (currently they are slower).
205              
206             =head1 SEE ALSO
207              
208             =over
209              
210             =item *
211              
212             L
213              
214             =item *
215              
216             L
217              
218             =item *
219              
220             The manual page of C.
221              
222             =item *
223              
224             L.
225              
226             =back
227              
228             =head1 AUTHOR
229              
230             Alceu Rodrigues de Freitas Junior, Earfreitas@cpan.orgE
231              
232             =head1 COPYRIGHT AND LICENSE
233              
234             This software is copyright (c) 2016 of Alceu Rodrigues de Freitas Junior, Earfreitas@cpan.orgE
235              
236             This file is part of Linux-NFS-BigDir distribution.
237              
238             Linux-NFS-BigDir is free software: you can redistribute it and/or modify
239             it under the terms of the GNU General Public License as published by
240             the Free Software Foundation, either version 3 of the License, or
241             (at your option) any later version.
242              
243             Linux-NFS-BigDir is distributed in the hope that it will be useful,
244             but WITHOUT ANY WARRANTY; without even the implied warranty of
245             MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
246             GNU General Public License for more details.
247              
248             You should have received a copy of the GNU General Public License
249             along with Linux-NFS-BigDir. If not, see .
250              
251             =cut
252              
253             1;