File Coverage

blib/lib/Linux/NFS/BigDir.pm
Criterion Covered Total %
statement 21 37 56.7
branch 0 6 0.0
condition 0 3 0.0
subroutine 7 8 87.5
pod n/a
total 28 54 51.8


line stmt bran cond sub pod time code
1             package Linux::NFS::BigDir;
2 1     1   530 use strict;
  1         1  
  1         24  
3 1     1   3 use warnings;
  1         1  
  1         27  
4 1     1   8 use Exporter 'import';
  1         1  
  1         25  
5 1     1   3 use Carp;
  1         1  
  1         69  
6 1     1   7 use constant BUF_SIZE => 4096;
  1         1  
  1         53  
7 1     1   651 use File::Temp 'tempfile';
  1         12418  
  1         49  
8 1     1   5 use Fcntl;
  1         2  
  1         368  
9             require 'syscall.ph';
10              
11             our $VERSION = '0.001'; # VERSION
12              
13             =pod
14              
15             =head1 NAME
16              
17             Linux::NFS::BigDir - use Linux getdents syscall to read large directories over NFS
18              
19             =head1 SYNOPSIS
20              
21             use Linux::NFS::BigDir qw(getdents);
22             # entries_ref is an array reference
23             my $entries_ref = getdents($very_large_dir);
24              
25             =head1 DESCRIPTION
26              
27             This module was created to solve a very specific problem: you have a directory over NFS, mounted by
28             a Linux OS, and that directory has a very large number of items (files, directories, etc). The number of entries
29             is so large that you have trouble to list the contents with C or even C from the shell. In extreme
30             cases, the operation just "hangs" and will provide a feedback hours later.
31              
32             I observed this behavior only with NFS version 3 (and wasn't able to simulate it with local EXT3/EXT4): you might find in different situations,
33             but in that case it migh be a wrong configuration regarding the filesystem. Ask your administrator first.
34              
35             If you can't fix (or get fixed) the problem, then you might want to try to use this module. It will use the C
36             syscall from Linux. You can check the documentation about this syscall with C in a shell.
37              
38             In short, this syscall will return a data structure, but you probably will want to use only the name of each entry in the directory.
39              
40             How can this be useful? Here are some directions:
41              
42             =over
43              
44             =item 1.
45              
46             You want to remove all directory content.
47              
48             =item 2.
49              
50             You want to remove files from the directory with a pattern in their filename (using regular expressions, for example).
51              
52             =item 3.
53              
54             You want to select specific files by their filenames and then test something else (like atime).
55              
56             =back
57              
58             These are examples, but it should cover the vast majority of what you want to do. C syscall will be more effective because
59             it will not call C of each of those files before returning the information to you. That means, you will have the opportunity to filter
60             whatever you need and then call C if you really need.
61              
62             I came up at C after researching about "how to remove million of files". After a while I reached an C program example that uses C
63             to print the filenames under the directory. By using it, I was able to cleanup directories with thousands (or even millions) of files in a couple of minutes,
64             instead of many hours.
65              
66             This module is a Perl implementation of that.
67              
68             =head1 FUNCTIONS
69              
70             The sub C is exported by demand.
71              
72             =cut
73              
74             our @EXPORT_OK = qw(getdents);
75              
76             =head2 getdents
77              
78             Expects the full path to the directory as a parameter.
79              
80             Returns an array reference with all the fullpath to each of the file inside that directory.
81              
82             Meanwhile simple, you should be careful regarding memory restrictions. If you have too files, you program may try to allocate too much memory, with all the
83             undesired effects.
84              
85             =cut
86              
87             sub getdents {
88 0     0     my ( $dir, $output ) = @_;
89 0 0         confess "directory $dir is not available" unless ( -d $dir );
90 0           sysopen( my $fd, $dir, O_RDONLY | O_DIRECTORY );
91 0           my @items;
92              
93 0           while (1) {
94 0           my $buf = "\0" x BUF_SIZE;
95 0           my $read = syscall( &SYS_getdents, fileno($fd), $buf, BUF_SIZE );
96              
97 0 0 0       if ( ( $read == -1 ) and ( $! != 0 ) ) {
98 0           confess "failed to syscall getdents: $!";
99             }
100              
101 0 0         last if ( $read == 0 );
102              
103 0           while ( $read != 0 ) {
104 0           my ( $ino, $off, $len, $name ) = unpack( "L!L!SZ*", $buf );
105 0           push( @items, ( $dir . '/' . $name ) );
106 0           substr( $buf, 0, $len ) = '';
107 0           $read -= $len;
108             }
109              
110             }
111              
112 0           return @items;
113             }
114              
115             =head1 INSTALL
116              
117             You should install this module as any Perl module, but before that be sure to execute L before trying to run any function from this module!
118              
119             In some system, you might need to use the system administrator account to run L or even run some manual steps to fix files locations.
120              
121             If you got errors like:
122              
123             Error: Can't locate bits/syscall.ph in @INC (did you run h2ph?) (@INC contains: /home/me/Projetos/Linux-NFS-BigDir/.build/MHr69O96uB/blib/lib /home/me/Projetos/Linux-NFS-BigDir/.build/MHr69O96uB/blib/arch /home/me/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/x86_64-linux /home/me/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0 /home/me/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0/x86_64-linux /home/me/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0 .) at /home/me/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/sys/syscall.ph line 9.
124              
125             It might means that the expected header files are not in the expected standard location. For instance, on a Ubuntu system you might need to create additional links:
126              
127             ln -s /home/me/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/x86_64-linux/x86_64-linux-gnu/bits /home/me/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/bits
128              
129             You will have to troubleshoot this by looking at the C<$Config{'installsitearch'}> to see where are located your .ph files, then check the content of each .ph and compare with the real location of the C header files.
130              
131             Even though you might be using something like perlbrew (or compiling perl yourself), you will need to use the root account to fix this.
132              
133             =head1 SEE ALSO
134              
135             =over
136              
137             =item *
138              
139             L
140              
141             =item *
142              
143             L.
144              
145             =item *
146              
147             L
148              
149             =item *
150              
151             The manual page of C.
152              
153             =item *
154              
155             L.
156              
157             =back
158              
159             =head1 AUTHOR
160              
161             Alceu Rodrigues de Freitas Junior, Earfreitas@cpan.orgE
162              
163             =head1 COPYRIGHT AND LICENSE
164              
165             This software is copyright (c) 2016 of Alceu Rodrigues de Freitas Junior, Earfreitas@cpan.orgE
166              
167             This file is part of Linux-NFS-BigDir distribution.
168              
169             Linux-NFS-BigDir is free software: you can redistribute it and/or modify
170             it under the terms of the GNU General Public License as published by
171             the Free Software Foundation, either version 3 of the License, or
172             (at your option) any later version.
173              
174             Linux-NFS-BigDir is distributed in the hope that it will be useful,
175             but WITHOUT ANY WARRANTY; without even the implied warranty of
176             MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
177             GNU General Public License for more details.
178              
179             You should have received a copy of the GNU General Public License
180             along with Linux-NFS-BigDir. If not, see .
181              
182             =cut
183              
184             1;