File Coverage

blib/lib/ETL/Pipeline/Input.pm
Criterion Covered Total %
statement 11 11 100.0
branch n/a
condition n/a
subroutine 4 4 100.0
pod n/a
total 15 15 100.0


line stmt bran cond sub pod time code
1             =pod
2              
3             =head1 NAME
4              
5             ETL::Pipeline::Input - Role for ETL::Pipeline input sources
6              
7             =head1 SYNOPSIS
8              
9             use Moose;
10             with 'ETL::Pipeline::Input';
11              
12             sub run {
13             # Add code to read your data here
14             ...
15             }
16              
17             =head1 DESCRIPTION
18              
19             An I<input source> feeds the B<extract> part of B<ETL>. This is where data comes
20             from. These are your data sources.
21              
22             A data source may be anything - a file, a database, or maybe a socket. Each
23             I<format> is an L<ETL::Pipeline> input source. For example, Excel files
24             represent one input source. Perl reads every Excel file the same way. With a few
25             judicious attributes, we can re-use the same input source for just about any
26             type of Excel file.
27              
28             L<ETL::Pipeline> defines an I<input source> as a Moose object with at least one
29             method - C<run>. This role basically defines the requirement for the B<run>
30             method. It should be consumed by B<all> input source classes. L<ETL::Pipeline>
31             relies on the input source having this role.
32              
33             =head2 How do I create an I<input source>?
34              
35             =over
36              
37             =item 1. Start a new Perl module. I recommend putting it in the C<ETL::Pipeline::Input> namespace. L<ETL::Pipeline> will pick it up automatically.
38              
39             =item 2. Make your module a L<Moose> class - C<use Moose;>.
40              
41             =item 3. Consume this role - C<with 'ETL::Pipeline::Input';>.
42              
43             =item 4. Write the L</run> method. L</run> follows this basic algorithmn...
44              
45             =over
46              
47             =item a. Open the source.
48              
49             =item b. Loop reading the records. Each iteration should call L<ETL::Pipeline/record> to trigger the I<transform> step.
50              
51             =item c. Close the source.
52              
53             =back
54              
55             =item 5. Add any attributes for your class.
56              
57             =back
58              
59             The new source is ready to use, like this...
60              
61             $etl->input( 'YourNewSource' );
62              
63             You can leave off the leading B<ETL::Pipeline::Input::>.
64              
65             When L<ETL::Pipeline> calls L</run>, it passes the L<ETL::Pipeline> object as
66             the only parameter.
67              
68             =head2 Why this way?
69              
70             Input sources mostly follow the basic algorithm of open, read, process, and
71             close. I originally had the role define methods for each of these steps. That
72             was a lot of work, and kind of confusing. This way, the input source only
73             I<needs> one code block that does all of these steps - in one place. So it's
74             easier to troubleshoot and write new sources.
75              
76             In the work that I do, we have one output destination that rarely changes. It's
77             far more common to write new input sources - especially customized sources.
78             Making new sources easier saves time. Making it simpler means that more
79             developers can pick up those tasks.
80              
81             =head2 Does B<ETL::Pipeline> only work with files?
82              
83             No. B<ETL::Pipeline::Input> works for any source of data, such as SQL queries,
84             CSV files, or network sockets. Tailor the C<run> method for whatever suits your
85             needs.
86              
87             Because files are most common, B<ETL::Pipeline> comes with a helpful role -
88             L<ETL::Pipeline::Input::File>. Consume L<ETL::Pipeline::Input::File> in your
89             inpiut source to access some standardized attributes.
90              
91             =head2 Upgrading from older versions
92              
93             L<ETL::Pipeline> version 3 is not compatible with input sources from older
94             versions. You will need to rewrite your custom input sources.
95              
96             =over
97              
98             =item Merge the C<setup>, C<finish>, and C<next_record> methods into L</run>.
99              
100             =item Have L</run> call C<$etl->record> in place of C<next_record>.
101              
102             =item Adjust attributes as necessary.
103              
104             =back
105              
106             =cut
107              
108             package ETL::Pipeline::Input;
109              
110 10     10   14269 use 5.014000;
  10         43  
111 10     10   52 use warnings;
  10         22  
  10         309  
112              
113 10     10   53 use Moose::Role;
  10         20  
  10         107  
114              
115              
116             our $VERSION = '3.00';
117              
118              
119             =head1 METHODS & ATTRIBUTES
120              
121             =head3 path (optional)
122              
123             If you define this, the standard logging will include it. The attribute is
124             named for file inputs. But it can return any value that is meaningful to your
125             users.
126              
127             =head3 position (optional)
128              
129             If you define this, the standard logging includes it with error or informational
130             messages. It can be any value that helps users locate the correct place to
131             troubleshoot.
132              
133             =head3 run (required)
134              
135             You define this method in the consuming class. It should open the file, read
136             each record, call L<ETL::Pipeline/record> after each record, and close the file.
137             This method is the workhorse. It defines the main ETL loop.
138             L<ETL::Pipeline/record> acts as a callback.
139              
140             I say I<file>. It really means I<input source> - whatever that might be.
141              
142             Some important things to remember about C<run>...
143              
144             =over
145              
146             =item C<run> receives one parameter - the L<ETL::Pipeline> object.
147              
148             =item Should include all the code to open, read, and close the input source.
149              
150             =item After reading a record, call L<ETL::Pipeline/record>.
151              
152             =back
153              
154             If your code encounters an error, B<run> can call L<ETL::Pipeline/status> with
155             the error message. L<ETL::Pipeline/status> should automatically include the
156             record count with the error message. You should add any other troubleshooting
157             information such as file names or key fields.
158              
159             $etl->status( "ERROR", "Error message here for id $id" );
160              
161             For fatal errors, I recommend using the C<croak> command from L<Carp>.
162              
163             =cut
164              
165             requires 'run';
166              
167              
168             =head3 source
169              
170             The location in the input source of the current record. For example, for files
171             this would be the file name and character position. The consuming class can set
172             this value in its L<run|ETL::Pipeline::Input/run> method.
173              
174             L<Logging|ETL::Pipeline/log> uses this when displaying errors or informational
175             messages. The value should be something that helps the user troubleshoot issues.
176             It can be whatever is appropriate for the input source.
177              
178             B<NOTE:> Don't capitalize the first letter, unless it's supposed to be.
179             L<Logging|ETL::Pipeline/log> will upper case the first letter if it's
180             appropriate.
181              
182             =cut
183              
184             has 'source' => (
185             default => '',
186             is => 'rw',
187             isa => 'Str',
188             );
189              
190              
191             =head1 SEE ALSO
192              
193             L<ETL::Pipeline>, L<ETL::Pipeline::Input::File>, L<ETL::Pipeline::Output>
194              
195             =head1 AUTHOR
196              
197             Robert Wohlfarth <robert.j.wohlfarth@vumc.org>
198              
199             =head1 LICENSE
200              
201             Copyright 2021 (c) Vanderbilt University Medical Center
202              
203             This program is free software; you can redistribute it and/or modify it under
204             the same terms as Perl itself.
205              
206             =cut
207              
208 10     10   33949 no Moose;
  10         28  
  10         76  
209              
210             # Required by Perl to load the module.
211             1;