File Coverage

blib/lib/Bio/ToolBox/db_helper.pm
Criterion Covered Total %
statement 155 662 23.4
branch 66 526 12.5
condition 26 125 20.8
subroutine 15 30 50.0
pod 16 17 94.1
total 278 1360 20.4


line stmt bran cond sub pod time code
1             package Bio::ToolBox::db_helper;
2             our $VERSION = '1.69';
3              
4             =head1 NAME
5              
6             Bio::ToolBox::db_helper - helper interface to various database formats
7              
8             =head1 DESCRIPTION
9              
10             In most cases, this module does not need to be used directly. The
11             methods available to L and L
12             provide convenient access to the methods described here.
13              
14             These are helper subroutines to work with relevant databases that can be
15             accessed through BioPerl modules. These include the L
16             relational database as well as L, L,
17             L, L, L, and
18             L databases. The primary functions
19             included opening connections to these databases, verifying features found
20             within the database, generating lists of features or genomic intervals for
21             subsequent analysis, and collecting features and/or scores within regions
22             of interest. Collected scores may be summarized using a variety of statistical
23             methods.
24              
25             When collecting scores or features, the data may hosted in a variety of
26             formats and locations. These include:
27              
28             =over 4
29              
30             =item SeqFeature database
31              
32             Genomic feature annotation, including genes, transcripts, exons, etc,
33             may be stored in a L database. These
34             databases are backed by either a relational database, including
35             MySQL, PostGreSQL, or SQLite. Small GFF3 files may also be loaded in
36             an in-memory database. These are typically loaded from GFF3
37             files; see the adapter documentation for more information. Dataset
38             scores, such as from microarray, may also be stored as the score
39             value in the source GFF file.
40              
41             References to local, binary, indexed files may also be included as
42             attributes to features stored in the database. This is legacy support
43             from the L, and should
44             not be used in preference to other methods. Supported files include the
45             legacy binary wig files (F<.wib>, see L) using the
46             C attribute, or bigWig files using the C or C
47             attribute. The attribute values must be full paths.
48              
49             =item BigWig files
50              
51             BigWig files (F<.bw> and F<.bigwig>) are compressed, binary, indexed
52             versions of text wig files and may be accessed either locally or remotely.
53             They support extremely fast score retrieval from any genomic location of
54             any size without sacrificing resolution (spatial and numeric). See
55             L for more information.
56             BigWig files are supported by either the L or the
57             L adapter, based on either L
58             or the L libraries,
59             respectively; see their respective documentation for more information.
60              
61             =item Directory of BigWig files
62              
63             A directory containing two or more BigWig files is assembled into a
64             BigWigSet, allowing for metadata, such as strand, to be associated with
65             BigWig files. Additional metadata beyond the filename may be included in
66             text file F within the directory. See the
67             L adapter documentation for more information. When using
68             L adapters, BigWigSet support is natively supported by
69             the BioToolBox package.
70              
71             =item BigBed files
72              
73             BigBed files are compressed, binary, indexed versions of text BED files. See
74             L for more information.
75             Both local and remote files may be accessed. BigBed files are supported by
76             either the L or the L adapter, based on either L
77             or the L libraries,
78             respectively; see their respective documentation for more information.
79              
80             =item Bam files
81              
82             Bam files (F<.bam>) are compressed, binary, indexed versions of the text SAM file,
83             or sequence alignment map. They are used with next generation sequencing
84             technologies. They support individual alignment retrieval as well as
85             read depth coverage. Two different Bam file adapters are supported.
86              
87             The L adapter is an interface to the Bam (or Cram) file.
88             This is based on the modern HTSlib C library (version E= 1.0), successor
89             to the original samtools library. See L for
90             more information.
91              
92             The L adapter is an older interface to the Bam alignment
93             file. This is based on the samtools C library version E= 0.1.19. While
94             this adapter is still supported, the above should be used for new installs.
95              
96             =item Cram files
97              
98             Cram files (F<.cram>) are similar to Bam files, but are much smaller due to
99             only storing sequence differences for each alignment. As such, they require
100             an indexed, reference fasta to regenerate the complete alignment. Cram files
101             are only supported by the L adapter.
102              
103             =item USeq files
104              
105             USeq files (F<.useq>) are compressed, binary, indexed files that support
106             multiple information types, including region intervals, BED type
107             annotations, or wig type scores distributed across the genome. They
108             support rapid, random access across the genome and are comparable to
109             both BigWig and BigBed files. See L
110             for more information. Files may only be local. These files are supported
111             by the L adapter.
112              
113             =item Fasta files
114              
115             Fasta files (F<.fa> or F<.fasta>) may be opened. Fasta files are indexed to
116             rapidly and randomly fetch genomic sequence. Three different adapters are
117             available for indexing the fasta file.
118              
119             The L adapter is preferentially used, and requires a
120             F<.fai> index file.
121              
122             The L adapter may alternatively used, and also requires a
123             F<.fai> index file.
124              
125             The older style L adapter is much slower, but will index
126             either a single genomic fasta file or a directory of individual chromosome
127             or contig fasta files.
128              
129             Additionally, genomic sequence stored in a L
130             annotation database may also be used.
131              
132             =back
133              
134             While the functions within this library may appear to be simply a rehashing of
135             the methods and functions in L and other modules, they
136             either provide a simpler function to often used database methodologies or are
137             designed to work intimately with the BioToolBox data format file and data structures
138             (see L). One key advantage to these functions is the ability
139             to work with datasets that are stranded (transcriptome data, for example).
140              
141             Historically, this module was initially written to use L for
142             database usage. It has since been re-written to use L
143             database schema. Additional functionality has also been added for other databases
144             and file formats.
145              
146             Complete usage and examples for the functions are provided below.
147              
148             =head1 USAGE
149              
150             Call the module at the beginning of your perl script and include the module
151             names to export. None are exported by default.
152              
153             use Bio::ToolBox::db_helper qw(
154             get_new_feature_list
155             get_db_feature
156             );
157              
158             This will export the indicated subroutine names into the current namespace.
159             Their usage is detailed below.
160              
161             =head1 EXPORTED SUBROUTINES
162              
163             =over 4
164              
165             =item C<$BAM_ADAPTER> variable
166              
167             =item C<$BIG_ADAPTER> variable
168              
169             These variables control which Bam and bigWig/bigBed adapters are used
170             during execution in installations where both adapters are present. These
171             variables are not set initially; adapters are chosen automatically when an
172             adapter is requested during execution and modules are evaluated for loading.
173              
174             =item use_bam_adapter
175              
176             =item use_big_adapter
177              
178             These are convenience methods for explicitly setting and retrieving the
179             C<$BAM_ADAPTER> and C<$BIG_ADAPTER> variables, respectively.
180              
181             Optionally pass the value to set. It will always return the value being used.
182             Values include the following:
183              
184             =over 4
185              
186             =item Bam adapter
187              
188             * hts Bio::DB::HTS (default if available)
189             * sam Bio::DB::Sam
190             * none no adapters allowed
191              
192             =item Big adapter
193              
194             * big Bio::DB::Big (default if available)
195             * ucsc Bio::DB::BigWig and Bio::DB::BigBed
196             * none no adapters allowed
197              
198             =back
199              
200             =item open_db_connection
201              
202             my $db_name = 'cerevisiae';
203             my $db = open_db_connection($db_name);
204            
205             my $file = '/path/to/file.bam';
206             my $db = open_db_connection($file);
207            
208             # within a forked child process
209             # reusing the same variable to simplify code
210             # pass second true value to avoid cache
211             $db = open_db_connection($file, 1);
212              
213             This is a simplified, generalized method that will open a connection to a
214             database or indexed file using any one of a number of different
215             BioPerl-style adapters. For local and remote files, the appropriate
216             adapter is determined by the file extension. Relational database
217             connection and authentication information is checked in a configuration
218             file. Once identified, the appropriate Perl module adapter is loaded
219             during run time automatically, saving the user from knowing in advance
220             to C the appropriate module in the script.
221              
222             Pass the name of a relational database or the path of the database file to
223             the subroutine. The opened database object is returned. If it fails, then
224             an error message should be generated and nothing is returned.
225              
226             B If you are forking your perl process, B re-open your
227             database objects in the child process, and pass a second true value
228             to avoid using the cached database object. By default, opened databases are
229             cached to improve efficiency, but this will be disastrous when crossing forks.
230              
231             =over 4
232              
233             =item SeqFeature Store database
234              
235             Provide the name of the relational database to open. Parameters for
236             connecting to a relational database are stored in the BioToolBox
237             configuration file, F<.biotoolbox.cfg>. These include database adaptors,
238             user name, password, etc. Further information regarding the configuration
239             file may be found in L.
240              
241             For SQLite databases, provide the path to the F<.sqlite> or F<.db> file.
242              
243             For in-memory databases, provide the path to the F<.gff> or F<.gff3> file.
244             The larger the file, the more memory and processing time is consumed as
245             the file is parsed into memory. This may be fine for small model organisms
246             like yeast, but not for vertebrate annotation.
247              
248             =item Bam file database
249              
250             Provide the path to the F<.bam> file. This may be a local file, or a remote
251             URL (generally supported by the adapter). If a local F<.bam.bai> index file
252             is not present, it will automatically be built prior to opening; this may
253             fail if the bam file is not sorted. Remote files are not indexed.
254              
255             =item Cram file database
256              
257             Provide the path to the local F<.cram> file. Currently, supplementary fasta
258             files are not supported, so the Cram file header must point to a valid
259             reference. Cram files will be indexed automatically if an index is not available.
260              
261             =item BigWig file database
262              
263             Provide the path to a local F<.bw> or F<.bigwig> file, or URL to a
264             remote file.
265              
266             =item BigWigSet database
267              
268             A local or remote directory of one or more BigWig files that can treated
269             collectively as a database. A special text file may be included in the
270             directory to assign metadata to each BigWig file, including attributes such
271             as type, source, display name, strand, etc. See L for
272             more information on the formatting of the metadata file. If a metadata
273             file is not included (it's not necessary), then the filenames are used
274             as the name and type. Strand may be parsed from the filenames if the
275             basename ends in F<.f> and F<.r>.
276              
277             =item BigBed database
278              
279             Provide the path to a local F<.bb> or F<.bigbed> file, or URL to a
280             remote file.
281              
282             =item USeq database
283              
284             Provide the path to a local F<.bb> or F<.bigbed> file, or URL to a
285             remote file.
286              
287             =item Fasta database
288              
289             Provide the path to a local fasta file for rapidly fetching genomic
290             sequence. If the fasta file is not indexed, then it will be indexed
291             automatically upon opening.
292              
293             =back
294              
295             =item get_db_name
296              
297             my $name = get_db_name($db);
298              
299             This method will attempt to get the name of an opened Database
300             object if, for some reason, it's unknown, e.g. the user only provided an
301             opened db object. This only works for some databases, at least those I've
302             bothered to track down and find a usable API call to use.
303              
304             =item get_dataset_list
305              
306             my $db_name = 'cerevisiae';
307             my @types = get_dataset_list($db_name);
308              
309             This method will retrieve a list of the available features stored in the
310             database and returns an array of the features' types.
311              
312             Pass either the name of the database or an established database object.
313             Supported databases include both L and
314             L databases.
315              
316             For L databases, the type is represented as
317             "C", corresponding to the third and second GFF columns,
318             respectively. The types are sorted alphabetically first by source,
319             then by method.
320              
321             For L databases, the C, C, C,
322             or C attribute may be used, in that respective order of
323             availability. The list is sorted alphabetically.
324              
325             =item verify_or_request_feature_types
326              
327             # verify a dataset, either file or database type
328             my $db_name = 'cerevisiae';
329             my $dataset = 'microaray_data';
330             my @features = verify_or_request_feature_types(
331             db => $db_name,
332             feature => $dataset,
333             );
334             unless (@features) {
335             die " no valid feature provided!\n";
336             }
337            
338             # select a list of features
339             my $db_name = 'cerevisiae';
340             my @types = (); # user not provided
341             @types = verify_or_request_feature_types(
342             db => $db_name,
343             feature => $types,
344             );
345             # user will be promoted to select from database list
346             if (@types) {
347             print "using types " . join(", ", @types);
348             }
349             else {
350             die " no valid features selected!\n";
351             }
352              
353             This subroutine will process a list of feature types or data sources to be
354             used for data or feature collection. There are two modes of action. If a
355             list was provided, it will process and verify the list. If no list was
356             provided, then the user will be prompted to select from a list of available
357             feature types in the provided database.
358              
359             The provided list may be a list of feature types or a list of single
360             file data sources, including Bam, bigWig, or bigBed files. If the list
361             includes feature types, they are verified as existing in the provided
362             database. If the list includes file names, the local files are checked
363             and the names prefixed with "C" for use downstream.
364              
365             If no list was provided, then a list of available feature types in the
366             provided database will be presented to the user, and the user prompted
367             to make a selection. One or more types may be selected, and a single
368             item may be enforced if requested. The response is filtered through
369             the parse_list method from L, so a mix of single
370             numbers or a range of numbers may be accepted. The responses are then
371             validated.
372              
373             Two types of databases are supported: L and
374             L databases. For SeqFeature Store databases, the
375             type is comprised of "C", corresponding to the third and
376             second GFF columns. For BigWigSet databases, types correspond to either
377             the C, C, C, or C attributes,
378             in that order of availability.
379              
380             For feature types or files that pass validation, they are returned as a
381             list. Types of files that do not pass validation are printed to C.
382              
383             To use this subroutine, pass an array with the following keys
384             and values. Not every key is required.
385              
386             =over 4
387              
388             =item db
389              
390             The name of the database or a reference to an established BioPerl
391             database object. A L or
392             L database are typically used. Only required when
393             no features are provided.
394              
395             =item feature
396              
397             Pass either a single dataset name as a scalar or an anonymous array
398             reference of a list of dataset names. These may have been provided as
399             a command line option and need to be verified. If nothing is passed,
400             then a list of possible datasets will be presented to the user to be
401             chosen.
402              
403             =item prompt
404              
405             Provide a phrase to be prompted to the user to help in selecting
406             datasets from the list. If none is provided, a generic prompt will be
407             used.
408              
409             =item single
410              
411             A Boolean value (1 or 0) indicating whether only a single dataset is
412             allowed when selecting datasets from a presented list. If true, only
413             one dataset choice is accepted. If false, one or more dataset choices
414             are allowed. Default is false.
415              
416             =item limit
417              
418             Optionally provide a word or regular expression that matches the
419             feature type (C only; C, if present, is ignored).
420             For example, provide "geneEmrna" to only present gene and mRNA
421             features to the user. This is only applicable when a user must select
422             from a database list. The default is to list all available feature
423             types.
424              
425             =back
426              
427             =item check_dataset_for_rpm_support($dataset, [$cpu])
428              
429             # count the total number of alignments
430             my $dataset = '/path/to/file.bam';
431             my $total_count = check_dataset_for_rpm_support($dataset);
432            
433             # use multithreading
434             my $total_count = check_dataset_for_rpm_support($dataset, $cpu);
435              
436             This subroutine will check a dataset for RPM, or Reads Per Million mapped,
437             support. Only two types of database files support this, Bam files and
438             BigBed files. If the dataset is either one of these, or the name of a
439             database feature which points to one of these files, then it will
440             calculate the total number of mapped alignments (Bam file) or features
441             (BigBed file). It will return this total number. If the dataset does
442             not support RPM (because it is not a Bam or BigBed file, for example),
443             then it will return undefined.
444              
445             Pass this subroutine one or two values. The first is the name of the
446             dataset. Ideally it should be validated using verify_or_request_feature_types()
447             and have an appropriate prefix (file, http, or ftp).
448              
449             For multi-threaded execution pass a second value, the number of CPU cores
450             available to count Bam files. This will speed up counting bam files
451             considerably. The default is 2 for environments where L
452             is installed, or 1 where it is not.
453              
454             =item get_new_feature_list
455              
456             my $Data = Bio::ToolBox::Data->new;
457             my $db_name = 'cerevisiae';
458             my %data = get_new_feature_list(
459             data => $Data,
460             db => $db_name,
461             features => 'genes',
462             );
463              
464             This subroutine will generate a new feature list collected from an annotation
465             database, such as a L database. It will populate
466             an empty L object data table with a list of the collected
467             features. Columns include the Primary ID, Name, and feature Type.
468              
469             The subroutine is passed an array containing three required arguments. The keys
470             include
471              
472             =over 4
473              
474             =item data
475              
476             An empty L object. This is required.
477              
478             =item db
479              
480             The name of the database or a reference to an established database object.
481             Currently, this must be a L database.
482             This is required.
483              
484             =item features
485              
486             A scalar value containing a name representing the feature type(s) of
487             feature(s) to collect. The type may be a C or C.
488             It may be a single element or a comma-delimited list. This is required.
489              
490             =item chrskip
491              
492             Provide a regular-expression compatible string for skipping features from
493             specific chromosomes, for example mitochondrial or unmapped contigs.
494              
495             =back
496              
497             Status messages, including the number of features found and kept, are always
498             printed to C.
499              
500             =item get_new_genome_list
501              
502             my $Data = Bio::ToolBox::Data->new();
503             my $db_name = 'cerevisiae';
504             my $window_size = 500;
505             my $step_size = 250;
506             my %data = get_new_genome_list(
507             data => $Data,
508             db => $db_name,
509             win => $window_size,
510             step => $step_size,
511             );
512              
513             This subroutine will generate a new list of genomic windows or intervals.
514             The genome is split into intervals of a specified size with the specified
515             step size.
516              
517             The subroutine is passed an array containing the required arguments.
518              
519             =over 4
520              
521             =item data
522              
523             An empty L object. This is required.
524              
525             =item db
526              
527             The name of the database or a reference to an established database
528             object. Any BioPerl adapter, including those described in the L,
529             are supported. Required.
530              
531             =item win
532              
533             A scalar value containing an integer representing the size of the
534             window in basepairs. The default value is 500 bp.
535              
536             =item step
537              
538             A scalar value containing an integer representing the step size for
539             advancing the window across the genome. The default is the window size.
540              
541             =item chrskip
542              
543             =item skip
544              
545             =item exclude
546              
547             Provide a regular expression compatible string for excluding specific
548             chromosomes or classes of chromosomes, such as the mitochondrial or
549             unmapped contigs.
550              
551             =back
552              
553             Status messages are printed to C.
554              
555             =item get_db_feature
556              
557             my $db = open_db_connection('annotation.db');
558             my $seqfeature = get_db_feature(
559             db => $db,
560             type => 'gene',
561             name => 'ABC1',
562             );
563              
564             This subroutine will retrieve a specific feature from a L
565             database for subsequent analysis, manipulation, and/or score retrieval using the
566             L methods. It relies upon unique information to pull out a
567             single, unique feature.
568              
569             Several attributes may be used to pull out the feature, including the feature's
570             unique database primary ID, name andEor aliases, and GFF type (C
571             andEor C). The L subroutine will generate a list
572             of features with their unique identifiers.
573              
574             The C attribute is preferentially used as it provides the best
575             performance. However, it is not always portable between databases or even a
576             database re-load. In that case, the C and C are used to identify
577             potential features. Note that the C may not always be unique in the
578             database. In this case, the addition of aliases may help. If all else fails, a new
579             feature list should be generated.
580              
581             To get a feature, pass an array of arguments.
582              
583             =over 4
584              
585             =item db
586              
587             The name of the L database or a reference
588             to an established database object.
589              
590             =item id
591              
592             Provide the C tag. In the L
593             database schema this is a (usually) non-portable, unique identifier
594             specific to a database. It provides the fastest lookup.
595              
596             =item name
597              
598             A scalar value representing the feature C. Aliases may be
599             appended with semicolon delimiters.
600              
601             =item type
602              
603             Provide the feature type, which is typically expressed as
604             C. Alternatively, provide just the C only.
605              
606             =back
607              
608             While it is possible to identify features with any two attributes
609             (or possibly just name or ID), the best performance is obtained with
610             all three together. The first SeqFeature object is returned if multiple
611             are found.
612              
613             =item get_segment_score
614              
615             my $score = get_segment_score(
616             $seqfeature->seq_id, # chromosome
617             $seqfeature->start, # start
618             $seqfeature->end, # end
619             $seqfeature->strand, # strand
620             'sense', # strandedness
621             'mean', # method
622             0, # return type
623             $db, # database
624             'scores.bw' # datasets
625             );
626              
627             Generic method for collecting score(s) from a database. This is the
628             primary interface to all of the database handling packages, including
629             Bam, bigWig, bigBed, and USeq files, as well as SeqFeature databases.
630             By passing common parameters here, the appropriate database or file
631             handling packages are loaded during run time and the appropriate
632             methods are called. Essentially, this is where the magic happens.
633              
634             Pass the method an array of nine (or more) parameters. The parameters
635             are not keyed as a hash, and the order is explicit. However, see the
636             L module for named index positions.
637             The order is as follows.
638              
639             =over 4
640              
641             =item * 0 Chromosome
642              
643             String representing chromosome or sequence ID.
644              
645             =item * 1 Start
646              
647             Start coordinate (integer, 1-based)
648              
649             =item * 2 Stop
650              
651             Stop or end coordinate (integer, 1-based)
652              
653             =item * 3 Strand
654              
655             BioPerl style strand value, including -1, 0, and 1.
656              
657             =item * 4 Strandedness
658              
659             A string indicating strandedness: sense, antisense, all.
660              
661             =item * 5 Method
662              
663             A string representing the method of data collection.
664              
665             Current acceptable methods include mean, median, sum, min, max,
666             stddev, count, ncount (named count), and pcount (precise count).
667             See the individual db_helper sub classes for more details. Not all
668             adapters support all methods.
669              
670             =item * 6 Return type
671              
672             An integer (0, 1, 2) representing the type of value to return.
673              
674             A return type of 0 indicates a single score should be returned,
675             calculated using the specified method. A return type of 1 is an
676             array or array reference of all scores found. A return type of 2
677             is a hash or hash reference of coordinate => score.
678              
679             =item * 7 Dataset database
680              
681             This is either a L or a L
682             database containing multiple score dataset features. If your dataset
683             does not contain multiple different feature types, pass C.
684              
685             Pass either a string containing the name of database (which will be opened)
686             or an already opened database object.
687              
688             =item * 8 Dataset
689              
690             This is the verified dataset from which the scores will be passed. The
691             dataset should be verified using the L
692             subroutine. For indexed file datasets, e.g. F<.bam> or F<.bw> files,
693             this will the verified local path or URL. For a database, this will be
694             a database type, sucha as C or C.
695             or the path to a data file, e.g. Bam, bigWig, bigBed, or USeq file.
696              
697             If multiple datasets are to be combined, simply append them to the
698             array (index 9, 10, etc).
699              
700             =back
701              
702             The returned item is dependent on the value of the return type code.
703              
704             =item calculate_score
705              
706             my $scores = get_segment_score($chr,$start,$stop,$strand,
707             'sense','ncount',1,undef,'file:/path/to/dataset.bam');
708             my $score = calculate_score('ncount', $scores);
709              
710             This subroutine will perform the math to calculate a single score
711             from an array of values. Current acceptable methods include mean,
712             median, sum, min, max, stddev, count, ncount, and pcount.
713              
714             Pass the subroutine two items: the name of the mathematical method
715             enumerated above, and an array reference of the values to work on.
716             A single score will be returned.
717              
718             =item get_chromosome_list
719              
720             my $db = open_db_connection('cerevisiae');
721             # get all chromosomes in the database
722             my @chromosomes = get_chromosome_list($db);
723             foreach (@chromosomes) {
724             my $name = $_->[0];
725             my $length = $_->[1];
726             print "chromosome $name is $length bp\n";
727             }
728              
729             This subroutine will collect a list of chromosomes or reference sequences
730             in a Bio::DB database and return the list along with their sizes in bp.
731             Many BioPerl-based databases are supported, including
732             L, L, L,
733             L, L, L,
734             L, or any others that support the C method.
735             L adapters for big files are also supported, though see note
736             below. See the L subroutine for more information.
737              
738             Pass the subroutine either the name of the database, the path to the
739             database file, or an opened database object.
740              
741             Optionally pass a second value, a regular expression compatible string
742             for skipping specific chromosomes or chromosome classes, such as mitochondrial
743             or unmapped contigs. The default is to return all chromosomes.
744              
745             The subroutine will return an array, with each element representing each
746             chromosome or reference sequence in the database. Each element is an anonymous
747             array of two elements, the chromosome name and length in bp.
748              
749             B: L databases for bigWig and bigBed files do
750             not return chromosomes in the original order as the file, and are returned
751             in a sorted manner that may not be the original order.
752              
753             =item low_level_bam_coverage
754              
755             my $coverage = low_level_bam_coverage($sam, $tid, $start, $stop);
756              
757             This is a convenience method for running the low level bam coverage method.
758             Since both L and L bam file adapters are
759             supported, and each has slight variation in the API syntax, this method helps
760             to abstract the actual method and use the appropriate syntax depending on
761             which adapter is loaded. It is best if the C<$sam> object was opened using the
762             L method, or that C<$BAM_ADAPTER> is set.
763              
764             B that this is the B coverage method based on the index object,
765             and not the similarly named high level API method. Read the adapter
766             documentation for proper usage.
767              
768             =item low_level_bam_fetch
769              
770             my $success = low_level_bam_fetch($sam, $tid, $start, $stop, $callback, $data);
771              
772             This is a convenience method for running the low level bam fetch method.
773             Since both L and L bam file adapters are
774             supported, and each has slight variation in the API syntax, this method helps
775             to abstract the actual method and use the appropriate syntax depending on
776             which adapter is loaded. It is best if the C<$sam> object was opened using the
777             L method, or that C<$BAM_ADAPTER> is set.
778              
779             B that this is the B fetch method based on the index object,
780             and not the similarly named high level API method. Read the adapter
781             documentation for proper usage.
782              
783             =item get_genomic_sequence
784              
785             my $sequence = get_genomic_sequence($db, $chrom, $start, $stop);
786              
787             This is a convenience wrapper function for fetching genomic sequence from
788             three different supported fasta database adapters, which, of course, all
789             have different APIs, including the L, L,
790             and L adapters, as well as L
791             and possibly other BioPerl databases.
792              
793             Pass the opened database object, chromosome, start, and stop coordinates. This
794             assumes BioPerl standard 1-base coordinates. Only the forward sequence is
795             retrieved. The sequence is returned as a simple string.
796              
797             =back
798              
799             =head1 INTERNAL SUBROUTINES
800              
801             These are not intended for normal general consumption but are documented here
802             so that at least some of us know what is going on.
803              
804             =over 4
805              
806             =item _lookup_db_method
807              
808             Internal method for determining which database adapter and method call
809             should be used given a set of parameters. This result is cached for
810             future data queries (likely hundreds or thousands of repeated calls).
811             The method is stored in the global hash C<%DB_METHODS>;
812              
813             Pass the parameter array reference from L. This
814             should load the appropriate db_helper sub module during run time.
815              
816             =item _load_helper_module
817              
818             Subroutine to load and import a db_helper module during run time.
819             Sets the appropriate global variable if successful.
820              
821             =item _load_bam_helper_module
822              
823             Subroutine to determine which Bam adapter db_helper module to load,
824             either the older samtools-based adapter or the newer HTSlib-based adapter.
825             Uses the global and exportable variable C<$BAM_ADAPTER> to set a
826             preference for which adapter to use. Use 'sam' or 'hts' or some
827             string containing these names. Do B change this variable after
828             loading the helper module; it will not do what you think it will do.
829             If both are available and a preference is not set then the hts helper
830             is the default.
831              
832             =back
833              
834             =cut
835              
836 3     3   16 use strict;
  3         6  
  3         89  
837             require Exporter;
838 3     3   11 use Carp qw(carp cluck croak confess);
  3         4  
  3         148  
839 3     3   1136 use Module::Load; # for dynamic loading during runtime
  3         2491  
  3         15  
840 3     3   124 use List::Util qw(min max sum0 uniq);
  3         5  
  3         166  
841 3     3   1040 use Statistics::Lite qw(median range stddevp);
  3         3542  
  3         157  
842 3     3   1104 use Bio::ToolBox::db_helper::constants;
  3         6  
  3         145  
843 3     3   1168 use Bio::ToolBox::utility;
  3         6  
  3         20771  
844              
845             # check values for dynamically loaded helper modules
846             # these are loaded only when needed during runtime to avoid wasting resources
847             our $BAM_OK = 0;
848             our $BIGBED_OK = 0;
849             our $BIGWIG_OK = 0;
850             our $SEQFASTA_OK = 0;
851             our $USEQ_OK = 0;
852             our $BAM_ADAPTER = undef; # preference for which bam adapter to use
853             our $BIG_ADAPTER = undef;
854              
855             # define reusable variables
856             my %TOTAL_READ_NUMBER; # for rpm calculations
857             my $PRIMARY_ID_WARNING; # for out of date primary IDs
858             my %OPENED_DB; # cache for some opened Bio::DB databases
859             my %DB_METHODS; # cache for database score methods
860              
861             # score calculators
862             my %SCORE_CALCULATOR_SUB = (
863             'mean' => sub {
864             my $s = shift;
865             return sum0(@$s)/(scalar(@$s) || 1);
866             },
867             'sum' => sub {
868             my $s = shift;
869             return sum0(@$s);
870             },
871             'median' => sub {
872             my $s = shift;
873             return '.' unless scalar(@$s);
874             return median(@$s);
875             },
876             'min' => sub {
877             my $s = shift;
878             return '.' unless scalar(@$s);
879             return min(@$s);
880             },
881             'max' => sub {
882             my $s = shift;
883             return '.' unless scalar(@$s);
884             return max(@$s);
885             },
886             'count' => sub {
887             my $s = shift;
888             return scalar(@$s);
889             },
890             'pcount' => sub {
891             my $s = shift;
892             return scalar(@$s);
893             },
894             'ncount' => sub {
895             # Convert names into unique counts
896             my $s = shift;
897             my %name2count;
898             foreach my $n (@$s) {
899             if (ref($n) eq 'ARRAY') {
900             # this is likely from a ncount indexed hash
901             foreach (@$n) {
902             $name2count{$_} += 1;
903             }
904             }
905             else {
906             $name2count{$n} += 1;
907             }
908             }
909             return scalar(keys %name2count);
910             },
911             'range' => sub {
912             # the range value is 'min-max'
913             my $s = shift;
914             return '.' unless scalar(@$s);
915             return range(@$s);
916             },
917             'stddev' => sub {
918             # we are using the standard deviation of the population,
919             # since these are the only scores we are considering
920             my $s = shift;
921             return '.' unless scalar(@$s);
922             return stddevp(@$s);
923             },
924             'rpm' => sub {
925             confess " The rpm methods have been deprecated due to complexity and " .
926             "the variable way of calculating the value. Collect counts and " .
927             "calculate your preferred way.\n";
928             },
929             'rpkm' => sub {
930             confess " The rpm methods have been deprecated due to complexity and " .
931             "the variable way of calculating the value. Collect counts and " .
932             "calculate your preferred way.\n";
933             }
934             );
935              
936              
937             # Exported names
938             our @ISA = qw(Exporter);
939             our @EXPORT = qw();
940             our @EXPORT_OK = qw(
941             $BAM_ADAPTER
942             $BIG_ADAPTER
943             use_bam_adapter
944             use_big_adapter
945             open_db_connection
946             get_dataset_list
947             verify_or_request_feature_types
948             check_dataset_for_rpm_support
949             get_new_feature_list
950             get_new_genome_list
951             validate_included_feature
952             get_db_feature
953             get_segment_score
954             calculate_score
955             get_chromosome_list
956             low_level_bam_coverage
957             low_level_bam_fetch
958             get_genomic_sequence
959             );
960              
961             # The true statement
962             1;
963              
964              
965             ### Open a connection to the SeqFeature Store MySQL database
966              
967             sub use_bam_adapter {
968 0   0 0 1 0 my $a = shift || undef;
969 0 0       0 $BAM_ADAPTER = $a if $a;
970 0         0 return $BAM_ADAPTER;
971             }
972              
973              
974             sub use_big_adapter {
975 0   0 0 1 0 my $a = shift || undef;
976 0 0       0 $BIG_ADAPTER = $a if $a;
977 0         0 return $BIG_ADAPTER;
978             }
979              
980              
981             sub open_db_connection {
982 8     8 1 12 my $database = shift;
983 8   50     20 my $no_cache = shift || 0;
984 8 50       12 unless ($database) {
985             # cluck 'no database name passed!';
986 0         0 return;
987             }
988            
989             # first check if it is a database reference
990 8         14 my $db_ref = ref $database;
991 8 100       25 if ($db_ref =~ /DB|big::BigWigSet/) {
992             # the provided database is already an open database object
993             # nothing to open, return as is
994 2         5 return $database;
995             }
996            
997             # check to see if we have already opened it
998 6 100 66     23 if (exists $OPENED_DB{$database} and not $no_cache) {
999             # return the cached database connection
1000             # but NOT if user explicitly requested no cached databases
1001             # DO NOT reuse database objects if you have forked!!! Bad things happen
1002 5         11 return $OPENED_DB{$database};
1003             }
1004            
1005             # skip parsed databases
1006 1 50       3 return if $database =~ /^Parsed:/; # we don't open parsed annotation files
1007            
1008             # remove file prefix, just in case
1009 1         2 $database =~ s/^file://;
1010            
1011            
1012             ### Attempt to open the database
1013             # we go through a series of checks to determine if it is remote, local,
1014             # an indexed big data file, SQLite file, etc
1015             # when all else fails, try to open a SQL connection
1016 1         2 my $db;
1017             my $error;
1018            
1019             # check if it is a remote file
1020 1 50       20 if ($database =~ /^(?:https?|ftp)/i) {
    50          
    0          
1021            
1022             # a remote Bam database
1023 0 0       0 if ($database =~ /\.bam$/i) {
    0          
    0          
    0          
    0          
1024             # open using Bam adaptor
1025 0 0       0 _load_bam_helper_module() unless $BAM_OK;
1026 0 0       0 if ($BAM_OK) {
1027 0         0 $db = open_bam_db($database);
1028 0 0       0 unless ($db) {
1029 0         0 $error = " ERROR: could not open remote Bam file" .
1030             " '$database'! $!\n";
1031             }
1032             }
1033             else {
1034 0         0 $error = " Bam database cannot be loaded because\n" .
1035             " Bio::DB::Sam or Bio::DB::HTS is not installed\n";
1036             }
1037             }
1038            
1039             # a remote BigBed database
1040             elsif ($database =~ /\.(?:bb|bigbed)$/i) {
1041             # open using BigBed adaptor
1042 0 0       0 $BIGBED_OK = _load_bigbed_helper_module() unless $BIGBED_OK;
1043 0 0       0 if ($BIGBED_OK) {
1044 0         0 $db = open_bigbed_db($database);
1045 0 0       0 unless ($db) {
1046 0         0 $error = " ERROR: could not open remote BigBed file" .
1047             " '$database'! $!\n";
1048             }
1049             }
1050             else {
1051 0         0 $error = " BigBed database cannot be loaded because\n" .
1052             " Bio::DB::Big or Bio::DB::BigBed is not installed\n";
1053             }
1054             }
1055            
1056             # a remote BigWig database
1057             elsif ($database =~ /\.(?:bw|bigwig)$/i) {
1058             # open using BigWig adaptor
1059 0 0       0 $BIGWIG_OK = _load_bigwig_helper_module() unless $BIGWIG_OK;
1060 0 0       0 if ($BIGWIG_OK) {
1061 0         0 $db = open_bigwig_db($database);
1062 0 0       0 unless ($db) {
1063 0         0 $error = " ERROR: could not open remote BigWig file" .
1064             " '$database'! $!\n";
1065             }
1066             }
1067             else {
1068 0         0 $error = " BigWig database cannot be loaded because\n" .
1069             " Bio::DB::Big or Bio::DB::BigWig is not installed\n";
1070             }
1071             }
1072            
1073             # a remote useq file
1074             elsif ($database =~ /\.useq$/) {
1075             # uh oh! remote useq files are not supported
1076 0         0 $error = " ERROR: remote useq files are not supported!\n";
1077             }
1078            
1079             # a remote fasta file???
1080             elsif ($database =~ /\.fa(?:sta)?$/i) {
1081             # uh oh! remote fasta files are not supported
1082 0         0 $error = " ERROR: remote fasta files are not supported!\n";
1083             }
1084            
1085             # a presumed remote directory, presumably of bigwig files
1086             else {
1087             # open using BigWigSet adaptor
1088 0 0       0 $BIGWIG_OK = _load_helper_module('Bio::ToolBox::db_helper::bigwig')
1089             unless $BIGWIG_OK;
1090 0 0       0 if ($BIGWIG_OK) {
1091 0         0 $db = open_bigwigset_db($database);
1092 0 0       0 unless ($db) {
1093 0         0 $error = " ERROR: could not open presumed remote " .
1094             "BigWigSet directory '$database'! $!\n";
1095             }
1096             }
1097             else {
1098 0         0 $error = " Presumed BigWigSet database cannot be loaded because\n" .
1099             " Bio::DB::BigWigSet is not installed\n";
1100             }
1101             }
1102            
1103             }
1104            
1105             # a local existing file
1106             elsif (-f $database) {
1107            
1108             # a Bam database
1109 1 50       17 if ($database =~ /\.bam$/i) {
    50          
    50          
    50          
    0          
    0          
    0          
1110             # open using appropriate bam adaptor
1111 0 0       0 _load_bam_helper_module() unless $BAM_OK;
1112 0 0       0 if ($BAM_OK) {
1113 0         0 undef $@;
1114 0         0 $db = open_bam_db($database);
1115 0 0       0 unless ($db) {
1116 0         0 $error = " ERROR: could not open local Bam file" .
1117             " '$database'! $@\n";
1118             }
1119             }
1120             else {
1121 0         0 $error = " Bam database cannot be loaded because\n" .
1122             " Bio::DB::Sam or Bio::DB::HTS is not installed\n";
1123             }
1124             }
1125            
1126             # a BigBed database
1127             elsif ($database =~ /\.(?:bb|bigbed)$/i) {
1128             # open using BigBed adaptor
1129 0 0       0 $BIGBED_OK = _load_bigbed_helper_module() unless $BIGBED_OK;
1130 0 0       0 if ($BIGBED_OK) {
1131 0         0 undef $@;
1132 0         0 $db = open_bigbed_db($database);
1133 0 0       0 unless ($db) {
1134 0         0 $error = " ERROR: could not open local BigBed file" .
1135             " '$database'! $@\n";
1136             }
1137             }
1138             else {
1139 0         0 $error = " BigBed database cannot be loaded because\n" .
1140             " Big::DB::Big or Bio::DB::BigBed is not installed\n";
1141             }
1142             }
1143            
1144             # a BigWig database
1145             elsif ($database =~ /\.(?:bw|bigwig)$/i) {
1146             # open using BigWig adaptor
1147 0 0       0 $BIGWIG_OK = _load_bigwig_helper_module() unless $BIGWIG_OK;
1148 0 0       0 if ($BIGWIG_OK) {
1149 0         0 undef $@;
1150 0         0 $db = open_bigwig_db($database);
1151 0 0       0 unless ($db) {
1152 0         0 $error = " ERROR: could not open local BigWig file" .
1153             " '$database'! $@\n";
1154             }
1155             }
1156             else {
1157 0         0 $error = " BigWig database cannot be loaded because\n" .
1158             " Big::DB::Big or Bio::DB::BigWig is not installed\n";
1159             }
1160             }
1161            
1162             # a useq database
1163             elsif ($database =~ /\.useq$/i) {
1164             # open using USeq adaptor
1165 1 50       35 $USEQ_OK = _load_helper_module('Bio::ToolBox::db_helper::useq')
1166             unless $USEQ_OK;
1167 1 50       3 if ($USEQ_OK) {
1168 1         2 $db = open_useq_db($database);
1169 1 50       3 unless ($db) {
1170 0         0 $error = " ERROR: could not open local useq file" .
1171             " '$database'! $!\n";
1172             }
1173             }
1174             else {
1175 0         0 $error = " Useq database cannot be loaded because\n" .
1176             " Bio::DB::USeq is not installed\n";
1177             }
1178             }
1179            
1180             # a Fasta File
1181             elsif ($database =~ /\.fa(?:sta)?(?:\.gz)?$/i) {
1182             # first try a modern fai indexed adapter
1183 0 0       0 _load_bam_helper_module() unless $BAM_OK;
1184 0 0       0 if ($BAM_OK) {
1185 0         0 $db = open_indexed_fasta($database);
1186 0 0       0 unless ($db) {
1187 0         0 $error .= " ERROR: could not open indexed fasta file '$database'!\n";
1188             }
1189             }
1190             else {
1191             # open using the old slow BioPerl Fasta adaptor
1192 0 0       0 $SEQFASTA_OK = _load_helper_module('Bio::ToolBox::db_helper::seqfasta')
1193             unless $SEQFASTA_OK;
1194 0 0       0 if ($SEQFASTA_OK) {
1195 0         0 $db = open_fasta_db($database);
1196 0 0       0 unless ($db) {
1197 0         0 $error .= " ERROR: could not open fasta file '$database'!\n";
1198 0 0       0 if (-e "$database\.index") {
1199 0         0 $error .= " Try deleting $database\.index and try again\n";
1200             }
1201             }
1202             }
1203             else {
1204 0         0 $error .= " Fasta file could not be loaded because neither" .
1205             " Bio::DB::HTS, Bio::DB::Sam, or Bio::DB::Fasta is installed\n";
1206             }
1207             }
1208             }
1209            
1210             # a gff3 or sqlite database
1211             elsif ($database =~ /\.(?:gff3?|gff3?\.gz|db|sqlite)$/) {
1212 0 0       0 $SEQFASTA_OK = _load_helper_module('Bio::ToolBox::db_helper::seqfasta')
1213             unless $SEQFASTA_OK;
1214 0 0       0 if ($SEQFASTA_OK) {
1215 0         0 $db = open_store_db($database);
1216 0 0       0 unless ($db) {
1217 0         0 $error = " ERROR: could not load SeqFeature database file '$database'!\n";
1218             }
1219             }
1220             else {
1221 0         0 $error .= " Module Bio::DB::SeqFeature::Store is required to load " .
1222             "GFF and database files\n";
1223             }
1224             }
1225            
1226             # a cram database
1227             elsif ($database =~ /\.cram$/i) {
1228             # open using HTS bam adaptor only
1229 0   0     0 $BAM_ADAPTER ||= 'hts';
1230 0 0       0 if ($BAM_ADAPTER eq 'sam') {
1231 0         0 $error .= " ERROR: Only HTS adapters support Cram files!\n";
1232             }
1233 0 0       0 _load_bam_helper_module() unless $BAM_OK;
1234 0 0       0 if ($BAM_OK) {
1235 0         0 undef $@;
1236 0         0 $db = open_bam_db($database);
1237 0 0       0 unless ($db) {
1238 0         0 $error = " ERROR: could not open local Cram file" .
1239             " '$database'! $@\n";
1240             }
1241             }
1242             else {
1243 0         0 $error = " Cram database cannot be loaded because\n" .
1244             " Bio::DB::HTS is not installed\n";
1245             }
1246             }
1247            
1248             # something unrecognized?
1249             else {
1250 0         0 $error .= " ERROR! Cannot identify database type based on extension for $database!\n";
1251             }
1252             }
1253            
1254             # a directory, presumably of bigwig files
1255             elsif (-d $database) {
1256             # try opening using the BigWigSet adaptor
1257 0 0       0 $BIGWIG_OK = _load_bigwig_helper_module() unless $BIGWIG_OK;
1258 0 0       0 if ($BIGWIG_OK) {
1259 0         0 $db = open_bigwigset_db($database);
1260 0 0       0 unless ($db) {
1261 0         0 $error = " ERROR: could not open local BigWigSet " .
1262             "directory '$database'!\n";
1263 0         0 $error .= " Does directory contain bigWig .bw files?\n";
1264             }
1265             }
1266             else {
1267 0         0 $error = " Presumed BigWigSet database cannot be loaded because\n" .
1268             " Bio::DB::Big or Bio::DB::BigWigSet is not installed\n";
1269             }
1270            
1271             # try opening with the Fasta adaptor
1272 0 0       0 unless ($db) {
1273 0 0       0 $SEQFASTA_OK = _load_helper_module('Bio::ToolBox::db_helper::seqfasta')
1274             unless $SEQFASTA_OK;
1275 0 0       0 if ($SEQFASTA_OK) {
1276 0         0 $db = open_fasta_db($database);
1277 0 0       0 unless ($db) {
1278 0         0 $error .= " ERROR: could not open fasta directory '$database'!\n";
1279 0         0 $error .= " Does directory contain fasta files? If it contains a" .
1280             " directory.index file,\n try deleting it and try again.\n";
1281             }
1282             }
1283             else {
1284 0         0 $error .= " Module Bio::DB::Fasta is required to open presumed fasta " .
1285             "directory\n";
1286             }
1287             }
1288             }
1289            
1290             # otherwise assume the name of a SeqFeature::Store database in the configuration
1291             else {
1292             # attempt to open using information from the configuration
1293             # using default connection information as necessary
1294 0 0       0 $SEQFASTA_OK = _load_helper_module('Bio::ToolBox::db_helper::seqfasta')
1295             unless $SEQFASTA_OK;
1296 0 0       0 if ($SEQFASTA_OK) {
1297 0         0 $db = open_store_db($database);
1298 0 0       0 unless ($db) {
1299 0         0 $error .= " unable to open relational database named '$database'\n";
1300             }
1301             }
1302             else {
1303 0         0 $error .= " Module Bio::DB::SeqFeature::Store is required to connect " .
1304             "to databases\n";
1305             }
1306             }
1307            
1308             # conditional return
1309 1 50       3 if ($db) {
1310             # cache the opened connection for use later
1311 1 50       3 $OPENED_DB{$database} = $db unless $no_cache;
1312            
1313             # return as appropriate either both object and name or just object
1314 1         3 return $db;
1315             }
1316             else {
1317 0         0 print STDERR $error;
1318 0         0 return;
1319             }
1320             }
1321              
1322              
1323             ### Retrieve a database name from an db object
1324             sub get_db_name {
1325 0     0 1 0 my $db = shift;
1326 0 0       0 return unless $db;
1327 0         0 my $db_ref = ref($db);
1328 0 0       0 return $db unless $db_ref; # presumption that non-object is just a database name
1329 0         0 my $db_name;
1330 0 0       0 if ($db_ref =~ /^Bio::DB::SeqFeature::Store/) {
    0          
    0          
1331             # a SeqFeature database, using any DBI adapter
1332 0         0 $db_name = $db->{'dbh'}->{'name'};
1333             # dig through the object internals to identify the original
1334             # name of the database
1335             # this should be relatively well documented through DBI
1336             # but could break in the future since it's not official API
1337             }
1338             elsif ($db_ref eq 'Bio::DB::Sam') {
1339             # a Samtools Bam database
1340             # old versions <=1.39 don't have the bam_path method, so it's easier
1341             # to explicitly grab it from the object internals
1342 0         0 $db_name = $db->{bam_path};
1343             }
1344             elsif ($db_ref eq 'Bio::DB::HTS') {
1345             # a HTSlib Bam database
1346 0         0 $db_name = $db->hts_path;
1347             }
1348             # determining the database name from other sources is
1349             # either not possible or not easy, so won't bother unless
1350             # there is a really really good need
1351 0         0 return $db_name;
1352             }
1353              
1354              
1355             ### Retrieve a list of the microrarray data sets from the db
1356             sub get_dataset_list {
1357            
1358 0     0 1 0 my $database = shift;
1359 0         0 my $use_all_features = shift;
1360            
1361             # Open a db connection
1362 0         0 my $db = open_db_connection($database);
1363 0 0       0 unless ($db) {
1364 0         0 carp 'no database connected!';
1365 0         0 return;
1366             }
1367 0         0 my $db_ref = ref $db;
1368            
1369             # process the database types, according to the type of database
1370 0         0 my @types;
1371            
1372             # a SeqFeature database
1373 0 0       0 if ($db_ref =~ /^Bio::DB::SeqFeature::Store/) {
    0          
1374            
1375             # collect the database types,
1376             # sort first by source, then by method
1377             @types = (
1378             map $_->[2],
1379 0 0       0 sort { ($a->[0] cmp $b->[0]) or ($a->[1] cmp $b->[1]) }
  0         0  
1380             map [$_->source, $_->method, $_],
1381             $db->types
1382             );
1383             }
1384            
1385             # a BigWigSet database
1386             elsif ($db_ref =~ /BigWigSet/i) {
1387            
1388             # get the metadata
1389 0         0 my $metadata = $db->metadata;
1390            
1391             # collect
1392 0         0 foreach my $file (keys %{ $metadata }) {
  0         0  
1393             # get the type for each file
1394 0         0 my ($primary, $type, $name);
1395            
1396             # get the appropriate tags
1397 0         0 foreach my $attribute (keys %{ $metadata->{$file} } ) {
  0         0  
1398 0 0       0 if ($attribute =~ m/^type/i) {
    0          
    0          
1399 0         0 $type = $metadata->{$file}{$attribute};
1400             }
1401             elsif ($attribute =~ m/name/i) {
1402 0         0 $name = $metadata->{$file}{$attribute};
1403             }
1404             elsif ($attribute =~ m/^primary_tag|method$/i) {
1405 0         0 $primary = $metadata->{$file}{$attribute};
1406             }
1407             }
1408            
1409             # store the type
1410 0   0     0 push @types, $type || $primary || $name;
1411             }
1412            
1413             # sort the types in alphabetical order
1414             # and discard duplicate types - which may occur with stranded entries
1415 0         0 @types = uniq(sort {$a cmp $b} @types);
  0         0  
1416             }
1417            
1418             # some other database
1419             else {
1420 0         0 carp " no dataset lists for database type $db_ref!\n";
1421             }
1422            
1423             # finish
1424 0         0 return @types;
1425             }
1426              
1427              
1428              
1429             ### Process and verify a dataset
1430             sub verify_or_request_feature_types {
1431            
1432             # Retrieve passed values
1433 1     1 1 2 my %args = @_; # the passed argument values as a hash reference
1434            
1435             # Check for single option
1436 1   50     5 $args{'single'} ||= 0;
1437            
1438             # Collect the datasets
1439 1         1 my @datasets;
1440 1   50     3 $args{'feature'} ||= undef;
1441 1 50       4 if (ref $args{'feature'} eq 'ARRAY') {
    50          
1442             # an anonymous array of datasets
1443 0         0 @datasets = @{ $args{'feature'} };
  0         0  
1444             }
1445             elsif (defined $args{'feature'}) {
1446 1         2 push @datasets, $args{'feature'};
1447             }
1448             # else it is null and @datasets remains empty, prompting user input
1449            
1450             # feature limits
1451             # this is a regex to limit the feature types presented to the user
1452             # otherwise everything in the database is presented to the user
1453 1   50     12 my $limit = $args{'limit'} ||= undef;
1454            
1455            
1456             # Open database object and collect features
1457 1   50     4 $args{'db'} ||= undef;
1458 1 50       3 my $db = $args{'db'} ? open_db_connection( $args{'db'} ) : undef;
1459            
1460             # Initialize main output arrays
1461 1         3 my @good_datasets;
1462             my @bad_datasets;
1463 1         0 my %db_features; # hash for features found in the database, use when needed
1464            
1465             # Check provided datasets
1466 1 50       2 if (@datasets) {
1467            
1468             # check for multiple comma-delimited datasets
1469 1         1 my @list_to_check;
1470 1         2 foreach my $item (@datasets) {
1471 1 50       3 if ($item =~ /,/) {
1472             # this one has a comma, therefore it has more than dataset
1473 0         0 push @list_to_check, split(/,/, $item);
1474             }
1475             else {
1476             # a singleton
1477 1         4 push @list_to_check, $item;
1478             }
1479             }
1480            
1481             # now verify the datasets
1482 1         2 foreach my $dataset (@list_to_check) {
1483            
1484             # check for a remote file
1485 1 50       9 if ($dataset =~ /^(?: http | ftp) .+ \. (?: bam | bw | bb) $/xi) {
    50          
1486             # a remote file
1487             # assume it is good, no verification here though
1488             # it will either work or won't work
1489 0         0 push @good_datasets, $dataset;
1490             }
1491            
1492             # a local file
1493             elsif ($dataset =~ /\.(?:bam|bw|bigwig|bb|bigbed|useq)$/i) {
1494             # presume we have a local indexed data file
1495            
1496             # user may have requested two or more files to be merged
1497             # these should be combined with an ampersand
1498             # check each one
1499 1         2 my @files;
1500 1         3 foreach my $file (split /\&/, $dataset) {
1501 1 50       17 if (-e $file) {
1502             # file exists
1503 1         4 push @files, "file:$file";
1504             }
1505             else {
1506             # file doesn't exist! can't use this set of files
1507 0         0 @files = ();
1508 0         0 last;
1509             }
1510             }
1511 1 50       3 if (@files) {
1512 1         10 push @good_datasets, join("&", @files);
1513             }
1514             else {
1515 0         0 push @bad_datasets, $dataset;
1516             }
1517             }
1518            
1519             # a feature type in a database
1520             else {
1521             # must be a database feature type
1522            
1523             # check for database features hash
1524 0 0       0 unless (%db_features) {
1525 0 0       0 if ($db) {
1526             # collect the database features as needed
1527             # I want both method:source and method in the hash
1528 0         0 foreach my $type ( get_dataset_list($db) ) {
1529 0         0 my ($method, $source) = split /:/, $type;
1530 0 0       0 if ($source) {
1531 0         0 $db_features{$type} = 1;
1532 0         0 $db_features{$method} = 1;
1533             }
1534             else {
1535 0         0 $db_features{$type} = 1;
1536             }
1537             }
1538            
1539             # verify
1540 0 0       0 unless (%db_features) {
1541 0         0 carp " provided database has no feature types " .
1542             "to verify dataset(s) against!\n";
1543             }
1544             }
1545             else {
1546             # we need a database
1547 0         0 carp " unable to verify dataset without database";
1548             }
1549             }
1550            
1551             # validate the given dataset
1552             # user may have requested two or more features to be merged
1553             # these should be combined with an ampersand
1554             # check each one
1555 0         0 my $check = 0;
1556 0         0 foreach my $d (split /&/, $dataset) {
1557             # validate this dataset
1558 0 0       0 if (exists $db_features{$d}) {
1559 0         0 $check++;
1560             }
1561             else {
1562             # at least one of these is not good, fail check
1563 0         0 $check = 0;
1564 0         0 last;
1565             }
1566             }
1567            
1568             # report the verification
1569 0 0       0 if ($check) {
1570 0         0 push @good_datasets, $dataset;
1571             }
1572             else {
1573 0         0 push @bad_datasets, $dataset;
1574             }
1575             }
1576             }
1577             }
1578            
1579             # User must select datasets
1580             else {
1581             # dataset not specified
1582             # present the dataset list to the user and get an answer
1583            
1584             # get the dataset list
1585 0 0       0 if ($db) {
1586             # collect the database features as needed
1587 0         0 my $i = 1;
1588 0         0 foreach my $type ( get_dataset_list($db) ) {
1589 0 0       0 if ($limit) {
1590 0         0 my ($p, $s) = split /:/, $type; # split into primary_tag and source
1591             # only keep those types that match our limiter
1592 0 0       0 next unless $p =~ /$limit/i;
1593             }
1594 0         0 $db_features{$i} = $type;
1595 0         0 $i++;
1596             }
1597            
1598             # verify
1599 0 0       0 unless (%db_features) {
1600 0         0 carp " provided database has no feature types " .
1601             "to collect!\n";
1602 0         0 return;
1603             }
1604             }
1605             else {
1606             # we need a database
1607 0         0 carp " no database provided from which to collect features!\n";
1608 0         0 return;
1609             }
1610            
1611             # present the list
1612 0         0 print "\n These are the available data sets in the database:\n";
1613 0         0 foreach (sort {$a <=> $b} keys %db_features) {
  0         0  
1614             # print out the list of microarray data sets
1615 0         0 print " $_\t$db_features{$_}\n";
1616             }
1617            
1618             # prompt the user
1619 0   0     0 $args{'prompt'} ||= undef;
1620 0 0       0 if ($args{'prompt'}) {
1621             # provided custom prompt
1622 0         0 print $args{'prompt'};
1623             }
1624             else {
1625             # generic prompt
1626 0 0       0 if ($args{'single'}) {
1627 0         0 print " Enter one number for the data set or feature ";
1628             }
1629             else {
1630 0         0 print " Enter the number(s) or range of the data set(s) or" .
1631             " feature(s) ";
1632             }
1633             }
1634            
1635             # get answer from the user
1636 0         0 my $answer = ;
1637 0         0 chomp $answer;
1638 0         0 my @answer_list = parse_list($answer);
1639            
1640             # take the first one if requested
1641 0 0       0 if ($args{'single'}) {
1642 0 0       0 unless (scalar @answer_list == 1) {
1643 0         0 splice(@answer_list, 1);
1644             }
1645             }
1646            
1647             # verify the answer list
1648 0         0 foreach my $answer (@answer_list) {
1649            
1650             # check for merged datasets
1651 0 0       0 if ($answer =~ /&/) {
1652             # a merged dataset
1653 0         0 my @list = split /&/, $answer;
1654 0         0 my $check = 1;
1655            
1656             # check all are good
1657 0         0 foreach (@list) {
1658 0 0       0 unless (exists $db_features{$_}) {
1659 0         0 $check = 0;
1660             }
1661             }
1662            
1663             # if all are good
1664 0 0       0 if ($check) {
1665             push @good_datasets,
1666 0         0 join( "&", map { $db_features{$_} } @list);
  0         0  
1667             }
1668             else {
1669 0         0 push @bad_datasets, $answer;
1670             }
1671             }
1672            
1673             else {
1674             # a single dataset
1675             # check if it is good
1676            
1677 0 0       0 if (exists $db_features{$answer}) {
1678 0         0 push @good_datasets, $db_features{$answer};
1679             }
1680             else {
1681 0         0 push @bad_datasets, $db_features{$answer};
1682             }
1683             }
1684             }
1685             }
1686            
1687             # Print bad results
1688 1 50       3 if (@bad_datasets) {
1689 0         0 print " The following datasets could not be verified:\n";
1690 0         0 foreach (@bad_datasets) {
1691 0         0 print " $_\n";
1692             }
1693             }
1694            
1695             # Return good results
1696 1 50       3 if ($args{'single'}) {
1697 0         0 return $good_datasets[0];
1698             }
1699             else {
1700 1         4 return @good_datasets;
1701             }
1702             }
1703              
1704              
1705             ### Process and verify a dataset
1706             sub check_dataset_for_rpm_support {
1707            
1708             # get passed dataset and databases
1709 0     0 1 0 my $dataset = shift;
1710 0         0 my $cpu = shift;
1711            
1712             # Calculate the total number of reads
1713             # this uses the global variable $rpkm_read_sum
1714 0         0 my $rpm_read_sum;
1715            
1716 0 0       0 if (exists $TOTAL_READ_NUMBER{$dataset}) {
    0          
    0          
1717             # this dataset has already been summed
1718             # no need to do it again
1719 0         0 $rpm_read_sum = $TOTAL_READ_NUMBER{$dataset};
1720             }
1721            
1722             elsif ($dataset =~ /\.bam$/) {
1723             # a bam file dataset
1724            
1725 0 0       0 _load_bam_helper_module() unless $BAM_OK;
1726 0 0       0 if ($BAM_OK) {
1727             # Bio::ToolBox::db_helper::bam was loaded ok
1728             # sum the number of reads in the dataset
1729 0         0 $rpm_read_sum = sum_total_bam_alignments($dataset, 0, 0, $cpu);
1730             }
1731             else {
1732 0         0 carp " Bam support is not available! " .
1733             "Is Bio::DB::Sam or Bio::DB::HTS installed?\n";
1734 0         0 return;
1735             }
1736             }
1737            
1738             elsif ($dataset =~ /\.bb$/) {
1739             # a bigbed file dataset
1740            
1741 0 0       0 $BIGBED_OK = _load_helper_module('Bio::ToolBox::db_helper::bigbed') unless
1742             $BIGBED_OK;
1743 0 0       0 if ($BIGBED_OK) {
1744             # Bio::ToolBox::db_helper::bigbed was loaded ok
1745             # sum the number of features in the dataset
1746 0         0 $rpm_read_sum = sum_total_bigbed_features($dataset);
1747             }
1748             else {
1749 0         0 carp " BigBed support is not available! " .
1750             "Is Bio::DB::BigBed installed?\n";
1751 0         0 return;
1752             }
1753             }
1754            
1755             else {
1756             # some other non-supported dataset
1757 0         0 return;
1758             }
1759            
1760             # return the sum value if we've made it this far
1761 0         0 $TOTAL_READ_NUMBER{$dataset} = $rpm_read_sum;
1762 0         0 return $rpm_read_sum;
1763             }
1764              
1765              
1766             ### Generate a new list of features
1767             sub get_new_feature_list {
1768              
1769             # Retrieve passed values
1770 0     0 1 0 my %args = @_; # the passed argument values
1771            
1772             # check data object
1773 0   0     0 my $data = $args{data} || undef;
1774 0 0       0 unless ($data) {
1775 0         0 confess "must pass a 'data' key and Bio::ToolBox::Data object!";
1776 0         0 return;
1777             }
1778 0 0       0 unless (ref($data) eq 'Bio::ToolBox::Data') {
1779 0         0 confess 'must pass a Bio::ToolBox::Data object!';
1780 0         0 return;
1781             }
1782            
1783             # Open a db connection
1784 0   0     0 $args{'db'} ||= undef;
1785 0         0 my $db = open_db_connection($args{'db'});
1786 0 0       0 unless ($db) {
1787 0         0 carp 'no database connected!';
1788 0         0 return;
1789             }
1790              
1791             # Verify a SeqFeature::Store database
1792 0         0 my $db_ref = ref $db;
1793 0 0       0 unless ($db_ref =~ /^Bio::DB::SeqFeature::Store/) {
1794 0         0 carp "Database type $db_ref doesn't support generating feature lists!\n";
1795 0         0 return;
1796             }
1797            
1798             # Features to search for
1799 0   0     0 my $searchFeature = $args{'feature'} || $args{'features'} || undef;
1800 0 0       0 unless ($searchFeature) {
1801 0         0 carp "no search feature types passed!";
1802 0         0 return;
1803             }
1804 0         0 my @classes = split(',', $searchFeature); # it may or may not be a list
1805            
1806             # chromosomes to skip
1807 0   0     0 my $chr_exclude = $args{'chrskip'} || undef;
1808            
1809             # Add table columns
1810 0         0 my $pid_i = $data->add_column('Primary_ID');
1811 0         0 my $name_i = $data->add_column('Name');
1812 0         0 my $type_i = $data->add_column('Type');
1813            
1814             # Set up the database iterator
1815 0         0 print " Searching for $searchFeature features\n";
1816 0 0       0 my $iterator = $db->get_seq_stream(
1817             -types => scalar @classes ? \@classes : $searchFeature,
1818             );
1819 0 0       0 unless ($iterator) {
1820             # there should be some features found in the database
1821 0         0 carp "could not get feature iterator for database";
1822 0         0 return;
1823             }
1824            
1825             # Walk through the collected features
1826 0         0 my $total_count = 0; # total found features
1827 0         0 while (my $feature = $iterator->next_seq) {
1828 0         0 $total_count++;
1829            
1830             # skip genes from excluded chromosomes
1831 0 0 0     0 next if (defined $chr_exclude and $feature->seq_id =~ $chr_exclude);
1832            
1833             # Record the feature information
1834             # in the B::DB::SF::S database, the primary_ID is a number unique to the
1835             # the specific database, and is not portable between databases
1836 0         0 $data->add_row( [
1837             $feature->primary_id,
1838             $feature->display_name,
1839             $feature->type,
1840             ] );
1841             }
1842            
1843             # print result of search
1844 0         0 printf " Found %s features in the database\n", format_with_commas($total_count);
1845 0         0 printf " Kept %s features.\n", format_with_commas($data->last_row);
1846            
1847             # return the new data structure
1848 0         0 return 1;
1849             }
1850              
1851              
1852             ### Generate a new list genomic windows
1853             sub get_new_genome_list {
1854              
1855             # Collect the passed arguments
1856 1     1 1 5 my %args = @_;
1857            
1858             # check data object
1859 1   50     3 my $data = $args{data} || undef;
1860 1 50       4 unless ($data) {
1861 0         0 confess "must pass a 'data' key and Bio::ToolBox::Data object!";
1862             }
1863 1 50       3 unless (ref($data) eq 'Bio::ToolBox::Data') {
1864 0         0 confess 'must pass a Bio::ToolBox::Data object!';
1865             }
1866            
1867             # Open a db connection
1868 1   50     3 $args{'db'} ||= undef;
1869 1         3 my $db = open_db_connection($args{'db'});
1870 1 50       3 unless ($db) {
1871 0         0 carp 'no database connected!';
1872 0         0 return;
1873             }
1874            
1875             # Determine win and step sizes
1876 1   50     3 $args{'win'} ||= 500; # hard coded default size
1877 1   33     5 $args{'step'} ||= $args{'win'};
1878            
1879            
1880             # Prepare data structures
1881 1         5 my $chr_i = $data->add_column('Chromosome');
1882 1         3 my $start_i = $data->add_column('Start');
1883 1         66 my $stop_i = $data->add_column('Stop');
1884 1         7 $data->metadata($start_i, 'win', $args{'win'});
1885 1         3 $data->metadata($start_i, 'step', $args{'step'});
1886            
1887             # Collect the chromosomes
1888 1   50     6 my $chr_exclude = $args{'skip'} || $args{'chrskip'} || undef;
1889 1         5 my @chromosomes = get_chromosome_list($db, $chr_exclude);
1890 1 50       36 unless (@chromosomes) {
1891 0         0 carp " no sequence IDs were found in the database!\n";
1892 0         0 return;
1893             }
1894            
1895             # Load exclusion intervals
1896 1         2 my %exclusion_tree;
1897 1   50     7 $args{exclude} ||= $args{blacklist} || undef;
      33        
1898 1 50 33     6 if (exists $args{exclude} and defined $args{exclude}) {
1899 0 0       0 if (ref($args{exclude}) eq 'Bio::ToolBox::Data') {
1900 0 0       0 if (_load_helper_module('Set::IntervalTree')) {
1901             # iterate through the data object of intervals
1902             # prepare an Interval Tree for each chromosome
1903             # and load the exclusion interval into it
1904             $args{exclude}->iterate( sub {
1905 0     0   0 my $row = shift;
1906 0 0       0 unless (exists $exclusion_tree{$row->seq_id} ) {
1907 0         0 $exclusion_tree{ $row->seq_id } = Set::IntervalTree->new();
1908             }
1909 0         0 $exclusion_tree{ $row->seq_id }->insert(1, $row->start - 1, $row->end);
1910 0         0 } );
1911             }
1912             else {
1913 0         0 carp " Set::IntervalTree must be installed to use exclusion intervals!";
1914             }
1915             }
1916             else {
1917 0         0 confess " Exclusion data must be a Bio::ToolBox::Data object!";
1918             }
1919             }
1920            
1921             # Collect the genomic windows
1922 1         31 print " Generating $args{win} bp windows in $args{step} bp increments\n";
1923 1         6 foreach (@chromosomes) {
1924            
1925             # name and length as sub-array in each element
1926 1         1 my ($chr, $length) = @{$_};
  1         3  
1927 1   50     5 my $Tree = $exclusion_tree{$chr} || undef;
1928            
1929             # iterate through chromosome
1930 1         4 for (my $start = 1; $start <= $length; $start += $args{step}) {
1931 122         127 my $end = $start + $args{win} - 1;
1932 122 100       143 if ($end > $length) {
1933             # fix end to the length of the chromosome
1934 1         2 $end = $length;
1935             }
1936             # skip if excluded
1937 122 50 33     152 next if ($Tree and scalar( @{ $Tree->fetch($start - 1, $end) } ) >= 1);
  0         0  
1938 122         184 $data->add_row( [ $chr, $start, $end] );
1939             }
1940             }
1941 1         27 print " Kept " . $data->{'last_row'} . " windows.\n";
1942            
1943             # Return the data structure
1944 1         9 return 1;
1945             }
1946              
1947              
1948             sub get_db_feature {
1949 0     0 1 0 my %args = @_;
1950            
1951             # Open a db connection
1952 0   0     0 $args{db} ||= undef;
1953 0 0       0 unless ($args{db}) {
1954 0         0 croak 'no database provided for getting a feature!';
1955             }
1956 0         0 my $db = open_db_connection($args{db});
1957 0         0 my $db_ref = ref $db;
1958            
1959             # get the name of the feature
1960 0   0     0 my $name = $args{'name'} || undef;
1961            
1962             # check for values and internal nulls
1963 0 0       0 $args{'id'} = exists $args{'id'} ? $args{'id'} : undef;
1964 0   0     0 $args{'type'} ||= undef;
1965 0 0 0     0 undef $name if $name and $name eq '.';
1966 0 0 0     0 undef $args{'id'} if $args{'id'} and $args{'id'} eq '.';
1967 0 0 0     0 undef $args{'type'} if $args{'type'} and $args{'type'} eq '.';
1968            
1969             # quick method for feature retrieval
1970 0 0 0     0 if (defined $args{'id'} and $db->can('fetch')) {
1971             # we have a unique primary_id to identify the feature
1972             # usually this pulls out the feature directly
1973 0   0     0 my $feature = $db->fetch($args{'id'}) || undef;
1974            
1975             # check that the feature we pulled out is what we want
1976 0 0       0 my $check = $feature ? 1 : 0;
1977 0 0       0 if ($check) {
1978 0 0 0     0 $check = 0 if (defined $name and $feature->display_name ne $name);
1979 0 0 0     0 $check = 0 if (defined $args{'type'} and $feature->type ne $args{'type'});
1980             }
1981            
1982             # return if things match up as best we can tell
1983 0 0       0 if ($check) {
1984 0         0 return $feature;
1985             }
1986             else {
1987             # the primary_ids are out of date
1988 0 0       0 unless ($PRIMARY_ID_WARNING) {
1989 0         0 warn "CAUTION: Some primary IDs in Input file appear to be out of date\n";
1990 0         0 $PRIMARY_ID_WARNING++;
1991             }
1992             }
1993             }
1994            
1995             # otherwise use name and type
1996 0 0       0 return unless $name; # otherwise db will return all features! Not what we want!
1997             my @features = $db->features(
1998             -name => $name, # we assume this name will be unique
1999             -aliases => 0,
2000 0         0 -type => $args{'type'},
2001             );
2002            
2003             # if none are found
2004 0 0       0 unless (@features) {
2005             # try again with aliases enabled
2006             @features = $db->features(
2007             -name => $name,
2008             -aliases => 1,
2009 0         0 -type => $args{'type'},
2010             );
2011             }
2012 0 0 0     0 unless (@features and $name =~ /[;,\|]/) {
2013             # I used to append aliases to the end of the name in old versions of biotoolbox
2014             # crazy, I know, but just in case this happened, let's try breaking them apart
2015 0         0 my $name2 = (split(/\s*[;,\|]\s*/, $name))[0];
2016             # multiple names present using common delimiters ;,|
2017             # take the first name only, assume others are aliases that we don't need
2018             @features = $db->features(
2019             -name => $name2,
2020             -aliases => 1,
2021 0         0 -type => $args{'type'},
2022             );
2023             }
2024            
2025             # check the number of features returned
2026 0 0       0 if (scalar @features > 1) {
    0          
2027             # there should only be one feature found
2028             # if more are returned, there's redundant or duplicated data in the db
2029            
2030             # first check whether addition of aliases may improve accuracy
2031 0 0       0 if ($args{'name'} =~ /;/) {
2032             # we have presumed aliases in the name value
2033 0         0 my $check = $args{'name'};
2034 0         0 $check =~ s/\s*;\s*/;/g; # strip spaces just in case
2035            
2036             # check each feature and try to match name and aliases
2037 0         0 my @candidates;
2038 0         0 foreach my $f (@features) {
2039 0         0 my $f_name = join(';', $f->display_name, ($f->get_tag_values('Alias')));
2040 0 0       0 if ($check eq $f_name) {
2041             # found one
2042 0         0 push @candidates, $f;
2043             }
2044             }
2045            
2046 0 0       0 if (scalar @candidates == 1) {
    0          
2047             # we found it!
2048 0         0 return shift @candidates;
2049             }
2050             elsif (scalar @candidates > 1) {
2051             # hopefully we've improved a little bit?
2052 0         0 @features = @candidates;
2053             }
2054             }
2055            
2056             # warn the user, this should be fixed
2057             printf " Found %s %s features named '$name' in the database! Using first one.\n",
2058 0         0 scalar(@features), $args{'type'};
2059             }
2060             elsif (!@features) {
2061 0         0 printf " Found no %s features named '$name' in the database!\n", $args{'type'};
2062 0         0 return;
2063             }
2064            
2065             # done
2066 0         0 return shift @features;
2067             }
2068              
2069              
2070             ### Get a dataset score for a single region
2071             sub get_segment_score {
2072             # parameters passed as an array
2073             # we will be passing this array on as a reference to the appropriate
2074             # imported helper subroutine
2075             # chromosome, start, stop, strand, strandedness, method, return type, db, dataset
2076 12 50   12 1 23 confess "incorrect number of parameters passed!" unless scalar @_ == 9;
2077            
2078             # check the database
2079 12 100 66     41 $_[DB] = open_db_connection($_[DB]) if ($_[DB] and not ref($_[DB]));
2080            
2081             # check for combined datasets
2082 12 50       49 if ($_[DATA] =~ /&/) {
2083 0         0 push @_, (split '&', pop @_);
2084             }
2085            
2086             # determine method
2087             # yes, we're only checking the first dataset, but they should all
2088             # be the same type
2089 12   66     71 my $db_method = $DB_METHODS{$_[METH]}{$_[RETT]}{$_[DB]}{$_[DATA]} ||
2090             _lookup_db_method(\@_);
2091            
2092             # return type values
2093             # 0 = calculate score
2094             # 1 = score array
2095             # 2 = hash position scores
2096             # immediately return calculated score if appropriate
2097 12 100       24 if ($_[RETT] > 0) {
2098             # immediately return either indexed hash or array of scores
2099 5         7 return &{$db_method}(\@_);
  5         11  
2100             }
2101             else {
2102             # calculate a score
2103 7         9 my $scores = &{$db_method}(\@_);
  7         16  
2104             # this might be an array reference of scores that need to be combined
2105             # or it could be a single scalar score which just needs to be returned
2106 7 50       18 if (ref($scores)) {
2107 7         18 return calculate_score($_[METH], $scores);
2108             }
2109             else {
2110 0 0       0 if (not defined $scores) {
2111 0 0       0 return 0 if $_[METH] =~ /count|sum/;
2112 0         0 return '.';
2113             }
2114 0         0 return $scores;
2115             }
2116             }
2117             }
2118              
2119              
2120             sub calculate_score {
2121 7     7 1 12 my ($method, $scores) = @_;
2122 7   50     14 $scores ||= []; # just in case
2123 7 50       14 if (exists $SCORE_CALCULATOR_SUB{$method}) {
2124 7         11 return &{$SCORE_CALCULATOR_SUB{$method}}($scores);
  7         17  
2125             }
2126             else {
2127 0         0 confess " unrecognized method '$method'!";
2128             }
2129             }
2130              
2131              
2132             sub get_chromosome_list {
2133            
2134             # options
2135 2     2 1 297 my $database = shift;
2136 2   50     8 my $chr_exclude = shift || undef;
2137            
2138             # Open a db connection
2139 2         5 my $db = open_db_connection($database);
2140 2 50       5 unless ($db) {
2141 0         0 carp 'no database connected!';
2142 0         0 return;
2143             }
2144            
2145             # Check for BigWigSet database
2146             # these need to be handled a little differently
2147 2 50       7 if (ref($db) =~ /BigWigSet/) {
2148             # BigWigSet databases are the only databases that don't
2149             # support the seq_ids method
2150             # instead we have to look at one of the bigwigs in the set
2151 0         0 my $bw_file = ($db->bigwigs)[0];
2152 0         0 $db = open_db_connection($bw_file);
2153             }
2154            
2155            
2156             # Collect chromosome lengths
2157             # we want to maintain the original order of the chromosomes so we
2158             # won't be putting it into a hash
2159             # instead an array of arrays
2160 2         4 my @chrom_lengths;
2161            
2162             # Database specific approaches to collecting chromosomes
2163             # I used to have one generic approach, but idiosyncrasies and potential
2164             # bugs make me use different approaches for better consistency
2165 2         3 my $type = ref($db);
2166            
2167             # SeqFeature::Store
2168 2 50 33     33 if ($type =~ /^Bio::DB::SeqFeature::Store/) {
    50 33        
    50          
    50          
    50          
    50          
    50          
2169 0         0 for my $chr ($db->seq_ids) {
2170            
2171             # check for excluded chromosomes
2172 0 0 0     0 next if (defined $chr_exclude and $chr =~ /$chr_exclude/i);
2173            
2174             # get chromosome size
2175 0         0 my ($seqf) = $db->get_features_by_name($chr);
2176 0 0       0 my $length = $seqf ? $seqf->length : 0;
2177            
2178             # store
2179 0         0 push @chrom_lengths, [ $chr, $length ];
2180             }
2181             }
2182            
2183             # libBigWig Bigfile, including both bigWig and bigBed
2184             elsif ($type eq 'Bio::DB::Big::File') {
2185             # this doesn't follow the typical BioPerl convention
2186             # it's a hash, so randomly sorted!!!!! Never will be in same order as file!!!!
2187 0         0 my $chroms = $db->chroms();
2188            
2189 0         0 my @list;
2190 0         0 foreach (values %$chroms) {
2191             # check for excluded chromosomes
2192 0 0 0     0 next if (defined $chr_exclude and $_->{name} =~ /$chr_exclude/i);
2193 0         0 push @list, [$_->{name}, $_->{length}];
2194             }
2195             # sort consistently by a sane system
2196 0         0 @chrom_lengths = sane_chromo_sort(@list);
2197             }
2198            
2199             # UCSC kent Bigfile
2200             elsif ($type eq 'Bio::DB::BigWig' or $type eq 'Bio::DB::BigBed') {
2201 0         0 foreach my $chr ($db->seq_ids) {
2202            
2203             # check for excluded chromosomes
2204 0 0 0     0 next if (defined $chr_exclude and $chr =~ /$chr_exclude/i);
2205            
2206             # get chromosome size
2207 0         0 my $length = $db->length($chr);
2208            
2209             # store
2210 0         0 push @chrom_lengths, [ $chr, $length ];
2211             }
2212             }
2213            
2214             # Samtools Bam
2215             elsif ($type eq 'Bio::DB::Sam' or $type eq 'Bio::DB::HTS') {
2216 0         0 for my $tid (0 .. $db->n_targets - 1) {
2217             # each chromosome is internally represented in the bam file as
2218             # a numeric target identifier
2219             # we can easily convert this to an actual sequence name
2220             # we will force the conversion to go one chromosome at a time
2221            
2222             # sequence info
2223 0         0 my $chr = $db->target_name($tid);
2224 0         0 my $length = $db->target_len($tid);
2225            
2226             # check for excluded chromosomes
2227 0 0 0     0 next if (defined $chr_exclude and $chr =~ /$chr_exclude/i);
2228            
2229             # store
2230 0         0 push @chrom_lengths, [ $chr, $length ];
2231             }
2232             }
2233            
2234             # Fast through modern HTS fai index
2235             elsif ($type eq 'Bio::DB::HTS::Faidx') {
2236 0         0 for my $chr ($db->get_all_sequence_ids) {
2237             # check for excluded
2238 0 0 0     0 next if (defined $chr_exclude and $chr =~ /$chr_exclude/i);
2239            
2240             # get length and store it
2241 0         0 my $length = $db->length($chr);
2242 0         0 push @chrom_lengths, [$chr, $length];
2243             }
2244             }
2245            
2246             # Fasta through old BioPerl adapter
2247             elsif ($type eq 'Bio::DB::Fasta') {
2248 0         0 for my $chr ($db->get_all_ids) {
2249            
2250             # check for excluded chromosomes
2251 0 0 0     0 next if (defined $chr_exclude and $chr =~ /$chr_exclude/i);
2252            
2253             # get chromosome size
2254 0         0 my $seq = $db->get_Seq_by_id($chr);
2255 0 0       0 my $length = $seq ? $seq->length : 0;
2256            
2257             # store
2258 0         0 push @chrom_lengths, [ $chr, $length ];
2259             }
2260             }
2261            
2262             # other Bioperl adapter?????
2263             elsif ($db->can('seq_ids')) {
2264 2         8 foreach my $chr ($db->seq_ids) {
2265            
2266             # check for excluded chromosomes
2267 2 50 33     20 next if (defined $chr_exclude and $chr =~ /$chr_exclude/i);
2268            
2269             # generate a segment representing the chromosome
2270             # due to fuzzy name matching, we may get more than one back
2271 2         18 my @segments = $db->segment($chr);
2272             # need to find the right one
2273 2         247 my $segment;
2274 2         12 while (@segments) {
2275 2         4 $segment = shift @segments;
2276 2 50       8 last if $segment->seq_id eq $chr;
2277             }
2278            
2279             # check segment
2280 2 50       25 unless ($segment) {
2281 0         0 carp " No genome segment for seq_id $chr!!?? skipping\n";
2282 0         0 next;
2283             }
2284            
2285             # get the chromosome length
2286 2         10 my $length = $segment->length;
2287            
2288             # store
2289 2         49 push @chrom_lengths, [ $chr, $length ];
2290             }
2291             }
2292            
2293             else {
2294 0         0 carp " Unable to identify chromosomes in database type '$type'. Try using a different database file or adapter\n";
2295 0         0 return;
2296             }
2297            
2298             # Return
2299 2 50       6 unless (@chrom_lengths) {
2300 0         0 carp " no chromosome sequences identified in database!\n";
2301 0         0 return;
2302             }
2303 2         6 return @chrom_lengths;
2304             }
2305              
2306              
2307             sub low_level_bam_fetch {
2308 0 0   0 1 0 confess "incorrect number of parameters passed!" unless scalar @_ == 6;
2309 0         0 my ($sam, $tid, $start, $stop, $callback, $data) = @_;
2310             # run the the low level bam fetch based on which adapter is being used
2311 0 0       0 unless ($BAM_ADAPTER) {
2312 0 0       0 $BAM_ADAPTER = ref($sam) =~ /hts/i ? 'hts' : 'sam';
2313             }
2314 0 0       0 if ($BAM_ADAPTER eq 'hts') {
    0          
2315             # using Bio::DB::HTS
2316 0         0 my $index = $sam->hts_index;
2317 0 0       0 return unless $index;
2318 0         0 return $index->fetch($sam->hts_file, $tid, $start, $stop, $callback, $data);
2319             }
2320             elsif ($BAM_ADAPTER eq 'sam') {
2321             # using Bio::DB::Sam
2322 0         0 my $index = $sam->bam_index;
2323 0 0       0 return unless $index;
2324 0         0 return $index->fetch($sam->bam, $tid, $start, $stop, $callback, $data);
2325             }
2326             else {
2327 0         0 confess "no bam adapter loaded!\n";
2328             }
2329             }
2330              
2331              
2332             sub low_level_bam_coverage {
2333 0 0   0 1 0 confess "incorrect number of parameters passed!" unless scalar @_ == 4;
2334 0         0 my ($sam, $tid, $start, $stop) = @_;
2335             # run the the low level bam coverage based on which adapter is being used
2336 0 0       0 unless ($BAM_ADAPTER) {
2337 0 0       0 $BAM_ADAPTER = ref($sam) =~ /hts/i ? 'hts' : 'sam';
2338             }
2339 0 0       0 if ($BAM_ADAPTER eq 'hts') {
    0          
2340             # using Bio::DB::HTS
2341 0         0 my $index = $sam->hts_index;
2342 0 0       0 return unless $index;
2343 0         0 return $index->coverage($sam->hts_file, $tid, $start, $stop);
2344             }
2345             elsif ($BAM_ADAPTER eq 'sam') {
2346             # using Bio::DB::Sam
2347 0         0 my $index = $sam->bam_index;
2348 0 0       0 return unless $index;
2349 0         0 return $index->coverage($sam->bam, $tid, $start, $stop);
2350             }
2351             else {
2352 0         0 confess "no bam adapter loaded!\n";
2353             }
2354             }
2355              
2356              
2357             sub get_genomic_sequence {
2358 0 0   0 1 0 confess "incorrect number of parameters passed!" unless scalar @_ == 4;
2359 0         0 my ($db, $chrom, $start, $stop) = @_;
2360            
2361             # check database
2362 0         0 my $type = ref($db);
2363 0 0       0 $db = open_db_connection($db) if not $type;
2364            
2365             # return sequence based on the type of database adapter we're using
2366 0 0       0 if ($type eq 'Bio::DB::HTS::Faidx') {
    0          
    0          
2367 0         0 return $db->get_sequence_no_length("$chrom:$start-$stop");
2368             }
2369             elsif ($type eq 'Bio::DB::Sam::Fai') {
2370 0         0 return $db->fetch("$chrom:$start-$stop");
2371             }
2372             elsif ($db->can('seq')) {
2373             # BioPerl database including Bio::DB::Fasta and Bio::DB::SeqFeature::Store
2374 0         0 return $db->seq($chrom, $start, $stop);
2375             }
2376             else {
2377 0         0 confess "unsupported database $type for collecting genomic sequence!\n";
2378             }
2379             }
2380              
2381              
2382             ### deprecated methods
2383             # just in case
2384             sub validate_included_feature {
2385 0     0 0 0 confess "validate_included_feature() is no longer a valid method. " .
2386             "Please update your script to the current API.\n";
2387             }
2388              
2389              
2390             ### Internal subroutine to retrieve the scores from an established region object
2391             sub _lookup_db_method {
2392             # parameters passed as an array reference
2393 9     9   14 my $param = shift;
2394             # determine the appropriate score method
2395 9         10 my $score_method;
2396 9 50       31 if ($param->[DATA] =~ /^file|http|ftp/) {
    0          
    0          
2397             # collect the data according to file type
2398            
2399             # BigWig Data file
2400 9 50       69 if ($param->[DATA] =~ /\.(?:bw|bigwig)$/i) {
    50          
    50          
    50          
2401             # file is in bigwig format
2402             # get the dataset scores using Bio::ToolBox::db_helper::bigwig
2403             # this uses either Bio::DB::BigWig or Bio::DB::Big
2404            
2405             # check that we have bigwig support
2406 0 0       0 $BIGWIG_OK = _load_bigwig_helper_module() unless $BIGWIG_OK;
2407 0 0       0 if ($BIGWIG_OK) {
2408 0 0       0 if ($param->[RETT] == 2) {
    0          
    0          
    0          
2409 0         0 $score_method = \&collect_bigwig_position_scores;
2410             }
2411             elsif ($param->[RETT] == 1) {
2412 0         0 $score_method = \&collect_bigwig_scores;
2413             }
2414             elsif ($param->[METH] =~ /min|max|mean/) {
2415 0         0 $score_method = \&collect_bigwig_score;
2416             }
2417             elsif ($param->[METH] =~ /sum|count/) {
2418             # difference between ucsc and libBigWig libraries
2419 0 0       0 $score_method = $BIG_ADAPTER eq 'ucsc' ?
2420             \&collect_bigwig_score : \&collect_bigwig_scores;
2421             }
2422             else {
2423 0         0 $score_method = \&collect_bigwig_scores;
2424             }
2425             }
2426             else {
2427 0         0 croak " BigWig support is not enabled! Is Bio::DB::Big or Bio::DB::BigFile installed?";
2428             }
2429             }
2430            
2431             # BigBed Data file
2432             elsif ($param->[DATA] =~ /\.(?:bb|bigbed)$/i) {
2433             # data is in bigbed format
2434             # get the dataset scores using Bio::ToolBox::db_helper::bigbed
2435             # this uses either Bio::DB::BigBed or Bio::DB::Big
2436            
2437             # check that we have bigbed support
2438 0 0       0 $BIGBED_OK = _load_bigbed_helper_module() unless $BIGBED_OK;
2439 0 0       0 if ($BIGBED_OK) {
2440 0 0       0 if ($param->[RETT] == 2) {
2441 0         0 $score_method = \&collect_bigbed_position_scores;
2442             }
2443             else {
2444 0         0 $score_method = \&collect_bigbed_scores;
2445             }
2446             }
2447             else {
2448 0         0 croak " BigBed support is not enabled! Is Bio::DB::Big or Bio::DB::BigFile installed?";
2449             }
2450             }
2451            
2452             # BAM data file
2453             elsif ($param->[DATA] =~ /\.bam$/i) {
2454             # data is in bam format
2455             # get the dataset scores using Bio::ToolBox::db_helper::bam
2456             # this uses the Bio::DB::Sam or Bio::DB::HTS adaptor
2457            
2458             # check that we have Bam support
2459 0 0       0 _load_bam_helper_module() unless $BAM_OK;
2460 0 0       0 if ($BAM_OK) {
2461 0         0 $score_method = \&collect_bam_scores;
2462             }
2463             else {
2464 0         0 croak " Bam support is not enabled! " .
2465             "Is Bio::DB::HTS or Bio::DB::Sam installed?\n";
2466             }
2467             }
2468            
2469             # USeq Data file
2470             elsif ($param->[DATA] =~ /\.useq$/i) {
2471             # data is in useq format
2472             # get the dataset scores using Bio::ToolBox::db_helper::useq
2473             # this uses the Bio::DB::USeq adaptor
2474            
2475             # check that we have bigbed support
2476 9 50       21 $USEQ_OK = _load_helper_module('Bio::ToolBox::db_helper::useq')
2477             unless $USEQ_OK;
2478 9 50       13 if ($USEQ_OK) {
2479 9 100       13 if ($param->[RETT] == 2) {
2480 4         8 $score_method = \&collect_useq_position_scores;
2481             }
2482             else {
2483 5         11 $score_method = \&collect_useq_scores;
2484             }
2485             }
2486             else {
2487 0         0 croak " USeq support is not enabled! " .
2488             "Is Bio::DB::USeq installed?\n";
2489             }
2490             }
2491            
2492             # Unsupported Data file
2493             else {
2494 0         0 confess sprintf " Unsupported or unrecognized file type for file '%s'!\n",
2495             $param->[DATA];
2496             }
2497             }
2498            
2499            
2500             ### BigWigSet database
2501             elsif (ref($param->[DB]) =~ m/BigWigSet/) {
2502             # calling features from a BigWigSet database object
2503             # this uses either Bio::DB::BigWig or Bio::DB::Big
2504            
2505             # check that we have bigwig support
2506             # duh! we should, we probably opened the stupid database!
2507 0 0       0 $BIGWIG_OK = _load_bigwig_helper_module() unless $BIGWIG_OK;
2508 0 0       0 croak " BigWigSet support is not enabled! Is Bio::DB::Big or Bio::DB::BigFile installed?"
2509             unless $BIGWIG_OK;
2510            
2511             # the data collection depends on the method
2512 0 0       0 if ($param->[RETT] == 2) {
    0          
    0          
2513 0         0 $score_method = \&collect_bigwigset_position_scores;
2514             }
2515             elsif ($param->[RETT] == 1) {
2516 0         0 $score_method = \&collect_bigwigset_scores;
2517             }
2518             elsif ($param->[METH] =~ /min|max|mean|sum|count/) {
2519             # difference between ucsc and libBigWig libraries
2520             # the ucsc library works with summaries and we can handle multiple of these
2521             # but the big adapter doesn't
2522 0 0       0 $score_method = $BIG_ADAPTER eq 'ucsc' ?
2523             \&collect_bigwigset_score : \&collect_bigwigset_scores;
2524             }
2525             else {
2526 0         0 $score_method = \&collect_bigwigset_scores;
2527             }
2528             }
2529            
2530              
2531             ### BioPerl style database
2532             elsif (ref($param->[DB]) =~ m/^Bio::DB/) {
2533             # a BioPerl style database, including Bio::DB::SeqFeature::Store
2534             # most or all Bio::DB databases support generic get_seq_stream() methods
2535             # that return seqfeature objects, which we can use in a generic sense
2536            
2537             # check that we have support
2538             # duh! we should, we probably opened the stupid database!
2539 0 0       0 $SEQFASTA_OK = _load_helper_module('Bio::ToolBox::db_helper::seqfasta')
2540             unless $SEQFASTA_OK;
2541 0 0       0 if ($SEQFASTA_OK) {
2542             # get the dataset scores using Bio::ToolBox::db_helper::seqfasta
2543             # check that we support methods
2544 0 0       0 unless ($param->[DB]->can('get_seq_stream')) {
2545 0         0 confess sprintf "unsupported database! cannot use %s as it does not support get_seq_stream method or equivalent",
2546             ref($param->[DB]);
2547             }
2548 0         0 $score_method = \&collect_store_scores;
2549             }
2550             else {
2551 0         0 croak " SeqFeature Store support is not enabled! " .
2552             "Is BioPerl and Bio::DB::SeqFeature::Store properly installed?\n";
2553             }
2554             }
2555            
2556            
2557             ### Some other database?
2558             else {
2559 0 0       0 confess "no recognizeable dataset provided!" unless $param->[DATA];
2560 0 0       0 confess "no database passed!" unless $param->[DB];
2561 0         0 confess "something went wrong! parameters: @$param";
2562             }
2563            
2564            
2565             ### Cache and return the results
2566 9         27 $DB_METHODS{$param->[METH]}{$param->[RETT]}{$param->[DB]}{$param->[DATA]} =
2567             $score_method;
2568 9         19 return $score_method;
2569             }
2570              
2571              
2572             sub _load_helper_module {
2573 1     1   3 my $class = shift;
2574 1         2 my $success = 0;
2575 1         1 eval {
2576 1         4 load $class; # using Module::Load to dynamically load modules
2577 1         36 $class->import; # must be done particularly for my modules
2578 1         3 $success = 1;
2579             };
2580 1         2 return $success;
2581             }
2582              
2583              
2584             sub _load_bam_helper_module {
2585 0 0   0     if ($BAM_ADAPTER) {
2586 0 0         if ($BAM_ADAPTER =~ /sam/i) {
    0          
    0          
2587 0           $BAM_OK = _load_helper_module('Bio::ToolBox::db_helper::bam');
2588 0           $BAM_ADAPTER = 'sam'; # for internal consistency
2589             }
2590             elsif ($BAM_ADAPTER =~ /hts/i) {
2591 0           $BAM_OK = _load_helper_module('Bio::ToolBox::db_helper::hts');
2592 0           $BAM_ADAPTER = 'hts'; # for internal consistency
2593             }
2594             elsif ($BAM_ADAPTER =~ /none/i) {
2595             # basically for testing purposes, don't use a module
2596 0           return 0;
2597             }
2598             else {
2599             # unrecognized
2600 0           $@ = 'unrecognized';
2601             }
2602 0 0         if (not $BAM_OK) {
2603 0           print "Requested '$BAM_ADAPTER' adapter could not be loaded: $@\n";
2604             }
2605             }
2606             else {
2607             # try hts first, then sam
2608             # be sure to set BAM_ADAPTER upon success
2609 0           $BAM_OK = _load_helper_module('Bio::ToolBox::db_helper::hts');
2610 0 0         if ($BAM_OK) {
2611 0           $BAM_ADAPTER = 'hts';
2612             }
2613             else {
2614 0           $BAM_OK = _load_helper_module('Bio::ToolBox::db_helper::bam');
2615 0 0         $BAM_ADAPTER = 'sam' if $BAM_OK;
2616             }
2617             }
2618 0           return $BAM_OK;
2619             }
2620              
2621              
2622             sub _load_bigwig_helper_module {
2623 0 0   0     if ($BIG_ADAPTER) {
2624 0 0         if ($BIG_ADAPTER =~ /ucsc|kent/i) {
    0          
    0          
2625 0           $BIGWIG_OK = _load_helper_module('Bio::ToolBox::db_helper::bigwig');
2626 0           $BIG_ADAPTER = 'ucsc'; # for internal consistency
2627             }
2628             elsif ($BIG_ADAPTER =~ /big/i) {
2629 0           $BIGWIG_OK = _load_helper_module('Bio::ToolBox::db_helper::big');
2630 0           $BIGBED_OK = $BIGWIG_OK; # bigbed too!
2631 0           $BIG_ADAPTER = 'big'; # for internal consistency
2632             }
2633             elsif ($BIG_ADAPTER =~ /none/i) {
2634             # basically for testing purposes, don't use a module
2635 0           return 0;
2636             }
2637             else {
2638             # unrecognized
2639 0           $@ = 'unrecognized';
2640             }
2641 0 0         if (not $BIGWIG_OK) {
2642 0           print "Requested '$BIG_ADAPTER' adapter could not be loaded: $@\n";
2643             }
2644             }
2645             else {
2646             # we have to try each one out
2647             # try the modern big adapter first
2648 0           $BIGWIG_OK = _load_helper_module('Bio::ToolBox::db_helper::big');
2649 0 0         if ($BIGWIG_OK) {
2650             # success!
2651 0           $BIGBED_OK = $BIGWIG_OK; # bigbed too!
2652 0 0         $BIG_ADAPTER = 'big' if $BIGWIG_OK;
2653             }
2654             else {
2655 0           $BIGWIG_OK = _load_helper_module('Bio::ToolBox::db_helper::bigwig');
2656 0 0         $BIG_ADAPTER = 'ucsc' if $BIGWIG_OK;
2657             }
2658             }
2659 0           return $BIGWIG_OK;
2660             }
2661              
2662             sub _load_bigbed_helper_module {
2663 0 0   0     if ($BIG_ADAPTER) {
2664 0 0         if ($BIG_ADAPTER =~ /ucsc|kent/i) {
    0          
    0          
    0          
2665 0           $BIGBED_OK = _load_helper_module('Bio::ToolBox::db_helper::bigbed');
2666 0           $BIG_ADAPTER = 'ucsc'; # for internal consistency
2667             }
2668             elsif ($BIG_ADAPTER =~ /big/i) {
2669 0           $BIGBED_OK = _load_helper_module('Bio::ToolBox::db_helper::big');
2670 0           $BIGWIG_OK = $BIGBED_OK; # bigwig too!
2671 0           $BIG_ADAPTER = 'big'; # for internal consistency
2672             }
2673             elsif ($BIG_ADAPTER =~ /none/i) {
2674             # basically for testing purposes, don't use a module
2675 0           return 0;
2676             }
2677             elsif ($BAM_ADAPTER =~ /\w+/) {
2678             # unrecognized
2679 0           $@ = 'unrecognized';
2680             }
2681 0 0         if (not $BIGWIG_OK) {
2682 0           print "Requested '$BIG_ADAPTER' adapter could not be loaded: $@\n";
2683             }
2684             }
2685             else {
2686             # we have to try each one out
2687             # try the modern big adapter first
2688 0           $BIGBED_OK = _load_helper_module('Bio::ToolBox::db_helper::big');
2689 0 0         if ($BIGBED_OK) {
2690             # success!
2691 0           $BIGWIG_OK = $BIGBED_OK; # bigwig too!
2692 0 0         $BIG_ADAPTER = 'big' if $BIGBED_OK;
2693             }
2694             else {
2695 0           $BIGBED_OK = _load_helper_module('Bio::ToolBox::db_helper::bigbed');
2696 0 0         $BIG_ADAPTER = 'ucsc' if $BIGBED_OK;
2697             }
2698             }
2699 0           return $BIGBED_OK;
2700             }
2701              
2702             =head1 AUTHOR
2703              
2704             Timothy J. Parnell, PhD
2705             Howard Hughes Medical Institute
2706             Dept of Oncological Sciences
2707             Huntsman Cancer Institute
2708             University of Utah
2709             Salt Lake City, UT, 84112
2710              
2711             This package is free software; you can redistribute it and/or modify
2712             it under the terms of the Artistic License 2.0.
2713