File Coverage

Bio/Restriction/EnzymeI.pm
Criterion Covered Total %
statement 6 37 16.2
branch n/a
condition n/a
subroutine 2 33 6.0
pod 30 31 96.7
total 38 101 37.6


line stmt bran cond sub pod time code
1             #------------------------------------------------------------------
2             #
3             # BioPerl module Bio::Restriction::EnzymeI
4             #
5             # Please direct questions and support issues to
6             #
7             # Cared for by Heikki Lehvaslaiho, heikki-at-bioperl-dot-org
8             #
9             # You may distribute this module under the same terms as perl itself
10             #------------------------------------------------------------------
11              
12             ## POD Documentation:
13              
14             =head1 NAME
15              
16             Bio::Restriction::EnzymeI - Interface class for restriction endonuclease
17              
18             =head1 SYNOPSIS
19              
20             # do not run this class directly
21              
22             =head1 DESCRIPTION
23              
24             This module defines methods for a single restriction endonuclease. For an
25             implementation, see L.
26              
27             =head1 FEEDBACK
28              
29             =head2 Mailing Lists
30              
31             User feedback is an integral part of the evolution of this and other
32             Bioperl modules. Send your comments and suggestions preferably to one
33             of the Bioperl mailing lists. Your participation is much appreciated.
34              
35             bioperl-l@bioperl.org - General discussion
36             http://bioperl.org/wiki/Mailing_lists - About the mailing lists
37              
38             =head2 Support
39              
40             Please direct usage questions or support issues to the mailing list:
41              
42             I
43              
44             rather than to the module maintainer directly. Many experienced and
45             reponsive experts will be able look at the problem and quickly
46             address it. Please include a thorough description of the problem
47             with code and data examples if at all possible.
48              
49             =head2 Reporting Bugs
50              
51             Report bugs to the Bioperl bug tracking system to help us keep track
52             the bugs and their resolution. Bug reports can be submitted via the
53             web:
54              
55             https://github.com/bioperl/bioperl-live/issues
56              
57             =head1 AUTHOR
58              
59             Heikki Lehvaslaiho, heikki-at-bioperl-dot-org
60              
61             =head1 CONTRIBUTORS
62              
63             Rob Edwards, redwards@utmem.edu
64              
65             =head1 SEE ALSO
66              
67             L
68              
69             =head1 APPENDIX
70              
71             Methods beginning with a leading underscore are considered private and
72             are intended for internal use by this module. They are not considered
73             part of the public interface and are described here for documentation
74             purposes only.
75              
76             =cut
77              
78             package Bio::Restriction::EnzymeI;
79 4     4   17 use strict;
  4         6  
  4         99  
80              
81 4     4   12 use base qw(Bio::Root::RootI);
  4         4  
  4         1846  
82              
83             =head1 Essential methods
84              
85             =cut
86              
87             =head2 name
88              
89             Title : name
90             Usage : $re->name($newval)
91             Function : Gets/Sets the restriction enzyme name
92             Example : $re->name('EcoRI')
93             Returns : value of name
94             Args : newvalue (optional)
95              
96             This will also clean up the name. I have added this because some
97             people get confused about restriction enzyme names. The name should
98             be One upper case letter, and two lower case letters (because it is
99             derived from the organism name, eg. EcoRI is from E. coli). After
100             that it is all confused, but the numbers should be roman numbers not
101             numbers, therefore we'll correct those. At least this will provide
102             some standard, I hope.
103              
104             =cut
105              
106 0     0 1   sub name { shift->throw_not_implemented; }
107              
108             =head2 site
109              
110             Title : site
111             Usage : $re->site();
112             Function : Gets/sets the recognition sequence for the enzyme.
113             Example : $seq_string = $re->site();
114             Returns : String containing recognition sequence indicating
115             : cleavage site as in 'G^AATTC'.
116             Argument : n/a
117             Throws : n/a
118              
119             Side effect: the sequence is always converted to upper case.
120              
121             The cut site can also be set by using methods L and
122             L.
123              
124             This will pad out missing sequence with N's. For example the enzyme
125             Acc36I cuts at ACCTGC(4/8). This will be returned as ACCTGCNNNN^
126              
127             Note that the common notation ACCTGC(4/8) means that the forward
128             strand cut is four nucleotides after the END of the recognition
129             site. The forward cut() in the coordinates used here in Acc36I
130             ACCTGC(4/8) is at 6+4 i.e. 10.
131              
132             ** This is the main setable method for the recognition site.
133              
134             =cut
135              
136 0     0 1   sub site { shift->throw_not_implemented; }
137              
138             =head2 revcom_site
139              
140             Title : revcom_site
141             Usage : $re->revcom_site();
142             Function : Gets/sets the complementary recognition sequence for the enzyme.
143             Example : $seq_string = $re->revcom_site();
144             Returns : String containing recognition sequence indicating
145             : cleavage site as in 'G^AATTC'.
146             Argument : Sequence of the site
147             Throws : n/a
148              
149             This is the same as site, except it returns the revcom site. For
150             palindromic enzymes these two are identical. For non-palindromic
151             enzymes they are not!
152              
153             See also L above.
154              
155             =cut
156              
157 0     0 0   sub cuts_after { shift->throw_not_implemented; }
158              
159             =head2 cut
160              
161             Title : cut
162             Usage : $num = $re->cut(1);
163             Function : Sets/gets an integer indicating the position of cleavage
164             relative to the 5' end of the recognition sequence in the
165             forward strand.
166              
167             For type II enzymes, sets the symmetrically positioned
168             reverse strand cut site by calling complementary_cut().
169              
170             Returns : Integer, 0 if not set
171             Argument : an integer for the forward strand cut site (optional)
172              
173              
174             Note that the common notation ACCTGC(4/8) means that the forward
175             strand cut is four nucleotides after the END of the recognition
176             site. The forwad cut in the coordinates used here in Acc36I
177             ACCTGC(4/8) is at 6+4 i.e. 10.
178              
179             Note that REBASE uses notation where cuts within symmetic sites are
180             marked by '^' within the forward sequence but if the site is
181             asymmetric the parenthesis syntax is used where numbering ALWAYS
182             starts from last nucleotide in the forward strand. That's why AciI has
183             a site usually written as CCGC(-3/-1) actualy cuts in
184              
185             C^C G C
186             G G C^G
187              
188             In our notation, these locations are 1 and 3.
189              
190             The cuts locations in the notation used are relative to the first
191             (non-N) nucleotide of the reported forward strand of the recognition
192             sequence. The following diagram numbers the phosphodiester bonds
193             (marked by + ) which can be cut by the restriction enzymes:
194              
195             1 2 3 4 5 6 7 8 ...
196             N + N + N + N + N + G + A + C + T + G + G + N + N + N
197             ... -5 -4 -3 -2 -1
198              
199             =cut
200              
201 0     0 1   sub cut { shift->throw_not_implemented; }
202              
203             =head2 complementary_cut
204              
205             Title : complementary_cut
206             Usage : $num = $re->complementary_cut('1');
207             Function : Sets/Gets an integer indicating the position of cleavage
208             : on the reverse strand of the restriction site.
209             Returns : Integer
210             Argument : An integer (optional)
211             Throws : Exception if argument is non-numeric.
212              
213             This method determines the cut on the reverse strand of the sequence.
214             For most enzymes this will be within the sequence, and will be set
215             automatically based on the forward strand cut, but it need not be.
216              
217             B that the returned location indicates the location AFTER the
218             first non-N site nucleotide in the FORWARD strand.
219              
220             =cut
221              
222 0     0 1   sub complementary_cut { shift->throw_not_implemented; }
223              
224             =head1 Read only (usually) recognition site descriptive methods
225              
226             =cut
227              
228             =head2 type
229              
230             Title : type
231             Usage : $re->type();
232             Function : Get/set the restriction system type
233             Returns :
234             Argument : optional type: ('I'|II|III)
235              
236             Restriction enzymes have been catezorized into three types. Some
237             REBASE formats give the type, but the following rules can be used to
238             classify the known enzymes:
239              
240             =over 4
241              
242             =item 1
243              
244             Bipartite site (with 6-8 Ns in the middle and the cut site
245             is E 50 nt away) =E type I
246              
247             =item 2
248              
249             Site length E 3 =E type I
250              
251             =item 3
252              
253             5-6 asymmetric site and cuts E20 nt away =E type III
254              
255             =item 4
256              
257             All other =E type II
258              
259             =back
260              
261             There are some enzymes in REBASE which have bipartite recognition site
262             and cat far from the site but are still classified as type I. I've no
263             idea if this is really so.
264              
265             =cut
266              
267 0     0 1   sub type { shift->throw_not_implemented; }
268              
269             =head2 seq
270              
271             Title : seq
272             Usage : $re->seq();
273             Function : Get the Bio::PrimarySeq.pm object representing
274             : the recognition sequence
275             Returns : A Bio::PrimarySeq object representing the
276             enzyme recognition site
277             Argument : n/a
278             Throws : n/a
279              
280              
281             =cut
282              
283 0     0 1   sub seq { shift->throw_not_implemented; }
284              
285             =head2 string
286              
287             Title : string
288             Usage : $re->string();
289             Function : Get a string representing the recognition sequence.
290             Returns : String. Does NOT contain a '^' representing the cut location
291             as returned by the site() method.
292             Argument : n/a
293             Throws : n/a
294              
295             =cut
296              
297 0     0 1   sub string { shift->throw_not_implemented; }
298              
299             =head2 revcom
300              
301             Title : revcom
302             Usage : $re->revcom();
303             Function : Get a string representing the reverse complement of
304             : the recognition sequence.
305             Returns : String
306             Argument : n/a
307             Throws : n/a
308              
309             =cut
310              
311 0     0 1   sub revcom { shift->throw_not_implemented; }
312              
313             =head2 recognition_length
314              
315             Title : recognition_length
316             Usage : $re->recognition_length();
317             Function : Get the length of the RECOGNITION sequence.
318             This is the total recognition sequence,
319             inluding the ambiguous codes.
320             Returns : An integer
321             Argument : Nothing
322              
323             See also: L
324              
325             =cut
326              
327 0     0 1   sub recognition_length { shift->throw_not_implemented; }
328              
329             =head2 non_ambiguous_length
330              
331             Title : non_ambiguous_length
332             Usage : $re->non_ambiguous_length();
333             Function : Get the nonambiguous length of the RECOGNITION sequence.
334             This is the total recognition sequence,
335             excluding the ambiguous codes.
336             Returns : An integer
337             Argument : Nothing
338              
339             See also: L
340              
341             =cut
342              
343 0     0 1   sub non_ambiguous_length { shift->throw_not_implemented; }
344              
345             =head2 cutter
346              
347             Title : cutter
348             Usage : $re->cutter
349             Function : Returns the "cutter" value of the recognition site.
350              
351             This is a value relative to site length and lack of
352             ambiguity codes. Hence: 'RCATGY' is a five (5) cutter site
353             and 'CCTNAGG' a six cutter
354              
355             This measure correlates to the frequency of the enzyme
356             cuts much better than plain recognition site length.
357              
358             Example : $re->cutter
359             Returns : integer or float number
360             Args : none
361              
362             Why is this better than just stripping the ambiguous codes? Think about
363             it like this: You have a random sequence; all nucleotides are equally
364             probable. You have a four nucleotide re site. The probability of that
365             site finding a match is one out of 4^4 or 256, meaning that on average
366             a four cutter finds a match every 256 nucleotides. For a six cutter,
367             the average fragment length is 4^6 or 4096. In the case of ambiguity
368             codes the chances are finding the match are better: an R (A|T) has 1/2
369             chance of finding a match in a random sequence. Therefore, for RGCGCY
370             the probability is one out of (2*4*4*4*4*2) which exactly the same as
371             for a five cutter! Cutter, although it can have non-integer values
372             turns out to be a useful and simple measure.
373              
374             From bug 2178: VHDB are ambiguity symbols that match three different
375             nucleotides, so they contribute less to the effective recognition sequence
376             length than e.g. Y which matches only two nucleotides. A symbol which matches n
377             of the 4 nucleotides has an effective length of 1 - log(n) / log(4).
378              
379             =cut
380              
381 0     0 1   sub cutter { shift->throw_not_implemented; }
382              
383             =head2 is_palindromic
384              
385             Title : is_palindromic
386             Usage : $re->is_palindromic();
387             Function : Determines if the recognition sequence is palindromic
388             : for the current restriction enzyme.
389             Returns : Boolean
390             Argument : n/a
391             Throws : n/a
392              
393             A palindromic site (EcoRI):
394              
395             5-GAATTC-3
396             3-CTTAAG-5
397              
398             =cut
399              
400 0     0 1   sub is_palindromic { shift->throw_not_implemented; }
401              
402             =head2 overhang
403              
404             Title : overhang
405             Usage : $re->overhang();
406             Function : Determines the overhang of the restriction enzyme
407             Returns : "5'", "3'", "blunt" of undef
408             Argument : n/a
409             Throws : n/a
410              
411             A blunt site in SmaI returns C
412              
413             5' C C C^G G G 3'
414             3' G G G^C C C 5'
415              
416             A 5' overhang in EcoRI returns C<5'>
417              
418             5' G^A A T T C 3'
419             3' C T T A A^G 5'
420              
421             A 3' overhang in KpnI returns C<3'>
422              
423             5' G G T A C^C 3'
424             3' C^C A T G G 5'
425              
426             =cut
427              
428 0     0 1   sub overhang { shift->throw_not_implemented; }
429              
430             =head2 overhang_seq
431              
432             Title : overhang_seq
433             Usage : $re->overhang_seq();
434             Function : Determines the overhang sequence of the restriction enzyme
435             Returns : a Bio::LocatableSeq
436             Argument : n/a
437             Throws : n/a
438              
439             I do not think it is necessary to create a seq object of these. (Heikki)
440              
441             Note: returns empty string for blunt sequences and undef for ones that
442             we don't know. Compare these:
443              
444             A blunt site in SmaI returns empty string
445              
446             5' C C C^G G G 3'
447             3' G G G^C C C 5'
448              
449             A 5' overhang in EcoRI returns C
450              
451             5' G^A A T T C 3'
452             3' C T T A A^G 5'
453              
454             A 3' overhang in KpnI returns C
455              
456             5' G G T A C^C 3'
457             3' C^C A T G G 5'
458              
459             Note that you need to use method L to decide
460             whether it is a 5' or 3' overhang!!!
461              
462             Note: The overhang stuff does not work if the site is asymmetric! Rethink!
463              
464             =cut
465              
466 0     0 1   sub overhang_seq { shift->throw_not_implemented; }
467              
468             =head2 compatible_ends
469              
470             Title : compatible_ends
471             Usage : $re->compatible_ends($re2);
472             Function : Determines if the two restriction enzyme cut sites
473             have compatible ends.
474             Returns : 0 if not, 1 if only one pair ends match, 2 if both ends.
475             Argument : a Bio::Restriction::Enzyme
476             Throws : unless the argument is a Bio::Resriction::Enzyme and
477             if there are Ns in the ovarhangs
478              
479             In case of type II enzymes which which cut symmetrically, this
480             function can be considered to return a boolean value.
481              
482             =cut
483              
484 0     0 1   sub compatible_ends {shift->throw_not_implemented;}
485              
486             =head2 is_ambiguous
487              
488             Title : is_ambiguous
489             Usage : $re->is_ambiguous();
490             Function : Determines if the restriction enzyme contains ambiguous sequences
491             Returns : Boolean
492             Argument : n/a
493             Throws : n/a
494              
495             =cut
496              
497 0     0 1   sub is_ambiguous { shift->throw_not_implemented; }
498              
499             =head2 Additional methods from Rebase
500              
501             =cut
502              
503              
504             =head2 is_prototype
505              
506             Title : is_prototype
507             Usage : $re->is_prototype
508             Function : Get/Set method for finding out if this enzyme is a prototype
509             Example : $re->is_prototype(1)
510             Returns : Boolean
511             Args : none
512              
513             Prototype enzymes are the most commonly available and usually first
514             enzymes discoverd that have the same recognition site. Using only
515             prototype enzymes in restriciton analysis avoids redundacy and
516             speeds things up.
517              
518             =cut
519              
520 0     0 1   sub is_prototype { shift->throw_not_implemented; }
521              
522             =head2 prototype_name
523              
524             Title : prototype_name
525             Usage : $re->prototype_name
526             Function : Get/Set method for the name of prototype for
527             this enzyme's recognition site
528             Example : $re->prototype_name(1)
529             Returns : prototype enzyme name string or an empty string
530             Args : optional prototype enzyme name string
531              
532             If the enzyme itself is the protype, its own name is returned. Not to
533             confuse the negative result with an unset value, use method
534             L.
535              
536             This method is called I rather than I,
537             because it returns a string rather than on object.
538              
539             =cut
540              
541 0     0 1   sub prototype_name { shift->throw_not_implemented; }
542              
543             =head2 isoschizomers
544              
545             Title : isoschizomers
546             Usage : $re->isoschizomers(@list);
547             Function : Gets/Sets a list of known isoschizomers (enzymes that
548             recognize the same site, but don't necessarily cut at
549             the same position).
550             Arguments : A reference to an array that contains the isoschizomers
551             Returns : A reference to an array of the known isoschizomers or 0
552             if not defined.
553              
554             Added for compatibility to REBASE
555              
556             =cut
557              
558 0     0 1   sub isoschizomers { shift->throw_not_implemented; }
559              
560             =head2 purge_isoschizomers
561              
562             Title : purge_isoschizomers
563             Usage : $re->purge_isoschizomers();
564             Function : Purges the set of isoschizomers for this enzyme
565             Arguments :
566             Returns : 1
567              
568             =cut
569              
570 0     0 1   sub purge_isoschizomers { shift->throw_not_implemented; }
571              
572             =head2 methylation_sites
573              
574             Title : methylation_sites
575             Usage : $re->methylation_sites(\%sites);
576             Function : Gets/Sets known methylation sites (positions on the sequence
577             that get modified to promote or prevent cleavage).
578             Arguments : A reference to a hash that contains the methylation sites
579             Returns : A reference to a hash of the methylation sites or
580             an empty string if not defined.
581              
582             There are three types of methylation sites:
583              
584             =over 3
585              
586             =item * (6) = N6-methyladenosine
587              
588             =item * (5) = 5-methylcytosine
589              
590             =item * (4) = N4-methylcytosine
591              
592             =back
593              
594             These are stored as 6, 5, and 4 respectively. The hash has the
595             sequence position as the key and the type of methylation as the value.
596             A negative number in the sequence position indicates that the DNA is
597             methylated on the complementary strand.
598              
599             Note that in REBASE, the methylation positions are given
600             Added for compatibility to REBASE.
601              
602             =cut
603              
604 0     0 1   sub methylation_sites { shift->throw_not_implemented; }
605              
606             =head2 purge_methylation_sites
607              
608             Title : purge_methylation_sites
609             Usage : $re->purge_methylation_sites();
610             Function : Purges the set of methylation_sites for this enzyme
611             Arguments :
612             Returns :
613              
614             =cut
615              
616 0     0 1   sub purge_methylation_sites { shift->throw_not_implemented; }
617              
618             =head2 microbe
619              
620             Title : microbe
621             Usage : $re->microbe($microbe);
622             Function : Gets/Sets microorganism where the restriction enzyme was found
623             Arguments : A scalar containing the microbes name
624             Returns : A scalar containing the microbes name or 0 if not defined
625              
626             Added for compatibility to REBASE
627              
628             =cut
629              
630 0     0 1   sub microbe { shift->throw_not_implemented; }
631              
632             =head2 source
633              
634             Title : source
635             Usage : $re->source('Rob Edwards');
636             Function : Gets/Sets the person who provided the enzyme
637             Arguments : A scalar containing the persons name
638             Returns : A scalar containing the persons name or 0 if not defined
639              
640             Added for compatibility to REBASE
641              
642             =cut
643              
644 0     0 1   sub source { shift->throw_not_implemented; }
645              
646             =head2 vendors
647              
648             Title : vendors
649             Usage : $re->vendor(@list_of_companies);
650             Function : Gets/Sets the a list of companies that you can get the enzyme from.
651             Also sets the commercially_available boolean
652             Arguments : A reference to an array containing the names of companies
653             that you can get the enzyme from
654             Returns : A reference to an array containing the names of companies
655             that you can get the enzyme from
656              
657             Added for compatibility to REBASE
658              
659             =cut
660              
661 0     0 1   sub vendors { shift->throw_not_implemented; }
662              
663             =head2 purge_vendors
664              
665             Title : purge_vendors
666             Usage : $re->purge_references();
667             Function : Purges the set of references for this enzyme
668             Arguments :
669             Returns :
670              
671             =cut
672              
673 0     0 1   sub purge_vendors { shift->throw_not_implemented; }
674              
675             =head2 vendor
676              
677             Title : vendor
678             Usage : $re->vendor(@list_of_companies);
679             Function : Gets/Sets the a list of companies that you can get the enzyme from.
680             Also sets the commercially_available boolean
681             Arguments : A reference to an array containing the names of companies
682             that you can get the enzyme from
683             Returns : A reference to an array containing the names of companies
684             that you can get the enzyme from
685              
686             Added for compatibility to REBASE
687              
688             =cut
689              
690 0     0 1   sub vendor { shift->throw_not_implemented; }
691              
692             =head2 references
693              
694             Title : references
695             Usage : $re->references(string);
696             Function : Gets/Sets the references for this enzyme
697             Arguments : an array of string reference(s) (optional)
698             Returns : an array of references
699              
700             Use L to reset the list of references
701              
702             This should be a L or L object, but its not (yet)
703              
704             =cut
705              
706 0     0 1   sub references { shift->throw_not_implemented; }
707              
708             =head2 purge_references
709              
710             Title : purge_references
711             Usage : $re->purge_references();
712             Function : Purges the set of references for this enzyme
713             Arguments :
714             Returns : 1
715              
716             =cut
717              
718 0     0 1   sub purge_references { shift->throw_not_implemented; }
719              
720             =head2 clone
721              
722             Title : clone
723             Usage : $re->clone
724             Function : Deep copy of the object
725             Arguments : -
726             Returns : new Bio::Restriction::EnzymeI object
727              
728             This works as long as the object is a clean in-memory object using
729             scalars, arrays and hashes. You have been warned.
730              
731             If you have module Storable, it is used, otherwise local code is used.
732             Todo: local code cuts circular references.
733              
734             =cut
735              
736 0     0 1   sub clone { shift->throw_not_implemented; }
737              
738             1;
739