File Coverage

Bio/Restriction/EnzymeI.pm

Criterion	Covered	Total	%
statement	6	37	16.2
branch			n/a
condition			n/a
subroutine	2	33	6.0
pod	30	31	96.7
total	38	101	37.6

line	stmt	sub	pod	time	code
1					#------------------------------------------------------------------
2					#
3					# BioPerl module Bio::Restriction::EnzymeI
4					#
5					# Please direct questions and support issues to
6					#
7					# Cared for by Heikki Lehvaslaiho, heikki-at-bioperl-dot-org
8					#
9					# You may distribute this module under the same terms as perl itself
10					#------------------------------------------------------------------
11
12					## POD Documentation:
13
14					=head1 NAME
15
16					Bio::Restriction::EnzymeI - Interface class for restriction endonuclease
17
18					=head1 SYNOPSIS
19
20					# do not run this class directly
21
22					=head1 DESCRIPTION
23
24					This module defines methods for a single restriction endonuclease. For an
25					implementation, see L.
26
27					=head1 FEEDBACK
28
29					=head2 Mailing Lists
30
31					User feedback is an integral part of the evolution of this and other
32					Bioperl modules. Send your comments and suggestions preferably to one
33					of the Bioperl mailing lists. Your participation is much appreciated.
34
35					bioperl-l@bioperl.org - General discussion
36					http://bioperl.org/wiki/Mailing_lists - About the mailing lists
37
38					=head2 Support
39
40					Please direct usage questions or support issues to the mailing list:
41
42					I
43
44					rather than to the module maintainer directly. Many experienced and
45					reponsive experts will be able look at the problem and quickly
46					address it. Please include a thorough description of the problem
47					with code and data examples if at all possible.
48
49					=head2 Reporting Bugs
50
51					Report bugs to the Bioperl bug tracking system to help us keep track
52					the bugs and their resolution. Bug reports can be submitted via the
53					web:
54
55					https://github.com/bioperl/bioperl-live/issues
56
57					=head1 AUTHOR
58
59					Heikki Lehvaslaiho, heikki-at-bioperl-dot-org
60
61					=head1 CONTRIBUTORS
62
63					Rob Edwards, redwards@utmem.edu
64
65					=head1 SEE ALSO
66
67					L
68
69					=head1 APPENDIX
70
71					Methods beginning with a leading underscore are considered private and
72					are intended for internal use by this module. They are not considered
73					part of the public interface and are described here for documentation
74					purposes only.
75
76					=cut
77
78					package Bio::Restriction::EnzymeI;
79	4	4		17	use strict;
	4			6
	4			99
80
81	4	4		12	use base qw(Bio::Root::RootI);
	4			4
	4			1846
82
83					=head1 Essential methods
84
85					=cut
86
87					=head2 name
88
89					Title : name
90					Usage : $re->name($newval)
91					Function : Gets/Sets the restriction enzyme name
92					Example : $re->name('EcoRI')
93					Returns : value of name
94					Args : newvalue (optional)
95
96					This will also clean up the name. I have added this because some
97					people get confused about restriction enzyme names. The name should
98					be One upper case letter, and two lower case letters (because it is
99					derived from the organism name, eg. EcoRI is from E. coli). After
100					that it is all confused, but the numbers should be roman numbers not
101					numbers, therefore we'll correct those. At least this will provide
102					some standard, I hope.
103
104					=cut
105
106	0	0	1		sub name { shift->throw_not_implemented; }
107
108					=head2 site
109
110					Title : site
111					Usage : $re->site();
112					Function : Gets/sets the recognition sequence for the enzyme.
113					Example : $seq_string = $re->site();
114					Returns : String containing recognition sequence indicating
115					: cleavage site as in 'G^AATTC'.
116					Argument : n/a
117					Throws : n/a
118
119					Side effect: the sequence is always converted to upper case.
120
121					The cut site can also be set by using methods L and
122					L.
123
124					This will pad out missing sequence with N's. For example the enzyme
125					Acc36I cuts at ACCTGC(4/8). This will be returned as ACCTGCNNNN^
126
127					Note that the common notation ACCTGC(4/8) means that the forward
128					strand cut is four nucleotides after the END of the recognition
129					site. The forward cut() in the coordinates used here in Acc36I
130					ACCTGC(4/8) is at 6+4 i.e. 10.
131
132					** This is the main setable method for the recognition site.
133
134					=cut
135
136	0	0	1		sub site { shift->throw_not_implemented; }
137
138					=head2 revcom_site
139
140					Title : revcom_site
141					Usage : $re->revcom_site();
142					Function : Gets/sets the complementary recognition sequence for the enzyme.
143					Example : $seq_string = $re->revcom_site();
144					Returns : String containing recognition sequence indicating
145					: cleavage site as in 'G^AATTC'.
146					Argument : Sequence of the site
147					Throws : n/a
148
149					This is the same as site, except it returns the revcom site. For
150					palindromic enzymes these two are identical. For non-palindromic
151					enzymes they are not!
152
153					See also L above.
154
155					=cut
156
157	0	0	0		sub cuts_after { shift->throw_not_implemented; }
158
159					=head2 cut
160
161					Title : cut
162					Usage : $num = $re->cut(1);
163					Function : Sets/gets an integer indicating the position of cleavage
164					relative to the 5' end of the recognition sequence in the
165					forward strand.
166
167					For type II enzymes, sets the symmetrically positioned
168					reverse strand cut site by calling complementary_cut().
169
170					Returns : Integer, 0 if not set
171					Argument : an integer for the forward strand cut site (optional)
172
173
174					Note that the common notation ACCTGC(4/8) means that the forward
175					strand cut is four nucleotides after the END of the recognition
176					site. The forwad cut in the coordinates used here in Acc36I
177					ACCTGC(4/8) is at 6+4 i.e. 10.
178
179					Note that REBASE uses notation where cuts within symmetic sites are
180					marked by '^' within the forward sequence but if the site is
181					asymmetric the parenthesis syntax is used where numbering ALWAYS
182					starts from last nucleotide in the forward strand. That's why AciI has
183					a site usually written as CCGC(-3/-1) actualy cuts in
184
185					C^C G C
186					G G C^G
187
188					In our notation, these locations are 1 and 3.
189
190					The cuts locations in the notation used are relative to the first
191					(non-N) nucleotide of the reported forward strand of the recognition
192					sequence. The following diagram numbers the phosphodiester bonds
193					(marked by + ) which can be cut by the restriction enzymes:
194
195					1 2 3 4 5 6 7 8 ...
196					N + N + N + N + N + G + A + C + T + G + G + N + N + N
197					... -5 -4 -3 -2 -1
198
199					=cut
200
201	0	0	1		sub cut { shift->throw_not_implemented; }
202
203					=head2 complementary_cut
204
205					Title : complementary_cut
206					Usage : $num = $re->complementary_cut('1');
207					Function : Sets/Gets an integer indicating the position of cleavage
208					: on the reverse strand of the restriction site.
209					Returns : Integer
210					Argument : An integer (optional)
211					Throws : Exception if argument is non-numeric.
212
213					This method determines the cut on the reverse strand of the sequence.
214					For most enzymes this will be within the sequence, and will be set
215					automatically based on the forward strand cut, but it need not be.
216
217					B that the returned location indicates the location AFTER the
218					first non-N site nucleotide in the FORWARD strand.
219
220					=cut
221
222	0	0	1		sub complementary_cut { shift->throw_not_implemented; }
223
224					=head1 Read only (usually) recognition site descriptive methods
225
226					=cut
227
228					=head2 type
229
230					Title : type
231					Usage : $re->type();
232					Function : Get/set the restriction system type
233					Returns :
234					Argument : optional type: ('I'\|II\|III)
235
236					Restriction enzymes have been catezorized into three types. Some
237					REBASE formats give the type, but the following rules can be used to
238					classify the known enzymes:
239
240					=over 4
241
242					=item 1
243
244					Bipartite site (with 6-8 Ns in the middle and the cut site
245					is E 50 nt away) =E type I
246
247					=item 2
248
249					Site length E 3 =E type I
250
251					=item 3
252
253					5-6 asymmetric site and cuts E20 nt away =E type III
254
255					=item 4
256
257					All other =E type II
258
259					=back
260
261					There are some enzymes in REBASE which have bipartite recognition site
262					and cat far from the site but are still classified as type I. I've no
263					idea if this is really so.
264
265					=cut
266
267	0	0	1		sub type { shift->throw_not_implemented; }
268
269					=head2 seq
270
271					Title : seq
272					Usage : $re->seq();
273					Function : Get the Bio::PrimarySeq.pm object representing
274					: the recognition sequence
275					Returns : A Bio::PrimarySeq object representing the
276					enzyme recognition site
277					Argument : n/a
278					Throws : n/a
279
280
281					=cut
282
283	0	0	1		sub seq { shift->throw_not_implemented; }
284
285					=head2 string
286
287					Title : string
288					Usage : $re->string();
289					Function : Get a string representing the recognition sequence.
290					Returns : String. Does NOT contain a '^' representing the cut location
291					as returned by the site() method.
292					Argument : n/a
293					Throws : n/a
294
295					=cut
296
297	0	0	1		sub string { shift->throw_not_implemented; }
298
299					=head2 revcom
300
301					Title : revcom
302					Usage : $re->revcom();
303					Function : Get a string representing the reverse complement of
304					: the recognition sequence.
305					Returns : String
306					Argument : n/a
307					Throws : n/a
308
309					=cut
310
311	0	0	1		sub revcom { shift->throw_not_implemented; }
312
313					=head2 recognition_length
314
315					Title : recognition_length
316					Usage : $re->recognition_length();
317					Function : Get the length of the RECOGNITION sequence.
318					This is the total recognition sequence,
319					inluding the ambiguous codes.
320					Returns : An integer
321					Argument : Nothing
322
323					See also: L
324
325					=cut
326
327	0	0	1		sub recognition_length { shift->throw_not_implemented; }
328
329					=head2 non_ambiguous_length
330
331					Title : non_ambiguous_length
332					Usage : $re->non_ambiguous_length();
333					Function : Get the nonambiguous length of the RECOGNITION sequence.
334					This is the total recognition sequence,
335					excluding the ambiguous codes.
336					Returns : An integer
337					Argument : Nothing
338
339					See also: L
340
341					=cut
342
343	0	0	1		sub non_ambiguous_length { shift->throw_not_implemented; }
344
345					=head2 cutter
346
347					Title : cutter
348					Usage : $re->cutter
349					Function : Returns the "cutter" value of the recognition site.
350
351					This is a value relative to site length and lack of
352					ambiguity codes. Hence: 'RCATGY' is a five (5) cutter site
353					and 'CCTNAGG' a six cutter
354
355					This measure correlates to the frequency of the enzyme
356					cuts much better than plain recognition site length.
357
358					Example : $re->cutter
359					Returns : integer or float number
360					Args : none
361
362					Why is this better than just stripping the ambiguous codes? Think about
363					it like this: You have a random sequence; all nucleotides are equally
364					probable. You have a four nucleotide re site. The probability of that
365					site finding a match is one out of 4^4 or 256, meaning that on average
366					a four cutter finds a match every 256 nucleotides. For a six cutter,
367					the average fragment length is 4^6 or 4096. In the case of ambiguity
368					codes the chances are finding the match are better: an R (A\|T) has 1/2
369					chance of finding a match in a random sequence. Therefore, for RGCGCY
370					the probability is one out of (24444*2) which exactly the same as
371					for a five cutter! Cutter, although it can have non-integer values
372					turns out to be a useful and simple measure.
373
374					From bug 2178: VHDB are ambiguity symbols that match three different
375					nucleotides, so they contribute less to the effective recognition sequence
376					length than e.g. Y which matches only two nucleotides. A symbol which matches n
377					of the 4 nucleotides has an effective length of 1 - log(n) / log(4).
378
379					=cut
380
381	0	0	1		sub cutter { shift->throw_not_implemented; }
382
383					=head2 is_palindromic
384
385					Title : is_palindromic
386					Usage : $re->is_palindromic();
387					Function : Determines if the recognition sequence is palindromic
388					: for the current restriction enzyme.
389					Returns : Boolean
390					Argument : n/a
391					Throws : n/a
392
393					A palindromic site (EcoRI):
394
395					5-GAATTC-3
396					3-CTTAAG-5
397
398					=cut
399
400	0	0	1		sub is_palindromic { shift->throw_not_implemented; }
401
402					=head2 overhang
403
404					Title : overhang
405					Usage : $re->overhang();
406					Function : Determines the overhang of the restriction enzyme
407					Returns : "5'", "3'", "blunt" of undef
408					Argument : n/a
409					Throws : n/a
410
411					A blunt site in SmaI returns C
412
413					5' C C C^G G G 3'
414					3' G G G^C C C 5'
415
416					A 5' overhang in EcoRI returns C<5'>
417
418					5' G^A A T T C 3'
419					3' C T T A A^G 5'
420
421					A 3' overhang in KpnI returns C<3'>
422
423					5' G G T A C^C 3'
424					3' C^C A T G G 5'
425
426					=cut
427
428	0	0	1		sub overhang { shift->throw_not_implemented; }
429
430					=head2 overhang_seq
431
432					Title : overhang_seq
433					Usage : $re->overhang_seq();
434					Function : Determines the overhang sequence of the restriction enzyme
435					Returns : a Bio::LocatableSeq
436					Argument : n/a
437					Throws : n/a
438
439					I do not think it is necessary to create a seq object of these. (Heikki)
440
441					Note: returns empty string for blunt sequences and undef for ones that
442					we don't know. Compare these:
443
444					A blunt site in SmaI returns empty string
445
446					5' C C C^G G G 3'
447					3' G G G^C C C 5'
448
449					A 5' overhang in EcoRI returns C
450
451					5' G^A A T T C 3'
452					3' C T T A A^G 5'
453
454					A 3' overhang in KpnI returns C
455
456					5' G G T A C^C 3'
457					3' C^C A T G G 5'
458
459					Note that you need to use method L to decide
460					whether it is a 5' or 3' overhang!!!
461
462					Note: The overhang stuff does not work if the site is asymmetric! Rethink!
463
464					=cut
465
466	0	0	1		sub overhang_seq { shift->throw_not_implemented; }
467
468					=head2 compatible_ends
469
470					Title : compatible_ends
471					Usage : $re->compatible_ends($re2);
472					Function : Determines if the two restriction enzyme cut sites
473					have compatible ends.
474					Returns : 0 if not, 1 if only one pair ends match, 2 if both ends.
475					Argument : a Bio::Restriction::Enzyme
476					Throws : unless the argument is a Bio::Resriction::Enzyme and
477					if there are Ns in the ovarhangs
478
479					In case of type II enzymes which which cut symmetrically, this
480					function can be considered to return a boolean value.
481
482					=cut
483
484	0	0	1		sub compatible_ends {shift->throw_not_implemented;}
485
486					=head2 is_ambiguous
487
488					Title : is_ambiguous
489					Usage : $re->is_ambiguous();
490					Function : Determines if the restriction enzyme contains ambiguous sequences
491					Returns : Boolean
492					Argument : n/a
493					Throws : n/a
494
495					=cut
496
497	0	0	1		sub is_ambiguous { shift->throw_not_implemented; }
498
499					=head2 Additional methods from Rebase
500
501					=cut
502
503
504					=head2 is_prototype
505
506					Title : is_prototype
507					Usage : $re->is_prototype
508					Function : Get/Set method for finding out if this enzyme is a prototype
509					Example : $re->is_prototype(1)
510					Returns : Boolean
511					Args : none
512
513					Prototype enzymes are the most commonly available and usually first
514					enzymes discoverd that have the same recognition site. Using only
515					prototype enzymes in restriciton analysis avoids redundacy and
516					speeds things up.
517
518					=cut
519
520	0	0	1		sub is_prototype { shift->throw_not_implemented; }
521
522					=head2 prototype_name
523
524					Title : prototype_name
525					Usage : $re->prototype_name
526					Function : Get/Set method for the name of prototype for
527					this enzyme's recognition site
528					Example : $re->prototype_name(1)
529					Returns : prototype enzyme name string or an empty string
530					Args : optional prototype enzyme name string
531
532					If the enzyme itself is the protype, its own name is returned. Not to
533					confuse the negative result with an unset value, use method
534					L.
535
536					This method is called I rather than I,
537					because it returns a string rather than on object.
538
539					=cut
540
541	0	0	1		sub prototype_name { shift->throw_not_implemented; }
542
543					=head2 isoschizomers
544
545					Title : isoschizomers
546					Usage : $re->isoschizomers(@list);
547					Function : Gets/Sets a list of known isoschizomers (enzymes that
548					recognize the same site, but don't necessarily cut at
549					the same position).
550					Arguments : A reference to an array that contains the isoschizomers
551					Returns : A reference to an array of the known isoschizomers or 0
552					if not defined.
553
554					Added for compatibility to REBASE
555
556					=cut
557
558	0	0	1		sub isoschizomers { shift->throw_not_implemented; }
559
560					=head2 purge_isoschizomers
561
562					Title : purge_isoschizomers
563					Usage : $re->purge_isoschizomers();
564					Function : Purges the set of isoschizomers for this enzyme
565					Arguments :
566					Returns : 1
567
568					=cut
569
570	0	0	1		sub purge_isoschizomers { shift->throw_not_implemented; }
571
572					=head2 methylation_sites
573
574					Title : methylation_sites
575					Usage : $re->methylation_sites(\%sites);
576					Function : Gets/Sets known methylation sites (positions on the sequence
577					that get modified to promote or prevent cleavage).
578					Arguments : A reference to a hash that contains the methylation sites
579					Returns : A reference to a hash of the methylation sites or
580					an empty string if not defined.
581
582					There are three types of methylation sites:
583
584					=over 3
585
586					=item * (6) = N6-methyladenosine
587
588					=item * (5) = 5-methylcytosine
589
590					=item * (4) = N4-methylcytosine
591
592					=back
593
594					These are stored as 6, 5, and 4 respectively. The hash has the
595					sequence position as the key and the type of methylation as the value.
596					A negative number in the sequence position indicates that the DNA is
597					methylated on the complementary strand.
598
599					Note that in REBASE, the methylation positions are given
600					Added for compatibility to REBASE.
601
602					=cut
603
604	0	0	1		sub methylation_sites { shift->throw_not_implemented; }
605
606					=head2 purge_methylation_sites
607
608					Title : purge_methylation_sites
609					Usage : $re->purge_methylation_sites();
610					Function : Purges the set of methylation_sites for this enzyme
611					Arguments :
612					Returns :
613
614					=cut
615
616	0	0	1		sub purge_methylation_sites { shift->throw_not_implemented; }
617
618					=head2 microbe
619
620					Title : microbe
621					Usage : $re->microbe($microbe);
622					Function : Gets/Sets microorganism where the restriction enzyme was found
623					Arguments : A scalar containing the microbes name
624					Returns : A scalar containing the microbes name or 0 if not defined
625
626					Added for compatibility to REBASE
627
628					=cut
629
630	0	0	1		sub microbe { shift->throw_not_implemented; }
631
632					=head2 source
633
634					Title : source
635					Usage : $re->source('Rob Edwards');
636					Function : Gets/Sets the person who provided the enzyme
637					Arguments : A scalar containing the persons name
638					Returns : A scalar containing the persons name or 0 if not defined
639
640					Added for compatibility to REBASE
641
642					=cut
643
644	0	0	1		sub source { shift->throw_not_implemented; }
645
646					=head2 vendors
647
648					Title : vendors
649					Usage : $re->vendor(@list_of_companies);
650					Function : Gets/Sets the a list of companies that you can get the enzyme from.
651					Also sets the commercially_available boolean
652					Arguments : A reference to an array containing the names of companies
653					that you can get the enzyme from
654					Returns : A reference to an array containing the names of companies
655					that you can get the enzyme from
656
657					Added for compatibility to REBASE
658
659					=cut
660
661	0	0	1		sub vendors { shift->throw_not_implemented; }
662
663					=head2 purge_vendors
664
665					Title : purge_vendors
666					Usage : $re->purge_references();
667					Function : Purges the set of references for this enzyme
668					Arguments :
669					Returns :
670
671					=cut
672
673	0	0	1		sub purge_vendors { shift->throw_not_implemented; }
674
675					=head2 vendor
676
677					Title : vendor
678					Usage : $re->vendor(@list_of_companies);
679					Function : Gets/Sets the a list of companies that you can get the enzyme from.
680					Also sets the commercially_available boolean
681					Arguments : A reference to an array containing the names of companies
682					that you can get the enzyme from
683					Returns : A reference to an array containing the names of companies
684					that you can get the enzyme from
685
686					Added for compatibility to REBASE
687
688					=cut
689
690	0	0	1		sub vendor { shift->throw_not_implemented; }
691
692					=head2 references
693
694					Title : references
695					Usage : $re->references(string);
696					Function : Gets/Sets the references for this enzyme
697					Arguments : an array of string reference(s) (optional)
698					Returns : an array of references
699
700					Use L to reset the list of references
701
702					This should be a L or L object, but its not (yet)
703
704					=cut
705
706	0	0	1		sub references { shift->throw_not_implemented; }
707
708					=head2 purge_references
709
710					Title : purge_references
711					Usage : $re->purge_references();
712					Function : Purges the set of references for this enzyme
713					Arguments :
714					Returns : 1
715
716					=cut
717
718	0	0	1		sub purge_references { shift->throw_not_implemented; }
719
720					=head2 clone
721
722					Title : clone
723					Usage : $re->clone
724					Function : Deep copy of the object
725					Arguments : -
726					Returns : new Bio::Restriction::EnzymeI object
727
728					This works as long as the object is a clean in-memory object using
729					scalars, arrays and hashes. You have been warned.
730
731					If you have module Storable, it is used, otherwise local code is used.
732					Todo: local code cuts circular references.
733
734					=cut
735
736	0	0	1		sub clone { shift->throw_not_implemented; }
737
738					1;
739