File Coverage

blib/lib/Class/DBI/ConceptSearch.pm
Criterion Covered Total %
statement 7 9 77.7
branch n/a
condition n/a
subroutine 3 3 100.0
pod n/a
total 10 12 83.3


line stmt bran cond sub pod time code
1             # $Id $
2             #
3             # Perl module for Class::DBI::ConceptSearch
4             #
5             # Cared for by Allen Day
6             #
7             # Copyright Allen Day
8             #
9             # You may distribute this module under the same terms as perl itself
10              
11             # POD documentation - main docs before the code
12              
13             =head1 NAME
14              
15             Class::DBI::ConceptSearch - Retrieve Class::DBI aggregates from high-level conceptual searches
16              
17             =head1 SYNOPSIS
18              
19             my $cs = Class::DBI::ConceptSearch->new(xml => $config); #see CONFIGURATION
20             $cs->use_wildcards(1);
21             $cs->use_implicit_wildcards(1);
22             $cs->use_search_ilike(1);
23             $cs->use_search_lower(1);
24              
25             my(@tracks) = $cs->search( albums => 'Britney' );
26              
27             =head1 DESCRIPTION
28              
29             Given the example Class::DBI classes (Music::CD, Music::Artist,
30             Music::Track), lets add another one, Music::Dbxref, which contains
31             external database accessions outside our control. Music::Dbxref includes
32             things like UPC IDs, ASIN and ISBN numbers, vendor and manufacturer part
33             numbers, person IDs (for artists), etc.
34              
35             Now, imagine a website with a basic search function that gives the users
36             the option of searching in "Albums", "Artists", "Tracks", and (my favorite)
37             "Anything".
38              
39             (1) In a simple implementation, a user search for "Britney Spears" in
40             "Artists" is going to do something like:
41              
42             Music::Artist->search( name => 'Britney Spears');
43              
44             (2) But suppose the user had accidentally searched in "Albums". The executed
45             search would be:
46              
47             Music::CD->search( title => 'Britney Spears');
48              
49             which doesn't produce any hits, and wouldn't even using search_like().
50             Doh!
51              
52             (3) Likewise, if the user were to search in *any* category for Britney's
53             CD "In the Zone" by its ASIN B0000DD7LB, no hits would be found.
54              
55             In a slightly more complex implementation, searches in each category might
56             try to match fields in multiple different tables. Query (2) might try to
57             match "Britney Spears" in both Artist.name and CD.title, but this would be
58             hardcoded into a class that performs the search. If the search should be
59             returning only CDs, we would also have to hardcode how to transform any
60             matching Music::Artist instance to Music::CD instance(s).
61              
62             This is where Class::DBI::ConceptSearch comes in. It contains a generic
63             search function that, given a configuration file, allows arbitrary
64             mappings of search categories to database table fields. You specify what
65             the available categories are, and where to look for data when a category
66             is searched.
67              
68             You also specify any transforms that need to be performed on the resulting
69             matches. This is where the Artist->CD mapping in query (2) is set up.
70              
71             You're also able to search in sections of the database which are private
72             internals, and return public data. For instance, in query (3), we might
73             have searched in "Artist" for the ASID. Behind the scenes,
74             Class::DBI::ConceptSearch finds the ID and follows up with a:
75              
76             Dbxref -> CD -> Artist
77              
78             transform and returns the Music::Artist objects.
79              
80             As we can imagine, there may be multiple possible paths within the
81             database between Dbxref and Artist. It is also possible to specify these,
82             see CONFIGURATION for details on how to define multiple sources
83              
84             NOTE: This example is contrived, and the usefulness of
85              
86             Concept -> Table.Field(s)
87              
88             mapping may not be readily apparent. Class::DBI::ConceptSearch really
89             shines when you have a more complex data model.
90              
91             =head2 CONFIGURATION aka CONCEPT MAP FORMAT
92              
93             =head3 An example
94              
95            
96            
97              
98            
101            
102            
103            
104              
105            
112            
113            
114            
115            
116            
117            
118            
119            
120            
121            
122              
123            
130            
131            
132            
133            
134            
135            
136            
137            
138            
139            
140              
141            
142              
143             =head3 Allowed elements and attributes
144              
145             conceptsearch # root container for searchable concepts
146             attributes:
147             name (optional)
148             page_size (optional) # number of search results per page if the DBI object uses Class::DBI::Pager
149             subelements:
150             concept (0..*)
151              
152             concept # a searchable concept
153             attributes:
154             name (required) # name of the concept
155             label (optional) # label of the concept, used for display UI, for
156             # instance
157             target (optional) # class of object returned by source
158             subelements:
159             source (0..*)
160              
161             source # class in which to search
162             attributes:
163             class (required) # name of class
164             field (required) # attribute of class to match search pattern
165             subelements:
166             transform (0..*)
167              
168             transform # rule to transform one class to another ; an edge
169             # between nodes
170             # a sourceclass.sourcefield = targetclass.targetfield
171             # join is performed
172             attributes:
173             sourceclass (required) # source class (defaults to parent source.class for
174             # first element
175             sourcefield (required) # source field which equals target field
176             targetclass (required) # target class returned
177             targetfield (required) # target field which equals source field
178             subelements:
179             none
180              
181             =head1 FEEDBACK
182              
183             =head2 Mailing Lists
184              
185             Email the author, or cdbi-talk@groups.kasei.com
186              
187             =head2 Reporting Bugs
188              
189             Email the author.
190              
191             =head1 AUTHOR
192              
193             Allen Day Eallenday@ucla.eduE
194              
195             =head1 SEE ALSO
196              
197             Concept Mapping
198             http://www.google.com/search?q=concept+mapping
199              
200             =head1 APPENDIX
201              
202             The rest of the documentation details each of the object methods.
203             Internal methods are usually preceded with a _
204              
205             =cut
206              
207             package Class::DBI::ConceptSearch;
208 1     1   12157 use strict;
  1         3  
  1         46  
209              
210 1     1   6 no strict 'refs';
  1         2  
  1         24  
211              
212 1     1   1540 use XML::XPath;
  0            
  0            
213              
214             our $VERSION = '0.04';
215              
216             use constant DEBUG => 0;
217              
218             =head2 new
219              
220             Title : new
221             Usage : my $obj = new Class::DBI::ConceptSearch(xml => $xml);
222             Function: Builds a new Class::DBI::ConceptSearch object
223             Returns : an instance of Class::DBI::ConceptSearch
224             Args : xml (required): an xml string describing the behavior of
225             this instance. See CONFIGURATION
226              
227              
228             =cut
229              
230             sub new {
231             my($class,%arg) = @_;
232              
233             my $self = bless {}, $class;
234             $self->_init(%arg);
235              
236             die(__PACKAGE__.' requires an "xml" argument.') unless $self->xml;
237              
238             return $self;
239             }
240              
241             =head2 _init
242              
243             Title : _init
244             Usage : $obj->_init(%arg);
245             Function: internal method. initializes a new Class::DBI::ConceptSearch object
246             Returns : true on success
247             Args : args passed to new()
248              
249              
250             =cut
251              
252             sub _init {
253             my($self,%arg) = @_;
254              
255             foreach my $arg (keys %arg){
256             $self->$arg($arg{$arg}) if $self->can($arg);
257             }
258              
259             *Class::DBI::_do_search = sub {
260             my ($proto, $search_type, @args) = @_;
261             my $class = ref $proto || $proto;
262              
263             @args = %{ $args[0] } if ref $args[0] eq "HASH";
264             my (@cols, @vals);
265             my $search_opts = @args % 2 ? pop @args : {};
266             while (my ($col, $val) = splice @args, 0, 2) {
267             #this regex allows the field being searched to be transformed,
268             #which can be useful for certain indexes, eg, in postgres:
269             # SELECT * FROM book WHERE lower(title) LIKE 'symbolic logic'
270             #can use a functional index defined as:
271             # CREATE INDEX ON book(lower(title))
272             #which performs much better than the ILIKE version of the same query:
273             # SELECT * FROM book WHERE title ILIKE 'symbolic logic';
274              
275             my($x,$y,$z) = $col =~ /^(.+\()(.+)(\))$/;
276             $col = $y if $y;
277              
278             my $column = $class->find_column($col)
279             || (List::Util::first { $_->accessor eq $col } $class->columns)
280             || $class->_croak("$col is not a column of $class");
281             push @cols, $y ? "$x$col$z" : $col;
282             push @vals, $class->_deflated_column($column, $val);
283             }
284              
285             my $frag = join " AND ",
286             map defined($vals[$_]) ? "$cols[$_] $search_type ?" : "$cols[$_] IS NULL",
287             0 .. $#cols;
288             $frag .= " ORDER BY $search_opts->{order_by}"
289             if $search_opts->{order_by};
290             return $class->sth_to_objects($class->sql_Retrieve($frag),
291             [ grep defined, @vals ]);
292             };
293              
294             return 1;
295             }
296              
297             =head2 search
298              
299             Title : search
300             Usage : $cs->search(concept => 'gene', pattern => 'GH1');
301             Function:
302             Returns : a (possibly heterogenous) list of objects inheriting from
303             Class::DBI.
304             Args : concept (required): conceptual domain to be searched
305             pattern (required): pattern to match in each source
306             table.field of concept search, as configured. See CONFIGURATION
307              
308              
309             =cut
310              
311             sub search {
312             #FIXME: the pod doc for this sub says args should come in as a hash but here they are used as an array.
313             my($self,$category,$pattern,$page_num) = @_;
314              
315             $page_num = 1 unless defined($page_num);
316              
317             return () unless defined($category) and defined($pattern);
318              
319             my $search_strategy;
320              
321             if(($pattern =~ /\*/s and $self->use_wildcards) or $self->use_implicit_wildcards){
322             $pattern =~ s/\*/%/gs;
323              
324             $pattern = '%'.$pattern.'%' if $self->use_implicit_wildcards;
325             }
326              
327             if($self->use_search_ilike){
328             $search_strategy = 'search_ilike';
329             } elsif($self->use_search_lower){
330             $search_strategy = 'search_lower';
331             } elsif($pattern =~ /%/) {
332             $search_strategy = 'search_like';
333             } else {
334             $search_strategy = 'search';
335             }
336              
337             my $config = XML::XPath->new( xml => $self->xml ) or die "couldn't instantiate XML::XPath: $!";
338              
339             my @concepts;
340             my @hits;
341             my @concept_hits =();
342             my $page_size = 20;
343              
344             #find the page_size for Class::DBI objects that support paging
345             foreach my $conceptsearch ($config->find('/conceptsearch')->get_nodelist){
346             if(defined($conceptsearch->getAttribute('page_size'))) { $page_size = $conceptsearch->getAttribute('page_size'); }
347             }
348              
349             #a driver to test the search
350             warn "iterate over concepts using $search_strategy" if DEBUG;
351             foreach my $concept ($config->find('/conceptsearch/concept')->get_nodelist){
352             warn "concept: $category" if DEBUG;
353             next unless $category eq $concept->getAttribute('name');
354             warn " searching..." if DEBUG;
355              
356             foreach my $source ($concept->find('source')->get_nodelist){
357             my $sourceclass = $source->getAttribute('class');
358             my $sourcefield = $source->getAttribute('field');
359              
360             warn "searching: $sourceclass.$sourcefield for '$pattern' with $search_strategy" if DEBUG;
361              
362             my @source_matches;
363             # check if the targetclass is able to use the Class::DBI::Pager API
364             if ($sourceclass->can("pager")) {
365             my $pager = $sourceclass->pager($page_size,$page_num);
366             $self->pager($pager);
367             (@source_matches) = $pager->$search_strategy($sourcefield => $pattern);
368             } else {
369             (@source_matches) = $sourceclass->$search_strategy($sourcefield => $pattern);
370             }
371              
372             #my(@source_matches) = $sourceclass->$search_strategy( $sourcefield => $pattern );
373              
374             if(@source_matches){
375             warn "xforms start" if DEBUG;
376              
377             foreach my $transform ($source->find('transform')->get_nodelist){
378             warn "xform" if DEBUG;
379              
380             my $t_sourceclass = $transform->getAttribute('sourceclass'); #unused;
381             my $t_sourcefield = $transform->getAttribute('sourcefield');
382             my $t_targetclass = $transform->getAttribute('targetclass');
383             my $t_targetfield = $transform->getAttribute('targetfield');
384              
385             my @t = ();
386              
387             foreach my $source_match (@source_matches){
388             warn Data::Dumper::Dumper($source_match) if DEBUG;
389             warn "$t_targetclass->search( $t_targetfield => ".$source_match->$t_sourcefield." );" if DEBUG;
390              
391             my $v = ref($source_match->$t_sourcefield)
392             ? $source_match->$t_sourcefield->id
393             : scalar($source_match->$t_sourcefield);
394              
395             warn $v if DEBUG;
396              
397             # this call is fragile, handle it with care
398             #
399             # it would add power to allow search_like, search_ilike, or fuzzy searches (eg soundex) here
400             # but requires extension of the xml format and *a lot* more code
401             my @u = $t_targetclass->search( $t_targetfield => $v );
402             push @t, @u;
403             }
404             @source_matches = @t;
405             }
406              
407             push @concept_hits, @source_matches;
408             }
409             warn "xforms end" if DEBUG;
410             }
411              
412             my %unique_hits = ();
413             $unique_hits{ref($_).'_'.$_->id} = $_ foreach @concept_hits;
414             push @hits, values %unique_hits;
415             }
416             # FIXME: should I close the db connection here???
417             return @hits;
418             }
419              
420             =head2 pager
421              
422             Title : pager
423             Usage : $obj->pager($newval)
424             Function: sets/returns the pager object, useful for getting information
425             about the complete set of results
426             Returns : value of pager
427             Args : on set, new value (a scalar or undef, optional)
428              
429              
430             =cut
431              
432             sub pager {
433             my $self = shift;
434              
435             return $self->{'pager'} = shift if @_;
436             return $self->{'pager'};
437             }
438              
439             =head2 use_wildcards
440              
441             Title : use_wildcards
442             Usage : $obj->use_wildcards($newval)
443             Function: when true, enables search_like/search_ilike from
444             search()
445             Returns : value of use_wildcards (a scalar)
446             Args : on set, new value (a scalar or undef, optional)
447              
448              
449             =cut
450              
451             sub use_wildcards {
452             my $self = shift;
453              
454             return $self->{'use_wildcards'} = shift if @_;
455             return $self->{'use_wildcards'};
456             }
457              
458             =head2 use_implicit_wildcards
459              
460             Title : use_implicit_wildcards
461             Usage : $obj->use_implicit_wildcards($newval)
462             Function: assume wildcards on the beginning and end of the
463             search string
464             Returns : value of use_implicit_wildcards (a scalar)
465             Args : on set, new value (a scalar or undef, optional)
466              
467              
468             =cut
469              
470             sub use_implicit_wildcards {
471             my $self = shift;
472              
473             return $self->{'use_implicit_wildcards'} = shift if @_;
474             return $self->{'use_implicit_wildcards'};
475             }
476              
477             =head2 use_search_ilike
478              
479             Title : use_search_ilike
480             Usage : $obj->use_search_ilike($newval)
481             Function: when true, search() uses search_ilike()
482             where search_like() would have been used
483             Returns : value of use_search_ilike (a scalar)
484             Args : on set, new value (a scalar or undef, optional)
485              
486              
487             =cut
488              
489             sub use_search_ilike {
490             my $self = shift;
491              
492             return $self->{'use_search_ilike'} = shift if @_;
493             return $self->{'use_search_ilike'};
494             }
495              
496              
497             =head2 use_search_lower
498              
499             Title : use_search_lower
500             Usage : $obj->use_search_lower($newval)
501             Function: when true, search() uses search_lower()
502             where search_like() would have been used
503             Returns : value of use_search_lower (a scalar)
504             Args : on set, new value (a scalar or undef, optional)
505              
506              
507             =cut
508              
509             sub use_search_lower {
510             my $self = shift;
511              
512             return $self->{'use_search_lower'} = shift if @_;
513             return $self->{'use_search_lower'};
514             }
515              
516              
517             =head2 xml
518              
519             Title : xml
520             Usage : $obj->xml($newval)
521             Function: stores the configuration for this instance. See
522             CONFIGURATION
523             Returns : value of xml (a scalar)
524             Args : on set, new value (a scalar or undef, optional)
525              
526              
527             =cut
528              
529             sub xml {
530             my $self = shift;
531              
532             return $self->{'xml'} = shift if @_;
533             return $self->{'xml'};
534             }
535              
536             1;