File Coverage

blib/lib/XML/Filter/Dispatcher.pm
Criterion Covered Total %
statement 12 15 80.0
branch n/a
condition n/a
subroutine 4 5 80.0
pod n/a
total 16 20 80.0


line stmt bran cond sub pod time code
1             package XML::Filter::Dispatcher ;
2              
3             $VERSION = 0.52;
4              
5             =head1 NAME
6              
7             XML::Filter::Dispatcher - Path based event dispatching with DOM support
8              
9             =head1 SYNOPSIS
10              
11             use XML::Filter::Dispatcher qw( :all );
12              
13             my $f = XML::Filter::Dispatcher->new(
14             Rules => [
15             'foo' => \&handle_foo_start_tag,
16             '@bar' => \&handle_bar_attr,
17              
18             ## Send any elts and their contents to $handler
19             'snarf//self::node()' => $handler,
20              
21             ## Print the text of all elements
22             'description'
23             => [ 'string()' => sub { push @out, xvalue } ],
24             ],
25              
26             Vars => {
27             "id" => [ string => "12a" ],
28             },
29             );
30              
31             =head1 DESCRIPTION
32              
33             B: Beta code alert.
34              
35             A SAX2 filter that dispatches SAX events based on "EventPath" patterns
36             as the SAX events arrive. The SAX events are not buffered or converted
37             to an in-memory document representation like a DOM tree. This provides
38             for low lag operation because the actions associated with each pattern
39             are executed as soon as possible, usually in an element's
40             C event method.
41              
42             This differs from traditional XML pattern matching tools like
43             XPath and XSLT (which is XPath-based) which require the entire
44             document to be built in memory (as a "DOM tree") before queries can be
45             executed. In SAX terms, this means that they have to build a DOM tree
46             from SAX events and delay pattern matching until the C
47             event method is called.
48              
49             =head2 Rules
50              
51             A rule is composed of a pattern and an action. Each
52             XML::Filter::Dispatcher instance has a list of rules. As SAX events are
53             received, the rules are evaluated and one rule's action is executed. If
54             more than one rule matches an event, the rule with the highest score
55             wins; by default a rule's score is its position in the rule list, so
56             the last matching rule the list will be acted on.
57              
58             A simple rule list looks like:
59              
60             Rules => [
61             'a' => \&handle_a,
62             'b' => \&handle_b,
63             ],
64              
65             =head3 Actions
66              
67             There are several types of actions:
68              
69             =over
70              
71             =item *
72              
73             CODE reference
74              
75             Rules => [
76             'a' => \&foo,
77             'b' => sub { print "got a !\n" },
78             ],
79              
80             =item *
81              
82             SAX handler
83              
84             Handler => $h, ## A downstream handler
85             Rules => [
86             'a' => "Handler",
87             'b' => $h2, ## Another handler
88             ],
89              
90             =item *
91              
92             undef
93              
94             Rules => [
95             '//node()' => $h,
96             'b' => undef,
97             ],
98              
99             Useful for preventing other actions for some events
100              
101             =item *
102              
103             Perl code
104              
105             Rules => [
106             'b' => \q{print "got a !\n"},
107             ],
108              
109             Lower overhead than a CODE reference.
110              
111             B.
112              
113             =back
114              
115              
116             =head2 EventPath Patterns
117              
118             Note: this section describes EventPath and discusses differences between
119             EventPath and XPath. If you are not familiar with XPath you may want
120             to skim those bits; they're provided for the benefit of people coming
121             from an XPath background but hopefully don't hinder others. A working
122             knowledge of SAX is necessary for the advanced bits.
123              
124             EventPath patterns may match the document, elements, attributes, text
125             nodes, comments, processing instructions, and (not yet implemented)
126             namespace nodes. Patterns like this are referred to as "location paths"
127             and resemble Unix file paths or URIs in appearance and functionality.
128              
129             Location paths describe a location (or set of locations) in the document
130             much the same way a filespec describes a location in a filesystem. The
131             path C could refer to a directory named C on a filesystem or
132             a set of CcE> elements in an XML document. In either case,
133             the path indicates that C must be a child of C, C must be
134             's, and is a root level entity. More examples later.
135              
136             EventPath patterns may also extract strings, numbers and boolean values
137             from a document. These are called "expression patterns" and are only
138             said to match when the values they extract are "true" according to XPath
139             semantics (XPath truth-ness differs from Perl truth-ness, see
140             EventPath Truth below). Expression patterns look
141             like C or C, and if the result
142             is true, the action will be executed and the result can be retrieved
143             using the L method.
144              
145             TODO: rename xvalue to be ep_result or something.
146              
147             We cover patterns in more detail below, starting with some examples.
148              
149             If you'd like to get some experience with pattern matching in an
150             interactive XPath web site, there's a really good XPath/XSLT based
151             tutorial and lab at
152             L.
153              
154             =head2 Actions
155              
156             Two kinds of actions are supported: Perl subroutine calls and
157             dispatching events to other SAX processors. When a pattern matches, the
158             associated action
159              
160             =head2 Examples
161              
162             This is perhaps best introduced by some examples. Here's a routine that runs a
163             rather knuckleheaded document through a dispatcher:
164              
165             use XML::SAX::Machines qw( Pipeline );
166              
167             sub run { Pipeline( shift )->parse_string( <
168            
169            
170             Bully
171            
172            
173             Klutz
174            
175             Middleman
176            
177            
178            
179             Fool
180            
181            
182            
183            
184            
185            
186            
187             XML_END
188              
189             =over
190              
191             =item Counting Stooges
192              
193             Let's count the number of stooge characters in that document. To do that, we'd
194             like a rule that fires on almost all CstoogeE> elements:
195              
196             my $count;
197              
198             run(
199             XML::Filter::Dispatcher->new(
200             Rules => [
201             'stooge' => sub { ++$count },
202             ],
203             )
204             );
205              
206             print "$count\n"; ## 7
207              
208             Hmmm, that's one too many: it's picking up on Shemp twice since the document
209             shows that Shemp had two periods of stoogedom. The second node has a
210             convenient C attribute we can use to ignore the duplicate.
211              
212             We can ignore the duplicate element by adding a "predicate"
213             expression to the pattern to accept only those elements with no C
214             attribute. Changing that rule to
215              
216             'stooge[not(@repeat)]' => ...
217              
218             or even the more pedantic
219              
220             'stooge[not(@repeat) or not(@repeat = "yes")]' => ...
221              
222             yields the expected answer (6).
223              
224             =item Hairstyles and Attitudes
225              
226             Now let's try to figure out the hairstyles the stooges wore. To extract
227             just the names of hairstyles, we could do something like:
228              
229             my %styles;
230              
231             run(
232             XML::Filter::Dispatcher->new(
233             Rules => [
234             'stooge' => [
235             'string( @hairstyle )' => sub { $styles{xvalue()} = 1 },
236             ],
237             ],
238             )
239             );
240              
241             print join( ", ", sort keys %styles ), "\n";
242              
243             which prints "bald, bowl cut, bushy, mop". That rule extracts the text
244             of each C attribute and the C returns it.
245              
246             The text contents of elements like CattitudesE> can also be
247             sussed out by using a rule like:
248              
249             'string( attitude )' => sub { $styles{xvalue()} = 1 },
250              
251             which prints "Bully, Fool, Klutz, Middleman".
252              
253             Finally, we might want to correlate hairstyles and attitudes by using
254             a rule like:
255              
256             my %styles;
257              
258             run(
259             XML::Filter::Dispatcher->new(
260             Rules => [
261             'stooge' => [
262             'concat(@hairstyle,"=>",attitude)' => sub {
263             $styles{$1} = $2 if xvalue() =~ /(.+)=>(.+)/;
264             },
265             ],
266             ],
267             )
268             );
269              
270             print map "$_ => $styles{$_}\n", sort keys %styles;
271              
272             which prints:
273              
274             bald => Fool
275             bowl cut => Bully
276             bushy => Middleman
277             mop => Klutz
278              
279             =back
280              
281             =head3 Examples that need to be written
282              
283             =over
284              
285             =item * Examples for accumulating data
286              
287             =item * Advanced pattern matching examples
288              
289             =back
290              
291             =cut
292              
293             =head2 Sending Trees and Events to Other SAX Handlers
294              
295             When a blessed object C<$handler> is provided as an action for a rule:
296              
297             my $foo = XML::Handler::Foo->new();
298             my $d = XML::Filter::Dispatcher->new(
299             Rules => [
300             'foo' => $handler,
301             ],
302             Handler => $h,
303             );
304              
305             the selected SAX events are sent to C<$handler>.
306              
307             =head3 Element Forwarding
308              
309             If the event is selected is a C or C
310             event and it is selected without using the C or
311             C axes, then the handler (C<$foo>) replaces the
312             existing handler of the dispatcher (C<$h>) until after the corresponding
313             C event is received.
314              
315             This causes the entire element (CfooE>) to be sent to the
316             temporary handler (C<$foo>). In the example, each CfooE>
317             element will be sent to C<$foo> as a separate document, so if
318             (whitespace shown as underscores)
319              
320            
321             ____....
322             ____....
323             ____....
324            
325              
326             is fed in to C<$d>, then C<$foo> will receive 3 separate
327              
328             ...
329              
330             documents (C and C events are emitted
331             as necessary) and C<$h> will receive a single document without any
332             CfooE> elements:
333              
334            
335             ____
336             ____
337             ____
338            
339              
340             This can be useful for parsing large XML files in small chunks, often in
341             conjunction with L or
342             L.
343              
344             But what if you don't want C<$foo> to see three separate documents?
345             What if you're excerpting chunks of a document to create another
346             document? This can be done by telling the dispatcher to emit the main
347             document to C<$foo> and using rules with an action of C to elide
348             the events that are not wanted. This setup:
349              
350             my $foo = XML::Handler::Foo->new();
351             my $d = XML::Filter::Dispatcher->new(
352             Rules => [
353             '/' => $foo,
354             'bar' => undef,
355             'foo' => $foo,
356             ],
357             Handler => $h,
358             );
359              
360             , when fed this document:
361              
362            
363             __hork
364             __
365             __....
366             __....
367             __....
368             __
369             __
370             __
371            
372              
373             results in C<$foo> receiving a single document of input looking like
374             this:
375              
376            
377             __
378             __....
379             __....
380             __....
381             __
382            
383              
384             XML::Filter::Dispatcher keeps track of each handler and sends
385             C and C at the appropriate times, so
386             the CfooE> elements are "hoisted" out of the CbarE>
387             element in this example without any untoward C<..._document()> events.
388              
389             B: support forwarding to multiple documents at a time. At the
390             present, using multiple handlers for the same event is not supported.
391              
392             =head3 Discrete Event Forwarding
393              
394             B: At the moment, selecting and forwarding individual events is
395             not supported. When it is, any events other than those covered above
396             will be forwarded individually
397              
398             =head2 Tracing
399              
400             XML::Filter::Dispatcher checks when it is first loaded to see if
401             L is loaded. If so, it will emit tracing
402             messages. Typical use looks like
403              
404             perl -d:Devel::TraceSAX script_using_x_f_dispatcher
405              
406             If you are Cing Devel::TraceSAX in source code, make sure that it is
407             loaded before XML::Filter::Dispatcher.
408              
409             TODO: Allow tracing to be enabled/disabled independantly of Devel::TraceSAX.
410              
411             =head2 Namespace Support
412              
413             XML::Filter::Dispatcher offers namespace support in matching and by
414             providing functions like local-name(). If the documents you are
415             processing don't use namespaces, or you only care about elements and
416             attributes in the default namespace (ie without a "foo:" namespace
417             prefix), then you need not worry about engaging
418             XML::Filter::Dispatcher's namespace support. You do need it if your
419             patterns contain the C construct (that C<*> is literal).
420              
421             To specify the namespaces, pass in an option like
422              
423             Namespaces => {
424             "" => "uri0", ## Default namespace
425             prefix1 => "uri1",
426             prefix2 => "uri2",
427             },
428              
429             Then use C and C whereever necessary in patterns.
430              
431             A missing prefix on an element always maps to the default namespace URI,
432             which is "" by default. Attributes are treated likewise, though this
433             is probably a bug.
434              
435             If your patterns contain prefixes (like the C in C), and
436             you don't provide a Namespaces option, then the element names will
437             silently be matched literally as "foo:bar", whether or not the source
438             document declares namespaces. B
439             much user confusion>.
440              
441             XML::Filter::Dispatcher follows the XPath specification rather literally
442             and does not allow C<:*>, which you might think would match all nodes in
443             the default namespace. To do this, ass a prefixe for the default
444             namespace URI:
445              
446             Namespaces => {
447             "" => "uri0", ## Default namespace
448             "default" => "uri0", ## Default namespace
449             prefix1 => "uri1",
450             prefix2 => "uri2",
451             },
452              
453             then use "default:*" to match it.
454              
455             B: Currently, all rules must exist in the same namespace
456             context. This will be changed when I need to change it (contact me
457             if you need it changed). The current idear is to allow a special
458             function "Namespaces( { .... }, @rules )" that enables a temporary
459             namespace context, although abbreviated forms might be possible.
460              
461             =head2 EventPath Dialect
462              
463             "EventPath" patterns are that large subset of XPath patterns that can be
464             run in a SAX environment without a DOM. There are a few crucial
465             differences between the environments that EventPath and XPath each
466             operate in.
467              
468             XPath operates on a tree of "nodes" where each entity in an XML document
469             has only one corresponding node. The tree metaphor used in XPath has a
470             literal representation in memory. For instance, an element
471             CfooE> is represented by a single node which contains other
472             nodes.
473              
474             EventPath operates on a series of events instead of a tree of nodes.
475             For instance elements, which are represented by nodes in DOM trees, are
476             represented by two event method calls, C and
477             C. This means that EventPath patterns may match in a
478             C method or an C method, or even both if you try
479             hard enough.
480              
481             The only times an EventPath pattern will match in an
482             C method are when the pattern refers to an element's contents
483             or it uses the XXXX function (described below) to do so
484             intentionally.
485              
486             The tree metaphor is used to arrange and describe the
487             relationships between events. In the DOM trees an XPath engine operates
488             on, a document or an element is represented by a single entity, called a
489             node. In the event streams that EventPath operates on, documents and
490             element
491              
492             =head3 Why EventPath and not XPath?
493              
494             EventPath is not a standard of any kind, but XPath can't cope with
495             situations where there is no DOM and there are some features that
496             EventPath need (start_element() vs. end_element() processing for
497             example) that are not compatible with XPath.
498              
499             Some of the features of XPath require that the source document be fully
500             translated in to a DOM tree of nodes before the features can be evaluated.
501             (Nodes are things like elements, attributes, text, comments, processing
502             instructions, namespace mappings etc).
503              
504             These features are not supported and are not likely to be, you might
505             want to use L for "full" XPath
506             support (tho it be in an XSLT framework) or wait for
507             L to grow SAX support.
508              
509             Rather than build a DOM, XML::Filter::Dispatcher only keeps a bare
510             minimum of nodes: the current node and its parent, grandparent, and so
511             on, up to the document ("root") node (basically the /ancestor-or-self::
512             axis). This is called the "context stack", although you may not need to
513             know that term unless you delve in to the guts.
514              
515             =head3 EventPath Truth
516              
517             EventPath borrows a lot from XPath including its notion of truth.
518             This is different from Perl's notion of truth; presumably to make
519             document processing easier. Here's a table that may help, the
520             important differences are towards the end:
521              
522             Expression EventPath XPath Perl
523             ========== ========= ===== ====
524             false() FALSE FALSE n/a (not applicable)
525             true() TRUE TRUE n/a
526             0 FALSE FALSE FALSE
527             -0 FALSE** FALSE n/a
528             NaN FALSE** FALSE n/a (not fully, anyway)
529             1 TRUE TRUE TRUE
530             "" FALSE FALSE FALSE
531             "1" TRUE TRUE TRUE
532              
533             "0" TRUE TRUE FALSE
534              
535             * To be regarded as a bug in this implementation
536             ** Only partially implemented/supported in this implementation
537              
538             Note: it looks like XPath 2.0 is defining a more workable concept
539             for document processing that uses something resembling Perl's empty
540             lists, C<()>, to indicate empty values, so C<""> and C<()> will be
541             distinct and C<"0"> can be interpreted as false like in Perl. XPath2
542             is I provided by this module yet and won't be for a long time
543             (patches welcome ;).
544              
545             =head3 EventPath Examples
546              
547             All of this means that only a portion of XPath is available. Luckily,
548             that portion is also quite useful. Here are examples of working XPath
549             expressions, followed by known unimplemented features.
550              
551             TODO: There is also an extension function available to differentiate between
552             C and C events. By default
553              
554             =head2 Examples
555              
556             Expression Event Type Description (event type)
557             ========== ========== ========================
558             / start_document Selects the document node
559              
560             /a start_element Root elt, if it's ""
561              
562             a start_element All "a" elements
563              
564             b//c start_element All "c" descendants of "b" elt.s
565              
566             @id start_element All "id" attributes
567              
568             string( foo ) end_element matches at the first or
569             in the current element;
570             xvalue() returns the
571             text contained in "..."
572              
573             string( @name ) start_element the first "name" attributes;
574             xvalue() returns the
575             text of the attribute.
576              
577             =head2 Methods and Functions
578              
579             There are several APIs provided: general, xstack, and EventPath
580             variable handling.
581              
582             The general API provides C and C, C, and
583             C.
584              
585             The variables API provides C and C.
586              
587             The xstack API provides C, C, C,
588             C, C and C.
589              
590             All of the "xfoo()" APIs may be called as a method or,
591             within rule handlers, called as a function:
592              
593             $d = XML::Filter::Dispatcher->new(
594             Rules => [
595             "/" => sub {
596             xpush "foo\n";
597             print xpeek; ## Prints "foo\n"
598              
599             my $self = shift;
600             print $self->xpeek; ## Also prints "foo\n"
601             },
602             ],
603             );
604              
605             print $d->xpeek; ## Yup, prints "foo\n" as well.
606              
607             This dual nature allows you to import the APIs as functions and call them
608             using a concise function-call style, or to leave them as methods and
609             use object-oriented style.
610              
611             Each call may be imported by name:
612              
613             use XML::Filter::Dispatcher qw( xpush xpeek );
614              
615             or by one of three API category tags:
616              
617             use XML::Filter::Dispatcher ":general"; ## xvalue()
618             use XML::Filter::Dispatcher ":variables"; ## xset_var(), xget_var()
619             use XML::Filter::Dispatcher ":xstack"; ## xpush(), xpop(), and xpeek()
620              
621             or en mass:
622              
623             use XML::Filter::Dispatcher ":all";
624              
625             =cut
626              
627             require Exporter;
628             *import = \&Exporter::import;
629              
630             BEGIN {
631 16     16   410291 my @general_API = qw( xvalue xvaluetype xvalue_type xevent_type xrun_next_action );
632 16         58 my @xstack_API = qw( xpeek xpop xadd xset xoverwrite xpush xstack_empty xstack_max );
633 16         38 my @variables_API = qw( xset_var xget_var );
634 16         57 @EXPORT_OK = ( @general_API, @variables_API, @xstack_API );
635 16         413 %EXPORT_TAGS = (
636             all => \@EXPORT_OK,
637             general => \@general_API,
638             xstack => \@xstack_API,
639             autostack => \@xstack_API, # deprecated
640             variables => \@variables_API,
641             );
642             }
643              
644              
645 16     16   122 use strict ;
  16         24  
  16         583  
646              
647 16     16   80 use Carp qw( confess );
  16         39  
  16         1271  
648 0     0     sub croak { goto &Carp::confess }
649              
650             #use XML::SAX::Base;
651             #use XML::NamespaceSupport;
652 16     16   25373 use XML::SAX::EventMethodMaker qw( compile_missing_methods sax_event_names );
  0            
  0            
653              
654             use constant is_tracing => defined $Devel::TraceSAX::VERSION;
655             # Devel::TraceSAX does not work under perl5.8.0
656             #use constant is_tracing => 1;
657             #sub emit_trace_SAX_message { warn @_ };
658              
659             use constant show_buffer_highwater =>
660             $ENV{XFDSHOWBUFFERHIGHWATER} || 0;
661              
662             BEGIN {
663             eval( is_tracing
664             ? 'use Devel::TraceSAX qw( emit_trace_SAX_message ); 1'
665             : 'sub emit_trace_SAX_message; 1'
666             ) or die $@;
667             }
668              
669              
670             ## TODO: Prefix all of the hash keys in $self with XFD_ to avoid
671             ## conflict with X::S::B and subclasses / hander CODE.
672              
673             ## TODO: local $_ = xvalue before calling in to a sub
674              
675             ##
676             ## $ctx->{Vars} a HASH of variables passed in from the parent context
677             ## (or the Perl world in the doc root node context). Set
678             ## in all child contexts.
679             ##
680             ## $ctx->{ChildVars} a HASH of variables set in by this node, passed
681             ## on to all child contexts, but erased before this
682             ## node's siblings can see it.
683             ##
684              
685             =head1 General API
686              
687             =over
688              
689             =item new
690              
691             my $f = XML::Filter::Dispatcher->new(
692             Rules => [ ## Order is significant
693             "/foo/bar" => sub {
694             ## Code to execute
695             },
696             ],
697             );
698              
699             Must be called as a method, unlike other API calls provided.
700              
701             =cut
702              
703             my @every_names = qw(
704             attribute
705             characters
706             comment
707             processing_instruction
708             start_element
709             start_prefix_mapping
710             );
711              
712              
713             sub new {
714             my $proto = shift ;
715             my %handlers;
716             my $self = bless {
717             FoldContstants => 1,
718             Handlers => {
719             Handler => undef, ## Setting this here always allows "Handler"
720             ## in actions, so that a call to
721             ## set_handler( $h ) can be used when the
722             ## handler is not set when new() is called.
723             },
724             }, ref $proto || $proto;
725              
726             while ( @_ ) {
727             my ( $key, $value ) = ( shift, shift );
728              
729             if ( substr( $key, -7 ) eq "Handler" ) {
730             $self->{Handlers}->{$key} = $value;
731             }
732             elsif ( $key eq "Handlers" ) {
733             $self->{Handlers}->{$_} = $value->{$_}
734             for keys %$value;
735             }
736             else {
737             $self->{$key} = $value;
738             }
739              
740             }
741              
742             $self->{Debug} ||= $ENV{XFDDEBUG} || 0;
743              
744             $self->{SortAttributes} = 1
745             unless defined $self->{SortAttributes};
746             $self->{SetXValue} = 1
747             unless defined $self->{SetXValue};
748              
749              
750             $self->{Rules} ||= [];
751             # $self->{Rules} = [ %{$self->{Rules}} ]
752             # if ref $self->{Rules} eq "HASH";
753              
754             my $doc_ctx = $self->{DocCtx} = $self->{CtxStack}->[0] = {};
755             $doc_ctx->{ChildCtx} = {};
756            
757             for ( keys %{$self->{Vars}} ) {
758             $self->xset_var( $_, @{$self->{Vars}->{$_}} );
759             }
760              
761             if ( @{$self->{Rules}} ) {
762             require XML::Filter::Dispatcher::Compiler;
763             my $c = XML::Filter::Dispatcher::Compiler->new( %$self );
764              
765             my $code;
766            
767             ## Use the internal use only compiler internals.
768             ( $code, $self->{Actions} ) = $c->_compile;
769              
770             $self->{DocSub} = eval $code;
771             if ( ! $self->{DocSub} ) {
772             my $c = $code;
773             my $ln = 1;
774             $c =~ s{^}{sprintf "%4d|", $ln++}gme;
775             die $@, $c;
776             }
777              
778             }
779              
780             return $self ;
781             }
782              
783              
784             =item xvalue
785              
786             "string( foo )" => sub { my $v = xvalue }, # if imported
787             "string( foo )" => sub { my $v = shift->xvalue }, # if not
788              
789             Returns the result of the last EventPath expression evaluated; this is
790             the result that fired the current rule. The example prints all text
791             node children of CfooE> elements, for instance.
792              
793             For matching expressions, this is equivalent to $_[1] in action
794             subroutines.
795              
796             =cut
797              
798             sub xvalue() {
799             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
800              
801             return $XFD::cur_self->{XValue};
802             exists $XFD::cur_self->{XValue}
803             ? $XFD::cur_self->{XValue}
804             : $XFD::ctx && $XFD::ctx->{Node};
805             }
806              
807             =item xvalue_type
808              
809             Returns the type of the result returned by xvalue. This is either a SAX
810             event name or "attribute" for path rules ("//a"), or "" (for a string),
811             "HASH" for a hash (note that struct() also returns a hash; these types
812             are Perl data structure types, not EventPath types).
813              
814             This is the same as xeventtype for all rules that don't evaluate
815             functions like "string()" as their top level expression.
816              
817             =cut
818              
819             sub xvalue_type() {
820             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
821              
822             return $XFD::cur_self->{XValue} == $XFD::ctx->{Node}
823             ? $XFD::ctx->{EventType}
824             : ref $XFD::ctx->{Node};
825             }
826             sub xvaluetype() { goto \&xvalue_type } ## deprecated syntax
827              
828             =item xeventtype
829              
830             Returns the type of the current SAX event.
831              
832             =cut
833              
834             sub xevent_type() {
835             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
836              
837             return $XFD::ctx->{EventType};
838             }
839              
840             =item xrun_next_action
841              
842             Runs the next action for the current node. Ordinarily,
843             XML::Filter::Dispatcher runs only one action per node; this allows an
844             action to call down to the next action.
845              
846             This is especially useful in filters that tweak a document on the way
847             by. This tweaky sort of filter establishes a default "pass-through"
848             rule and then additional override rules to tweak the values being
849             passed through.
850              
851             Let's suppose you want to convert some mtimes from seconds since the
852             epoch to a human readable format. Here's a set of rules that might
853             do that:
854              
855             Rules => [
856             'node()' => "Handler", ## Pass everything through by default.
857              
858             'file[@mtime]' => sub { ## intercept and tweak the node.
859             my $attr = $_[1]->{Attributes}->{"{}mtime"};
860              
861             ## Localize the changes: never assume that it is safe
862             ## to alter SAX elements on the way by in a general purpose
863             ## filter. Some smart aleck might send the same events
864             ## to another filter with a Tee fitting or even back
865             ## through your filter multiple times from a cache.
866             local $attr->{Value} = localtime $attr->{Value};
867              
868             ## Now that the changes are localised, fall through to
869             ## the default rule.
870             xrun_next_action;
871              
872             ## We could emit other events here as well, but need not
873             ## in this example.
874             },
875             ],
876              
877             =cut
878              
879             sub xrun_next_action() {
880             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
881             $XFD::cur_self->_execute_next_action;
882             }
883              
884              
885             =back
886              
887             =head2 EventPath Variables
888              
889             EventPath variables may be set in the current context using
890             C, and accessed using C. Variables set in a
891             given context are visible to all child contexts. If you want a variable
892             to be set in an enclosed context and later retrieved in an enclosing
893             context, you must set it in the enclosing context first, then alter it
894             in the enclosed context, then retrieve it.
895              
896             EventPath variables are typed.
897              
898             EventPath variables set in a context are visible within that context and
899             all enclosed contexts, but not outside of them.
900              
901             =cut
902              
903             =over
904              
905             =item xset_var
906              
907             "foo" => sub { xset_var( bar => string => "bingo" ) }, # if imported
908             "foo" => sub { shift->xset_var( bar => boolean => 1 ) },
909              
910             Sets an XPath variables visible in the current context and all child
911             contexts. Will not be visible in parent contexts or sibling contexts.
912              
913             Legal types are C, C, and C. Node sets and
914             nodes are unsupported at this time, and "other" types are not useful
915             unless you work in your own functions that handle them.
916              
917             Variables are visible as C<$bar> variable references in XPath expressions and
918             using xget_var in Perl code. Setting a variable to a new value temporarily
919             overrides any existing value, somewhat like using Perl's C.
920              
921             =cut
922              
923             sub xset_var {
924             my $self = @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ) ? shift : $XFD::cur_self;
925             croak
926             "Wrong number of parameters (" . @_ . ") passed to xset_var, need 3.\n"
927             if @_ != 3;
928              
929             my ( $name, $type, $value ) = @_;
930             croak "undefined type passed to xset_var\n" unless defined $type;
931             croak "undefined name passed to xset_var\n" unless defined $name;
932             croak "undefined value passed to xset_var\n" unless defined $name;
933              
934             ## TODO: rename the type non-classes to st other than "string", etc.
935             $self->{CtxStack}->[-1]->{Vars}->{$name} = bless \$value, $type;
936             }
937              
938              
939             ## Used in compiled XPath exprs only; only minimal safeties engaged.
940             sub _look_up_var {
941             my $self = shift;
942             my ( $vname ) = @_;
943              
944             my $ctx = $self->{CtxStack}->[-1];
945             return $ctx->{Vars}->{$vname} if exists $ctx->{Vars}->{$vname};
946              
947             die "Unknown variable '\$$vname' referenced in XPath expression\n";
948             }
949              
950              
951             =item xget_var
952              
953             "bar" => sub { print xget_var( "bar" ) }, # if imported
954             "bar" => sub { print shift->xget_var( "bar" ) },
955              
956             Retrieves a single variable from the current context. This may have
957             been set by a parent or by a previous rule firing on this node, but
958             not by children or preceding siblings.
959              
960             Returns C if the variable is not set (or if it was set to undef).
961              
962             =cut
963              
964             sub xget_var {
965             my $self = @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ) ? shift : $XFD::cur_self;
966             croak "No variable name passed to xget_var.\n"
967             unless @_;
968             croak "More than one variable name passed to xget_var.\n"
969             unless @_ == 1;
970              
971             my ( $vname ) = @_;
972              
973             croak "Undefined variable name passed to xget_var.\n"
974             unless defined $vname;
975              
976             my $ctx = $self->{CtxStack}->[-1];
977             return
978             exists $ctx->{Vars}->{$vname}
979             ? ${$ctx->{Vars}->{$vname}}
980             : undef;
981             }
982              
983              
984             =item xget_var_type
985              
986             "bar" => sub { print xget_var_type( "bar" ) }, # if imported
987             "bar" => sub { shift->xget_var_type( "bar" ) },
988              
989             Retrieves the type of a variable from the current context. This may have
990             been set by a parent or by a previous rule firing on this node, but
991             not by children or preceding siblings.
992              
993             Returns C if the variable is not set.
994              
995             =cut
996              
997             sub xget_var_type {
998             my $self = @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ) ? shift : $XFD::cur_self;
999             croak "No variable name passed to xget_var_type.\n"
1000             unless @_;
1001             croak "More than one variable name passed to xget_var_type.\n"
1002             unless @_ == 1;
1003              
1004             my ( $vname ) = @_;
1005              
1006             croak "Undefined variable name passed to xget_var.\n"
1007             unless defined $vname;
1008              
1009             my $ctx = $self->{CtxStack}->[-1];
1010             return
1011             exists $ctx->{Vars}->{$vname}
1012             ? ref $ctx->{Vars}->{$vname}
1013             : undef;
1014             }
1015              
1016             =back
1017              
1018             =head2 Handlers
1019              
1020             XML::Filter::Dispatcher allows you to register handlers using
1021             C and C, and then to refer to them
1022             by name in actions. These are part of the "general API".
1023              
1024             You may use any string for handler names that you like, including
1025             strings with spaces. It is wise to avoid those standard, rarely used
1026             handlers recognized by parsers, such as:
1027              
1028             DTDHandler
1029             ContentHandler
1030             DocumentHandler
1031             DeclHandler
1032             ErrorHandler
1033             EntityResolver
1034             LexicalHandler
1035              
1036             unless you are using them for the stated purpose. (List taken from
1037             L).
1038              
1039             Handlers may be set in the constructor in two ways: by using a name
1040             ending in "Handler" and passing it as a top level option:
1041              
1042             my $f = XML::Filter::Dispatcher->new(
1043             Handler => $h,
1044             FooHandler => $foo,
1045             BarHandler => $bar,
1046             Rules => [
1047             ...
1048             ],
1049             );
1050              
1051             Or, for oddly named handlers, by passing them in the Handlers hash:
1052              
1053             my $f = XML::Filter::Dispatcher->new(
1054             Handlers => {
1055             Manny => $foo,
1056             Moe => $bar,
1057             Jack => $bat,
1058             },
1059             Rules => [
1060             ...
1061             ],
1062             );
1063              
1064             Once declared in new(), handler names can be
1065             used as actions. The "well known" handler name "Handler" need not be
1066             predeclared.
1067              
1068             For exampled, this forwards all events except the C
1069             and C events for the root element's children, thus
1070             "hoisting" everything two levels below the root up a level:
1071              
1072             Rules => [
1073             '/*/*' => undef,
1074             'node()' => "Handler",
1075             ],
1076              
1077             By default, no events are forwarded to any handlers: you must send
1078             individual events to an individual handlers.
1079              
1080             Normally, when a handler is used in this manner, XML::Filter::Dispatcher
1081             makes sure to send C and C events to
1082             it just before the first event and just after the last event. This
1083             prevents sending the document events unless a handler actually receives
1084             other events, which is what most people expect (the alternative would be
1085             to preemptively always send a C to all handlers when
1086             when the dispatcher receives its C: ugh).
1087              
1088             To disable this for all handlers, pass the C
1089             => 1> option.
1090              
1091             =over
1092              
1093             =item set_handler
1094              
1095             $self->set_handler( $handler );
1096             $self->set_handler( $name => $handler );
1097              
1098             =cut
1099              
1100             sub set_handler {
1101             my $self = shift;
1102             my $name = @_ > 1 ? shift : "Handler";
1103             $self->{Handlers}->{$name} = shift;
1104             }
1105              
1106             =item get_handler
1107              
1108             $self->set_handler( $handler );
1109             $self->set_handler( $name => $handler );
1110              
1111             =cut
1112              
1113             sub get_handler {
1114             my $self = shift;
1115             my $name = @_ > 1 ? shift : "Handler";
1116             return $self->{Handlers}->{$name}
1117             if exists $self->{Handlers}->{$name}
1118             }
1119              
1120             =back
1121              
1122              
1123             =head2 The xstack
1124              
1125             The xstack is a stack mechanism provided by XML::Filter::Dispatcher that
1126             is automatically unwound after end_element, end_document, and all other
1127             events other than start_element or start_document. This sounds
1128             limiting, but it's quite useful for building data structures that mimic
1129             the structure of the XML input. I've found this to be common when
1130             dealing with data structures in XML and a creating nested hierarchies of
1131             objects and/or Perl data structures.
1132              
1133             Here's an example of how to build and return a graph:
1134              
1135             use Graph;
1136              
1137             my $d = XML::Filter::Dispatcher->new(
1138             Rules => [
1139             ## These two create and, later, return the Graph object.
1140             'graph' => sub { xpush( Graph->new ) },
1141             'end::graph' => \&xpop,
1142              
1143             ## Every vertex must have a name, so collect in and add it
1144             ## to the Graph object using its add_vertex( $name ) method.
1145             'vertex' => [ 'string( @name )' => sub { xadd } ],
1146              
1147             ## Edges are a little more complex: we need to collect the
1148             ## from and to attributes, which we do using a hash, then
1149             ## pop the hash and use it to add an edge. You could
1150             ## also use a single rule, see below.
1151             'edge' => [ 'string()' => sub { xpush {} } ],
1152             'edge/@*' => [ 'string()' => sub { xset } ],
1153             'end::edge' => sub {
1154             my $edge = xpop;
1155             xpeek->add_edge( @$edge{"from","to"} );
1156             },
1157             ],
1158             );
1159              
1160             my $graph = QB->new( "graph", <playback( $d );
1161            
1162            
1163            
1164            
1165            
1166             END_XML
1167              
1168             print $graph, $graph->is_sparse ? " is sparse!\n" : "\n";
1169              
1170             should print "0,1-2,2-1 is sparse!\n".
1171              
1172             This is good if you can tell what object to add to the stack before
1173             seeing content. Some XML parsing is more general than that: if you see
1174             no child elements, you want to create one class to contain just
1175             character content, otherwise you want to add a container class to
1176             contain the child nodes.
1177              
1178             An faster alternative to the 3 edge rules relies on the fact that
1179             SAX's start_element events carry the attributes, so you can actually
1180             do a single rule instead of the three we show above:
1181              
1182             'edge' => sub {
1183             xpeek->add_edge(
1184             $_[1]->{Attributes}->{"{}from"}->{Value},
1185             $_[1]->{Attributes}->{"{}to" }->{Value},
1186             );
1187             },
1188              
1189             =over
1190              
1191             =item xpush
1192              
1193             Push values on to the xstack. These will be removed from the xstack at
1194             the end of the current element. The topmost item on the
1195             xstack is available through the peek method. Elements xpushed before
1196             the first element (usually in the C event) remain on
1197             the stack after the document has been parsed and a call like
1198              
1199             my $elt = $dispatcher->xpop;
1200              
1201             can be used to retrieve them.
1202              
1203             =cut
1204              
1205             sub xpush {
1206             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
1207             emit_trace_SAX_message "EventPath: xpush()ing ", @_ if is_tracing;
1208             push @{$XFD::cur_self->{XStack}}, map( { "#elt" => $_ }, @_ );
1209             }
1210              
1211             =item xadd
1212              
1213             Tries to add a possibly named item to the element on the top of the
1214             stack and push the item on to the stack. It makes a guess about how to
1215             add items depending on what the current top of the stack is.
1216              
1217             xadd $name, $new_item;
1218              
1219             does this:
1220              
1221             Top of Stack Action
1222             ============ ======
1223             scalar xpeek .= $new_item;
1224             SCALAR ref ${xpeek} .= $new_item;
1225             ARRAY ref push @{xpeek()}, $new_item;
1226             HASH ref push @{xpeek->{$name}} = $new_item;
1227             blessed object xpeek->$method( $new_item );
1228              
1229             The $method in the last item is one of (in order) "add_$name",
1230             "push_$name", or "$name".
1231              
1232             After the above action, an
1233              
1234             xpush $new_item;
1235              
1236             is done.
1237              
1238             $name defaults to the LocalName of the current node if it is an
1239             attribute or element, so
1240              
1241             xadd $foo;
1242              
1243             will DWYM. TODO: search up the current node's ancestry for a LocalName
1244             when handling other event types.
1245              
1246             If no parameters are provided, xvalue is used.
1247              
1248             If the stack is empty, it just xpush()es on the stack.
1249              
1250             =cut
1251              
1252             sub xadd {
1253             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
1254             my $name = @_ > 1
1255             ? shift
1256             : do {
1257             croak "$XFD::ctx->{EventType} has no LocalName"
1258             unless exists $XFD::ctx->{Node}->{LocalName}
1259             && defined exists $XFD::ctx->{Node}->{LocalName};
1260             croak "$XFD::ctx->{EventType} a LocalName of ''"
1261             unless length $XFD::ctx->{Node}->{LocalName};
1262             $XFD::ctx->{Node}->{LocalName};
1263             };
1264              
1265             my $new_item = @_ ? shift : $XFD::cur_self->xvalue;
1266              
1267             emit_trace_SAX_message "EventPath: xadd()ing ", $name, " => ", $new_item if is_tracing;
1268              
1269             if ( @{$XFD::cur_self->{XStack}} ) {
1270             my $e = $XFD::cur_self->{XStack}->[-1];
1271             my $top = $e->{"#elt"};
1272             my $t = ref $top;
1273             my $meth;
1274              
1275             if ( $t eq "" ) {
1276             $XFD::cur_self->{XStack}->[-1]->{"#elt"} .= "";
1277             $e->{scalar}++;
1278             }
1279             elsif ( $t eq "SCALAR" ) {
1280             $e->{scalar}++;
1281             $$top .= "";
1282             }
1283             elsif ( $t eq "ARRAY" ) {
1284             $e->{scalar}++;
1285             push @$top, $new_item;
1286             }
1287             elsif ( $t eq "HASH" ) {
1288             croak
1289             "element '",
1290             $name,
1291             "' of the HASH on top of the xstack is a ",
1292             do {
1293             my $t = ref $top->{$name};
1294             ! $t ? "scalar" : "$t reference";
1295             },
1296             ", not an ARRAY ref"
1297             if defined $top->{$name} && ! ref $top->{$name};
1298             push @{$top->{$name}}, $new_item;
1299             $e->{$name}++;
1300             }
1301             ## See if it's a blessed object that can add thingamies"
1302             elsif ( $meth = (
1303             UNIVERSAL::can( $top, "add_$name" )
1304             || UNIVERSAL::can( $top, "push_$name" )
1305             || UNIVERSAL::can( $top, "add" )
1306             ) ) {
1307             $top->$meth( $new_item );
1308             $e->{$name}++;
1309             }
1310             else {
1311             croak "don't know how to xadd() a '",
1312             ref( $new_item ) || "scalar",
1313             "' ",
1314             defined $name ? $name : "item",
1315             " to a '$t' (which is what is on the top of the xstack)";
1316             }
1317              
1318             }
1319              
1320             $XFD::cur_self->xpush( $new_item )
1321             if ref $new_item && ref $new_item ne "SCALAR";
1322             return $new_item;
1323             }
1324              
1325              
1326             =item xset
1327              
1328             Like C, but tries to set a named value. Dies if the value is
1329             already defined (so duplicate values aren't silently ignored).
1330              
1331             xset $name, $new_item;
1332              
1333             does this:
1334              
1335             Top of Stack Action
1336             ============ ======
1337             scalar xpeek = $new_item;
1338             SCALAR ref ${xpeek} = $new_item;
1339             HASH ref xpeek->{$name} = $new_item;
1340             blessed object xpeek->$name( $new_item );
1341              
1342             Trying to xset any other types results in an exception.
1343              
1344             After the above action (except when the top is a scalar or SCALAR ref), an
1345              
1346             xpush $new_item;
1347              
1348             is done so that more may be added to the item.
1349              
1350             $name defaults to the LocalName of the current node if it is an
1351             attribute or element, so
1352              
1353             xset $foo;
1354              
1355             will DWYM. TODO: search up the current node's ancestry for a LocalName
1356             when handling other event types.
1357              
1358             If no parameters are provided, xvalue is used.
1359              
1360             =cut
1361              
1362             sub xset {
1363             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
1364             my $name = @_ > 1
1365             ? shift
1366             : do {
1367             croak "$XFD::ctx->{EventType} has no LocalName"
1368             unless exists $XFD::ctx->{Node}->{LocalName}
1369             && defined exists $XFD::ctx->{Node}->{LocalName};
1370             croak "$XFD::ctx->{EventType} has a LocalName of ''"
1371             unless length $XFD::ctx->{Node}->{LocalName};
1372             $XFD::ctx->{Node}->{LocalName};
1373             };
1374              
1375              
1376             my $new_item = @_ ? shift : $XFD::cur_self->xvalue;
1377             emit_trace_SAX_message "EventPath: xset()ing ", $name, " => ", $new_item if is_tracing;
1378              
1379             unless ( @{$XFD::cur_self->{XStack}} ) {
1380             $XFD::cur_self->xpush( $new_item );
1381             }
1382             else {
1383             my $e = $XFD::cur_self->{XStack}->[-1];
1384             my $top = $e->{"#elt"};
1385             my $t = ref $top;
1386             my $meth;
1387              
1388             if ( $t eq "" ) {
1389             croak "already xset() scalar on top of xstack"
1390             if $e->{scalar}++;
1391             $e->{"#elt"} = $new_item;
1392             }
1393             elsif ( $t eq "SCALAR" ) {
1394             croak "already xset() SCALAR reference on top of xstack"
1395             if $e->{scalar}++;
1396             $$top = $new_item;
1397             }
1398             elsif ( $t eq "HASH" ) {
1399             croak "already xset() element '$name' of HASH on top of xstack"
1400             if $e->{$name}++;
1401             $top->{$name} = $new_item;
1402             }
1403             ## See if it's a blessed object that can add thingamies"
1404             elsif (
1405             ( $meth = UNIVERSAL::can( $top, $name ) )
1406             || ( $meth = UNIVERSAL::can( $top, "set_$name" ) )
1407             ) {
1408             croak "already xset() accessor $name() of ", ref $top, " on top of xstack"
1409             if $e->{$name}++;
1410             $top->$meth( $new_item );
1411             }
1412             else {
1413             croak "don't know how to xset() $name for a '$t' (which is what is on the top of the xstack)";
1414             }
1415              
1416             $XFD::cur_self->xpush( $new_item )
1417             if ref $new_item && ref $new_item ne "SCALAR";
1418             }
1419              
1420             return $new_item;
1421             }
1422              
1423              
1424              
1425             =item xoverwrite
1426              
1427             Exactly like xset but does not complain if the value has already been
1428             xadd(), xset() or xoverwrite().
1429              
1430             =cut
1431              
1432             sub xoverwrite {
1433             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
1434             my $name = @_ > 1
1435             ? shift
1436             : do {
1437             croak "$XFD::ctx->{EventType} has no LocalName"
1438             unless exists $XFD::ctx->{Node}->{LocalName}
1439             && defined exists $XFD::ctx->{Node}->{LocalName};
1440             croak "$XFD::ctx->{EventType} a LocalName of ''"
1441             unless length $XFD::ctx->{Node}->{LocalName};
1442             $XFD::ctx->{Node}->{LocalName};
1443             };
1444              
1445              
1446             my $new_item = @_ ? shift : $XFD::cur_self->xvalue;
1447             emit_trace_SAX_message "EventPath: xoverwrite()ing ", $name, " => ", $new_item if is_tracing;
1448              
1449             unless ( @{$XFD::cur_self->{XStack}} ) {
1450             $XFD::cur_self->xpush( $new_item );
1451             }
1452             else {
1453             my $e = $XFD::cur_self->{XStack}->[-1];
1454             my $top = $e->{"#elt"};
1455              
1456             my $t = ref $top;
1457             my $meth;
1458             if ( $t eq "" ) {
1459             $XFD::cur_self->{XStack}->[-1]->{"#elt"} = $new_item;
1460             $e->{scalar}++;
1461             }
1462             elsif ( $t eq "SCALAR" ) {
1463             $$top = $new_item;
1464             $e->{scalar}++;
1465             }
1466             elsif ( $t eq "HASH" ) {
1467             $top->{$name} = $new_item;
1468             $e->{$name}++;
1469             }
1470             ## See if it's a blessed object that can add thingamies"
1471             elsif (
1472             ( $meth = UNIVERSAL::can( $top, $name ) )
1473             || ( $meth = UNIVERSAL::can( $top, "set_$name" ) )
1474             ) {
1475             $top->$meth( $new_item );
1476             $e->{$name}++;
1477             }
1478             else {
1479             croak "don't know how to xoverwrite $name for a '$t' (which is what is on the top of the xstack)";
1480             }
1481             $XFD::cur_self->xpush( $new_item )
1482             if ref $new_item && ref $new_item ne "SCALAR";
1483             }
1484              
1485             return $new_item;
1486             }
1487              
1488              
1489              
1490             =item xpeek
1491              
1492             Rules => [
1493             "foo" => sub {
1494             my $elt = $_[1];
1495             xpeek->set_name( $elt->{Attributes}->{"{}name"} );
1496             },
1497             "/end::*" => sub {
1498             my $self = shift;
1499             XXXXXXXXXXXXXXXXXXXX
1500             }
1501             ],
1502              
1503              
1504             Returns the top element on the xstack, which was the last thing
1505             pushed in the current context. Throws an exception if the xstack is
1506             empty. To check for an empty stack, use eval:
1507              
1508             my $stack_not_empty = eval { xpeek };
1509              
1510             To peek down the xstack, use a Perlish index value. The most
1511             recently pushed element is index number -1:
1512              
1513             $xpeek( -1 ); ## Same as $self->peek
1514              
1515             The first element pushed on the xstack is element 0:
1516              
1517             $xpeek( 0 );
1518              
1519             An exception is thrown if the index is off either end of the stack.
1520              
1521             =cut
1522              
1523             sub xpeek {
1524             unless ( @_ ) {
1525             croak "xpeek() called on empty stack"
1526             unless @{$XFD::cur_self->{XStack}};
1527              
1528             return $XFD::cur_self->{XStack}->[-1]->{"#elt"};
1529             }
1530              
1531             local $XFD::cur_self = shift if UNIVERSAL::isa( $_[0], __PACKAGE__ );
1532              
1533             my $index = shift;
1534             $index = -1 unless defined $index;
1535              
1536             croak "xpeek( $index ) off the end of the stack"
1537             if $index > $#{$XFD::cur_self->{XStack}}
1538             || $index < -1 - $#{$XFD::cur_self->{XStack}};
1539              
1540             return $XFD::cur_self->{XStack}->[$index]->{"#elt"};
1541             }
1542              
1543             =item xpop
1544              
1545              
1546             my $d = XML::Filter::Dispatcher->new(
1547             Rules => [
1548             ....rules to build an object hierarchy...
1549             ],
1550             );
1551              
1552             my $result = $d->xpop
1553              
1554             Removes an element from the xstack and returns it. Usually
1555             called in a end_document handler or after the document returns to
1556             retrieve a "root" object placed on the stack before the root element
1557             was started.
1558              
1559             =cut
1560              
1561             sub xpop {
1562             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
1563              
1564             croak "xpop() called on empty stack"
1565             unless @{$XFD::cur_self->{XStack}};
1566              
1567             emit_trace_SAX_message "EventPath: xpop()ing ", $XFD::cur_self->{XStack}->[-1]->{"#elt"} if is_tracing;
1568             return (pop @{$XFD::cur_self->{XStack}})->{"#elt"};
1569             }
1570              
1571             =item xstack_empty
1572              
1573             Handy for detecting a nonempty stack:
1574              
1575             warn xpeek unless xstack_empty;
1576              
1577             Because C and C throw exceptions on an empty stack,
1578             C is needed to detect whether it's safe to call them.
1579              
1580             =cut
1581              
1582             sub xstack_empty {
1583             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
1584             return ! @{$XFD::cur_self->{XStack}};
1585             }
1586              
1587             =item xstack_max
1588              
1589             Handy for walking the stack:
1590              
1591             for my $i ( reverse 0 .. xstack_max ) { ## from top to bottom
1592             use BFD;d xpeek( $i );
1593             }
1594              
1595             Because C and C throw exceptions on an empty stack,
1596             C may be used to walk the stack safely.
1597              
1598             =cut
1599              
1600             sub xstack_max {
1601             local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ );
1602             return $#{$XFD::cur_self->{XStack}};
1603             }
1604              
1605             =back
1606              
1607             =cut
1608              
1609             ##
1610             ## A little help for debugging
1611             ##
1612             {
1613             ## This allows us to number events as the arrive and then
1614             ## look those numbers up using their memory addresses
1615             ## This is a 1..N numbering system: the left hand side of the
1616             ## ||= preallocates the key, so there's already 1 key in the
1617             ## hash when this is first called.
1618             ## We also hold on to the events to keep them from reusing numbers.
1619             my %events;
1620             sub _ev($) {
1621             return (
1622             $events{int $_[0]} ||=
1623             { ctx => $_, ev => 0+keys %events }
1624             )->{ev};
1625             }
1626             }
1627              
1628             {
1629             ## This allows us to number events as the arrive and then
1630             ## look those numbers up using their memory addresses
1631             ## This is a 1..N numbering system: the left hand side of the
1632             ## ||= preallocates the key, so there's already 1 key in the
1633             ## hash when this is first called.
1634             my %postponements;
1635             sub _po($) {
1636             return (
1637             $postponements{int $_[0]} ||=
1638             { ctx => $_, po => 0+keys %postponements}
1639             )->{po};
1640             }
1641             }
1642              
1643             ##
1644             ## SAX handlers
1645             ##
1646              
1647             ## a helper called by the handlers...
1648             ## TODO: optimize most of this away for all events except start_document and
1649             ## start_ element, since these are the only two events that can have
1650             ## child events. The others should be able to get away with much simpler logic.
1651              
1652             sub _call_queued_subs {
1653             my $self = shift ;
1654             my $event_type = shift;
1655              
1656             local $XFD::cur_self = $self;
1657              
1658             $XFD::ctx->{EventType} = $event_type;
1659             $XFD::ctx->{Node} = $_[0];
1660             $XFD::ctx->{HighScore} = -1;
1661              
1662             for ( @{$XFD::ctx->{$event_type}} ) {
1663             $_->[0]->( @{$_}[1..$#$_], @_ );
1664             }
1665              
1666             $self->_queue_pending_event( $XFD::ctx );
1667             }
1668              
1669              
1670             sub _call_queued_end_subs {
1671             my $self = shift ;
1672             my $event_type = shift;
1673             my $start_ctx = shift;
1674              
1675             local $XFD::cur_self = $self;
1676              
1677             local $XFD::ctx = $self->_build_end_ctx_from_start_ctx( $start_ctx );
1678             $XFD::ctx->{EventType} = $event_type;
1679             $XFD::ctx->{Node} = $_[0];
1680             $XFD::ctx->{HighScore} = -1;
1681              
1682             my $i = 0;
1683             if ( exists $start_ctx->{EndSubs} ) {
1684             ## EndSubs are subroutines that are placed in a start_ context
1685             ## to be run when the matching end_ event is reached. Their
1686             ## purpose is usually to evaluate some postponement and queue
1687             ## an action for that postponement if need be.
1688             my $end_subs = $start_ctx->{EndSubs};
1689             while ( @$end_subs ) {
1690             local $_ = pop @$end_subs;
1691             emit_trace_SAX_message "EventPath: *** calling EndSub ", $i++, " for event ", _ev $start_ctx, " ***" if is_tracing;
1692              
1693             $_->[0]->( @{$_}[1..$#$_], @_ )
1694             }
1695             }
1696              
1697             # $start_ctx->{EndContext} = $XFD::ctx;
1698              
1699             $self->_queue_pending_event( $XFD::ctx );
1700             }
1701              
1702             sub _execute_next_action {
1703             my $self = shift;
1704              
1705             my $actions = $XFD::ctx->{PendingActions};
1706             my $key = ( sort { $a <=> $b } keys %$actions)[-1];
1707             return unless defined $key;
1708              
1709             emit_trace_SAX_message "EventPath: *** executing action $key for event ", _ev $XFD::ctx, " ***" if is_tracing;
1710             my $sub = shift @{$actions->{$key}};
1711             delete $actions->{$key} unless @{$actions->{$key}};
1712              
1713             $self->{LastHandlerResult} = $sub->( $self, $XFD::ctx->{Node} );
1714              
1715             emit_trace_SAX_message "EventPath: result set to ", defined $self->{LastHandlerResult} ? "'$self->{LastHandlerResult}'" : "undef" if is_tracing;
1716              
1717             # if ( exists $XFD::ctx->{EndContext} && exists $XFD::ctx->{EndSubs} ) {
1718             # my $i = 0;
1719             # while ( @{$XFD::ctx->{EndSubs}} ) {
1720             # local $_ = shift @{$XFD::ctx->{EndSubs}};
1721             # emit_trace_SAX_message "EventPath: *** calling delayed (due to postponed start_) EndSub ", $i++, " for event ", _ev $XFD::ctx, " ***" if is_tracing;
1722             # local $XFD::ctx = $XFD::ctx->{EndContext};
1723             #
1724             # $_->[0]->( @{$_}[1..$#$_], @_ )
1725             # }
1726             # }
1727              
1728             }
1729              
1730              
1731             sub _queue_pending_event {
1732             my $self = shift;
1733             local $XFD::cur_self = $self;
1734              
1735             my ( $ctx ) = @_;
1736              
1737             # if ( exists $ctx->{PendingActions}
1738             # || ( $ctx->{Postponements} && @{$ctx->{Postponements}} )
1739             # ) {
1740             # emit_trace_SAX_message "EventPath: queuing pending event ", _ev $ctx if is_tracing;
1741             push @{$self->{PendingEvents}}, $ctx;
1742             # }
1743             # else {
1744             # emit_trace_SAX_message "EventPath: not queuing event ", _ev $ctx if is_tracing;
1745             # }
1746              
1747             while ( @{$self->{PendingEvents}}
1748             && ! $self->{PendingEvents}->[0]->{PostponementCount}
1749             ) {
1750             my $c = shift @{$self->{PendingEvents}};
1751              
1752             if ( show_buffer_highwater
1753             && @{$self->{PendingEvents}}
1754             && @{$self->{PendingEvents}} >= $self->{BufferHighwater}
1755             ) {
1756             $self->{BufferHighwater} = @{$self->{PendingEvents}};
1757             if ( @{$self->{PendingEvents}} > $self->{BufferHighwater} ) {
1758             @{$self->{BufferHighwaterEvents}} = ( $c );
1759             }
1760             else {
1761             push @{$self->{BufferHighwaterEvents}}, $c;
1762             }
1763             }
1764              
1765             if (
1766             substr( $c->{EventType}, 0, 4 ) ne "end_"
1767             # substr( $c->{EventType}, 0, 6 ) eq "start_"
1768             # || $c->{EventType} eq "attribute"
1769             ) {
1770             push @{$self->{XStackMarks}}, scalar @{$self->{XStack}};
1771             }
1772              
1773             if ( exists $c->{PendingActions} ) {
1774             ## The "-1" here implements the "last match wins" logic.
1775             ## All rules are evaluated in order; the last rule to evaluate
1776             ## queues its action last. TODO: test this in the face of
1777             ## precursors; actions may need to be set based on action
1778             ## numbers or something.
1779             local $XFD::ctx = $c;
1780             $self->_execute_next_action;
1781             }
1782             else {
1783             emit_trace_SAX_message "EventPath: discarding event ", _ev $c if is_tracing;
1784             }
1785              
1786             if (
1787             # $c->{EventType} eq "end_element"
1788             ## Note that we don't unwind on end_document. Perhaps we should.
1789             # || $c->{EventType} eq "attribute"
1790             substr( $c->{EventType}, 0, 6 ) ne "start_"
1791             ) {
1792             my $level = pop @{$self->{XStackMarks}};
1793             if ( $level < @{$self->{XStack}} ) {
1794             emit_trace_SAX_message "EventPath: unwinding from xstack: ", splice @{$self->{XStack}}, $level if is_tracing;
1795             splice @{$self->{XStack}}, $level;
1796             }
1797             }
1798             }
1799              
1800             emit_trace_SAX_message
1801             "EventPath: ",
1802             @{$self->{PendingEvents}} . " events pending (",
1803             join( ", ",
1804             map
1805             $_->{PostponementCount}
1806             ? _ev( $_ ).":$_->{PostponementCount}:".(
1807             exists $_->{PendingActions}
1808             ? "action" ## TODO: dump actions?
1809             : ""
1810             )
1811             : (),
1812             @{$self->{PendingEvents}}
1813             ),
1814             ")" if is_tracing;
1815             }
1816              
1817              
1818             sub _build_ctx {
1819             my $self = shift;
1820              
1821             my $parent_ctx = $self->{CtxStack}->[-1];
1822             my $ctx = { %{$parent_ctx->{ChildCtx}} };
1823             $ctx->{Vars} = { %{$parent_ctx->{Vars}} }
1824             if exists $parent_ctx->{Vars};
1825              
1826             if ( exists $ctx->{EndSubs} ) {
1827             ## When descendant-or-self:: queues up cloned postponements
1828             ## for the child contexts, the child contexts don't exist yet
1829             ## so it puts undef where they should be. This loop replaces
1830             ## those undefs with the freshly minted context. Only
1831             ## descendant-or-self:: does this, so we can assume those are
1832             ## the only kind of EndSubs we'll find.
1833             $ctx->{EndSubs} = [ map [ @$_ ], @{$ctx->{EndSubs}} ];
1834             for ( @{$ctx->{EndSubs}} ) {
1835             die "The first param to the child's EndSubs should be undef not $_->[1]"
1836             if defined $_->[1];
1837             $_->[1] = $ctx;
1838             }
1839             }
1840             emit_trace_SAX_message "EventPath: built event ", _ev $ctx, " (from event ", _ev $parent_ctx, ")" if is_tracing;
1841             return $ctx;
1842             }
1843              
1844              
1845             sub _build_end_ctx_from_start_ctx {
1846             my $self = shift;
1847              
1848             my ( $start_ctx ) = @_;
1849              
1850             ## The $start_ctx's actions have yet to run. They may
1851             ## add actions to the end event.
1852             $start_ctx->{PendingEndActions} ||= {};
1853              
1854             my $end_ctx = {
1855             PendingActions => $start_ctx->{PendingEndActions},
1856             };
1857             emit_trace_SAX_message "EventPath: built end_ event ", _ev $end_ctx, " (from event ", _ev $start_ctx, ")" if is_tracing;
1858             return $end_ctx;
1859             }
1860              
1861              
1862             sub start_document {
1863             my $self = shift ;
1864              
1865             $self->{XStack} = [];
1866             $self->{XStackMarks} = [];
1867             delete $self->{DocStartedFlags};
1868             $self->{PendingEvents} = [];
1869              
1870             if ( $self->{DocSub} ) {
1871             ## The "[]" is the postponement record to pass in.
1872             @{$self->{DocCtx}->{start_document}} = [ $self->{DocSub}, $self, [] ];
1873             }
1874              
1875              
1876             if ( show_buffer_highwater ) {
1877             $self->{BufferHighwater} = 0;
1878             $self->{BufferHighwaterEvents} = [];
1879             }
1880              
1881             local $XFD::ctx = $self->{DocCtx};
1882             emit_trace_SAX_message "EventPath: using doc event ", _ev $XFD::ctx if is_tracing;
1883             $self->{CtxStack} = [ $XFD::ctx ];
1884             $self->_call_queued_subs( "start_document", @_ );
1885              
1886             return;
1887             }
1888              
1889              
1890             sub end_document {
1891             my $self = shift ;
1892             my ( $doc ) = @_;
1893              
1894             confess "Bug: context stack should not be empty!"
1895             unless @{$self->{CtxStack}};
1896              
1897             my $start_ctx = pop @{$self->{CtxStack}};
1898             die "end_document() mismatch: ",
1899             defined $start_ctx ? $start_ctx->{EventType} : "undef",
1900             " from the context stack\n"
1901             unless $start_ctx->{EventType} eq "start_document";
1902              
1903             confess "Bug: context stack should be empty!"
1904             unless ! @{$self->{CtxStack}};
1905              
1906             $self->_call_queued_end_subs( end_document => $start_ctx, @_ );
1907              
1908             if ( exists $self->{AutoStartedHandlers} ) {
1909             for ( reverse @{$self->{AutoStartedHandlers}} ) {
1910             $self->{LastHandlerResult} = $_->end_document( {} );
1911             }
1912             }
1913              
1914             @{$self->{XStack}} = ();
1915              
1916             if ( show_buffer_highwater ) {
1917             warn ref $self,
1918             " buffer highwater mark was ",
1919             $self->{BufferHighwater} + 1,
1920             $self->{BufferHighwater}
1921             ? (
1922             " for event",
1923             @{$self->{BufferHighwaterEvents}} > 1
1924             ? "s"
1925             : (),
1926             ":\n",
1927             map {
1928             my $n = $_->{Node};
1929             join( "",
1930             " $_->{EventType}",
1931             defined $n->{Name}
1932             ? ( " ", $n->{Name} )
1933             : (),
1934             defined $n->{Data}
1935             ? ( " \"",
1936             length $n->{Data} > 40
1937             ? ( substr( $n->{Data}, 0, 40 ), "..." )
1938             : $n->{Data},
1939             "\""
1940             )
1941             : (),
1942             "\n"
1943             );
1944             } @{$self->{BufferHighwaterEvents}}
1945             )
1946             : ( " (no events were buffered)\n" );
1947             @{$self->{BufferHighwaterEvents}} = ();
1948             }
1949              
1950             return $self->{LastHandlerResult};
1951             }
1952              
1953              
1954             sub start_element {
1955             my $self = shift ;
1956             my ( $elt ) = @_ ;
1957              
1958             push @{$self->{CtxStack}}, local $XFD::ctx = $self->_build_ctx;
1959              
1960             {
1961             local $XFD::cur_self = $self;
1962              
1963             $XFD::ctx->{EventType} = "start_element";
1964             $XFD::ctx->{Node} = $_[0];
1965             $XFD::ctx->{HighScore} = -1;
1966              
1967             for ( @{$XFD::ctx->{start_element}} ) {
1968             $_->[0]->( @{$_}[1..$#$_], @_ );
1969             }
1970              
1971             $self->{start_elementSub}->( $self, [] )
1972             if $self->{start_elementSub};
1973              
1974             $self->_queue_pending_event( $XFD::ctx );
1975             }
1976              
1977             if (
1978             (
1979             $self->{attributeSub}
1980             || exists $XFD::ctx->{ChildCtx}->{attribute} ## Any attr handlers?
1981             )
1982             && exists $elt->{Attributes} ## Any attrs?
1983             ) {
1984             $XFD::ctx->{ChildCtx} ||= {};
1985              
1986             ## Put attrs in a reproducible order. perl5.6.1 and perl5.8.0
1987             ## use different hashing algs, this helps keep code stable
1988             ## across versions.
1989             my @attrs = values %{$elt->{Attributes}};
1990             @attrs = sort {
1991             ( $a->{Name} || "" ) cmp ( $b->{Name} || "" )
1992             } @attrs if $self->{SortAttributes};
1993              
1994             for my $attr ( @attrs ) {
1995             local $XFD::ctx = $self->_build_ctx;
1996              
1997             $XFD::ctx->{EventType} = "attribute";
1998             $XFD::ctx->{Node} = $attr;
1999             $XFD::ctx->{HighScore} = -1;
2000              
2001             for ( @{$XFD::ctx->{attribute}} ) {
2002             $_->[0]->( @{$_}[1..$#$_], @_ );
2003             }
2004              
2005             $self->{attributeSub}->( $self, [] )
2006             if $self->{attributeSub};
2007              
2008             $self->_queue_pending_event( $XFD::ctx );
2009             }
2010             }
2011              
2012             return;
2013             }
2014              
2015              
2016             sub end_element {
2017             my $self = shift ;
2018             my ( $elt ) = @_ ;
2019              
2020             my $start_ctx = pop @{$self->{CtxStack}}; # Remove the child context
2021              
2022             $self->_call_queued_end_subs( end_element => $start_ctx, @_ );
2023              
2024             return;
2025             }
2026              
2027              
2028             sub start_prefix_mapping {
2029             my $self = shift ;
2030             my ( $elt ) = @_ ;
2031              
2032             ## Prefix mappings aren't containers, but they need to
2033             ## have contexts saved and restored in order like containers.
2034             ## So we have a stack within a stack to take care of them.
2035             push @{$self->{CtxStack}->[-1]->{PrefixContexts}},
2036             local $XFD::ctx = $self->_build_ctx;
2037              
2038             {
2039             local $XFD::cur_self = $self;
2040              
2041             $XFD::ctx->{EventType} = "start_prefix_mapping";
2042             $XFD::ctx->{Node} = $_[0];
2043             $XFD::ctx->{HighScore} = -1;
2044              
2045             for ( @{$XFD::ctx->{start_prefix_mapping}} ) {
2046             $_->[0]->( @{$_}[1..$#$_], @_ );
2047             }
2048              
2049             $self->{start_prefix_mappingSub}->( $self, [] )
2050             if $self->{start_prefix_mappingSub};
2051              
2052             $self->_queue_pending_event( $XFD::ctx );
2053             }
2054              
2055             return;
2056             }
2057              
2058              
2059             sub end_prefix_mapping {
2060             my $self = shift ;
2061             my ( $elt ) = @_ ;
2062              
2063             my $start_ctx = pop @{$self->{CtxStack}->[-1]->{PrefixContexts}};
2064              
2065             $self->_call_queued_end_subs( end_prefix_mapping => $start_ctx, @_ );
2066              
2067             return;
2068             }
2069              
2070              
2071             compile_missing_methods __PACKAGE__, <<'CODE_END', sax_event_names;
2072             #line 1 XML::Filter::Dispatcher::()
2073             sub {
2074             my $self = shift ;
2075             return unless (
2076             @{$self->{CtxStack}}
2077             && $self->{CtxStack}->[-1]->{ChildCtx}->{}
2078             )
2079             || $self->{Sub};
2080              
2081             my ( $data ) = @_;
2082              
2083             local $XFD::cur_self = $self;
2084              
2085             local $XFD::ctx = $self->_build_ctx;
2086              
2087             $XFD::ctx->{EventType} = "";
2088             $XFD::ctx->{Node} = $data;
2089             $XFD::ctx->{HighScore} = -1;
2090              
2091             for ( @{$XFD::ctx->{}} ) {
2092             $_->[0]->( @{$_}[1..$#$_], @_ );
2093             }
2094              
2095             $self->{Sub}->( $self, [] )
2096             if $self->{Sub};
2097              
2098             $self->_queue_pending_event( $XFD::ctx );
2099              
2100             $self->_call_queued_end_subs( @_ ) if $XFD::ctx->{EndSubs};
2101              
2102             return undef;
2103             }
2104             CODE_END
2105              
2106             =head2 Notes for XPath Afficianados
2107              
2108             This section assumes familiarity with XPath in order to explain some of
2109             the particulars and side effects of the incremental XPath engine.
2110              
2111             =over
2112              
2113             =item *
2114              
2115             Much of XPath's power comes from the concept of a "node set". A node
2116             set is a set of nodes returned by many XPath expressions.
2117             Event XPath fires a rule once for each node the rule applies to. If there
2118             is a location path in the expression, the rule will fire once for each
2119             matching event (perhaps twice if both start and end SAX events are
2120             trapped, see XXXX below.
2121              
2122             Expressions like C<0>, C, C<1>, and C<'a'> have no location
2123             path and apply to all nodes (including namespace nodes and processing
2124             instructions).
2125              
2126             =item *
2127              
2128             The XPath parser catches some simple mistakes Perlers might make in typing
2129             XPath expressions, such as using C<&&> or C<==> instead of C or C<=>.
2130              
2131             =item *
2132              
2133             SAX does not define events for attributes; these are passed in to the
2134             start_element (but not end_element) methods as part of the element node.
2135             XML::Filter::Dispatcher emulates an event for each attribute in order to
2136             allow selecting attribute nodes.
2137              
2138             =item *
2139              
2140             Axes in path steps (/foo::...)
2141              
2142             Only some axes can be reasonably supported within a SAX framework without
2143             building a DOM and/or queueing SAX events for in-document-order delivery.
2144              
2145             On the other hand, lots of SAX-specific Axes are supported.
2146              
2147             =item *
2148              
2149             text node aggregation
2150              
2151             SAX does not guarantee that C events will be aggregated as
2152             much as possible, as text() nodes do in XPath. Generally, however,
2153             this is not a problem; instead of writing
2154              
2155             "quotation/text()" => sub {
2156             ## BUG: may be called several times within each quotation elt.
2157             my $self = shift;
2158             print "He said '", $self->current_node->{Data}, "'\n'";
2159             },
2160              
2161             write
2162              
2163             "string( quotation )" => sub {
2164             my $self = shift;
2165             print "He said '", xvalue, "'\n'";
2166             },
2167              
2168             The former is unsafe; consider the XML:
2169              
2170             I am GREAT!
2171              
2172             Rules like C<.../text()> will fire twice, which is not what is needed
2173             here.
2174              
2175             Rules like C will fire once, at the end_element event,
2176             with all descendant text of quotation as the expression result.
2177              
2178             You can also place an L
2179             instance upstream of XML::Filter::Dispatcher if you really want to use
2180             the former syntax (but the C example will still generate more
2181             than one event due to the comment).
2182              
2183             =item *
2184              
2185             Axes
2186              
2187             All axes are implemented except for those noted below as "todo" or "not
2188             soon".
2189              
2190             Also except where noted, axes have a principal event type of
2191             C. This node type is used by the C<*> node type test.
2192              
2193             Note: XML::Filter::Dispatcher tries to die() on nonsensical paths like
2194             C or C, but it may miss some.
2195             This is meant to help in debugging user code; the eventual goal is to
2196             catch all such nonsense.
2197              
2198             =over
2199              
2200             =item *
2201              
2202             ancestor:: (XPath, todo, will be limited)
2203              
2204             =item *
2205              
2206             ancestor-or-self:: (XPath, todo, will be limited)
2207              
2208             =item *
2209              
2210             C (XPath, C)
2211              
2212             =item *
2213              
2214             C (XPath)
2215              
2216             Selects start_element, end_element, start_prefix_mapping,
2217             end_prefix_mapping, characters, comment, and
2218             processing_instruction events that are direct "children" of the context
2219             element or document.
2220              
2221             =item *
2222              
2223             C (XPath)
2224              
2225             =item *
2226              
2227             C (XPath)
2228              
2229             =item *
2230              
2231             C (SAX, C)
2232              
2233             Like C, but selects the C event of the
2234             element context node.
2235              
2236             This is usually used in preference to C due to its
2237             brevity.
2238              
2239             Because this selects the end element event, most of the path tests that
2240             may follow other axes are not valid following this axis. self:: and
2241             attribute:: are the only legal axes that may occur to the right of this
2242             axis.
2243              
2244             =item *
2245              
2246             C (SAX, C)
2247              
2248             Like C, but selects the C event of the document
2249             context node.
2250              
2251             Note: Because this selects the end document event, most of the path tests
2252             that may follow other axes are not valid following this axis.
2253             self:: are the only legal axes that may occur to the
2254             right of this axis.
2255              
2256             =item *
2257              
2258             C (SAX, C)
2259              
2260             B. This axis is not necessary given C.
2261              
2262             Like C, but selects the C event of the element
2263             context node. This is like C, but different from
2264             C.
2265              
2266             Note: Because this selects the end element event, most of the path tests
2267             that may follow other axes are not valid following this axis.
2268             attribute:: and self:: are the only legal axes that may occur to the
2269             right of this axis.
2270              
2271             =item *
2272              
2273             C (XPath, B)
2274              
2275             =item *
2276              
2277             C (XPath, B)
2278              
2279             Implementing following axes will take some fucky postponement logic and
2280             are likely to wait until I have time. Until then, setting a flag in
2281             $self in one handler and checking in another should suffice for most
2282             uses.
2283              
2284             =item *
2285              
2286             C (XPath, C, B)
2287              
2288             =item *
2289              
2290             C (XPath, B)
2291              
2292             parent/ancestor paths will not allow you to descend the tree, that would
2293             require DOM building and SAX event queueing.
2294              
2295             =item *
2296              
2297             C (XPath, B)
2298              
2299             =item *
2300              
2301             C (XPath, B)
2302              
2303             Implementing reverse axes will take some fucky postponement logic and
2304             are likely to wait until I have time. Until then, setting a flag in
2305             $self in one handler and checking in another should suffice for most
2306             uses.
2307              
2308             =item *
2309              
2310             C (XPath)
2311              
2312             =item *
2313              
2314             C (SAX, C )
2315              
2316             This is like child::, but selects the C events. This is
2317             usually used in preference to C due to its brevity.
2318              
2319             C is rarely used to drive code handlers because rules that
2320             match document or element events already only fire code handlers on the
2321             C event and not the C event (however, when a
2322             SAX handler is used, such expressions send both start and end events to
2323             the downstream handler, so start:: has utility there).
2324              
2325             =item *
2326              
2327             C (SAX, C)
2328              
2329             B. This axis is confusing compared to and
2330             C, and is not necessary given C.
2331              
2332             This is like C, but selects only the C events.
2333              
2334             =item *
2335              
2336             C (SAX, C)
2337              
2338             B. This axis is not necessary given C.
2339              
2340             This is like C, but selects only the C events.
2341              
2342             =back
2343              
2344             =item *
2345              
2346             Implemented XPath Features
2347              
2348             Anything not on this list or listed as unimplemented is a TODO. Ring me
2349             up if you need it.
2350              
2351             =over
2352              
2353             =item *
2354              
2355             String Functions
2356              
2357             =over
2358              
2359             =item *
2360              
2361             concat( string, string, string* )
2362              
2363             =item *
2364              
2365             contains( string, string )
2366              
2367             =item *
2368              
2369             normalize-space( string? )
2370              
2371             C is equivalent to C.
2372              
2373             =item *
2374              
2375             starts-with( string, string )
2376              
2377             =item *
2378              
2379             string(), string( object )
2380              
2381             Object may be a number, boolean, string, or the result of a location path:
2382              
2383             string( 10 );
2384             string( /a/b/c );
2385             string( @id );
2386              
2387             C is equivalent to C.
2388              
2389             =item *
2390              
2391             string-length( string? )
2392              
2393             string-length() not supported; can't stringify the context node without
2394             keeping all of the context node's children in mempory. Could enable it
2395             for leaf nodes, I suppose, like attrs and #PCDATA containing elts. Drop
2396             me a line if you need this (it's not totally trivial or I'd have done it).
2397              
2398             =item *
2399              
2400             substring( string, number, number? )
2401              
2402             =item *
2403              
2404             substring-after( string, string )
2405              
2406             =item *
2407              
2408             substring-before( string, string )
2409              
2410             =item *
2411              
2412             translate( string, string, string )
2413              
2414             =back
2415              
2416             =item *
2417              
2418             Boolean Functions, Operators
2419              
2420             =over
2421              
2422             =item *
2423              
2424             boolean( object )
2425              
2426             See notes about node sets for the string() function above.
2427              
2428             =item *
2429              
2430             false()
2431              
2432             =item *
2433              
2434             lang( string ) B.
2435              
2436             =item *
2437              
2438             not( boolean )
2439              
2440             =item *
2441              
2442             true()
2443              
2444             =back
2445              
2446             =item *
2447              
2448             Number Functions, Operators
2449              
2450             =over
2451              
2452             =item *
2453              
2454             ceil( number )
2455              
2456             =item *
2457              
2458             floor( number )
2459              
2460             =item *
2461              
2462             number( object? )
2463              
2464             Converts strings, numbers, booleans, or the result of a location path
2465             (C).
2466              
2467             Unlike real XPath, this dies if the object cannot be cleanly converted
2468             in to a number. This is due to Perl's varying level of support for NaN,
2469             and may change in the future.
2470              
2471             C is equivalent to C.
2472              
2473             =item *
2474              
2475             round ( number )
2476              
2477             =item * sum( node-set ) B.
2478              
2479             =back
2480              
2481             =item *
2482              
2483             Node Set Functions
2484              
2485             Many of these cannot be fully implemented in an event oriented
2486             environment.
2487              
2488             =over
2489              
2490             =item *
2491              
2492             last() B.
2493              
2494             =item *
2495              
2496             position() B.
2497              
2498             =item *
2499              
2500             count( node-set ) B.
2501              
2502             =back
2503              
2504             =item *
2505              
2506             id( object ) B.
2507              
2508             =item *
2509              
2510             local-name( node-set? )
2511              
2512             =item *
2513              
2514             namespace-uri( node-set? )
2515              
2516             =item *
2517              
2518             name( node-set? )
2519              
2520             =item *
2521              
2522             All relational operators
2523              
2524             No support for nodesets, though.
2525              
2526             =item *
2527              
2528             All logical operators
2529              
2530             Supports limited nodesets, see the string() function description for details.
2531              
2532             =back
2533              
2534             =item *
2535              
2536             Missing Features
2537              
2538             Some features are entirely or just currently missing due to the lack of
2539             nodesets or the time needed to work around their lack. This is an
2540             incomplete list; it's growing as I find new things not to implement.
2541              
2542             =over
2543              
2544             =item *
2545              
2546             count()
2547              
2548             No nodesets => no count() of nodes in a node set.
2549              
2550             =item *
2551              
2552             last()
2553              
2554             With SAX, you can't tell when you are at the end of what would be a node set
2555             in XPath.
2556              
2557             =item *
2558              
2559             position()
2560              
2561             I will implement pieces of this as I can. None are implemented as yet.
2562              
2563             =back
2564              
2565             =item *
2566              
2567             Todo features
2568              
2569             =over
2570              
2571             =item *
2572              
2573             id()
2574              
2575             =item *
2576              
2577             lang()
2578              
2579             =item *
2580              
2581             sum( node-set )
2582              
2583             =back
2584              
2585             =item *
2586              
2587             Extensions
2588              
2589             =over
2590              
2591             =item *
2592              
2593             is-start-event(), is-end-event()
2594              
2595             XPath has no concept of time; it's meant to operate on a tree of nodes. SAX
2596             has C and C events and C and
2597             C events.
2598              
2599             By default, XML::Filter::Dispatcher acts on start events and not end events
2600             (note that all rules are evaluated on both, but the actions are not run on end_
2601             events by default).
2602              
2603             By including a call to the C or C functions in a
2604             predicate the rule may be forced to fire only on end events or on both start
2605             and end events (using a C<[is-start-event() or is-end-event()]> idiom).
2606              
2607             =back
2608              
2609             =back
2610              
2611             =head1 TODO
2612              
2613             =over
2614              
2615             =item *
2616              
2617             Namespace support.
2618              
2619             =item *
2620              
2621             Text node aggregation so C handlers fire once per text node
2622             instead of once per C event.
2623              
2624             =item *
2625              
2626             Nice messages on legitimate but unsupported axes.
2627              
2628             =item *
2629              
2630             /../ (parent node)
2631              
2632             =item *
2633              
2634             C, C, C methods.
2635              
2636             =back
2637              
2638             =head1 OPTIMIZING
2639              
2640             Pass Assume_xvalue => 0 flag to tell X::F::D not to support xvalue
2641             and xvalue_type, which lets it skip some instructions and run faster.
2642              
2643             Pass SortAttributes => 0 flag to prevent calling sort() for each
2644             element's attributes (note that Perl changes hashing algorithms
2645             occasionally, so setting this to 0 may expose ordering dependancies
2646             in your code).
2647              
2648             =head1 DEBUGGING
2649              
2650             NOTE: this section describes things that may change from version to
2651             version as I need different views in to the internals.
2652              
2653             Set the option Debug => 1 to see the Perl code for the compiled ruleset.
2654             If you have GraphViz.pm and ee installed and working, set Debug => 2 to
2655             see a graph diagram of the intermediate tree generated by the compiler.
2656              
2657             Set the env. var XFDSHOWBUFFERHIGHWATER=1 to see what events were
2658             postponed the most (in terms of how many events had to pile up behind
2659             them). This can be of some help if you experience lots of buffering or
2660             high latency through the filter. Latency meaning the lag between when
2661             an event arrives at this filter and when it is dispatched to its
2662             actions. This will only report events that were actually postponed. If
2663             you have a 0 latency filter, the report will list no events.
2664              
2665             Set the env. var XFDOPTIMIZE=0 to prevent all sorts of optimizations.
2666              
2667             =head1 LIMITATIONS
2668              
2669             =over
2670              
2671             =item *
2672              
2673             NaN is not handled properly due to mediocre support in C,
2674             especially across some platforms that it apparently isn't easily supported on.
2675              
2676             =item *
2677              
2678             -0 (negative zero) is not provided or handled properly
2679              
2680             =item *
2681              
2682             +/- Infinity is not handled properly due to mediocre support in C,
2683             especially across some platforms that it apparently isn't easily supported on.
2684              
2685             =back
2686              
2687             This is more of a frustration than a limitation, but this class requires that
2688             you pass in a type when setting variables (in the C ctor parameter or
2689             when calling C). This is so that the engine can tell what type a
2690             variable is, since string(), number() and boolean() all treat the Perlian C<0>
2691             differently depending on its type. In Perl the digit C<0> means C,
2692             C<0> or C<'0'>, depending on context, but it's a consistent semantic. When
2693             passing a C<0> from Perl lands to XPath-land, we need to give it a type so that
2694             C can, for instance, decide whether to convert it to C<'0'> or
2695             C<'false'>.
2696              
2697             =head1 THANKS
2698              
2699             ...to Kip Hampton, Robin Berjon and Matt Sergeant for sanity checks and
2700             to James Clark (of Expat fame) for posting a Yacc XPath grammar where
2701             I could snarf it years later and add lots of Perl code to it.
2702              
2703             =head1 AUTHOR
2704              
2705             Barrie Slaymaker
2706              
2707             =head1 COPYRIGHT
2708              
2709             Copyright 2002, Barrie Slaymaker, All Rights Reserved.
2710              
2711             You may use this module under the terms of the Artistic or GNU Pulic
2712             licenses your choice. Also, a portion of XML::Filter::Dispatcher::Parser
2713             is covered by:
2714              
2715             The Parse::Yapp module and its related modules and shell scripts are
2716             copyright (c) 1998-1999 Francois Desarmenien, France. All rights
2717             reserved.
2718              
2719             You may use and distribute them under the terms of either the GNU
2720             General Public License or the Artistic License, as specified in the
2721             Perl README file.
2722              
2723             Note: Parse::Yapp is only needed if you want to modify
2724             lib/XML/Filter/Dispatcher/Grammar.pm
2725              
2726             =cut
2727              
2728             1 ;