File Coverage

blib/lib/XML/Snap.pm
Criterion Covered Total %
statement 10 12 83.3
branch n/a
condition n/a
subroutine 4 4 100.0
pod n/a
total 14 16 87.5


line stmt bran cond sub pod time code
1             package XML::Snap;
2            
3 1     1   26799 use 5.006;
  1         3  
  1         57  
4 1     1   5 use strict;
  1         3  
  1         41  
5 1     1   5 use warnings FATAL => 'all';
  1         6  
  1         50  
6            
7 1     1   438 use XML::Parser;
  0            
  0            
8             use Scalar::Util qw(reftype refaddr);
9             use Carp;
10             #use Data::Dumper;
11            
12             =head1 NAME
13            
14             XML::Snap - Makes simple XML tasks a snap!
15            
16             =head1 VERSION
17            
18             Version 0.04
19            
20             =cut
21            
22             our $VERSION = '0.04';
23            
24            
25             =head1 SYNOPSIS
26            
27             XML::Snap is a quick and relatively modern way to work with XML. If, like me, you have little patience for the endless reams of standards
28             the XML community burdens you with, maybe this is the module for you. If you want to maintain compatibility with normal people, though, and you want
29             to avoid scaling problems later, you're probably better off sitting down and understanding XML::LibXML and the SAX ecosystem.
30            
31             The other large omission from the model at present is namespaces. If you use namespaces (and honestly, most applications do) then again, you
32             should be using libxml or one of the SAX parsers.
33            
34             Still here? Cool. XML::Snap is my personal way of dealing with XML when I can't avoid it. It's roughly based on my experiences with my ANSI C
35             library "xmlapi", which I wrote back in 2000 to wrap the Expat parser. Along the way, I ended up building a lot of handy functionality into the
36             library that made C programming palatable - and a lot of that was string and list manipulation that Perl renders superfluous. So after working
37             with a port for a while, I tossed it. This is what I ended up with.
38            
39             XML::Snap works in DOM mode. That is, it reads in XML from a string or file and puts it into a tree for you to manipulate, then allows
40             you to write it back out. The tree is pretty minimalistic. The children of a node can be either plain text (as strings) or elements (as XML::Snap
41             objects or a subclass), and each element can have a hash of attributes. Order of attributes is maintained, as this is actually significant in XML.
42             There is also a clear distinction between content and tags. So some of the drawbacks to XML::Simple are averted with this setup.
43            
44             Right at the moment, comments in the XML are not preserved. If you need to work with XML comments, XML::Snap is not your module.
45            
46             Right at the moment, a streaming mode (like SAX) is also not provided, but it's something I want to get to soon. In streaming mode, comments
47             I be preserved, but not available to the user until further notice. But since streaming has not yet been implemented, that's kind of moot.
48             Streaming will be implemented in a separate module, probably to be named XML::Skim.
49            
50             Some examples!
51            
52             use XML::Snap;
53            
54             XML::Snap->load ('myfile.xml');
55             my $query = XML::Snap->search ('mynode');
56             while (my $hit = <$query>) {
57             ... do things with $hit ...
58             }
59            
60             =head1 CREATING AND LOADING XML ELEMENTS
61            
62             =head2 new (name, [attribute, value, ...])
63            
64             The C function just creates a new, empty XML node, simple as that. It has a name and optional attributes with values.
65             Note that the order of attributes will be retained. Duplicates are not permitted (storage is in a hash); this departs from the XML
66             model so it might cause you troubles - but I know I've never personally encountered XML where it would make a difference.
67            
68             =cut
69            
70             sub new {
71             my ($class, $name) = @_;
72            
73             bless ({
74             name=>$name,
75             parent=>undef,
76             attrs=>[],
77             attrval=>{},
78             children=>[]}, $class);
79             }
80            
81             =head2 parse (string), parse_with_refs (string)
82            
83             The C function uses the Expat parser wrapped in XML::Parse to parse the string supplied, building a tree from it.
84             If you want text to be blessed scalar refs instead of just strings, use C. (This can be easier, depending
85             on what you're going to do with the data structure later.)
86            
87             =cut
88            
89             sub _prepare_parser {
90             my $s = shift;
91             XML::Parser->new (
92             Handlers => {
93             Start => sub {
94             my $p = shift;
95             my $elem = XML::Snap->new (shift);
96             $elem->set (@_);
97             $s->{output}->add ($elem) if defined $s->{output};
98             $s->{output} = $elem;
99             },
100             End => sub {
101             my ($p, $el) = @_;
102             my $parent = $s->{output}->parent;
103             $s->{output} = $parent if defined $parent and ref($parent);
104             },
105             Char => sub {
106             my ($p, $str) = @_;
107             $s->{output}->add($s->{refs} ? \$str : $str) if defined $s->{output}; # Note that plain text not enclosed in nodes will be lost. I'm OK with that.
108             }
109             }
110             );
111             }
112            
113             sub parse {
114             my ($whatever, $string) = @_;
115            
116             my $stash = {refs=>0};
117             my $parser = _prepare_parser($stash);
118             $parser->parse ($string);
119             return $stash->{output};
120             }
121             sub parse_with_refs {
122             my ($whatever, $string) = @_;
123            
124             my $stash = {refs=>1};
125             my $parser = _prepare_parser($stash);
126             $parser->parse ($string);
127             return $stash->{output};
128             }
129            
130             =head2 load (filename)
131            
132             The C function does the same as C but takes a filename instead.
133            
134             =cut
135            
136             sub load {
137             my ($whatever, $string) = @_;
138            
139             my $stash = {};
140             my $parser = _prepare_parser($stash);
141             $parser->parsefile ($string);
142             return $stash->{output};
143             }
144            
145             =head2 name, is
146            
147             The C method returns the name of the node, that is, the tag used to create it, while
148             the C method tests for equality to a given string (it's just a convenience function).
149            
150             =cut
151            
152             sub name { reftype($_[0]) eq 'HASH' ? $_[0]->{name} : '' }
153             sub is { reftype($_[0]) eq 'HASH' ? $_[0]->{name} eq $_[1] : 0 }
154            
155             use overload ('""' => sub { $_[0]->name . '(' . ref($_[0]) . ':' . refaddr($_[0]) . ')' },
156             '==' => sub { defined(refaddr($_[0])) and defined(refaddr($_[1])) and refaddr($_[0]) eq refaddr($_[1]) },
157             'eq' => sub { refaddr($_[0]) eq refaddr($_[1]) },
158             '!=' => sub { refaddr($_[0]) ne refaddr($_[1]) });
159            
160             =head2 oob(key, value), unoob(key)
161            
162             Sets/gets an out-of-band (OOB) value on a node. This isn't anything special, just a hash
163             attached to each node, but it can be used by a template output for parameterization,
164             and it doesn't affect the output or actions of the XML in any other way.
165            
166             If a value isn't set in a given node, it will ask its parent.
167            
168             Call C to remove an OOB value, or C to remove all OOB values on a node.
169            
170             =cut
171            
172             sub oob {
173             my ($self, $key, $value) = @_;
174            
175             $self->{oob}->{$key} = $value if defined $value;
176             $value = $self->{oob}->{$key};
177             return $value if defined $value;
178             return undef unless defined $self->{parent};
179             return $self->{parent}->oob($key);
180             }
181             sub unoob {
182             my ($self, $key) = @_;
183             if (defined $key) {
184             undef $self->{oob}->{$key};
185             } else {
186             undef $self->{oob};
187             }
188             }
189            
190             =head2 parent, ancestor, root
191            
192             C returns the node's parent, if it has been attached to a parent, while C finds the ancestor with the tag you supply, or the root if you
193             don't give a tag. C is provided as a shorthand for ancestor().
194            
195             =cut
196             sub parent { reftype($_[0]) eq 'HASH' ? $_[0]->{parent} : undef }
197             sub root { $_[0]->ancestor }
198             sub ancestor {
199             my ($self, $name) = @_;
200             my $p = $self->parent;
201             if (not defined $p) {
202             return $self if not defined $name;
203             return undef;
204             }
205             return $p if defined $name and $p->is($name);
206             return $p->ancestor($name);
207             }
208            
209             =head2 delete
210            
211             Deletes a child from a node. Pass the actual reference to the child - or if you're using non-referenced text, the text itself.
212             (In this case, duplicate text will all be deleted.)
213            
214             =cut
215            
216             sub delete {
217             my $self = shift;
218             my $child = shift;
219             my @children = $self->children;
220             my @new_list = grep {$_ != $child} $self->children;
221             $self->{children} = \@new_list;
222             }
223            
224             =head2 detach
225            
226             Detaches the node from its parent, if it is attached. This not only removes the parent reference, but also removes the child
227             from its parent's list of children.
228            
229             =cut
230            
231             sub detach {
232             my $self = shift;
233             return unless $self->{parent};
234             $self->{parent}->delete($self);
235             $self->{parent} = undef;
236             }
237            
238            
239             =head1 WORKING WITH ATTRIBUTES
240            
241             Each tag in XML can have zero or more attributes, each of which has a value. Order is significant and preserved.
242            
243             =head2 set, unset
244            
245             The C method sets one or more attributes; its parameter list is considered to be key, value, key, value, etc.
246             The C method removes one or more attributes from the list.
247            
248             =cut
249             sub set {
250             my $self = shift;
251            
252             my $value;
253             while (@_) {
254             my $key = shift;
255             $value = shift;
256             return $self->get($key) unless defined $value;
257             push @{$self->{attrs}}, $key if !grep {$_ eq $key} @{$self->{attrs}};
258             $self->{attrval}->{$key} = $value;
259             }
260             return $value;
261             }
262             sub unset {
263             my ($self, $key) = @_;
264             return unless defined $key;
265             my @attributes = grep {$_ ne $key} @{$self->{attrs}};
266             $self->{attrs} = \@attributes;
267             $self->{attrval}->{$key} = undef;
268             }
269            
270            
271             =head2 get (attribute, default), attr_eq (attribute, value)
272            
273             Obviously, C retrieves an attribute value - specify a default value to be used if the attribute is not found,
274             otherwise returns undef.
275            
276             Since it's inconvenient to test attributes that can be undefined, there's a C method that checks that
277             the given attribute is defined I equal to the value given.
278            
279             =cut
280            
281             sub get {
282             my $self = shift;
283             my $key = shift;
284             my $value = $self->{attrval}->{$key};
285             return $value if defined $value;
286             return shift;
287             }
288            
289             sub attr_eq {
290             my $self = shift;
291             my $key = shift;
292             my $value = $self->{attrval}->{$key};
293             return undef unless defined $value;
294             return 1 if $value eq shift;
295             }
296            
297             =head2 attrs (attribute list)
298            
299             The C method retrieves a list of the attributes set.
300            
301             =cut
302            
303             sub attrs { reftype($_[0]) eq 'HASH' ? @{$_[0]->{attrs}} : () }
304            
305             =head2 getlist (attribute list)
306            
307             The C method retrieves a list of attribute values given a list of attributes.
308             (It's just a map.)
309            
310             =cut
311            
312             sub getlist {
313             my $self = shift;
314             map { $self->get($_) } @_;
315             }
316            
317             =head2 getctx (attribute, default)
318            
319             The C method looks at an attribute in the given node, but if it's not found, looks in the parent instead.
320             If there is no parent, the default value is returned.
321            
322             =cut
323            
324             sub getctx {
325             my $self = shift;
326             my $key = shift;
327             my $default = shift;
328            
329             my $value = $self->get($key);
330             return $value if defined $value;
331             return $default unless defined $self->{parent};
332             $self->{parent}->getctx($key, $default);
333             }
334            
335             =head2 attr_order (attribute list)
336            
337             Moves the named attributes to the front of the list; if any appear that aren't set, they stay unset.
338            
339             =cut
340            
341             sub attr_order {
342             my $self = shift;
343            
344             my @list = @_;
345             foreach my $a (@{$self->{attrs}}) {
346             push @list, $a unless grep { $a eq $_ } @list;
347             }
348             $self->{attrs} = \@list;
349             }
350            
351             =head1 WORKING WITH PLAIN TEXT CONTENT
352            
353             Depending on your needs, XML::Snap can store plain text embedded in an XML structure as simple strings,
354             or as scalar references blessed to XML::Snap. Since text may therefore I be blessed, you need
355             to handle it with care unless you're sure it's all references (by parsing with C,
356             for instance).
357            
358             =head2 istext
359            
360             Returns a flag whether a given thing is text or not. "Text" means a scalar or a scalar reference;
361             anything else will not be considered text.
362            
363             This is a class method or an instance method - note that if you're using it as an instance method
364             and you try to call it on a string, your call will die.
365            
366             =cut
367            
368             sub istext {
369             my $thing = shift;
370             my $text = shift || $thing;
371             return 1 unless ref($text);
372             reftype ($text) eq 'SCALAR';
373             }
374            
375             =head2 gettext
376            
377             Returns the actual text of either a string (which is obviously just the string) or a scalar reference.
378             Again, can be called as an instance method if you're sure it's an instance.
379            
380             =cut
381            
382             sub gettext {
383             my $thing = shift;
384             my $text = shift || $thing;
385             return $text unless ref $text;
386             return undef unless reftype ($text) eq 'SCALAR';
387             return $$text;
388             }
389            
390             =head2 bless_text
391            
392             Iterates through the node given, and converts all plain texts into referenced texts.
393            
394             =cut
395            
396             sub _bless_text {
397             my $thing = shift;
398             return $thing if ref($thing);
399             my $r = \$thing;
400             bless $r, 'XML::Snap';
401             return $r;
402             }
403             sub bless_text {
404             my $self = shift;
405             my @children = map {_bless_text($_)} @{$self->{children}};
406             $self->{children} = \@children;
407             foreach my $child ($self->elements) {
408             $child->bless_text;
409             }
410             }
411            
412             =head2 unbless_text
413            
414             Iterates through the node given, and converts all referenced texts into plain texts.
415            
416             =cut
417            
418             sub _unbless_text {
419             my $thing = shift;
420             return $thing if not ref $thing;
421             return $thing unless reftype($thing) eq 'SCALAR';
422             return $$thing;
423             }
424             sub unbless_text {
425             my $self = shift;
426             my @children = map {_unbless_text($_)} @{$self->{children}};
427             $self->{children} = \@children;
428             foreach my $child ($self->elements) {
429             $child->bless_text;
430             }
431             }
432            
433             =head1 WORKING WITH XML STRUCTURE
434            
435             =head2 add, add_pretty
436            
437             The C method adds nodes and text as children to the current node. The C method is a convenience
438             method that ensures that there is a line break if a node is inserted directly at the beginning of its parent
439             (this makes building human-readable XML easier).
440            
441             In addition to nodes and text, you can also add a coderef. This will have no effect on normal operations except
442             for appearing in the list of children for the node, but during writing operations (either for string output or
443             to streams) the coderef will be called to retrieve an iterator that delivers XML snippets. Those snippets will be
444             inserted into the output as though they appeared at the point in the structure where the coderef appears.
445             Extraction from the iterator stops when it returns undef.
446            
447             The next time the writer is used, the original coderef will be called again to retrieve a new iterator.
448            
449             The writer functions (string, stringcontent, write, etc.) can be called with optional parameters that will be passed
450             to each coderef in the structure, if any. This allows an XML::Snap structure to be used as a generic template,
451             for example for writing XML structures extracted from database queries.
452            
453             When adding a node that is already a child of another node, the source node will be copied into the target, not just
454             added. (Otherwise confusion could ensue!)
455            
456             Text is normally added as a simple string, but this can cause problems for consumers, as the output of an
457             iterator might then return a mixture of unblessed strings and blessed nodes, so you end up having to test for
458             blessedness when processing them. For ease of use, you can also add a I to a string; it will work
459             the same in terms of neighboring strings being coalesced, but they'll be stored as blessed string references.
460             Then, use istext or is_node to determine what each element is when iterating through structure.
461            
462             =cut
463            
464             sub add {
465             my $self = shift;
466             foreach my $child (@_) {
467             my $r = ref $child;
468             if (!$r) {
469             my $last = ${$self->{children}}[-1];
470             if (defined $last and istext($last)) {
471             if (ref $last eq '') {
472             ${$self->{children}}[-1] = $last . $child;
473             } else {
474             $$last .= $child;
475             }
476             } else {
477             push @{$self->{children}}, $child;
478             }
479             } elsif ($r eq 'CODE') {
480             push @{$self->{children}}, $child;
481             } elsif ($r eq 'SCALAR') {
482             my $copy = $child;
483             bless $copy, ref $self;
484             my $last = ${$self->{children}}[-1];
485             if (defined $last and istext($last)) {
486             $$copy = gettext($last) . $$copy;
487             ${$self->{children}}[-1] = $copy;
488             } else {
489             push @{$self->{children}}, $copy;
490             }
491             } elsif ($child->can('parent')) {
492             $child = $child->copy if defined $child->parent;
493             $child->{parent} = $self;
494             push @{$self->{children}}, $child;
495             }
496             }
497             }
498             sub add_pretty {
499             my $self = shift;
500             $self->add ("\n") if (!@{$self->{children}});
501             foreach my $child (@_) {
502             $self->add ($child, "\n");
503             }
504             }
505            
506             =head2 prepend, prepend_pretty
507            
508             These do the same as C and C except at the beginning of the child list.
509            
510             =cut
511             sub prepend {
512             my $self = shift;
513             foreach my $child (reverse @_) {
514             my $r = ref $child;
515             if (!$r) {
516             my $first = ${$self->{children}}[0];
517             if (defined $first and istext($first)) {
518             if (ref $first eq '') {
519             ${$self->{children}}[0] = $child . $first;
520             } else {
521             $$first = $child . $$first;
522             }
523             } else {
524             unshift @{$self->{children}}, $child;
525             }
526             } elsif ($r eq 'CODE') {
527             unshift @{$self->{children}}, $child;
528             } elsif ($r eq 'SCALAR') {
529             my $copy = $child;
530             bless $copy, ref $self;
531             my $first = ${$self->{children}}[0];
532             if (defined $first and istext($first)) {
533             $$copy = $$copy . gettext($first);
534             ${$self->{children}}[0] = $copy;
535             } else {
536             unshift @{$self->{children}}, $copy;
537             }
538             } elsif ($child->can('parent')) {
539             $child = $child->copy if defined $child->parent;
540             $child->{parent} = $self;
541             unshift @{$self->{children}}, $child;
542             }
543             }
544             }
545             sub prepend_pretty {
546             my $self = shift;
547             $self->prepend ("\n") if (!@{$self->{children}});
548             foreach my $child (reverse @_) {
549             $self->prepend ("\n", $child);
550             }
551             }
552            
553             =head2 replacecontent, replacecontent_from
554            
555             The C method first deletes the node's children, then calls C to add its parameters.
556             Use C to use the I of the first parameter, with optional matches to effect
557             filtration as the rest of the parameters.
558            
559             These are holdovers from my old xmlapi C library, where I was using in-memory XML structures as
560             "bags of data". Since Perl is basically built on bags of data to start with, I'm not sure these will
561             ever get used in a real situation (certainly I've never needed them yet in Perl).
562            
563             =cut
564            
565             sub replacecontent {
566             my $self = shift;
567             $self->{children} = [];
568             $self->add(@_);
569             }
570             sub replacecontent_from {
571             my $self = shift;
572             my $from = shift;
573             $self->{children} = [];
574             $self->copy_from ($from, @_);
575             }
576            
577             =head2 replace
578            
579             The C method is a little odd; it actually acts on the given node's I, by replacing the callee
580             with the passed parameters. In other words, the parent's children list is modified directly. If there's nothing
581             provided as a replacement, this simply deletes the callee from its parent's child list.
582            
583             =cut
584            
585             sub replace {
586             my $self = shift;
587             my $parent = $self->{parent};
588             return unless $parent;
589             my @children = @{$parent->{children}};
590             my $index = 0;
591             my $count = scalar @children;
592             $index++ until $children[$index] == $self or $index == $count;
593             return if $index == $count;
594             splice @children, $index, 1, @_;
595             $parent->{children} = \@children;
596             }
597            
598            
599             =head2 children, elements
600            
601             The C method just returns the list of children added with C (or the other addition-type methods).
602             The C method returns only those children that are elements, omitting text, comments, and generators.
603            
604             =cut
605            
606             sub children { reftype($_[0]) eq 'HASH' ? @{$_[0]->{children}} : () }
607             sub elements { return () unless reftype($_[0]) eq 'HASH';
608             defined $_[1] ? grep { ref $_ && reftype($_) ne 'SCALAR' && $_->can('is') && $_->is($_[1]) } @{$_[0]->{children}}
609             : grep { ref $_ && reftype($_) ne 'SCALAR' && $_->can('parent') } @{$_[0]->{children}}
610             }
611            
612             =head1 COPYING AND TRANSFORMATION
613            
614             =head2 copy, copy_from, filter
615            
616             The C method copies out a new node (recursively) that is independent, i.e. has no parent.
617             If you give it some matches of the form [name, key, value, coderef], then the coderef will be
618             called on the copy before it gets added, if the copy matches the match.
619             If a match is just a coderef, it'll apply to all text instead.
620            
621             C is just an alias that's a little more self-documenting.
622            
623             Note that the transformations specified will I fire for the root node you're copying,
624             just its children.
625            
626             =cut
627            
628             sub filter { my $self = shift; $self->copy(@_); }
629             sub copy {
630             my $self = shift;
631            
632             my $ret = XML::Snap->new ($self->name);
633             foreach my $key ($self->attrs) {
634             $ret->set ($key, $self->get ($key));
635             }
636            
637             $ret->copy_from ($self, @_);
638             return $ret;
639             }
640             sub copy_from {
641             my $self = shift;
642             my $other = shift;
643            
644             foreach my $child ($other->children) {
645             if (ref $child eq 'CODE') {
646             $self->add ($child);
647             } elsif (not ref $child) {
648             my $child_copy = $child;
649             foreach (@_) {
650             if (ref $_ eq 'CODE') {
651             $child_copy = $_->($child_copy);
652             }
653             }
654             $self->add ($child_copy);
655             } elsif (reftype $child eq 'SCALAR') {
656             my $child_copy = $$child;
657             foreach (@_) {
658             if (ref $_ eq 'CODE') {
659             $child_copy = $_->($child_copy);
660             }
661             }
662             $self->add (\$child_copy);
663             } else {
664             my $child_copy = $child->copy(@_);
665             foreach (@_) {
666             if (ref $_ eq 'ARRAY') {
667             my @match = @$_;
668             if (not defined $match[0] or $child_copy->is($match[0])) {
669             if (not defined $match[1] or $child->copy->get($match[1]) eq $match[2]) {
670             $child_copy = $match[3]->($child_copy);
671             }
672             }
673             }
674             }
675             $self->add ($child_copy);
676             }
677             }
678            
679             return $self;
680             }
681            
682            
683             =head1 STRING/FILE OUTPUT
684            
685             The obvious thing to do with an XML structure once constructed is of course to write it to a file or extract a
686             string from it. XML::Snap gives you one powerful option, which is the use of embedded generators to act as a
687             live template.
688            
689             =head2 string, rawstring
690            
691             Extracts a string from the XML node passed in; C gives you an escaped string that can be parsed back
692             into an equivalent XML structure, while C does not escape anything, so you can't count on equivalence
693             or even legal XML. This is useful if your XML structure is being used to build strings, otherwise it's the wrong
694             tool to use.
695            
696             =cut
697            
698             sub _stringchild {
699             my $self = shift;
700             my $child = shift;
701            
702             return $self->escape ($child) unless ref $child;
703             if (reftype ($child) eq 'SCALAR') {
704             return $self->escape ($$child);
705             }
706             if (ref $child eq 'CODE') {
707             my $generator = $child->($self);
708             my @genreturn;
709             my $ret = '';
710             do {
711             @genreturn = grep { defined $_ } ($generator->($self));
712             foreach my $return (@genreturn) {
713             $ret .= $self->_stringchild($return);
714             }
715             } while (@genreturn);
716             return $ret;
717             }
718             return $child->string;
719             }
720            
721             sub string {
722             my $self = shift;
723             return $$self if reftype($self) eq 'SCALAR';
724             my $ret = '';
725            
726             $ret .= "<" . $self->name;
727             foreach ($self->attrs) {
728             $ret .= " $_=\"" . $self->escape($self->get($_)) . "\"";
729             }
730            
731             my @children = $self->children;
732             if (!@children) {
733             $ret .= "/>";
734             } else {
735             $ret .= ">";
736             foreach my $child (@children) {
737             $ret .= $self->_stringchild ($child);
738             }
739             $ret .= "name . ">";
740             }
741            
742             return $ret;
743             }
744            
745            
746             sub _rawstringchild {
747             my $self = shift;
748             my $child = shift;
749            
750             return $child unless ref $child;
751             if (reftype ($child) eq 'SCALAR') {
752             return $$child;
753             }
754             if (ref $child eq 'CODE') {
755             my $generator = $child->($self);
756             my @genreturn = ();
757             my $ret = '';
758             do {
759             @genreturn = grep { defined $_ } ($generator->($self));
760             foreach my $return (@genreturn) {
761             $ret .= $self->_rawstringchild($return);
762             }
763             } while (@genreturn);
764             return $ret;
765             }
766             return $child->string;
767             }
768             sub rawstring {
769             my $self = shift;
770             return $$self if reftype($self) eq 'SCALAR';
771            
772             my $ret = '';
773            
774             $ret .= "<" . $self->name;
775             foreach ($self->attrs) {
776             $ret .= " $_=\"" . $self->get($_) . "\"";
777             }
778            
779             my @children = $self->children;
780             if (!@children) {
781             $ret .= "/>";
782             } else {
783             $ret .= ">";
784             foreach my $child (@children) {
785             $ret .= $self->_rawstringchild ($child);
786             }
787             $ret .= "name . ">";
788             }
789            
790             return $ret;
791             }
792            
793             =head2 content, rawcontent
794            
795             These do the same, but don't include the parent tag or its closing tag in the string.
796            
797             =cut
798            
799             sub content {
800             my $self = shift;
801             return $$self if reftype($self) eq 'SCALAR';
802            
803             my $ret = '';
804             foreach my $child ($self->children) {
805             $ret .= $self->_stringchild ($child);
806             }
807             return $ret;
808             } # Boy, that's simpler than in the xmlapi version...
809             sub rawcontent {
810             my $self = shift;
811             return $$self if reftype($self) eq 'SCALAR';
812            
813             my $ret = '';
814             foreach my $child ($self->children) {
815             $ret .= $self->_rawstringchild ($child);
816             }
817             return $ret;
818             }
819            
820            
821             =head2 write
822            
823             Given a filename, an optional prefix to write to the file, writes the XML
824             to a file.
825            
826             =cut
827            
828             sub write {
829             my ($self, $f, $prefix) = @_;
830            
831             my $file;
832             open $file, ">:utf8", $f or croak $!;
833             print $file $prefix if defined $prefix;
834             $self->writestream($file);
835             close $file;
836             }
837            
838            
839             =head2 writestream
840            
841             Writes the XML to an open stream.
842            
843             =cut
844            
845             sub _streamchild {
846             my $self = shift;
847             my $child = shift;
848             my $file = shift;
849            
850             if (not ref $child) {
851             print $file $self->escape ($child);
852             return;
853             }
854             if (ref $child eq 'CODE') {
855             my $generator = $child->($self);
856             my @genreturn = ();
857             my $ret = '';
858             do {
859             @genreturn = grep { defined $_ } ($generator->($self));
860             foreach my $return (@genreturn) {
861             $self->_streamchild($return, $file);
862             }
863             } while (@genreturn);
864             return;
865             }
866             $child->writestream($file);
867             }
868            
869             sub writestream {
870             my $self = shift;
871             my $file = shift;
872            
873             if ($self->istext) {
874             print $file $self->escape ($self->gettext);
875             return;
876             }
877             print $file "<" . $self->name;
878             foreach ($self->attrs) {
879             print $file " $_=\"" . $self->escape($self->get($_)) . "\"";
880             }
881            
882             my @children = $self->children;
883             if (!@children) {
884             print $file "/>";
885             } else {
886             print $file ">";
887             foreach my $child (@children) {
888             $self->_streamchild ($child, $file);
889             }
890             print $file "name . ">";
891             }
892             }
893            
894            
895             =head2 escape/unescape
896            
897             These are convenience functions that escape a string for use in XML, or unescape the escaped string for non-XML use.
898            
899             =cut
900            
901             sub escape {
902             my ($whatever, $str) = @_;
903            
904             $str =~ s/&/&/g;
905             $str =~ s/
906             $str =~ s/>/>/g;
907             $str =~ s/\"/"/g;
908             return $str;
909             }
910             sub unescape {
911             my ($whatever, $ret) = @_;
912            
913             $ret =~ s/</
914             $ret =~ s/>/>/g;
915             $ret =~ s/"/"/g;
916             $ret =~ s/&/&/g;
917             return $ret;
918             }
919            
920             =head1 BOOKMARKING AND SEARCHING
921            
922             Finally, there are searching and bookmarking functions for finding and locating given XML in a tree.
923            
924             =head2 getloc
925            
926             Retrieves a location for a given node in its tree, effectively a bookmark. The rules are simple.
927             The bookmark consists of a set of dotted pairs, each being the name of the tag plus a disambiguator
928             if necessary. If the tag is the first of its sibs with its own tag, no disambiguator is necessary.
929             If the tag has an attribute named 'id' that doesn't have a dot or square brackets in it, then
930             square brackets surrounding that value are used as the disambiguator. Otherwise, a number in
931             parentheses identifies the sequence of the tag within the list of siblings with its own tag name.
932            
933             So C matches C and C matches the second 'mytag' in its
934             parent's list of elements. C matches the fourth 'next' in C.
935            
936             This is essentially a much simplified XMLpath (I may be wrong, but I think I came up with it
937             before XMLpaths had been defined). It's quick and dirty, but works.
938            
939             =cut
940            
941             sub getloc {
942             my $self = shift;
943             my $parent = $self->parent;
944             return '' unless $parent;
945             my $ploc = $self->parent->getloc;
946             $ploc .= '.' if $ploc;
947            
948             my $name = $self->name;
949             my $id = $self->get('id');
950             if (defined $id and not $id =~ /[\.\[\]]/) {
951             my $t = $name . "[$id]";
952             my $try = $parent->loc($t);
953             return $ploc . $t if $try == $self;
954             }
955             my $try = $parent->first($name);
956             return $ploc . $name if $try == $self;
957             my $count = 0;
958             foreach my $try ($parent->elements($name)) {
959             return $ploc . "$name($count)" if $try == $self;
960             $count++;
961             }
962             # We shouldn't ever get here; returns undef but we might consider croaking.
963             }
964            
965            
966             =head2 loc
967            
968             Given such a bookmark and the tree it pertains to, finds the bookmarked node.
969            
970             =cut
971            
972             sub loc {
973             my $self = shift;
974             my $l = shift;
975             return $self unless $l;
976             if ($l =~ /\./) {
977             @_ = (split (/\s*\.\s*/, $l), @_);
978             $l = shift;
979             }
980             my $target;
981             if ($l =~ /\s*(.*)\s*\[(.*)\]\s*/) {
982             my ($tag, $id) = ($1, $2);
983             foreach my $child ($self->elements($tag)) {
984             if ($child->attr_eq ('id', $id)) {
985             $target = $child;
986             last;
987             }
988             }
989             } elsif ($l =~ /(.*)\((\d*)\)\s*/) {
990             my ($tag, $count) = ($1, $2);
991             foreach my $child ($self->elements($tag)) {
992             $target = $child unless $count;
993             $count--;
994             }
995             } else {
996             my @children = $self->elements($l);
997             $target = $children[0] if @children;
998             }
999             return undef unless defined $target;
1000             return $target unless @_;
1001             $target->loc(@_);
1002             }
1003            
1004             =head2 all
1005            
1006             Returns a list of XML snippets that meet the search criteria.
1007            
1008             =cut
1009            
1010             sub _test_item {
1011             my ($self, $name, $attr, $val) = @_;
1012             return 0 unless not defined $name or $self->is($name);
1013             return 1 unless defined $attr;
1014             return $self->attr_eq ($attr, $val);
1015             }
1016            
1017             sub all {
1018             my ($self, $name, $attr, $val) = @_;
1019             my @retlist = ();
1020             foreach my $child ($self->elements) {
1021             push @retlist, $child if $child->_test_item($name, $attr, $val);
1022             push @retlist, $child->all ($name, $attr, $val);
1023             }
1024             return @retlist;
1025             }
1026            
1027             =head1 WALKING THE TREE
1028            
1029             XML is a tree structure, and what do we do with trees? We walk them!
1030            
1031             A walker is an iterator that visits each node in turn, then its children, one by one. Walkers come in two flavors:
1032             full walk or element walk; the element walk ignores text.
1033            
1034             The walker constructor optionally takes a closure that will be called on each node before it's returned; the return
1035             from that closure will be what's returned. If it returns undef, the walk will skip that node and go on with the
1036             walk in the same order that it otherwise would have; if it returns a list of C<(value, 'prune')> then the walk will
1037             not visit that node's children, and "value" will be taken as the return value (and it can obviously be undef as well).
1038            
1039             =cut
1040            
1041             =head2 walk
1042            
1043             C is the complete walk. It returns an iterator. Pass it a closure to be called on each node as it's visited.
1044             Modifying the tree's structure is entirely fine as long as you're just manipulating the children of the current node;
1045             if you do other things, the walker might get confused.
1046            
1047             =cut
1048            
1049             sub walk {
1050             my $xml = shift;
1051             my @coord = ('-');
1052             my @stack = ($xml);
1053             my $process = shift;
1054            
1055             return sub {
1056             my $retval;
1057             my $action;
1058             AGAIN:
1059             return undef unless @stack;
1060             if ($coord[-1] eq '-') {
1061             ($retval, $action) = $process ? $process->($stack[-1]) : $stack[-1];
1062             $coord[-1] = 0;
1063             if (defined $action and $action eq 'prune') {
1064             $coord[-1] = '*';
1065             }
1066             } else {
1067             my @c = ref $stack[-1] ? $stack[-1]->children : ();
1068             if ($coord[-1] eq '*' or $coord[-1] >= @c) {
1069             pop @coord;
1070             pop @stack;
1071             return undef unless @stack;
1072             $coord[-1]++;
1073             goto AGAIN;
1074             }
1075             push @stack, $c[$coord[-1]];
1076             push @coord, '-';
1077             goto AGAIN;
1078             }
1079             goto AGAIN unless defined $retval;
1080             $retval;
1081             }
1082             }
1083            
1084             =head2 walk_elem
1085            
1086             For the sake of convenience, C does the same thing, except it only visits nodes, not text.
1087            
1088             =cut
1089            
1090             sub walk_elem {
1091             my $xml = shift;
1092             my @coord = ('-');
1093             my @stack = ($xml);
1094             my $process = shift;
1095            
1096             return sub {
1097             my $retval;
1098             my $action;
1099             AGAIN:
1100             return undef unless @stack;
1101             if ($coord[-1] eq '-') {
1102             ($retval, $action) = $process ? $process->($stack[-1]) : $stack[-1];
1103             $coord[-1] = 0;
1104             if (defined $action and $action eq 'prune') {
1105             $coord[-1] = '*';
1106             }
1107             } else {
1108             my @c = ref $stack[-1] ? $stack[-1]->elements : ();
1109             if ($coord[-1] eq '*' or $coord[-1] >= @c) {
1110             pop @coord;
1111             pop @stack;
1112             return undef unless @stack;
1113             $coord[-1]++;
1114             goto AGAIN;
1115             }
1116             push @stack, $c[$coord[-1]];
1117             push @coord, '-';
1118             goto AGAIN;
1119             }
1120             goto AGAIN unless defined $retval;
1121             $retval;
1122             }
1123             }
1124            
1125             =head2 walk_all
1126            
1127             A simplified walk that simply returns matching nodes.
1128            
1129             my $w = $self->{body}->walk(sub {
1130             my $node = shift;
1131             return ($node, 'prune') if $node->is('trans-unit'); # Segments are returned whole.
1132             return undef; # We don't want the details for anything else, but still walk into its children if it has any.
1133             });
1134            
1135            
1136            
1137             =head2 first
1138            
1139             Returns the first XML element (i.e. non-node thing) that meets the search criteria.
1140            
1141             =cut
1142            
1143             sub first {
1144             my ($self, $name, $attr, $val) = @_;
1145             foreach my $child ($self->children) {
1146             next unless ref($child) and reftype($child) ne 'SCALAR';
1147             next if ref($child) eq 'CODE';
1148             return $child if $child->_test_item($name, $attr, $val);
1149             my $ret = $child->first ($name, $attr, $val);
1150             return $ret if defined $ret;
1151             }
1152             return;
1153             }
1154            
1155            
1156             =head1 AUTHOR
1157            
1158             Michael Roberts, C<< >>
1159            
1160             =head1 BUGS
1161            
1162             Please report any bugs or feature requests to C, or through
1163             the web interface at L. I will be notified, and then you'll
1164             automatically be notified of progress on your bug as I make changes.
1165            
1166            
1167            
1168            
1169             =head1 SUPPORT
1170            
1171             You can find documentation for this module with the perldoc command.
1172            
1173             perldoc XML::Snap
1174            
1175            
1176             You can also look for information at:
1177            
1178             =over 4
1179            
1180             =item * RT: CPAN's request tracker (report bugs here)
1181            
1182             L
1183            
1184             =item * AnnoCPAN: Annotated CPAN documentation
1185            
1186             L
1187            
1188             =item * CPAN Ratings
1189            
1190             L
1191            
1192             =item * Search CPAN
1193            
1194             L
1195            
1196             =back
1197            
1198            
1199             =head1 ACKNOWLEDGEMENTS
1200            
1201            
1202             =head1 LICENSE AND COPYRIGHT
1203            
1204             Copyright 2013 Michael Roberts.
1205            
1206             This program is free software; you can redistribute it and/or modify it
1207             under the terms of the the Artistic License (2.0). You may obtain a
1208             copy of the full license at:
1209            
1210             L
1211            
1212             Any use, modification, and distribution of the Standard or Modified
1213             Versions is governed by this Artistic License. By using, modifying or
1214             distributing the Package, you accept this license. Do not use, modify,
1215             or distribute the Package, if you do not accept this license.
1216            
1217             If your Modified Version has been derived from a Modified Version made
1218             by someone other than you, you are nevertheless required to ensure that
1219             your Modified Version complies with the requirements of this license.
1220            
1221             This license does not grant you the right to use any trademark, service
1222             mark, tradename, or logo of the Copyright Holder.
1223            
1224             This license includes the non-exclusive, worldwide, free-of-charge
1225             patent license to make, have made, use, offer to sell, sell, import and
1226             otherwise transfer the Package with respect to any patent claims
1227             licensable by the Copyright Holder that are necessarily infringed by the
1228             Package. If you institute patent litigation (including a cross-claim or
1229             counterclaim) against any party alleging that the Package constitutes
1230             direct or contributory patent infringement, then this Artistic License
1231             to you shall terminate on the date that such litigation is filed.
1232            
1233             Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER
1234             AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
1235             THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
1236             PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY
1237             YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR
1238             CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR
1239             CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE,
1240             EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1241            
1242            
1243             =cut
1244            
1245             1; # End of XML::Snap