File Coverage

blib/lib/XML/Snap.pm

Criterion	Covered	Total	%
statement	10	12	83.3
branch			n/a
condition			n/a
subroutine	4	4	100.0
pod			n/a
total	14	16	87.5

line	stmt	sub	time	code
1				package XML::Snap;
2
3	1	1	26799	use 5.006;
	1		3
	1		57
4	1	1	5	use strict;
	1		3
	1		41
5	1	1	5	use warnings FATAL => 'all';
	1		6
	1		50
6
7	1	1	438	use XML::Parser;
	0
	0
8				use Scalar::Util qw(reftype refaddr);
9				use Carp;
10				#use Data::Dumper;
11
12				=head1 NAME
13
14				XML::Snap - Makes simple XML tasks a snap!
15
16				=head1 VERSION
17
18				Version 0.04
19
20				=cut
21
22				our $VERSION = '0.04';
23
24
25				=head1 SYNOPSIS
26
27				XML::Snap is a quick and relatively modern way to work with XML. If, like me, you have little patience for the endless reams of standards
28				the XML community burdens you with, maybe this is the module for you. If you want to maintain compatibility with normal people, though, and you want
29				to avoid scaling problems later, you're probably better off sitting down and understanding XML::LibXML and the SAX ecosystem.
30
31				The other large omission from the model at present is namespaces. If you use namespaces (and honestly, most applications do) then again, you
32				should be using libxml or one of the SAX parsers.
33
34				Still here? Cool. XML::Snap is my personal way of dealing with XML when I can't avoid it. It's roughly based on my experiences with my ANSI C
35				library "xmlapi", which I wrote back in 2000 to wrap the Expat parser. Along the way, I ended up building a lot of handy functionality into the
36				library that made C programming palatable - and a lot of that was string and list manipulation that Perl renders superfluous. So after working
37				with a port for a while, I tossed it. This is what I ended up with.
38
39				XML::Snap works in DOM mode. That is, it reads in XML from a string or file and puts it into a tree for you to manipulate, then allows
40				you to write it back out. The tree is pretty minimalistic. The children of a node can be either plain text (as strings) or elements (as XML::Snap
41				objects or a subclass), and each element can have a hash of attributes. Order of attributes is maintained, as this is actually significant in XML.
42				There is also a clear distinction between content and tags. So some of the drawbacks to XML::Simple are averted with this setup.
43
44				Right at the moment, comments in the XML are not preserved. If you need to work with XML comments, XML::Snap is not your module.
45
46				Right at the moment, a streaming mode (like SAX) is also not provided, but it's something I want to get to soon. In streaming mode, comments
47				I be preserved, but not available to the user until further notice. But since streaming has not yet been implemented, that's kind of moot.
48				Streaming will be implemented in a separate module, probably to be named XML::Skim.
49
50				Some examples!
51
52				use XML::Snap;
53
54				XML::Snap->load ('myfile.xml');
55				my $query = XML::Snap->search ('mynode');
56				while (my $hit = <$query>) {
57				... do things with $hit ...
58				}
59
60				=head1 CREATING AND LOADING XML ELEMENTS
61
62				=head2 new (name, [attribute, value, ...])
63
64				The C function just creates a new, empty XML node, simple as that. It has a name and optional attributes with values.
65				Note that the order of attributes will be retained. Duplicates are not permitted (storage is in a hash); this departs from the XML
66				model so it might cause you troubles - but I know I've never personally encountered XML where it would make a difference.
67
68				=cut
69
70				sub new {
71				my ($class, $name) = @_;
72
73				bless ({
74				name=>$name,
75				parent=>undef,
76				attrs=>[],
77				attrval=>{},
78				children=>[]}, $class);
79				}
80
81				=head2 parse (string), parse_with_refs (string)
82
83				The C function uses the Expat parser wrapped in XML::Parse to parse the string supplied, building a tree from it.
84				If you want text to be blessed scalar refs instead of just strings, use C. (This can be easier, depending
85				on what you're going to do with the data structure later.)
86
87				=cut
88
89				sub _prepare_parser {
90				my $s = shift;
91				XML::Parser->new (
92				Handlers => {
93				Start => sub {
94				my $p = shift;
95				my $elem = XML::Snap->new (shift);
96				$elem->set (@_);
97				$s->{output}->add ($elem) if defined $s->{output};
98				$s->{output} = $elem;
99				},
100				End => sub {
101				my ($p, $el) = @_;
102				my $parent = $s->{output}->parent;
103				$s->{output} = $parent if defined $parent and ref($parent);
104				},
105				Char => sub {
106				my ($p, $str) = @_;
107				$s->{output}->add($s->{refs} ? \$str : $str) if defined $s->{output}; # Note that plain text not enclosed in nodes will be lost. I'm OK with that.
108				}
109				}
110				);
111				}
112
113				sub parse {
114				my ($whatever, $string) = @_;
115
116				my $stash = {refs=>0};
117				my $parser = _prepare_parser($stash);
118				$parser->parse ($string);
119				return $stash->{output};
120				}
121				sub parse_with_refs {
122				my ($whatever, $string) = @_;
123
124				my $stash = {refs=>1};
125				my $parser = _prepare_parser($stash);
126				$parser->parse ($string);
127				return $stash->{output};
128				}
129
130				=head2 load (filename)
131
132				The C function does the same as C but takes a filename instead.
133
134				=cut
135
136				sub load {
137				my ($whatever, $string) = @_;
138
139				my $stash = {};
140				my $parser = _prepare_parser($stash);
141				$parser->parsefile ($string);
142				return $stash->{output};
143				}
144
145				=head2 name, is
146
147				The C method returns the name of the node, that is, the tag used to create it, while
148				the C method tests for equality to a given string (it's just a convenience function).
149
150				=cut
151
152				sub name { reftype($_[0]) eq 'HASH' ? $_[0]->{name} : '' }
153				sub is { reftype($_[0]) eq 'HASH' ? $_[0]->{name} eq $_[1] : 0 }
154
155				use overload ('""' => sub { $_[0]->name . '(' . ref($_[0]) . ':' . refaddr($_[0]) . ')' },
156				'==' => sub { defined(refaddr($_[0])) and defined(refaddr($_[1])) and refaddr($_[0]) eq refaddr($_[1]) },
157				'eq' => sub { refaddr($_[0]) eq refaddr($_[1]) },
158				'!=' => sub { refaddr($_[0]) ne refaddr($_[1]) });
159
160				=head2 oob(key, value), unoob(key)
161
162				Sets/gets an out-of-band (OOB) value on a node. This isn't anything special, just a hash
163				attached to each node, but it can be used by a template output for parameterization,
164				and it doesn't affect the output or actions of the XML in any other way.
165
166				If a value isn't set in a given node, it will ask its parent.
167
168				Call C to remove an OOB value, or C to remove all OOB values on a node.
169
170				=cut
171
172				sub oob {
173				my ($self, $key, $value) = @_;
174
175				$self->{oob}->{$key} = $value if defined $value;
176				$value = $self->{oob}->{$key};
177				return $value if defined $value;
178				return undef unless defined $self->{parent};
179				return $self->{parent}->oob($key);
180				}
181				sub unoob {
182				my ($self, $key) = @_;
183				if (defined $key) {
184				undef $self->{oob}->{$key};
185				} else {
186				undef $self->{oob};
187				}
188				}
189
190				=head2 parent, ancestor, root
191
192				C returns the node's parent, if it has been attached to a parent, while C finds the ancestor with the tag you supply, or the root if you
193				don't give a tag. C is provided as a shorthand for ancestor().
194
195				=cut
196				sub parent { reftype($_[0]) eq 'HASH' ? $_[0]->{parent} : undef }
197				sub root { $_[0]->ancestor }
198				sub ancestor {
199				my ($self, $name) = @_;
200				my $p = $self->parent;
201				if (not defined $p) {
202				return $self if not defined $name;
203				return undef;
204				}
205				return $p if defined $name and $p->is($name);
206				return $p->ancestor($name);
207				}
208
209				=head2 delete
210
211				Deletes a child from a node. Pass the actual reference to the child - or if you're using non-referenced text, the text itself.
212				(In this case, duplicate text will all be deleted.)
213
214				=cut
215
216				sub delete {
217				my $self = shift;
218				my $child = shift;
219				my @children = $self->children;
220				my @new_list = grep {$_ != $child} $self->children;
221				$self->{children} = \@new_list;
222				}
223
224				=head2 detach
225
226				Detaches the node from its parent, if it is attached. This not only removes the parent reference, but also removes the child
227				from its parent's list of children.
228
229				=cut
230
231				sub detach {
232				my $self = shift;
233				return unless $self->{parent};
234				$self->{parent}->delete($self);
235				$self->{parent} = undef;
236				}
237
238
239				=head1 WORKING WITH ATTRIBUTES
240
241				Each tag in XML can have zero or more attributes, each of which has a value. Order is significant and preserved.
242
243				=head2 set, unset
244
245				The C method sets one or more attributes; its parameter list is considered to be key, value, key, value, etc.
246				The C method removes one or more attributes from the list.
247
248				=cut
249				sub set {
250				my $self = shift;
251
252				my $value;
253				while (@_) {
254				my $key = shift;
255				$value = shift;
256				return $self->get($key) unless defined $value;
257				push @{$self->{attrs}}, $key if !grep {$_ eq $key} @{$self->{attrs}};
258				$self->{attrval}->{$key} = $value;
259				}
260				return $value;
261				}
262				sub unset {
263				my ($self, $key) = @_;
264				return unless defined $key;
265				my @attributes = grep {$_ ne $key} @{$self->{attrs}};
266				$self->{attrs} = \@attributes;
267				$self->{attrval}->{$key} = undef;
268				}
269
270
271				=head2 get (attribute, default), attr_eq (attribute, value)
272
273				Obviously, C retrieves an attribute value - specify a default value to be used if the attribute is not found,
274				otherwise returns undef.
275
276				Since it's inconvenient to test attributes that can be undefined, there's a C method that checks that
277				the given attribute is defined I equal to the value given.
278
279				=cut
280
281				sub get {
282				my $self = shift;
283				my $key = shift;
284				my $value = $self->{attrval}->{$key};
285				return $value if defined $value;
286				return shift;
287				}
288
289				sub attr_eq {
290				my $self = shift;
291				my $key = shift;
292				my $value = $self->{attrval}->{$key};
293				return undef unless defined $value;
294				return 1 if $value eq shift;
295				}
296
297				=head2 attrs (attribute list)
298
299				The C method retrieves a list of the attributes set.
300
301				=cut
302
303				sub attrs { reftype($_[0]) eq 'HASH' ? @{$_[0]->{attrs}} : () }
304
305				=head2 getlist (attribute list)
306
307				The C method retrieves a list of attribute values given a list of attributes.
308				(It's just a map.)
309
310				=cut
311
312				sub getlist {
313				my $self = shift;
314				map { $self->get($_) } @_;
315				}
316
317				=head2 getctx (attribute, default)
318
319				The C method looks at an attribute in the given node, but if it's not found, looks in the parent instead.
320				If there is no parent, the default value is returned.
321
322				=cut
323
324				sub getctx {
325				my $self = shift;
326				my $key = shift;
327				my $default = shift;
328
329				my $value = $self->get($key);
330				return $value if defined $value;
331				return $default unless defined $self->{parent};
332				$self->{parent}->getctx($key, $default);
333				}
334
335				=head2 attr_order (attribute list)
336
337				Moves the named attributes to the front of the list; if any appear that aren't set, they stay unset.
338
339				=cut
340
341				sub attr_order {
342				my $self = shift;
343
344				my @list = @_;
345				foreach my $a (@{$self->{attrs}}) {
346				push @list, $a unless grep { $a eq $_ } @list;
347				}
348				$self->{attrs} = \@list;
349				}
350
351				=head1 WORKING WITH PLAIN TEXT CONTENT
352
353				Depending on your needs, XML::Snap can store plain text embedded in an XML structure as simple strings,
354				or as scalar references blessed to XML::Snap. Since text may therefore I be blessed, you need
355				to handle it with care unless you're sure it's all references (by parsing with C,
356				for instance).
357
358				=head2 istext
359
360				Returns a flag whether a given thing is text or not. "Text" means a scalar or a scalar reference;
361				anything else will not be considered text.
362
363				This is a class method or an instance method - note that if you're using it as an instance method
364				and you try to call it on a string, your call will die.
365
366				=cut
367
368				sub istext {
369				my $thing = shift;
370				my $text = shift \|\| $thing;
371				return 1 unless ref($text);
372				reftype ($text) eq 'SCALAR';
373				}
374
375				=head2 gettext
376
377				Returns the actual text of either a string (which is obviously just the string) or a scalar reference.
378				Again, can be called as an instance method if you're sure it's an instance.
379
380				=cut
381
382				sub gettext {
383				my $thing = shift;
384				my $text = shift \|\| $thing;
385				return $text unless ref $text;
386				return undef unless reftype ($text) eq 'SCALAR';
387				return $$text;
388				}
389
390				=head2 bless_text
391
392				Iterates through the node given, and converts all plain texts into referenced texts.
393
394				=cut
395
396				sub _bless_text {
397				my $thing = shift;
398				return $thing if ref($thing);
399				my $r = \$thing;
400				bless $r, 'XML::Snap';
401				return $r;
402				}
403				sub bless_text {
404				my $self = shift;
405				my @children = map {_bless_text($_)} @{$self->{children}};
406				$self->{children} = \@children;
407				foreach my $child ($self->elements) {
408				$child->bless_text;
409				}
410				}
411
412				=head2 unbless_text
413
414				Iterates through the node given, and converts all referenced texts into plain texts.
415
416				=cut
417
418				sub _unbless_text {
419				my $thing = shift;
420				return $thing if not ref $thing;
421				return $thing unless reftype($thing) eq 'SCALAR';
422				return $$thing;
423				}
424				sub unbless_text {
425				my $self = shift;
426				my @children = map {_unbless_text($_)} @{$self->{children}};
427				$self->{children} = \@children;
428				foreach my $child ($self->elements) {
429				$child->bless_text;
430				}
431				}
432
433				=head1 WORKING WITH XML STRUCTURE
434
435				=head2 add, add_pretty
436
437				The C method adds nodes and text as children to the current node. The C method is a convenience
438				method that ensures that there is a line break if a node is inserted directly at the beginning of its parent
439				(this makes building human-readable XML easier).
440
441				In addition to nodes and text, you can also add a coderef. This will have no effect on normal operations except
442				for appearing in the list of children for the node, but during writing operations (either for string output or
443				to streams) the coderef will be called to retrieve an iterator that delivers XML snippets. Those snippets will be
444				inserted into the output as though they appeared at the point in the structure where the coderef appears.
445				Extraction from the iterator stops when it returns undef.
446
447				The next time the writer is used, the original coderef will be called again to retrieve a new iterator.
448
449				The writer functions (string, stringcontent, write, etc.) can be called with optional parameters that will be passed
450				to each coderef in the structure, if any. This allows an XML::Snap structure to be used as a generic template,
451				for example for writing XML structures extracted from database queries.
452
453				When adding a node that is already a child of another node, the source node will be copied into the target, not just
454				added. (Otherwise confusion could ensue!)
455
456				Text is normally added as a simple string, but this can cause problems for consumers, as the output of an
457				iterator might then return a mixture of unblessed strings and blessed nodes, so you end up having to test for
458				blessedness when processing them. For ease of use, you can also add a I to a string; it will work
459				the same in terms of neighboring strings being coalesced, but they'll be stored as blessed string references.
460				Then, use istext or is_node to determine what each element is when iterating through structure.
461
462				=cut
463
464				sub add {
465				my $self = shift;
466				foreach my $child (@_) {
467				my $r = ref $child;
468				if (!$r) {
469				my $last = ${$self->{children}}[-1];
470				if (defined $last and istext($last)) {
471				if (ref $last eq '') {
472				${$self->{children}}[-1] = $last . $child;
473				} else {
474				$$last .= $child;
475				}
476				} else {
477				push @{$self->{children}}, $child;
478				}
479				} elsif ($r eq 'CODE') {
480				push @{$self->{children}}, $child;
481				} elsif ($r eq 'SCALAR') {
482				my $copy = $child;
483				bless $copy, ref $self;
484				my $last = ${$self->{children}}[-1];
485				if (defined $last and istext($last)) {
486				$$copy = gettext($last) . $$copy;
487				${$self->{children}}[-1] = $copy;
488				} else {
489				push @{$self->{children}}, $copy;
490				}
491				} elsif ($child->can('parent')) {
492				$child = $child->copy if defined $child->parent;
493				$child->{parent} = $self;
494				push @{$self->{children}}, $child;
495				}
496				}
497				}
498				sub add_pretty {
499				my $self = shift;
500				$self->add ("\n") if (!@{$self->{children}});
501				foreach my $child (@_) {
502				$self->add ($child, "\n");
503				}
504				}
505
506				=head2 prepend, prepend_pretty
507
508				These do the same as C and C except at the beginning of the child list.
509
510				=cut
511				sub prepend {
512				my $self = shift;
513				foreach my $child (reverse @_) {
514				my $r = ref $child;
515				if (!$r) {
516				my $first = ${$self->{children}}[0];
517				if (defined $first and istext($first)) {
518				if (ref $first eq '') {
519				${$self->{children}}[0] = $child . $first;
520				} else {
521				$$first = $child . $$first;
522				}
523				} else {
524				unshift @{$self->{children}}, $child;
525				}
526				} elsif ($r eq 'CODE') {
527				unshift @{$self->{children}}, $child;
528				} elsif ($r eq 'SCALAR') {
529				my $copy = $child;
530				bless $copy, ref $self;
531				my $first = ${$self->{children}}[0];
532				if (defined $first and istext($first)) {
533				$$copy = $$copy . gettext($first);
534				${$self->{children}}[0] = $copy;
535				} else {
536				unshift @{$self->{children}}, $copy;
537				}
538				} elsif ($child->can('parent')) {
539				$child = $child->copy if defined $child->parent;
540				$child->{parent} = $self;
541				unshift @{$self->{children}}, $child;
542				}
543				}
544				}
545				sub prepend_pretty {
546				my $self = shift;
547				$self->prepend ("\n") if (!@{$self->{children}});
548				foreach my $child (reverse @_) {
549				$self->prepend ("\n", $child);
550				}
551				}
552
553				=head2 replacecontent, replacecontent_from
554
555				The C method first deletes the node's children, then calls C to add its parameters.
556				Use C to use the I of the first parameter, with optional matches to effect
557				filtration as the rest of the parameters.
558
559				These are holdovers from my old xmlapi C library, where I was using in-memory XML structures as
560				"bags of data". Since Perl is basically built on bags of data to start with, I'm not sure these will
561				ever get used in a real situation (certainly I've never needed them yet in Perl).
562
563				=cut
564
565				sub replacecontent {
566				my $self = shift;
567				$self->{children} = [];
568				$self->add(@_);
569				}
570				sub replacecontent_from {
571				my $self = shift;
572				my $from = shift;
573				$self->{children} = [];
574				$self->copy_from ($from, @_);
575				}
576
577				=head2 replace
578
579				The C method is a little odd; it actually acts on the given node's I, by replacing the callee
580				with the passed parameters. In other words, the parent's children list is modified directly. If there's nothing
581				provided as a replacement, this simply deletes the callee from its parent's child list.
582
583				=cut
584
585				sub replace {
586				my $self = shift;
587				my $parent = $self->{parent};
588				return unless $parent;
589				my @children = @{$parent->{children}};
590				my $index = 0;
591				my $count = scalar @children;
592				$index++ until $children[$index] == $self or $index == $count;
593				return if $index == $count;
594				splice @children, $index, 1, @_;
595				$parent->{children} = \@children;
596				}
597
598
599				=head2 children, elements
600
601				The C method just returns the list of children added with C (or the other addition-type methods).
602				The C method returns only those children that are elements, omitting text, comments, and generators.
603
604				=cut
605
606				sub children { reftype($_[0]) eq 'HASH' ? @{$_[0]->{children}} : () }
607				sub elements { return () unless reftype($_[0]) eq 'HASH';
608				defined $_[1] ? grep { ref $_ && reftype($_) ne 'SCALAR' && $_->can('is') && $_->is($_[1]) } @{$_[0]->{children}}
609				: grep { ref $_ && reftype($_) ne 'SCALAR' && $_->can('parent') } @{$_[0]->{children}}
610				}
611
612				=head1 COPYING AND TRANSFORMATION
613
614				=head2 copy, copy_from, filter
615
616				The C method copies out a new node (recursively) that is independent, i.e. has no parent.
617				If you give it some matches of the form [name, key, value, coderef], then the coderef will be
618				called on the copy before it gets added, if the copy matches the match.
619				If a match is just a coderef, it'll apply to all text instead.
620
621				C is just an alias that's a little more self-documenting.
622
623				Note that the transformations specified will I fire for the root node you're copying,
624				just its children.
625
626				=cut
627
628				sub filter { my $self = shift; $self->copy(@_); }
629				sub copy {
630				my $self = shift;
631
632				my $ret = XML::Snap->new ($self->name);
633				foreach my $key ($self->attrs) {
634				$ret->set ($key, $self->get ($key));
635				}
636
637				$ret->copy_from ($self, @_);
638				return $ret;
639				}
640				sub copy_from {
641				my $self = shift;
642				my $other = shift;
643
644				foreach my $child ($other->children) {
645				if (ref $child eq 'CODE') {
646				$self->add ($child);
647				} elsif (not ref $child) {
648				my $child_copy = $child;
649				foreach (@_) {
650				if (ref $_ eq 'CODE') {
651				$child_copy = $_->($child_copy);
652				}
653				}
654				$self->add ($child_copy);
655				} elsif (reftype $child eq 'SCALAR') {
656				my $child_copy = $$child;
657				foreach (@_) {
658				if (ref $_ eq 'CODE') {
659				$child_copy = $_->($child_copy);
660				}
661				}
662				$self->add (\$child_copy);
663				} else {
664				my $child_copy = $child->copy(@_);
665				foreach (@_) {
666				if (ref $_ eq 'ARRAY') {
667				my @match = @$_;
668				if (not defined $match[0] or $child_copy->is($match[0])) {
669				if (not defined $match[1] or $child->copy->get($match[1]) eq $match[2]) {
670				$child_copy = $match[3]->($child_copy);
671				}
672				}
673				}
674				}
675				$self->add ($child_copy);
676				}
677				}
678
679				return $self;
680				}
681
682
683				=head1 STRING/FILE OUTPUT
684
685				The obvious thing to do with an XML structure once constructed is of course to write it to a file or extract a
686				string from it. XML::Snap gives you one powerful option, which is the use of embedded generators to act as a
687				live template.
688
689				=head2 string, rawstring
690
691				Extracts a string from the XML node passed in; C gives you an escaped string that can be parsed back
692				into an equivalent XML structure, while C does not escape anything, so you can't count on equivalence
693				or even legal XML. This is useful if your XML structure is being used to build strings, otherwise it's the wrong
694				tool to use.
695
696				=cut
697
698				sub _stringchild {
699				my $self = shift;
700				my $child = shift;
701
702				return $self->escape ($child) unless ref $child;
703				if (reftype ($child) eq 'SCALAR') {
704				return $self->escape ($$child);
705				}
706				if (ref $child eq 'CODE') {
707				my $generator = $child->($self);
708				my @genreturn;
709				my $ret = '';
710				do {
711				@genreturn = grep { defined $_ } ($generator->($self));
712				foreach my $return (@genreturn) {
713				$ret .= $self->_stringchild($return);
714				}
715				} while (@genreturn);
716				return $ret;
717				}
718				return $child->string;
719				}
720
721				sub string {
722				my $self = shift;
723				return $$self if reftype($self) eq 'SCALAR';
724				my $ret = '';
725
726				$ret .= "<" . $self->name;
727				foreach ($self->attrs) {
728				$ret .= " $_=\"" . $self->escape($self->get($_)) . "\"";
729				}
730
731				my @children = $self->children;
732				if (!@children) {
733				$ret .= "/>";
734				} else {
735				$ret .= ">";
736				foreach my $child (@children) {
737				$ret .= $self->_stringchild ($child);
738				}
739				$ret .= "name . ">";
740				}
741
742				return $ret;
743				}
744
745
746				sub _rawstringchild {
747				my $self = shift;
748				my $child = shift;
749
750				return $child unless ref $child;
751				if (reftype ($child) eq 'SCALAR') {
752				return $$child;
753				}
754				if (ref $child eq 'CODE') {
755				my $generator = $child->($self);
756				my @genreturn = ();
757				my $ret = '';
758				do {
759				@genreturn = grep { defined $_ } ($generator->($self));
760				foreach my $return (@genreturn) {
761				$ret .= $self->_rawstringchild($return);
762				}
763				} while (@genreturn);
764				return $ret;
765				}
766				return $child->string;
767				}
768				sub rawstring {
769				my $self = shift;
770				return $$self if reftype($self) eq 'SCALAR';
771
772				my $ret = '';
773
774				$ret .= "<" . $self->name;
775				foreach ($self->attrs) {
776				$ret .= " $_=\"" . $self->get($_) . "\"";
777				}
778
779				my @children = $self->children;
780				if (!@children) {
781				$ret .= "/>";
782				} else {
783				$ret .= ">";
784				foreach my $child (@children) {
785				$ret .= $self->_rawstringchild ($child);
786				}
787				$ret .= "name . ">";
788				}
789
790				return $ret;
791				}
792
793				=head2 content, rawcontent
794
795				These do the same, but don't include the parent tag or its closing tag in the string.
796
797				=cut
798
799				sub content {
800				my $self = shift;
801				return $$self if reftype($self) eq 'SCALAR';
802
803				my $ret = '';
804				foreach my $child ($self->children) {
805				$ret .= $self->_stringchild ($child);
806				}
807				return $ret;
808				} # Boy, that's simpler than in the xmlapi version...
809				sub rawcontent {
810				my $self = shift;
811				return $$self if reftype($self) eq 'SCALAR';
812
813				my $ret = '';
814				foreach my $child ($self->children) {
815				$ret .= $self->_rawstringchild ($child);
816				}
817				return $ret;
818				}
819
820
821				=head2 write
822
823				Given a filename, an optional prefix to write to the file, writes the XML
824				to a file.
825
826				=cut
827
828				sub write {
829				my ($self, $f, $prefix) = @_;
830
831				my $file;
832				open $file, ">:utf8", $f or croak $!;
833				print $file $prefix if defined $prefix;
834				$self->writestream($file);
835				close $file;
836				}
837
838
839				=head2 writestream
840
841				Writes the XML to an open stream.
842
843				=cut
844
845				sub _streamchild {
846				my $self = shift;
847				my $child = shift;
848				my $file = shift;
849
850				if (not ref $child) {
851				print $file $self->escape ($child);
852				return;
853				}
854				if (ref $child eq 'CODE') {
855				my $generator = $child->($self);
856				my @genreturn = ();
857				my $ret = '';
858				do {
859				@genreturn = grep { defined $_ } ($generator->($self));
860				foreach my $return (@genreturn) {
861				$self->_streamchild($return, $file);
862				}
863				} while (@genreturn);
864				return;
865				}
866				$child->writestream($file);
867				}
868
869				sub writestream {
870				my $self = shift;
871				my $file = shift;
872
873				if ($self->istext) {
874				print $file $self->escape ($self->gettext);
875				return;
876				}
877				print $file "<" . $self->name;
878				foreach ($self->attrs) {
879				print $file " $_=\"" . $self->escape($self->get($_)) . "\"";
880				}
881
882				my @children = $self->children;
883				if (!@children) {
884				print $file "/>";
885				} else {
886				print $file ">";
887				foreach my $child (@children) {
888				$self->_streamchild ($child, $file);
889				}
890				print $file "name . ">";
891				}
892				}
893
894
895				=head2 escape/unescape
896
897				These are convenience functions that escape a string for use in XML, or unescape the escaped string for non-XML use.
898
899				=cut
900
901				sub escape {
902				my ($whatever, $str) = @_;
903
904				$str =~ s/&/&/g;
905				$str =~ s/
906				$str =~ s/>/>/g;
907				$str =~ s/\"/"/g;
908				return $str;
909				}
910				sub unescape {
911				my ($whatever, $ret) = @_;
912
913				$ret =~ s/</
914				$ret =~ s/>/>/g;
915				$ret =~ s/"/"/g;
916				$ret =~ s/&/&/g;
917				return $ret;
918				}
919
920				=head1 BOOKMARKING AND SEARCHING
921
922				Finally, there are searching and bookmarking functions for finding and locating given XML in a tree.
923
924				=head2 getloc
925
926				Retrieves a location for a given node in its tree, effectively a bookmark. The rules are simple.
927				The bookmark consists of a set of dotted pairs, each being the name of the tag plus a disambiguator
928				if necessary. If the tag is the first of its sibs with its own tag, no disambiguator is necessary.
929				If the tag has an attribute named 'id' that doesn't have a dot or square brackets in it, then
930				square brackets surrounding that value are used as the disambiguator. Otherwise, a number in
931				parentheses identifies the sequence of the tag within the list of siblings with its own tag name.
932
933				So C matches C and C matches the second 'mytag' in its
934				parent's list of elements. C matches the fourth 'next' in C.
935
936				This is essentially a much simplified XMLpath (I may be wrong, but I think I came up with it
937				before XMLpaths had been defined). It's quick and dirty, but works.
938
939				=cut
940
941				sub getloc {
942				my $self = shift;
943				my $parent = $self->parent;
944				return '' unless $parent;
945				my $ploc = $self->parent->getloc;
946				$ploc .= '.' if $ploc;
947
948				my $name = $self->name;
949				my $id = $self->get('id');
950				if (defined $id and not $id =~ /[\.\[\]]/) {
951				my $t = $name . "[$id]";
952				my $try = $parent->loc($t);
953				return $ploc . $t if $try == $self;
954				}
955				my $try = $parent->first($name);
956				return $ploc . $name if $try == $self;
957				my $count = 0;
958				foreach my $try ($parent->elements($name)) {
959				return $ploc . "$name($count)" if $try == $self;
960				$count++;
961				}
962				# We shouldn't ever get here; returns undef but we might consider croaking.
963				}
964
965
966				=head2 loc
967
968				Given such a bookmark and the tree it pertains to, finds the bookmarked node.
969
970				=cut
971
972				sub loc {
973				my $self = shift;
974				my $l = shift;
975				return $self unless $l;
976				if ($l =~ /\./) {
977				@_ = (split (/\s\.\s/, $l), @_);
978				$l = shift;
979				}
980				my $target;
981				if ($l =~ /\s(.)\s\[(.)\]\s*/) {
982				my ($tag, $id) = ($1, $2);
983				foreach my $child ($self->elements($tag)) {
984				if ($child->attr_eq ('id', $id)) {
985				$target = $child;
986				last;
987				}
988				}
989				} elsif ($l =~ /(.)$(\d)$\s*/) {
990				my ($tag, $count) = ($1, $2);
991				foreach my $child ($self->elements($tag)) {
992				$target = $child unless $count;
993				$count--;
994				}
995				} else {
996				my @children = $self->elements($l);
997				$target = $children[0] if @children;
998				}
999				return undef unless defined $target;
1000				return $target unless @_;
1001				$target->loc(@_);
1002				}
1003
1004				=head2 all
1005
1006				Returns a list of XML snippets that meet the search criteria.
1007
1008				=cut
1009
1010				sub _test_item {
1011				my ($self, $name, $attr, $val) = @_;
1012				return 0 unless not defined $name or $self->is($name);
1013				return 1 unless defined $attr;
1014				return $self->attr_eq ($attr, $val);
1015				}
1016
1017				sub all {
1018				my ($self, $name, $attr, $val) = @_;
1019				my @retlist = ();
1020				foreach my $child ($self->elements) {
1021				push @retlist, $child if $child->_test_item($name, $attr, $val);
1022				push @retlist, $child->all ($name, $attr, $val);
1023				}
1024				return @retlist;
1025				}
1026
1027				=head1 WALKING THE TREE
1028
1029				XML is a tree structure, and what do we do with trees? We walk them!
1030
1031				A walker is an iterator that visits each node in turn, then its children, one by one. Walkers come in two flavors:
1032				full walk or element walk; the element walk ignores text.
1033
1034				The walker constructor optionally takes a closure that will be called on each node before it's returned; the return
1035				from that closure will be what's returned. If it returns undef, the walk will skip that node and go on with the
1036				walk in the same order that it otherwise would have; if it returns a list of C<(value, 'prune')> then the walk will
1037				not visit that node's children, and "value" will be taken as the return value (and it can obviously be undef as well).
1038
1039				=cut
1040
1041				=head2 walk
1042
1043				C is the complete walk. It returns an iterator. Pass it a closure to be called on each node as it's visited.
1044				Modifying the tree's structure is entirely fine as long as you're just manipulating the children of the current node;
1045				if you do other things, the walker might get confused.
1046
1047				=cut
1048
1049				sub walk {
1050				my $xml = shift;
1051				my @coord = ('-');
1052				my @stack = ($xml);
1053				my $process = shift;
1054
1055				return sub {
1056				my $retval;
1057				my $action;
1058				AGAIN:
1059				return undef unless @stack;
1060				if ($coord[-1] eq '-') {
1061				($retval, $action) = $process ? $process->($stack[-1]) : $stack[-1];
1062				$coord[-1] = 0;
1063				if (defined $action and $action eq 'prune') {
1064				$coord[-1] = '*';
1065				}
1066				} else {
1067				my @c = ref $stack[-1] ? $stack[-1]->children : ();
1068				if ($coord[-1] eq '*' or $coord[-1] >= @c) {
1069				pop @coord;
1070				pop @stack;
1071				return undef unless @stack;
1072				$coord[-1]++;
1073				goto AGAIN;
1074				}
1075				push @stack, $c[$coord[-1]];
1076				push @coord, '-';
1077				goto AGAIN;
1078				}
1079				goto AGAIN unless defined $retval;
1080				$retval;
1081				}
1082				}
1083
1084				=head2 walk_elem
1085
1086				For the sake of convenience, C does the same thing, except it only visits nodes, not text.
1087
1088				=cut
1089
1090				sub walk_elem {
1091				my $xml = shift;
1092				my @coord = ('-');
1093				my @stack = ($xml);
1094				my $process = shift;
1095
1096				return sub {
1097				my $retval;
1098				my $action;
1099				AGAIN:
1100				return undef unless @stack;
1101				if ($coord[-1] eq '-') {
1102				($retval, $action) = $process ? $process->($stack[-1]) : $stack[-1];
1103				$coord[-1] = 0;
1104				if (defined $action and $action eq 'prune') {
1105				$coord[-1] = '*';
1106				}
1107				} else {
1108				my @c = ref $stack[-1] ? $stack[-1]->elements : ();
1109				if ($coord[-1] eq '*' or $coord[-1] >= @c) {
1110				pop @coord;
1111				pop @stack;
1112				return undef unless @stack;
1113				$coord[-1]++;
1114				goto AGAIN;
1115				}
1116				push @stack, $c[$coord[-1]];
1117				push @coord, '-';
1118				goto AGAIN;
1119				}
1120				goto AGAIN unless defined $retval;
1121				$retval;
1122				}
1123				}
1124
1125				=head2 walk_all
1126
1127				A simplified walk that simply returns matching nodes.
1128
1129				my $w = $self->{body}->walk(sub {
1130				my $node = shift;
1131				return ($node, 'prune') if $node->is('trans-unit'); # Segments are returned whole.
1132				return undef; # We don't want the details for anything else, but still walk into its children if it has any.
1133				});
1134
1135
1136
1137				=head2 first
1138
1139				Returns the first XML element (i.e. non-node thing) that meets the search criteria.
1140
1141				=cut
1142
1143				sub first {
1144				my ($self, $name, $attr, $val) = @_;
1145				foreach my $child ($self->children) {
1146				next unless ref($child) and reftype($child) ne 'SCALAR';
1147				next if ref($child) eq 'CODE';
1148				return $child if $child->_test_item($name, $attr, $val);
1149				my $ret = $child->first ($name, $attr, $val);
1150				return $ret if defined $ret;
1151				}
1152				return;
1153				}
1154
1155
1156				=head1 AUTHOR
1157
1158				Michael Roberts, C<< >>
1159
1160				=head1 BUGS
1161
1162				Please report any bugs or feature requests to C, or through
1163				the web interface at L. I will be notified, and then you'll
1164				automatically be notified of progress on your bug as I make changes.
1165
1166
1167
1168
1169				=head1 SUPPORT
1170
1171				You can find documentation for this module with the perldoc command.
1172
1173				perldoc XML::Snap
1174
1175
1176				You can also look for information at:
1177
1178				=over 4
1179
1180				=item * RT: CPAN's request tracker (report bugs here)
1181
1182				L
1183
1184				=item * AnnoCPAN: Annotated CPAN documentation
1185
1186				L
1187
1188				=item * CPAN Ratings
1189
1190				L
1191
1192				=item * Search CPAN
1193
1194				L
1195
1196				=back
1197
1198
1199				=head1 ACKNOWLEDGEMENTS
1200
1201
1202				=head1 LICENSE AND COPYRIGHT
1203
1204				Copyright 2013 Michael Roberts.
1205
1206				This program is free software; you can redistribute it and/or modify it
1207				under the terms of the the Artistic License (2.0). You may obtain a
1208				copy of the full license at:
1209
1210				L
1211
1212				Any use, modification, and distribution of the Standard or Modified
1213				Versions is governed by this Artistic License. By using, modifying or
1214				distributing the Package, you accept this license. Do not use, modify,
1215				or distribute the Package, if you do not accept this license.
1216
1217				If your Modified Version has been derived from a Modified Version made
1218				by someone other than you, you are nevertheless required to ensure that
1219				your Modified Version complies with the requirements of this license.
1220
1221				This license does not grant you the right to use any trademark, service
1222				mark, tradename, or logo of the Copyright Holder.
1223
1224				This license includes the non-exclusive, worldwide, free-of-charge
1225				patent license to make, have made, use, offer to sell, sell, import and
1226				otherwise transfer the Package with respect to any patent claims
1227				licensable by the Copyright Holder that are necessarily infringed by the
1228				Package. If you institute patent litigation (including a cross-claim or
1229				counterclaim) against any party alleging that the Package constitutes
1230				direct or contributory patent infringement, then this Artistic License
1231				to you shall terminate on the date that such litigation is filed.
1232
1233				Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER
1234				AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
1235				THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
1236				PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY
1237				YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR
1238				CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR
1239				CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE,
1240				EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1241
1242
1243				=cut
1244
1245				1; # End of XML::Snap