File Coverage

blib/lib/WWW/Meta/XML/Browser.pm
Criterion Covered Total %
statement 25 27 92.5
branch n/a
condition n/a
subroutine 9 9 100.0
pod n/a
total 34 36 94.4


line stmt bran cond sub pod time code
1             package WWW::Meta::XML::Browser;
2              
3 1     1   43779 use strict;
  1         2  
  1         39  
4 1     1   4 use warnings;
  1         2  
  1         627  
5              
6             =head1 NAME
7              
8             WWW::Meta::XML::Browser - Perl module to simulate a browser session described in a XML file
9              
10             =head1 SYNOPSIS
11              
12             use WWW::Meta::XML::Browser;
13              
14             my $session = WWW::Meta::XML::Browser->new();
15             $session->process_file('file.xml');
16             $session->process_all_request_nodes();
17             $session->print_all_request_results();
18              
19             =head1 ABSTRACT
20              
21             This module reads a XML file from a given source and makes the HTTP-requests defined in this XML file.
22             The result of such a request can be filtered using a XSL stylesheet.
23             The following requests can be build using results from the transformation.
24              
25             =head1 DESCRIPTION
26              
27             =head2 WRITING A SESSION DESCRIPTION FILE
28              
29             The most important part when working with C is to write a session description file. Such a file describes which http requests are made and how the results of the requests are handled.
30              
31             The session description file is a simple XML file. The root element is Ewww-meta-xml-browserE and the DTD can be found at L, which leads us to the following construct:
32              
33            
34            
35            
36            
37            
38              
39             The optional meta-element can be specified as a child of the root element. The element acts as a container for different information regarding the handling of the request elements.
40              
41             =head3 META-PERL INFORMATION
42              
43             The perl element is a child of the meta element and can contain perl related information. The perl element can have one of the child elements described below.
44              
45             =head4 ELEMENT: callback; ATTRIBUTES: name
46              
47             The callback element is used to define an anonymous subroutine which can later be used as a callback. The name under which the callback can be accessed is specified by the required name attribut. The form of the callback (parameters, return value) depends on the later usage, an example for a (not very useful :-)) result-callback is the following:
48              
49            
50             sub {
51             my ($result) = @_;
52              
53             return $result;
54             }
55             ]]>
56              
57              
58             =head3 REQUEST DEFINITIONS
59              
60             A session description file must contain at least one request.
61              
62             =head4 DEFINING A REQUEST WITHOUT CONTENT
63              
64             Under the root element we will add some elements for the requests we want to make. A very complete request will look like the following:
65              
66            
67            
68              
69             The only attribute of the request-element that is required is url, all other attributes can be left out.
70              
71             If method is left out the default method get will be used.
72              
73             If stylesheet is left out, the raw html will be transformed to a valid XML document which will than be stored as the result of that request.
74              
75             The result-callback gives the user the possibility to change the raw html before it will be transformed to a XML document by calling the specified callback. This callback can be an element of the callbacks hash specified when the instance is created or a callback specified in the XML file (L). If a callback is specified in the callbacks hash as well as in the XML file the callback from the hash will be used. A result callback is called with the raw html as the only parameter and is required to return a valid html document.
76              
77             =head4 DEFINING A REQUEST WITH CONTENT
78              
79             The request-element has an optional child element, which can be used to specify the content of a request. The element is called content and is used as a child of the request element as follows (remember that & has to be written as & in XML):
80              
81            
82             q=42&ie=ISO-8859-1&hl=de&meta=
83            
84              
85             This example shows that the content will be sent using the specified method (get in this case) to the url of the request (http://www.google.de/search).
86              
87             =head4 EMBEDDED REQUESTS
88              
89             Embedded request can be used to fetch pages from a result page. They can be created in the XSL stylesheet to dynamically parse a tree of pages.
90              
91             As soon as a www-meta-xml-browser-request-element is created in the XSL stylesheet it is processed like a normal request-element and the result is inserted.
92              
93             If the result consists of multiple pages the container-attribute has to be specified and is used as the new root for the merged (optionally transformed) pages.
94              
95             =head3 REPLACEMENT EXPRESSIONS IN A SESSION DESCRIPTION FILE
96              
97             There are some cases in which static urls and a static content don't fit the requirements of what has to be done.
98              
99             For this case WWW::Meta::XML::Browser has an easy way to use arguments passed to the instance during creation or values from a previous result.
100              
101             To access arguments passed to the instance during creation the following simple syntax is used:
102              
103             #{args:key}
104              
105             The word key has to be replaced with the key of the hash containing the arguments. This will lead to the replacement of C<#{args:key}> with the appropriate value from the hash.
106              
107             Accessing previous results basically goes the same way, some example show, that it even offers more possibilities:
108              
109             #{0:0:/foo}
110             #{4:1-3:/foo/too}
111             #{1::/foo/@argument}
112             #escape{0:0:/foo}
113             #escape{4:1-3:/foo/too}
114             #escape{1::/foo/@argument}
115              
116             The first three example and the last three examples have only one difference, which is the word escape. This command simply tells the module to url-escape the value that is returned by that later part of the expression.
117              
118             Let's look at these expressions in detail:
119              
120             The first part (the number before the first colon) specifies the index (starting with 0) of the request which we want to access. This index can be mapped directly to the session description file.
121              
122             The second part (between the first and the second colon) specifies the subrequest results (more about subrequests later) that will be looked at. 0 in the first example specifies the first subrequest. 1-3 in the second example specifies the subrequests 2,3 and 4 (remember, we begin indexing with 0). The third example accesses all subrequests.
123              
124             The last part (after the second colon) specifies an XPath-Expression, which is looked up in each of the subrequest results and a list of all values which match the Expression is generated.
125              
126             This list is taken and each value of the list will replace the whole replacement expression, and for each replacement one http request is made.
127              
128             Naturally if the url or the content contains more than one replacement expression all possible combinations are requested (which actually is the product of the different numbers of matching XPath-Expressions).
129              
130             These different http requests make up the subrequests which are stored and can be accessed, when needed. Please not that subrequests can be merged into a singele subrequest result using L.
131              
132             =head2 CREATING A NEW BROWSER OBJECT
133              
134             To create a new browser object the L-method is called, with an optional hash containing options.
135              
136             $browser = WWW::Meta::XML::Browser->new(%options);
137              
138             The following options are possible:
139              
140             args => \%args
141              
142             C<\%args> is the pointer to a hash which values can be accessed from the session description file by their keys. The syntax to access the hash values from the session file is C<#{args:key}>, where key is a key from the hash.
143              
144             debug => 1
145              
146             When the debug option is set, the module produces a lot of debug output about execution times.
147              
148             debug_callback => \&debug
149              
150             C<\&debug> has to be a pointer to a subroutine taking two parameters. The first parameter is a number >= 0 which describes the logging level. The second parameter is the string which is the message to be printed.
151             Please note that there is a default routine L<_debug()>.
152              
153             result_doc_callback => \&result
154              
155             C<\&result> has to be a pointer to a subroutine taking one parameter. The parameter is an instance of C and can be manipulated. The subroutine must return an instance of C.
156             Please note that there is a default routine L<_result()>.
157              
158             callbacks => \%callbacks
159              
160             C<\%callbacks> is a pointer to a hash of references to subroutines. These subroutines can be used in various situations during the processing of the XML file.
161              
162             =head2 PROCESSING A SESSION DESCRIPTION FILE
163              
164             To read the session description file one of the following methods is called, depending on the source of the file.
165              
166             $browser->process_file($file);
167             -or-
168             $browser->process_url($url);
169             -or-
170             $browser->process_string($string);
171             -or-
172             $browser->process_xml_doc($doc);
173              
174             The names of the methods should be self-explaining:
175              
176             L is called when the session description file is on a local disk an read by the script directly (this should be the most common case).
177              
178             L is called when the session description file is accessed by an http request.
179              
180             L is called when the session description data is available in a scalar variable.
181              
182             L is called when the XML document has already been parsed (as done by the three methods above and we have a instance of XML::LibXML::Document.
183              
184             =head2 PROCESSING THE REQUESTS FROM THE SESSION DESCRIPTION FILE
185              
186             After the session description file has been processed as shown above, the request nodes contained in the XML document can be processed.
187              
188             $browser->process_all_request_nodes();
189             -or-
190             while (my $r_node = $browser->get_next_request_node()) {
191             $subrequest_result = $browser->process_request_node($r_node);
192             }
193              
194             L encapsulates the second construction with the while loop.
195             Both constructions execute all http requests generated from the session description file and store the results of the (optionally transformed) requests.
196              
197             =head2 ACCESSING THE RESULTS
198              
199             The result of a spceific request can be accessed with a simple call which returns an instance of C.
200              
201             $result = $browser->get_request_result($request_index, $subrequest_index);
202              
203             To access the results one has to understand how results are stored. The results are stored in a two-dimensional array.
204              
205             The first index (which starts with 0 for the first request) describes the request which can be found in the session description file.
206              
207             The second index describes the real index after all permutations caused by possible replacements in the url or content have been generated.
208              
209             For example C<$browser-Eget_request_result(0, 2)> returns the result of the third request generated from the first request node in the session description file.
210              
211             =head1 EXPORT
212              
213             None by default.
214              
215             =cut
216              
217             require Exporter;
218              
219             our @ISA = qw(Exporter);
220              
221             # Items to export into callers namespace by default. Note: do not export
222             # names by default without a very good reason. Use EXPORT_OK instead.
223             # Do not simply export all your public functions/methods/constants.
224              
225             # This allows declaration use WWW::Meta::XML::Browser ':all';
226             # If you do not need this, moving things directly into @EXPORT or @EXPORT_OK
227             # will save memory.
228             our %EXPORT_TAGS = ( 'all' => [ qw(
229            
230             ) ] );
231              
232             our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
233              
234             our @EXPORT = qw(
235            
236             );
237              
238             our $VERSION = '0.08';
239              
240 1     1   7 use Digest::MD5;
  1         7  
  1         72  
241 1     1   1034 use HTTP::Cookies;
  1         16356  
  1         33  
242 1     1   796 use HTTP::Request;
  1         47417  
  1         34  
243 1     1   40157 use LWP::UserAgent;
  1         44906  
  1         49  
244 1     1   2894 use Time::HiRes;
  1         4282  
  1         9  
245 1     1   291 use URI::Escape;
  1         2  
  1         139  
246 1     1   1065 use XML::LibXML;
  0            
  0            
247             use XML::LibXSLT;
248              
249             my $ROOT_XPATH = '/www-meta-xml-browser';
250              
251             my $META_XPATH = $ROOT_XPATH.'/meta';
252             my $PERL_META_XPATH = $META_XPATH.'/perl';
253             my $CALLBACK_XPATH = $PERL_META_XPATH.'/callback';
254              
255             my $REQUEST_XPATH = $ROOT_XPATH.'/request';
256             my $AUTHORIZATION_XPATH = './authorization';
257             my $CONTENT_XPATH = './content';
258              
259             my $XPATH_REGEXP = '\#(escape)*\{(\d+?):(.*?):(.+?)\}';
260             my $ARGS_REGEXP = '\#(escape)*\{args:(.*?)\}';
261              
262             my $URL_ATTRIBUTE = 'url';
263             my $METHOD_ATTRIBUTE = 'method';
264             my $RESULT_CALLBACK_ATTRIBUTE = 'result-callback';
265             my $STYLESHEET_ATTRIBUTE = 'stylesheet';
266              
267             my $CALLBACK_NAME_ATTRIBUTE = 'name';
268              
269             my $EMBEDDED_REQUEST_CONTAINER_ATTRIBUTE = 'container';
270              
271             my $XML_VERSION = '1.0';
272              
273             my $USER_AGENT = "WWW::Meta::XML::Browser ".$VERSION;
274             my $TIMEOUT = 30;
275              
276             =head1 METHODS
277              
278             The following methods are available:
279              
280             =over 4
281              
282             =cut
283              
284              
285              
286             =item $browser = WWW::Meta::XML::Browser->new(%options);
287              
288             This class method contructs a new C object and returns a reference to it.
289              
290             The hash C<%options> can be used to control the behaviour of the module and to provide some data to it as well. At the moment the following Key/Value pairs are supported:
291              
292             KEY: VALUE: DESCRIPTION:
293             --------------- ----------- -------------
294             args \%args a pointer to a hash of arguments which can be used in
295             requests
296             debug 0/1 a boolean true or boolean false value can be passed to
297             the module to control weather debugging information are
298             printed or not
299             debug_callback \&debug a pointer to a debug-callback
300             result_doc_callback \&result a pointer to a result-doc-callback
301             callbacks \%callbacks a pointer to a hash of subroutines which can be used as
302             callbacks in different situations
303              
304             =cut
305              
306             sub new {
307             my $type = shift;
308             my (%cnf) = @_;
309            
310             my $this = {};
311            
312             bless $this, $type;
313              
314             $this->{debug_callback} = \&_debug;
315             $this->{result_doc_callback} = \&_result;
316             $this->{callbacks} = {};
317            
318             $this->{args} = $cnf{'args'} if $cnf{'args'};
319             $this->{debug} = 1 if $cnf{'debug'};
320             $this->{debug_callback} = $cnf{'debug_callback'} if $cnf{'debug_callback'};
321             $this->{result_doc_callback} = $cnf{'result_doc_callback'} if $cnf{'result_doc_callback'};
322             $this->{callbacks} = $cnf{'callbacks'} if $cnf{'callbacks'};
323              
324             $this->{request_nodes} = ();
325              
326             $this->{request_results} = ();
327              
328             $this->{ua} = LWP::UserAgent->new(cookie_jar => HTTP::Cookies->new(), requests_redirectable => ['GET', 'POST', 'HEAD']);
329             $this->{ua}->agent($USER_AGENT);
330             $this->{ua}->timeout($TIMEOUT);
331             &{$this->{debug_callback}}(0, "LWP::UserAgent created") if $this->{debug};
332              
333             $this->{xml_parser} = XML::LibXML->new();
334             $this->{xml_parser}->validation(1);
335             $this->{xml_parser}->load_ext_dtd(1);
336             &{$this->{debug_callback}}(0, "XML::LibXML-Parser created") if $this->{debug};
337            
338             $this->{xml_doc} = undef;
339            
340             return $this;
341             }
342              
343              
344              
345             =item process_url($url);
346              
347             Reads the XML file containing session description from the specified url and constructs a XML document from it which is then passed to L.
348              
349             =cut
350              
351             sub process_url {
352             my $this = shift;
353             my ($url) = @_;
354              
355             &{$this->{debug_callback}}(0, "process_url() called") if $this->{debug};
356             my $source = LWP::UserAgent::get($url);
357             &{$this->{debug_callback}}(1, "LWP::UserAgent::get($url) succeeded") if $this->{debug};
358              
359             my $parser = XML::LibXML->new();
360             $parser->recover(1);
361             my $doc = $parser->parse_html_string($source);
362            
363             &{$this->{debug_callback}}(1, "parse_html_string() succeeded") if $this->{debug};
364              
365             $this->process_xml_doc($doc);
366             }
367              
368              
369              
370             =item process_file($file);
371              
372             Reads the XML file containing session description and constructs a XML document from it which is then passed to L.
373              
374             =cut
375              
376             sub process_file {
377             my $this = shift;
378             my ($file) = @_;
379              
380             &{$this->{debug_callback}}(0, "process_file() called") if $this->{debug};
381             my $doc = $this->{xml_parser}->parse_file($file);
382             &{$this->{debug_callback}}(1, "parse_file($file) succeeded") if $this->{debug};
383            
384             $this->process_xml_doc($doc);
385             }
386              
387              
388              
389             =item process_string($string);
390              
391             Constructs a XML document from the given string which is then passed to L.
392              
393             =cut
394              
395             sub process_string {
396             my $this = shift;
397             my ($string) = @_;
398            
399             &{$this->{debug_callback}}(0, "process_string() called") if $this->{debug};
400             my $doc = $this->{xml_parser}->parse_string($string);
401              
402             $this->process_xml_doc($doc);
403             }
404              
405              
406              
407             =item process_xml_doc($doc);
408              
409             Takes the given XML ocument and reads the request-nodes in the XML file. These request nodes are stored internally to be processed.
410              
411             =cut
412              
413             sub process_xml_doc {
414             my $this = shift;
415             my ($doc) = @_;
416              
417             &{$this->{debug_callback}}(0, "xml_doc stored") if $this->{debug};
418             $this->{xml_doc} = $doc;
419              
420             &{$this->{debug_callback}}(0, "process_xml_doc() called") if $this->{debug};
421             my $r_nodeset = $doc->findnodes($REQUEST_XPATH);
422              
423             foreach my $r_node ($r_nodeset->get_nodelist()) {
424             push(@{$this->{request_nodes}}, $r_node);
425             }
426              
427             &{$this->{debug_callback}}(1, ($#{$this->{request_nodes}} + 1)." request nodes read") if $this->{debug};
428             }
429              
430              
431              
432             =item $node = get_next_request_node();
433              
434             Returns the next request-node which than can be processed using L
435              
436             =cut
437              
438             sub get_next_request_node {
439             my $this = shift;
440              
441             return shift(@{$this->{request_nodes}});
442             }
443              
444              
445              
446             =item process_all_request_nodes();
447              
448             Iterates over all request nodes and processes each of them.
449              
450             =cut
451              
452             sub process_all_request_nodes {
453             my $this = shift;
454              
455             while (my $r_node = $this->get_next_request_node()) {
456             push(@{$this->{request_results}}, $this->process_request_node($r_node));
457             }
458             }
459              
460              
461              
462             =item $subrequest_result = process_request_node($r_node);
463              
464             Processes the request node. This subroutine does the actual work:
465             It generates all permutations of the url
466             It genarates all permutations of the content
467             It generates all permutations ot the url and the content
468             It makes the requests and processes the results
469             it returns the (optionally transformed) results
470              
471             =cut
472              
473             sub process_request_node {
474             my $this = shift;
475             my ($r_node) = @_;
476              
477             &{$this->{debug_callback}}(0, "process_request_node() called") if $this->{debug};
478             &{$this->{debug_callback}}(1, "processing url: ".$r_node->getAttribute($URL_ATTRIBUTE)) if $this->{debug};
479              
480             my @processed_url = ();
481             $this->parse_string($r_node->getAttribute($URL_ATTRIBUTE), \@processed_url);
482              
483             if ($this->{debug}) {
484             foreach my $url (@processed_url) {
485             &{$this->{debug_callback}}(2, "expanded url: ".$url) if $this->{debug};
486             }
487             }
488              
489            
490             # process the content specified for the request
491             my $c_nodeset = $r_node->findnodes($CONTENT_XPATH);
492              
493             &{$this->{debug_callback}}(1, "processing content") if $this->{debug};
494              
495             my @processed_content = $this->process_content_nodeset($c_nodeset);
496              
497             if ($this->{debug}) {
498             foreach my $content (@processed_content) {
499             &{$this->{debug_callback}}(2, "expanded content: ".$content) if $this->{debug};
500             }
501             }
502              
503              
504             my @subrequest_result = ();
505              
506             foreach my $url (@processed_url) {
507              
508             foreach my $content (@processed_content) {
509             my ($res, $doc);
510            
511             $res = $this->make_request($url, $r_node->getAttribute($METHOD_ATTRIBUTE), $content);
512              
513             my $result_callback = $r_node->getAttribute($RESULT_CALLBACK_ATTRIBUTE);
514              
515             if ($res && $result_callback) {
516             my ($result, $callback);
517            
518             &{$this->{debug_callback}}(1, "result callback called: ".$result_callback."(\$res->content())") if $this->{debug};
519            
520             if ($callback = $this->_read_callback($result_callback)) {
521              
522             my $t0 = [Time::HiRes::gettimeofday()] if $this->{debug};
523              
524             $result = &{$callback}($res->content());
525            
526             &{$this->{debug_callback}}(3, "time to process callback \"".$result_callback."\": ".Time::HiRes::tv_interval($t0)) if $this->{debug};
527              
528             $doc = $this->process_result($result, $r_node->getAttribute($STYLESHEET_ATTRIBUTE));
529             }
530             else {
531             $doc = $this->process_result_doc($res, $r_node->getAttribute($STYLESHEET_ATTRIBUTE));
532             }
533             }
534             elsif ($res) {
535             $doc = $this->process_result_doc($res, $r_node->getAttribute($STYLESHEET_ATTRIBUTE));
536             }
537            
538             if ($doc) {
539             push(@subrequest_result, $doc);
540             }
541             }
542              
543             }
544              
545             return \@subrequest_result;
546             }
547              
548              
549              
550             =item @processed_content = process_content_nodeset($c_nodeset);
551              
552             Processes a content nodeset and generates all possible permutations by replacing the tokens.
553              
554             =cut
555              
556             sub process_content_nodeset {
557             my $this = shift;
558              
559             my ($c_nodeset) = @_;
560              
561             my @content;
562            
563             foreach my $c_node ($c_nodeset->get_nodelist()) {
564            
565             my $content = $c_node->string_value();
566            
567             # strip all whitespaces
568             $content =~ s/\s*//gs;
569            
570             # strip leading '&'s
571             $content =~ s/^&*//gs;
572            
573             my $ctx = Digest::MD5->new();
574             $ctx->add($content);
575             my $digest = $ctx->hexdigest();
576            
577             $content =~ s/&/$digest/gis;
578            
579             my @raw_content = split(/&/, $content);
580              
581             foreach my $pair (@raw_content) {
582             $pair =~ s/$digest/&/gis;
583            
584             my ($name, $value);
585            
586             if ($pair =~ /(.+?)=(.*)/) {
587             ($name, $value) = ($1, $2);
588            
589             if (($value !~ /$XPATH_REGEXP/) && ($value !~ /$ARGS_REGEXP/)) {
590             $value = uri_escape($value);
591             }
592            
593             push(@content, $name.'='.$value);
594             }
595             else {
596             if (($pair !~ /$XPATH_REGEXP/) && ($pair !~ /$ARGS_REGEXP/)) {
597             $value = uri_escape($pair);
598             }
599             else {
600             $value = $pair;
601             }
602            
603             push(@content, $value);
604             }
605             }
606             }
607            
608             my $content = join('&', @content);
609            
610             my @processed_content = ();
611             $this->parse_string($content, \@processed_content);
612              
613             return @processed_content;
614             }
615              
616              
617              
618             =item make_request($url, $method, $content);
619              
620             Makes a request to C<$url> sending the C<$content> using C<$method> and returns the result. If a username and a password have bee specified within the url, they will be used for HTTP-Basic authentication if necessary.
621              
622             =cut
623              
624             sub make_request {
625             my $this = shift;
626             my ($url, $method, $content) = @_;
627              
628             my $username = undef;
629             my $password = undef;
630              
631             if ($url =~ /^(http:\/\/)(.+?):(.+?)\@(.+)$/) {
632             my $username = $2;
633             my $password = $3;
634             my $url = $1.$4;
635             }
636              
637             &{$this->{debug_callback}}(1, "make_request() called") if $this->{debug};
638             &{$this->{debug_callback}}(2, "url: ".$url) if $this->{debug};
639             &{$this->{debug_callback}}(2, "content: ".$content) if $this->{debug};
640             &{$this->{debug_callback}}(2, "method: ".$method) if $this->{debug};
641              
642             if (defined($username) && defined($password)) {
643             &{$this->{debug_callback}}(2, "authorization: ".$username." ".$password) if $this->{debug};
644             }
645              
646             my $t0 = [Time::HiRes::gettimeofday()] if $this->{debug};
647            
648             my $req;
649              
650             if ($method =~ /get/i) {
651             $req = HTTP::Request->new('GET' => $url.'?'.$content);
652             }
653             elsif ($method =~ /post/i) {
654             $req = HTTP::Request->new('POST' => $url);
655             $req->content_type('application/x-www-form-urlencoded');
656             $req->content($content);
657             }
658              
659             if (defined($username) && defined($password)) {
660             $req->authorization($username => $password)
661             }
662            
663             my $res = $this->{ua}->request($req);
664            
665             &{$this->{debug_callback}}(2, "time: ".Time::HiRes::tv_interval($t0)) if $this->{debug};
666              
667             if ($res->is_success()) {
668             return $res;
669             }
670             elsif ($res->is_redirect()) {
671             warn "Redirect (".$res->code().") to \"".$res->headers->header('Location')."\"\n";
672             return 0;
673             }
674             else {
675             warn "Error (".$res->code().") while processing request result from ".$method."-request to ".$url." with content ".$content."\n";
676             warn $res->content()."\n";
677             return 0;
678             }
679             }
680              
681              
682              
683             =item $doc = process_result_doc($res, $stylesheet);
684              
685             Processes the result (C<$res>) as returned by L by transforming it into a XML document.
686             Internally L is called with C<$res>->content() and C<$stylesheet>.
687              
688             =cut
689              
690             sub process_result_doc {
691             my $this = shift;
692             my ($res, $stylesheet) = @_;
693              
694             return $this->process_result($res->content(), $stylesheet);
695             }
696              
697              
698              
699             =item $doc = process_result($result, $stylesheet);
700              
701             Processes the result-string (C<$result>) by transforming it into a XML document.
702             If a XSL-Stylesheet (C<$stylesheet>) has been specified for the given request the XML document will be transformed using that stylesheet.
703             The resulting XML document is then returned.
704              
705             =cut
706              
707             sub process_result {
708             my $this = shift;
709             my ($result, $stylesheet) = @_;
710              
711             &{$this->{debug_callback}}(1, "process_result() called") if $this->{debug};
712            
713             # the result doc is undef by default and will not change if the request was not successfull
714             my $doc = undef;
715              
716             my $t0 = [Time::HiRes::gettimeofday()] if $this->{debug};
717              
718             # create a parser for the result
719             my $parser = XML::LibXML->new();
720             $parser->recover(1);
721              
722             # parse the html and generate the result doc
723             $doc = $parser->parse_html_string($result);
724              
725             &{$this->{debug_callback}}(2, "time to parse html: ".Time::HiRes::tv_interval($t0)) if $this->{debug};
726              
727             # if a stylesheet has been specified use it to transform the result doc
728             if ($stylesheet) {
729             my $t0 = [Time::HiRes::gettimeofday()] if $this->{debug};
730              
731             my $style_doc = $parser->parse_file($stylesheet);
732              
733             my $xslt = XML::LibXSLT->new();
734             my $stylesheet = $xslt->parse_stylesheet($style_doc);
735              
736             # overwrite the old result doc with the new result doc
737             $doc = $stylesheet->transform($doc);
738              
739             &{$this->{debug_callback}}(2, "time to transform result: ".Time::HiRes::tv_interval($t0)) if $this->{debug};
740             }
741              
742              
743              
744             # processing embedded requests after having applied the stylesheet if it has been specified
745             my $doc_string = $doc->toString();
746              
747              
748             my $contains_embedded_request = 0;
749             if ($doc_string =~ /()/gis) {
750             $doc_string =~ s/()/$this->process_embedded_request($parser->parse_string($1)->getDocumentElement())/egis;
751             $contains_embedded_request = 1;
752             }
753             if ($doc_string =~ /(.+?<\/www-meta-xml-browser-request>)/gis) {
754             $doc_string =~ s/(.+?<\/www-meta-xml-browser-request>)/$this->process_embedded_request($parser->parse_string($1)->getDocumentElement())/egis;
755             $contains_embedded_request = 1;
756             }
757              
758             if ($contains_embedded_request) {
759             $doc = $parser->parse_string($doc_string);
760             }
761              
762             return &{$this->{result_doc_callback}}($doc);
763             }
764              
765              
766              
767             =item $xml_string = process_embedded_request($embedded_request_node);
768              
769             Processes an embedded request node, by processing it as a normal node (using L).
770             If the embedded request node returns only one XML document it is transformed to a string and returned.
771             If the embedded request node returns more than one XML documents they are merged unded the name specified by the C<$EMBEDDED_REQUEST_CONTAINER_ATTRIBUTE>-attribute of the embedded requst node.
772              
773             =cut
774              
775             sub process_embedded_request {
776             my $this = shift;
777             my ($er_node) = @_;
778            
779             my $subrequest_result = $this->process_request_node($er_node);
780              
781             if (scalar(@{$subrequest_result}) > 1) {
782             my $doc = $this->merge_xml_array($subrequest_result, $er_node->getAttribute($EMBEDDED_REQUEST_CONTAINER_ATTRIBUTE));
783             return $doc->documentElement()->toString();
784             }
785             else {
786             return ${$subrequest_result}[0]->documentElement()->toString();
787             }
788             }
789              
790              
791              
792             =item $result = get_request_result($request_index, $subrequest_index);
793              
794             Returns the request-result specified by C<$request_index> and C<$subrequest_index>.
795              
796             =cut
797              
798             sub get_request_result {
799             my $this = shift;
800             my ($request_index, $subrequest_index) = @_;
801              
802             return ${$this->{request_results}}[$request_index][$subrequest_index];
803             }
804              
805              
806              
807             =item print_all_request_results();
808              
809             Iterates over all the request results and prints them.
810              
811             =cut
812              
813             sub print_all_request_results {
814             my $this = shift;
815              
816             my @requests = @{$this->{request_results}};
817             my $r = 0;
818              
819             foreach my $request (@requests) {
820             my @subrequests = @{$request};
821              
822             print "-------------------- REQUEST (".($r++ + 1)."/".($#requests + 1).") --------------------\n";
823              
824             my $s = 0;
825            
826             foreach (@subrequests) {
827             print "-------------------- SUBREQUEST (".($s++ + 1)."/".($#subrequests + 1).")--------------------\n";
828             $this->print_request_result($_);
829             }
830             }
831             }
832              
833              
834              
835             =item print_request_result($result);
836              
837             Prints the specified request result.
838              
839             =cut
840              
841             sub print_request_result {
842             my $this = shift;
843             my ($doc) = @_;
844            
845             print $doc->toString();
846             }
847              
848              
849              
850             =item merge_subrequests($request_index, $wrapper_name);
851              
852             Merges the subrequest of the request (specified by C<$request_index>) in a new XML document which consists of a new root element (C<$wrapper_name>) and all the subrequests as children of this root element.
853              
854             =cut
855              
856             sub merge_subrequests {
857             my $this = shift;
858             my ($request_index, $wrapper_name) = @_;
859              
860             my $doc = $this->merge_xml_array($this->{request_results}->[$request_index], $wrapper_name);
861            
862             my @doc = ($doc);
863             $this->{request_results}->[$request_index] = \@doc;
864             }
865              
866              
867              
868             =item merge_xml_array($array, $wrapper_name)
869              
870             Merges the XML documents in C<@{$array}> by building a new XML document with a new root element (C<$wrapper_name>) and the XML documents in C<@{$array}> as children of the root element.
871              
872             =cut
873              
874             sub merge_xml_array {
875             my $this = shift;
876             my ($array, $wrapper_name) = @_;
877              
878             my $root = XML::LibXML::Element->new($wrapper_name);
879              
880             foreach my $xml (@{$array}) {
881             $root->appendChild($xml->documentElement());
882             }
883              
884             my $doc = XML::LibXML->createDocument($XML_VERSION);
885             $doc->setDocumentElement($root);
886              
887             return $doc;
888             }
889              
890              
891              
892             =item parse_string($s, $r);
893              
894             Recursively parses the string passed as C<$s> and writes the replacement results to C<@{$r}>, which will be an array containing all possible permutations, created by the replacement of the specified tokens.
895              
896             =cut
897              
898             sub parse_string {
899             my $this = shift;
900             my ($s, $r) = @_;
901              
902             if ($s =~ /(.*?)$XPATH_REGEXP(.*)/) {
903             my $pre = $1;
904             my $escape = $2;
905             my $request_index = $3;
906             my $subrequest_index = $4;
907             my $xpath = $5;
908             my $post = $6;
909              
910             my @subrequests = @{$this->{request_results}->[$request_index]};
911            
912             my @xml_docs = ();
913              
914             if ($subrequest_index =~ /^(\d*)-(\d*)$/) {
915             my $start = $1;
916             my $end = $2;
917            
918             if (!$start) {
919             $start = 0;
920             }
921             if (!$end) {
922             $end = $#subrequests;
923             }
924            
925             for (my $i = $start; $i <= $end; $i++) {
926             push(@xml_docs, $subrequests[$i]);
927             }
928             }
929             elsif ($subrequest_index =~ /^(\d+)$/) {
930             my $index = $1;
931            
932             push(@xml_docs, $subrequests[$index]);
933             }
934             else {
935             my $start = 0;
936             my $end = $#subrequests;
937              
938             for (my $i = $start; $i <= $end; $i++) {
939             push(@xml_docs, $subrequests[$i]);
940             }
941             }
942              
943             foreach my $xml_doc (@xml_docs) {
944             my $nodeset = $xml_doc->findnodes($xpath);
945            
946             my $i = 0;
947            
948             my @return = ();
949            
950             foreach my $node ($nodeset->get_nodelist()) {
951             my $value = $node->string_value();
952            
953             if ($escape) {
954             $value =~ s/[\s]*(.*?)[\s]*/uri_escape($1)/egs;
955             }
956             else {
957             $value =~ s/[\s]*(.*?)[\s]*/$1/egs;
958             }
959            
960             $this->parse_string($pre.$value.$post, $r, $escape);
961             }
962             }
963             }
964             elsif ($s =~ /(.*?)$ARGS_REGEXP(.*)/) {
965             my $pre = $1;
966             my $escape = $2;
967             my $arg = $3;
968             my $post = $4;
969              
970             if ($escape) {
971             $this->parse_string($pre.uri_escape($this->{args}->{$arg}).$post, $r, $escape);
972             }
973             else {
974             $this->parse_string($pre.$this->{args}->{$arg}.$post, $r, $escape);
975             }
976             }
977             else {
978             push(@{$r}, $s);
979             }
980             }
981              
982              
983              
984             =item $callback = _read_callback($result_callback);
985              
986             Reads the callback from the callbacks hash or from the XML file and returns a reference to it. If the callback can not be found 'undef' is returned.
987              
988             =cut
989              
990             sub _read_callback {
991             my $this = shift;
992             my ($result_callback) = @_;
993            
994             if (ref($this->{callbacks}->{$result_callback}) eq 'CODE') {
995             &{$this->{debug_callback}}(2, "read result callback \"".$result_callback."\" from callback hash") if $this->{debug};
996             return $this->{callbacks}->{$result_callback};
997             }
998             else {
999             my $perl = $this->{xml_doc}->findvalue($CALLBACK_XPATH."[\@".$CALLBACK_NAME_ATTRIBUTE." = '".$result_callback."']");
1000             eval('$this->{callbacks}->{$result_callback} = '.$perl.';');
1001              
1002             if (ref($this->{callbacks}->{$result_callback}) eq 'CODE') {
1003             &{$this->{debug_callback}}(2, "installed result callback \"".$result_callback."\" from XML file in callback hash") if $this->{debug};
1004             return $this->_read_callback($result_callback);
1005             }
1006             else {
1007             &{$this->{debug_callback}}(2, "callback \"".$result_callback."\" was not found") if $this->{debug};
1008             return undef;
1009             }
1010             }
1011             }
1012              
1013              
1014              
1015             =item _debug($l, $msg);
1016              
1017             Default debug-callback. Prints C<$msg> as a debugging message to STDERR. C<$l> gives information about the logging level.
1018              
1019             =cut
1020              
1021             sub _debug {
1022             my ($l, $msg) = @_;
1023              
1024             print STDERR " " x $l;
1025             print STDERR "DEBUG: ".$msg."\n";
1026             }
1027              
1028              
1029              
1030             =item $doc = _result($doc);
1031              
1032             Default result-doc-callback. Just returns C<$doc> as it was passed to the subroutine.
1033              
1034             =cut
1035              
1036             sub _result {
1037             my ($doc) = @_;
1038              
1039             return $doc;
1040             }
1041              
1042              
1043              
1044             # Preloaded methods go here.
1045              
1046             1;
1047             __END__