File Coverage

blib/lib/PBib/PBib.pm
Criterion Covered Total %
statement 141 190 74.2
branch 27 64 42.1
condition 3 11 27.2
subroutine 26 32 81.2
pod 11 15 73.3
total 208 312 66.6


line stmt bran cond sub pod time code
1             # --*-Perl-*--
2             # $Id: PBib.pm 24 2005-07-19 11:56:01Z tandler $
3             #
4            
5             =head1 NAME
6            
7             PBib::PBib - Something like BibTeX, but written in perl and designed to be extensible in three dimensions: bibliographic databases (e.g. BibTeX, OpenOffice), document file formats (e.g. Word, RTF, OpenOffice), styles (e.g. ACM, IEEE)
8            
9             =head1 SYNOPSIS
10            
11             use PBib::PBib;
12             use Biblio::Biblio;
13             my $bib = new Biblio::Biblio();
14             my $pbib = new PBib::PBib('refs' => $bib->queryPapers());
15             $pbib->convertFile($file);
16            
17             =head1 DESCRIPTION
18            
19             I wrote PBib to have something like BibTex for MS Word that can use a various sources for bibliographic references, not just BibTex files, but also database systems. Especially, I wanted to use the StarOffice bibliographic database.
20            
21             Now, PBib can be extended in a couple of dimensions:
22            
23             =over
24            
25             =item - bibliographic styles
26            
27             such as ACM style or IEEE style.
28            
29             =item - document format
30            
31             such as Plain text, (La)TeX, Word, RTF, OpenOffice
32            
33             =item - bibliographic database format
34            
35             such as bibtex, refer, tib, but also database systems with different mappings to database fields.
36            
37             =back
38            
39             =head1 QUICK START
40            
41             =head2 SETUP BIBLIOGRAPHY DATABASE
42            
43             Once you've installed the distribution you have to set up a bibliography database in order to start using PBib and PBibTk.
44            
45             Several formats are supported:
46            
47             =over
48            
49             =item - Perl:DBI databases
50            
51             You can configure the database schema to use, see F, F and some for DBMSs, see
52             F, F.
53             You can C the files in your F file if you are
54             using one of these systems.
55            
56             =item - bibtex files
57            
58             =item - several other file types that are supported by the bp package.
59            
60             =back
61            
62             I'd recommend to use a mysql database, this works fine for me.
63             See the config/sample user.pbib file for some examples.
64            
65             You should specify your default settings in a user.pbib file, which is searched for at a couple of places, e.g. you home directory. (Check that the HOME environment variable on windows is set.) In case you want to provide defaults for your organization, use the local.pbib file.
66            
67             You can adapt the mapping of PBib fields to DB fields, see file config/OOo-table.pbib for an example if you want to use a OpenOffice.org bibliography database.
68            
69             No support is given to edit the bibliography database, as there are lots of tools around. Check docs/Edit_Bibliography.sxw for a OpenOffice.org document to edit a bibliography database. (That's the form that I use.) Ensure that it's attached to the correct database (Tools>>Data Sources, Edit>>Exchange Database).
70            
71            
72             =head2 CREATE INPUT DOCUMENTS
73            
74             =over
75            
76             =item Cite references
77            
78             In your documents, use [[Cite-Key]] (Double brackets) to place references in the document. These will be replaced by PBib to a reference according to the selected style, e.g. (Tandler, 2004).
79            
80             The CiteKey is the key defined in the bibliography database.
81            
82             =item Generate the list of references used
83            
84             Use [{}] as the place holder for the list of references.
85            
86             =back
87            
88             See L for a more detailed description.
89             You can find sample files in the test folder F.
90            
91             =head2 Supported document formats
92            
93             =over
94            
95             =item - MS Word .doc, .rtf
96            
97             .doc will be converted to .rtf before processing (requires MS Word to be installed)
98            
99             =item - Plain Text
100            
101             TeX input is currently handled as plain text, there is no specific style for TeX yet.
102            
103             =item - OpenOffice .sxw
104            
105             OpenOffice Text (.sxw) uses actually a zipped XML document. (You need the L and L modules to use this.)
106            
107             =back
108            
109             Not yet supported:
110            
111             =over
112            
113             =item LaTeX and TeX
114            
115             Should generate s.th. similar to BibTeX. But wait, if you write with TeX, you can I BibTeX!
116            
117             For now, this is treated as plain text.
118            
119             =item HTML
120            
121             For now, this is treated as plain text.
122            
123             At minimum, the correct character encoding should be ensured and
124             some formatting for the References section.
125            
126             =item XML
127            
128             There is support for XML, but of course the generic XML support is very limited. Maybe, support DocBook, or provide an easy way to specify the tags to be used.
129            
130             =back
131            
132             =head2 RUN PBIB
133            
134             Provided scripts as front ends for the modules:
135            
136            
137             bin/pbib.pl <>
138            
139             Process an input document and write the converted output to a new file
140             called IC<-pbib.>I.
141            
142            
143             bin/PBibTk.pl [<>]
144            
145             Open a Tk GUI that allows you to browse you bibliography database and browse the items referenced in your document.
146            
147            
148             =head1 SUCCESS STORIES ;-)
149            
150             I've used PBib/PBibTk to format citations and generate the bibliography for my thesis and several other papers;
151             in fact, I wrote it as I couldn't find another tool that matched my requirements.
152             To get an idea of the scope that PBib can handle: My thesis references about 360 papers, there are >900 entries in the database, the thesis converted to a RTF file is about 50MB. Maybe, you want to have a look at
153             L or
154             L.
155            
156             The bibliographic database I used is available in BibTeX format at L (with lots of HCI, CSCW, UbiComp references).
157            
158            
159             =head1 CONFIGURATION
160            
161             You can configure PBib in a number of ways, e.g. using config files and
162             environment variables. For detailed information, please refer to
163             module L.
164            
165             You can use a filename.pbib config file to specify specific configuration for a file.
166            
167             =head2 Environment Variables
168            
169             =over
170            
171             =item PBIBDIR
172            
173             The directory where the PBib scripts are located, e.g. /usr/local/bin.
174            
175             =item PBIBPATH
176            
177             Path to look for config files (and also styles), separated by ';'.
178            
179             =item PBIBSTYLES
180            
181             Path to look for PBib styles, separated by ';'.
182            
183             =item PBIBCONFIG
184            
185             Path to look for PBib config, separated by ';'.
186            
187             =item HOME
188            
189             If set, PBib looks for the user's personal config at
190            
191             =over
192            
193             =item $HOME/.pbib/styles
194            
195             =item $HOME/.pbib/conf
196            
197             =item $HOME
198            
199             =back
200            
201             =item APPDATA
202            
203             If set, PBib looks for the user's personal config at
204            
205             =over
206            
207             =item $APPDATA/PBib/styles
208            
209             =item $APPDATA/PBib/conf
210            
211             =back
212            
213             $APPDATA points on Windows XP to something like "C:\Documents and Settings\<>\Application Data".
214            
215             =back
216            
217             =head2 Config Files
218            
219             I, look at L and
220             the exsamples provided with this distribution.
221            
222            
223             =head1 DEPENDENCIES
224            
225             PBib itself consists of three packages that can be used independently:
226            
227             =over
228            
229             =item Biblio
230            
231             Provides an interface to bibliographic databases. The main class is L.
232            
233             L uses L and L that encapsulate the "bp" package mentioned above.
234            
235             =item PBib
236            
237             Main functionality to process documents that contain references.
238            
239             PBib uses the format for references returned by Biblio, so it's well designed to be used together. But, PBib can be used with any hash of references that contains the same keys.
240            
241             The main class is L. The main script is L.
242            
243             =item PBibTk
244            
245             PBibTk provides a GUI for PBib. It uses PBib and Biblio.
246            
247             The main class is L. It is started with the script L.
248            
249             =back
250            
251             I've thought about deploying these as separate packages, but currently I believe that this way it's easier to install and use.
252            
253             This module requires these other modules and libraries:
254            
255             =over
256            
257             =item bp
258            
259             The Perl Bibliography Package "bp", by Dana Jacobsen (dana@acm.org) is used. An adapted version of it (with some bug fixes and
260             enhancements) is included in this distribution.
261            
262             In fact, bp is really helpful to generate the hashes with literature references from various sources.
263             Please check http://www.ecst.csuchico.edu/~jacobsd/bib/bp/ and the bp README located in F.
264            
265             =item Config::General
266            
267             by Thomas Linden
268            
269             =item Archive::Zip and XML::Parser
270            
271             for OpenOffice support.
272            
273             =back
274            
275            
276             =cut
277            
278             package PBib::PBib;
279 1     1   2541 use 5.006;
  1         5  
  1         43  
280 1     1   6 use strict;
  1         2  
  1         38  
281 1     1   5 use warnings;
  1         3  
  1         37  
282             #use English;
283            
284 1     1   1602 use Time::HiRes qw(gettimeofday tv_interval);
  1         2222  
  1         9  
285            
286            
287            
288             BEGIN {
289 1     1   255 use vars qw($Revision $VERSION);
  1         3  
  1         128  
290             # SVN for generating version numbers is somehow strange ...
291             # maybe there's a better way?
292 1 50   1   4 my $major = 2; q$Revision: 24 $ =~ /: (\d+)/; my $minor = $1 - 10; $VERSION = "$major." . ($minor<10 ? '0' : '') . $minor;
  1         6  
  1         5  
  1         35  
293             }
294            
295             # superclass
296             #use base qw(YYYY);
297            
298             # used modules
299             #use FileHandle;
300             #use File::Basename;
301 1     1   8 use Data::Dumper;
  1         3  
  1         77  
302            
303             # used own modules
304 1     1   706 use Biblio::BP;
  1         3  
  1         7  
305            
306 1     1   10 use PBib::Config;
  1         12  
  1         27  
307            
308 1     1   4610 use PBib::Document;
  1         4  
  1         38  
309            
310 1     1   2424 use PBib::ReferenceConverter;
  1         4  
  1         41  
311 1     1   795 use PBib::ReferenceStyle;
  1         4  
  1         58  
312 1     1   1689 use PBib::BibliographyStyle;
  1         5  
  1         56  
313 1     1   2288 use PBib::BibItemStyle;
  1         5  
  1         62  
314 1     1   1816 use PBib::LabelStyle;
  1         3  
  1         3154  
315            
316             # register extra reference converters
317             # the reference converters can extend the document classes
318             # to specify a different converter.
319             ##### use PBib::ReferenceConverter::MSWord; # to be able to convert word documents
320             ##### PBib::ReferenceConverter::MSWord is not yet working ...
321            
322            
323            
324             # binmode(STDOUT, ":locale");
325             # binmode(STDERR, ":locale");
326            
327            
328             =head1 METHODS
329            
330             These methods are exported.
331            
332             =over
333            
334             =cut
335            
336            
337             #
338             #
339             # constructor
340             #
341             #
342            
343             =item $conf = new PBib::PBib(I)
344            
345             Supported Options:
346            
347             =over
348            
349             =item refs
350            
351            
352             =item config
353            
354            
355             =item inDoc
356            
357            
358             =item outDoc
359            
360            
361             =back
362            
363             =cut
364            
365             sub new {
366 1     1 1 744 my $self = shift;
367 1   33     9 my $class = ref($self) || $self;
368 1         12 my %args = @_;
369             # foreach my $arg qw/XXX/ {
370             # print STDERR "argument $arg missing in call to new $class\n"
371             # unless exists $args{$arg};
372             # }
373 1         3 $self = \%args;
374 1         5 return bless $self, $class;
375             }
376            
377             #
378             #
379             # access methods
380             #
381             #
382            
383 2   50 2 1 10 sub refs { return shift->{'refs'} || {}; }
384 2     2 1 9 sub inDoc { return shift->{'inDoc'}; }
385 2     2 1 8 sub outDoc { return shift->{'outDoc'}; }
386             sub config {
387 26     26 1 457 my ($self) = @_;
388 26         60 my $config = $self->{'config'};
389 26 50       72 unless( $config ) {
390 0         0 $config = new PBib::Config();
391 0         0 $self->{'config'} = $config;
392             }
393 26         224 return $config;
394             }
395 14     14 0 30 sub beVerbose { my $self = shift; return $self->config()->beVerbose(); }
  14         34  
396 8     8 0 15 sub beQuiet { my $self = shift; return $self->config()->beQuiet(); }
  8         21  
397 0     0 0 0 sub options { my $self = shift; return $self->config()->options(@_); }
  0         0  
398 0     0 0 0 sub option { my ($self, $opt) = @_; return $self->options()->{$opt}; }
  0         0  
399            
400             #
401             #
402             # processing of documents
403             #
404             #
405            
406             =item $conv = $pbib->processFile($infile, $outfile, $config, $refs)
407            
408             Calls convertFile() & optionally opens result in editor.
409            
410             =cut
411            
412             sub processFile {
413 0     0 1 0 my ($self, $infile, $outfile, $config, $refs) = @_;
414 0 0       0 $config = $self->config() unless defined $config;
415 0         0 my $conv = $self->convertFile($infile, $outfile, $config, $refs, @_);
416 0 0       0 return unless $conv;
417 0         0 my $outDoc = $conv->outDoc();
418 0 0 0     0 if( $outDoc && $config->option('pbib.showresult') ) {
419 0         0 $outDoc->openInEditor();
420             }
421 0         0 return $conv;
422             }
423            
424            
425             #
426             #
427             # converting
428             #
429             #
430            
431             =item $conv = $pbib->convertFile($infile, $outfile, $config, $refs)
432            
433             If $infile (filename) is undef, inDoc (document) is used.
434            
435             If $outfile (filename) is undef, outDoc (document) is used.
436            
437             If $config or $refs is undef, the default values are used (the ones passed to the constructor).
438            
439             The converter $conv is passed to the caller.
440            
441             =cut
442            
443             sub convertFile {
444 2     2 1 6 my ($self, $infile, $outfile, $config, $refs) = @_;
445 2 50       13 $config = $self->config() unless defined $config;
446 2 50       16 $refs = $self->refs() unless defined $refs;
447            
448 2         21 my $start_time = [gettimeofday()];
449            
450             # create documents
451            
452 2         32 my $inDoc = $self->inDoc();
453 2         10 my $outDoc = $self->outDoc();
454            
455 2 50       8 if( defined $infile ) {
456 2 50       45 $inDoc = new PBib::Document(
457             'filename' => $infile,
458             'mode' => '<',
459             'verbose' => $self->beVerbose(),
460             'quiet' => $self->beQuiet(),
461 2         10 %{$config->{doc} || {}},
462             );
463 2 50       11 if( ! defined $outfile ) {
464 2 50       15 if( $infile =~ /\.(\w+)$/ ) {
465 2         7 $outfile = $infile;
466 2         20 $outfile =~ s/\.(\w+)$/-pbib\.$1/;
467             } else {
468 0         0 $outfile = "$infile-pbib";
469             }
470             }
471             }
472            
473 2 50       8 if( defined $outfile ) {
474 2 50       22 $outDoc = new PBib::Document(
475             'filename' => $outfile,
476             'mode' => '>',
477             'verbose' => $self->beVerbose(),
478             'quiet' => $self->beQuiet(),
479 2         9 %{$config->{doc} || {}},
480             );
481             }
482            
483 2 50       9 print STDERR "convert ", $inDoc->filename(), "\nwrite ", $outDoc->filename(), "\n" unless $self->beQuiet();
484            
485             # read config
486 2         12 my $options = $config->options('file' => $inDoc->filename());
487            
488             # create converter and styles
489            
490             # print STDERR Dumper $options;
491 2 50       6 my $rs = new PBib::ReferenceStyle(%{$options->{'ref'}||{}}, 'verbose' => $self->beVerbose());
  2         16  
492 2 50       7 my $bs = new PBib::BibliographyStyle(%{$options->{'bib'}||{}}, 'verbose' => $self->beVerbose());
  2         26  
493 2 50       5 my $is = new PBib::BibItemStyle(%{$options->{'item'}||{}}, 'verbose' => $self->beVerbose());
  2         16  
494 2 50       8 my $ls = new PBib::LabelStyle(%{$options->{'label'}||{}}, 'verbose' => $self->beVerbose());
  2         19  
495 2         18 my $conv = new PBib::ReferenceConverter(
496             'inDoc' => $inDoc,
497             'outDoc' => $outDoc,
498             'refStyle' => $rs,
499             'labelStyle' => $ls,
500             'bibStyle' => $bs,
501             'itemStyle' => $is,
502             'refOptions' => $options->{'ref'},
503             'bibOptions' => $options->{'bib'},
504             'itemOptions' => $options->{'item'},
505             'labelOptions' => $options->{'label'},
506             'verbose' => $self->beVerbose(),
507             'quiet' => $self->beQuiet(),
508             );
509            
510 2         14 $conv->convert($refs);
511 2         17 $inDoc->close();
512 2         9 $outDoc->close();
513            
514             # remember values
515 2         6 $self->{'inDoc'} = $inDoc;
516 2         6 $self->{'outDoc'} = $outDoc;
517 2         6 $self->{'refs'} = $refs;
518            
519 2         14 my $duration = tv_interval($start_time);
520 2         48 logStatistics("$outfile.log", $conv, $options, $duration);
521 2         21 return $conv;
522             }
523            
524             =item logStatistics($logfile, $conv, $options, $duration)
525            
526             Write log file.
527            
528             =cut
529            
530             sub logStatistics {
531 2     2 1 5 my ($logfile, $conv, $options, $duration) = @_;
532            
533 2         146 open LOG, ">:utf8", $logfile;
534 2         13 print LOG "pbib conversion statistics\n\n";
535            
536 2 50       11 if( ! defined $conv->inDoc() ) {
537 0         0 print LOG "There was an error opening the input document.\n";
538 0         0 close LOG;
539 0         0 return;
540             }
541 2         10 print LOG "read ", $conv->inDoc()->filename(), "\n";
542 2         10 print LOG "write ", $conv->outDoc()->filename(), "\n\n";
543            
544 2         10 my $messages = $conv->messages();
545 2 50 33     18 if( $messages && @$messages ) {
546 2         15 print LOG "\n\nMessages (", scalar(@$messages), " items)\n====\n\n";
547 2         6 foreach my $item (@$messages) {
548 30         53 print LOG "$item\n";
549             }
550             }
551            
552 2         11 my $todo = $conv->toDoItems();
553 2 100       8 if( @$todo ) {
554 1         7 print LOG "\n\nToDo (", scalar(@$todo), " items)\n====\n\n";
555 1 50       6 print STDERR "\n\nToDo (", scalar(@$todo), " items)\n====\n\n" unless $options->{'quiet'};
556 1         4 foreach my $item (@$todo) {
557 2         11 my $text = "par $item->{'par'}: $item->{'text'}\n";
558 2         3 print LOG $text;
559 2 50       9 print STDERR $text unless $options->{'quiet'};
560             }
561             }
562            
563 2         10 my $unknownIDs = $conv->unknownIDs();
564 2 50       8 if( @$unknownIDs ) {
565 2         11 print LOG "\nCAUTION: ", scalar(@$unknownIDs), " unknown references found:\n",
566             "===========================================\n\n";
567 2 50       8 print STDERR "\nCAUTION: ", scalar(@$unknownIDs), " unknown references found:\n",
568             "===========================================\n\n" unless $options->{'quiet'};
569 2         5 foreach my $r (@$unknownIDs) {
570 4         9 print LOG "$r\n";
571 4 50       19 print STDERR PBib::ReferenceConverter::utf8_to_ascii("$r\n") unless $options->{'quiet'};
572             }
573             }
574            
575 2         8 my $foundInfo = $conv->foundInfo();
576 2         11 print LOG "\n", scalar(keys(%$foundInfo)), " references found:\n",
577             "===========================================\n\n";
578 2         16 foreach my $r (keys(%$foundInfo)) { print LOG "$r ($foundInfo->{$r})\n"; }
  116         246  
579            
580 2         17 my $knownIDs = $conv->knownIDs();
581 2         12 print LOG "\n", scalar(@$knownIDs), " references known:\n",
582             "===========================================\n\n";
583 2         6 foreach my $r (@$knownIDs) { print LOG "$r\n"; }
  112         157  
584            
585 2 50       10 if( $options ) {
586 2         6 print LOG "\n\nOptions:\n";
587 1 50   1   563 if( eval("use YAML; 1") ) {
  0     1   0  
  0         0  
  1         6286  
  0         0  
  0         0  
  2         229  
588             # use YAML if available
589 0         0 print LOG Store($options);
590             } else {
591 2         18 print LOG Dumper($options);
592             }
593             }
594            
595             # $duration
596 2 50       1308 print STDERR "\ndone (", sprintf('%.2f', $duration), " seconds)\n" unless $options->{'quiet'};
597 2         35 print LOG "\ndone (", sprintf('%.2f', $duration), " seconds)\n";
598            
599 2         77 close LOG;
600             }
601            
602            
603             #
604             #
605             # scanning
606             #
607             #
608            
609             =item $pbib->scanFile($infile, $config)
610            
611             Returns the foundInfo for the $infile.
612            
613             =cut
614            
615             sub scanFile {
616 0     0 1   my ($self, $infile, $config) = @_;
617 0 0         my $inDoc = new PBib::Document(
618             'filename' => $infile,
619             'mode' => '<',
620             'verbose' => $self->beVerbose(),
621             'quiet' => $self->beQuiet(),
622 0           %{$config->{doc} || {}},
623             );
624 0           my $conv = new PBib::ReferenceConverter(
625             'inDoc' => $inDoc,
626             'verbose' => $self->beVerbose(),
627             'quiet' => $self->beQuiet(),
628             );
629 0           my $foundInfo = $conv->foundInfo();
630 0           $inDoc->close();
631 0           return $foundInfo;
632             }
633            
634             =item \%foundIDs = $pbib->filterReferencesForFiles(@files)
635            
636             Filter the known references to the ones used in @files, a hash reference is returned.
637             CrossRefs are also included (filterReferences() is used).
638            
639             =cut
640            
641             sub filterReferencesForFiles ($@) {
642 0     0 1   my ($self, @files) = @_;
643 0           my %foundIDs;
644            
645 0           while( my $file = shift(@files) ) {
646 0           my $foundInfo = $self->scanFile($file);
647 0           foreach my $id (keys(%$foundInfo)) {
648 0           $foundIDs{$id} = 1;
649             }
650             }
651 0           return $self->filterReferences(\%foundIDs);
652             }
653            
654             =item $pbib->filterReferences($filter_refs)
655            
656             Scan the passed refs for the known ones, return a new hash reference with all known references (including CrossRefs).
657            
658             =cut
659            
660             sub filterReferences ($$) {
661 0     0 1   my ($self, $filter_refs) = @_;
662 0           my $all_refs = $self->refs();
663 0           my @filterIDs = keys(%$filter_refs);
664 0           my %known_refs;
665             my $id;
666            
667 0           while ($id = shift(@filterIDs)) {
668 0           my $ref = $all_refs->{$id};
669 0 0         if( ! defined($ref) ) {
670 0           print STDERR "Unkown reference '$id'\n";
671             } else {
672 0           $known_refs{$id} = $ref;
673 0 0         if( exists $ref->{'CrossRef'} ) {
674             # if there is a CrossRef field, add all xref IDs
675             # to the list of refs to export
676 0           push @filterIDs, split(/,/, $ref->{'CrossRef'});
677             }
678             }
679             }
680            
681 0           return \%known_refs;
682             }
683            
684             1;
685            
686            
687             __END__