File Coverage

blib/lib/Text/Refer.pm
Criterion Covered Total %
statement 86 115 74.7
branch 31 50 62.0
condition 7 12 58.3
subroutine 15 34 44.1
pod 9 25 36.0
total 148 236 62.7


line stmt bran cond sub pod time code
1             package Text::Refer;
2              
3             =head1 NAME
4              
5             Text::Refer - parse Unix "refer" files
6              
7             I
8             interface. It will stabilize by June 1997, at which point this
9             notice will be removed. Until then, if you have any feedback,
10             please let me know!>
11              
12              
13             =head1 SYNOPSIS
14              
15             Pull in the module:
16              
17             use Text::Refer;
18              
19             Parse a refer stream from a filehandle:
20              
21             while ($ref = input Text::Refer \*FH) {
22             # ...do stuff with $ref...
23             }
24             defined($ref) or die "error parsing input";
25              
26             Same, but using a parser object for more control:
27            
28             # Create a new parser:
29             $parser = new Text::Refer::Parser LeadWhite=>'KEEP';
30            
31             # Parse:
32             while ($ref = $parser->input(\*FH)) {
33             # ...do stuff with $ref...
34             }
35             defined($ref) or die "error parsing input";
36              
37             Manipulating reference objects, using high-level methods:
38              
39             # Get the title, author, etc.:
40             $title = $ref->title;
41             @authors = $ref->author; # list context
42             $lastAuthor = $ref->author; # scalar context
43            
44             # Set the title and authors:
45             $ref->title("Cyberiad");
46             $ref->author(["S. Trurl", "C. Klapaucius"]); # arrayref for >1 value!
47            
48             # Delete the abstract:
49             $ref->abstract(undef);
50              
51             Same, using low-level methods:
52              
53             # Get the title, author, etc.:
54             $title = $ref->get('T');
55             @authors = $ref->get('A'); # list context
56             $lastAuthor = $ref->get('A'); # scalar context
57            
58             # Set the title and authors:
59             $ref->set('T', "Cyberiad");
60             $ref->set('A', "S. Trurl", "C. Klapaucius");
61            
62             # Delete the abstract:
63             $ref->set('X'); # sets to empty array of values
64              
65             Output:
66              
67             print $ref->as_string;
68              
69              
70             =head1 DESCRIPTION
71              
72             I
73              
74             This module provides routines for parsing in the contents of
75             "refer"-format bibliographic databases: these are simple text files
76             which contain one or more bibliography records. They are usually found
77             lurking on Unix-like operating systems, with the extension F<.bib>.
78              
79             Each record in a "refer" file describes a single paper, book, or article.
80             Users of nroff/troff often employ such databases when typesetting papers.
81              
82             Even if you don't use *roff, this simple, easily-parsed parameter-value
83             format is still useful for recording/exchanging bibliographic
84             information. With this module, you can easily post-process
85             "refer" files: search them, convert them into LaTeX, whatever.
86              
87              
88             =head2 Example
89              
90             Here's a possible "refer" file with three entries:
91              
92             %T Cyberiad
93             %A Stanislaw Lem
94             %K robot fable
95             %I Harcourt/Brace/Jovanovich
96            
97             %T Invisible Cities
98             %A Italo Calvino
99             %K city fable philosophy
100             %X In this surreal series of fables, Marco Polo tells an
101             aged Kublai Khan of the many cities he has visited in
102             his lifetime.
103            
104             %T Angels and Visitations
105             %A Neil Gaiman
106             %D 1993
107              
108             The lines separating the records must be I;
109             that is, they cannot contain anything but a single newline.
110              
111             See refer(1) or grefer(1) for more information on "refer" files.
112              
113              
114             =head2 Syntax
115              
116             I:>
117              
118             The bibliographic database is a text file consisting of
119             records separated by one or more blank lines. Within each
120             record fields start with a % at the beginning of a line.
121             Each field has a one character name that immediately follows
122             the %. It is best to use only upper and lower case
123             letters for the names of fields. The name of the field
124             should be followed by exactly one space, and then by the
125             contents of the field. Empty fields are ignored. The
126             conventional meaning of each field is as follows:
127              
128             =over 4
129              
130             =item A
131              
132             The name of an author. If the name contains a
133             title such as Jr. at the end, it should be separated
134             from the last name by a comma. There can be multiple
135             occurrences of the A field. The order is significant.
136             It is a good idea always to supply an A field or a Q field.
137              
138             =item B
139              
140             For an article that is part of a book, the title of the book
141              
142             =item C
143              
144             The place (city) of publication.
145              
146             =item D
147              
148             The date of publication. The year should be specified in full.
149             If the month is specified, the name rather than the number of
150             the month should be used, but only the first three letters are required.
151             It is a good idea always to supply a D field; if the date is unknown,
152             a value such as "in press" or "unknown" can be used.
153              
154             =item E
155              
156             For an article that is part of a book, the name of an editor of the book.
157             Where the work has editors and no authors, the names of the editors should
158             be given as A fields and , (ed) or , (eds) should be
159             appended to the last author.
160              
161             =item G
162              
163             US Government ordering number.
164              
165             =item I
166              
167             The publisher (issuer).
168              
169             =item J
170              
171             For an article in a journal, the name of the journal.
172              
173             =item K
174              
175             Keywords to be used for searching.
176              
177             =item L
178              
179             Label.
180              
181             B Uniquely identifies the entry. For example, "Able94".
182              
183             =item N
184              
185             Journal issue number.
186              
187             =item O
188              
189             Other information. This is usually printed at the end of the reference.
190              
191             =item P
192              
193             Page number. A range of pages can be specified as m-n.
194              
195             =item Q
196              
197             The name of the author, if the author is not a person.
198             This will only be used if there are no A fields. There can only be one
199             Q field.
200              
201             B Thanks to Mike Zimmerman for clarifying this for me:
202             it means a "corporate" author: when the "author" is listed
203             as an organization such as the UN, or RAND Corporation, or whatever.
204              
205              
206             =item R
207              
208             Technical report number.
209              
210             =item S
211              
212             Series name.
213              
214             =item T
215              
216             Title. For an article in a book or journal, this should be the title
217             of the article.
218              
219             =item V
220              
221             Volume number of the journal or book.
222              
223             =item X
224              
225             Annotation.
226              
227             B Basically, a brief abstract or description.
228              
229             =back
230              
231             For all fields except A and E, if there is more than one occurrence
232             of a particular field in a record, only the last such field will be used.
233              
234             If accent strings are used, they should follow the character
235             to be accented. This means that the AM macro must be
236             used with the -ms macros. Accent strings should not be
237             quoted: use one \ rather than two.
238              
239              
240             =head2 Parsing records from "refer" files
241              
242             You will nearly always use the C constructor to create
243             new instances, and nearly always as shown in the L<"SYNOPSIS">.
244              
245             Internally, the records are parsed by a parser object; if you
246             invoke the class method C, a special default parser
247             is used, and this will be good enough for most tasks. However, for
248             more complex tasks, feel free to use L<"class Text::Refer::Parser">
249             to build (and use) your own fine-tuned parser, and C from
250             that instead.
251              
252              
253              
254             =head1 CLASS Text::Refer
255              
256             Each instance of this class represents a single record in a "refer" file.
257              
258             =cut
259              
260 1     1   3951 use strict;
  1         2  
  1         37  
261 1     1   5 use vars (qw($VERSION $QUIET $GroffFields));
  1         2  
  1         2295  
262              
263              
264             #------------------------------
265             #
266             # GLOBALS
267             #
268             #------------------------------
269              
270             # The package version, both in 1.23 style *and* usable by MakeMaker:
271             $VERSION = substr q$Revision: 1.106 $, 10;
272              
273             # Suppress warnings?
274             $QUIET = 0;
275              
276             # Legal fields for different situations:
277             $GroffFields = '[A-EGI-LN-TVX]'; # groff
278              
279             # The default parser:
280             my $Parser = new Text::Refer::Parser;
281              
282              
283              
284              
285             #==============================
286              
287             =head2 Construction and input
288              
289             =over 4
290              
291             =cut
292              
293             #------------------------------------------------------------
294              
295             =item new
296              
297             I
298             Build an empty "refer" record.
299              
300             =cut
301              
302             sub new {
303 3     3 1 5 my $type = shift;
304 3         8 bless {}, $type;
305             }
306              
307             #------------------------------------------------------------
308              
309             =item input FILEHANDLE
310              
311             I
312             Input a new "refer" record from a filehandle. The default parser
313             is used:
314              
315             while ($ref = input Text::Refer \*STDIN) {
316             # ...do stuff with $ref...
317             }
318              
319             Do I use this as an instance method; it will not re-init the object
320             you give it.
321              
322             =cut
323              
324             sub input {
325 0     0 1 0 shift;
326 0         0 $Parser->input(@_);
327             }
328              
329             =back
330              
331             =cut
332              
333              
334              
335              
336             #==============================
337              
338             =head2 Getting/setting attributes
339              
340             =over 4
341              
342             =cut
343              
344             #------------------------------------------------------------
345              
346             =item attr ATTR, [VALUE]
347              
348             I
349             Get/set the attribute by its one-character name, ATTR.
350             The VALUE is optional, and may be given in a number of ways:
351              
352             =over 4
353              
354             =item *
355              
356             B, the attribute will be deleted:
357              
358             $ref->attr('X', undef); # delete the abstract
359              
360             =item *
361              
362             B it is used to
363             replace the existing values for the attribute with that I value:
364              
365             $ref->attr('T', "The Police State Rears Its Ugly Head");
366             $ref->attr('D', 1997);
367              
368             =item *
369              
370             B it is used to replace the existing values
371             for the attribute with I
372              
373             $ref->attr('A', ["S. Trurl", "C. Klapaucius"]);
374              
375             We use an arrayref since an empty array would be impossible to distinguish
376             from the next two cases, where the goal is to "get" instead of "set"...
377              
378             =back
379              
380              
381             This method returns the current (or new) value of the given attribute,
382             just as C does:
383              
384             =over 4
385              
386             =item *
387              
388             B context,> the method will return the
389             I value (this is to mimic the behavior of I). Hence,
390             given the above, the code:
391              
392             $author = $ref->attr('A');
393              
394             will set C<$author> to C<"C. Klapaucius">.
395              
396             =item *
397              
398             B context,> the method will return the list
399             of I values, in order. Hence, given the above, the code:
400              
401             @authors = $ref->attr('A');
402              
403             will set C<@authors> to C<("S. Trurl", "C. Klapaucius")>.
404              
405             =back
406              
407              
408             I this method is used as the basis of all "named" access
409             methods; hence, the following are equivalent in every way:
410              
411             $ref->attr(T => $title) <=> $ref->title($title);
412             $ref->attr(A => \@authors) <=> $ref->author(\@authors);
413             $ref->attr(D => undef) <=> $ref->date(undef);
414             $auth = $ref->attr('A') <=> $auth = $ref->author;
415             @auths = $ref->attr('A') <=> @auths = $ref->author;
416              
417             =cut
418              
419             sub attr {
420 14     14 1 56 my ($self, $attr, $values) = @_;
421 14 100       29 if (@_ > 2) {
422             # set the "values"...
423             # undef => empty array
424             # non-arrayref => array of one element
425             # arrayref => that array
426 3 50       7 $values = defined($values) ? $values : [];
427 3 100       11 $self->set($attr, (ref($values) ? @$values : ($values)));
428             }
429 14         24 $self->get($attr);
430             }
431              
432             #------------------------------------------------------------
433              
434             =item author, book, city, ... [VALUE]
435              
436             I
437             For every one of the standard fields in a "refer" record, this
438             module has designated a high-level attribute name:
439              
440             A author G govt_no N number S series
441             B book I publisher O other_info T title
442             C city J journal P page V volume
443             D date K keywords Q corp_author X abstract
444             E editor L label R report_no
445              
446             Then, for each field I with high-level attribute name I,
447             the method C works as follows:
448              
449             $ref->attr('F', @args) <=> $ref->FIELDNAME(@args)
450              
451             Which means:
452              
453             $ref->attr(T => $title) <=> $ref->title($title);
454             $ref->attr(A => \@authors) <=> $ref->author(\@authors);
455             $ref->attr(D => undef) <=> $ref->date(undef);
456             $auth = $ref->attr('A') <=> $auth = $ref->author;
457             @auths = $ref->attr('A') <=> @auths = $ref->author;
458              
459             See the documentation of C for the argument list.
460              
461             =cut
462              
463 8     8 1 91 sub author { shift->attr('A',@_) }
464 0     0 1 0 sub book { shift->attr('B',@_) }
465 0     0 1 0 sub city { shift->attr('C',@_) }
466 0     0 0 0 sub date { shift->attr('D',@_) }
467 0     0 0 0 sub editor { shift->attr('E',@_) }
468 0     0 0 0 sub govt_no { shift->attr('G',@_) }
469 0     0 0 0 sub publisher { shift->attr('I',@_) }
470 0     0 0 0 sub journal { shift->attr('J',@_) }
471 0     0 0 0 sub keywords { shift->attr('K',@_) }
472 3     3 0 20 sub label { shift->attr('L',@_) }
473 0     0 0 0 sub number { shift->attr('N',@_) }
474 0     0 0 0 sub other_info { shift->attr('O',@_) }
475 0     0 0 0 sub page { shift->attr('P',@_) }
476 0     0 0 0 sub corp_author { shift->attr('Q',@_) }
477 0     0 0 0 sub report_no { shift->attr('R',@_) }
478 0     0 0 0 sub series { shift->attr('S',@_) }
479 0     0 0 0 sub title { shift->attr('T',@_) }
480 0     0 0 0 sub volume { shift->attr('V',@_) }
481 0     0 0 0 sub abstract { shift->attr('X',@_) }
482              
483             #------------------------------------------------------------
484              
485             =item get ATTR
486              
487             I
488             Get an attribute, by its one-character name.
489             In an array context, it returns all values (empty if none):
490              
491             @authors = $ref->get('A'); # returns list of all authors
492              
493             In a scalar context, it returns the I value (undefined if none):
494              
495             $author = $ref->get('A'); # returns the last author
496              
497             =cut
498              
499             sub get {
500 16     16 1 50 my ($self, $attr) = @_;
501 16   50     37 my $vals = $self->{$attr} || [];
502 16 100       69 (wantarray ? @$vals : $vals->[-1]);
503             }
504              
505             #------------------------------------------------------------
506              
507             =item set ATTR, VALUES...
508              
509             I
510             Set an attribute, by its one-character name.
511              
512             $ref->set('A', "S. Trurl", "C. Klapaucius");
513              
514             An empty array of VALUES deletes the attribute:
515              
516             $ref->set('A'); # deletes all authors
517              
518             No useful return value is currently defined.
519              
520             =cut
521              
522             sub set {
523 4     4 1 15 my $self = shift;
524 4         5 my $attr = shift;
525 4 50       26 if (@_) { $self->{$attr} = [@_] }
  4         11  
526 0         0 else { delete $self->{$attr} }
527 4         8 1;
528             }
529              
530             =back
531              
532             =cut
533              
534              
535             #==============================
536              
537             =head2 Output
538              
539             =over 4
540              
541             =cut
542              
543             #------------------------------------------------------------
544             #
545             # _wrap STRING
546             #
547             # Split string into lines not exceeding 80 chars in length.
548              
549             my $SMIN = 50; # don't split at nonwords before this position
550             my $SMAX = 75; # max line length
551              
552             sub _wrap {
553 10     10   13 pos($_[0]) = 0;
554 10         66 $_[0] =~ s{\G ( # from current position...
555             (.{1,$SMAX})(?:\n|\Z) # next line (if of legal length), plus EOL
556             | # or,
557             (.{$SMIN,$SMAX}\W) # longest prefx of MIN-MAX chars endng in nonword
558             | # or,
559             (.{$SMAX}) # the first MAX chars
560             )
561             }{
562 10 50       34 (defined($2) ? $2 : $1) . "\n" # replace with text followed by \n
563             }gexo;
564 10 50       23 chop $_[0] if (substr($_[0], -1, 1) eq "\n"); # get rid of final \n
565 10         10 1;
566             }
567              
568             #-----------------------------------------------------q-------
569              
570             =item as_string [OPTSHASH]
571              
572             I
573             Return the "refer" record as a string, usually for printing:
574              
575             print $ref->as_string;
576              
577             The options are:
578              
579             =over 4
580              
581             =item Quick
582              
583             If true, do it quickly, but unsafely.
584             I they are output as-is.
585             That means if you used parser-options which destroyed any of the
586             formatting whitespace (e.g., C with C),
587             there is a risk that the output object will be an invalid "refer" record.
588              
589             =back
590              
591             The fields are output with %L first (if it exists), and then the
592             remaining fields in alphabetical order. The following "safety measures"
593             are normally taken:
594              
595             =over 4
596              
597             =item *
598              
599             Lines longer than 76 characters are wrapped (if possible, at a non-word
600             character a reasonable length in, but there is a chance that they will
601             simply be "split" if no such character is available).
602              
603             =item *
604              
605             Any occurences of '%' immediately after a newline are preceded by a
606             single space.
607              
608             =back
609              
610             These safety measures are slightly time-consuming, and are silly if you
611             are merely outputting a "refer" object which you have read in verbatim
612             (i.e., using the default parser-options) from a valid "refer" file.
613             In these cases, you may want to use the B option.
614            
615             =cut
616              
617             sub as_string {
618 2     2 1 180 my ($self, %opts) = @_;
619 2         3 my ($key, $val);
620              
621             # Figure out the keys to use, and put them in order:
622 2 100       7 my @keys = sort grep {(length == 1) && ($_ ne 'L')} (keys %$self);
  18         64  
623 2 50       9 defined($self->{'L'}) && unshift(@keys, 'L');
624              
625             # Output:
626 2         2 my @lines;
627 2         3 foreach $key (@keys) {
628 16         21 foreach $val (@{$self->{$key}}) {
  16         21  
629 20 100       30 unless ($opts{Quick}) {
630             ### print "UNWRAPPED = [$val]\n";
631 10         17 _wrap($val); # make sure no line exceeds 80 chars
632             ### print "WRAPPED = [$val]\n";
633 10         10 $val =~ s/\n%/\n %/g; # newlines must NOT be followed by %
634 10         10 $val =~ s/\n+\Z//; # strip trailing newlines
635             }
636 20         51 push @lines, join('', '%', $key, ' ', $val, "\n");
637             }
638             }
639 2         17 join '', @lines;
640             }
641              
642             =back
643              
644             =cut
645              
646              
647              
648              
649              
650             #==============================
651             #
652             package Text::Refer::Parser;
653             #
654             #==============================
655              
656             =head1 CLASS Text::Refer::Parser
657              
658             Instances of this class do the actual parsing.
659              
660              
661             =head2 Parser options
662              
663             The options you may give to C are as follows:
664              
665             =over 4
666              
667             =item ForgiveEOF
668              
669             Normally, the last record in a file must end with a blank line, or
670             else this module will suspect it of being incomplete and return an
671             error. However, if you give this option as true, it will allow
672             the last record to be terminated by an EOF.
673              
674             =item GoodFields
675              
676             By default, the parser accepts any (one-character) field name that is
677             a printable ASCII character (no whitespace). Formally, this is:
678              
679             [\041-\176]
680              
681             However, when compiling parser options, you can supply your own regular
682             expression for validating (one-character) field names.
683             (I you must supply the square brackets; they are there to remind
684             you that you should give a well-formed single-character expression).
685             One standard expression is provided for you:
686              
687             $Text::Refer::GroffFields = '[A-EGI-LN-TVX]'; # legal groff fields
688              
689             Illegal fields which are encounterd during parsing result in a syntax error.
690              
691             B You really shouldn't use this unless you absolutely need to.
692             The added regular expression test slows down the parser.
693              
694              
695             =item LeadWhite
696              
697             In many "refer" files, continuation lines (the 2nd, 3rd, etc. lines of a
698             field) are written with leading whitespace, like this:
699              
700             %T Incontrovertible Proof that Pi Equals Three
701             (for Large Values of Three)
702             %A S. Trurl
703             %X The author shows how anyone can use various common household
704             objects to obtain successively less-accurate estimations of
705             pi, until finally arriving at a desired integer approximation,
706             which nearly always is three.
707              
708             This leading whitespace serves two purposes: (1) it makes it impossible
709             to mistake a continuation line for a field, since % can no longer be the
710             first character, and (2) it makes the entries easier to read.
711             The C option controls what is done with this whitespace:
712              
713             KEEP - default; the whitespace is untouched
714             KILLONE - exactly one character of leading whitespace is removed
715             KILLALL - all leading whitespace is removed
716              
717             See the section below on "using the parser options" for hints and warnings.
718              
719              
720             =item Newline
721              
722             The C option controls what is done with the newlines that
723             separate adjacent lines in the same field:
724              
725             KEEP - default; the newlines are kept in the field value
726             TOSPACE - convert each newline to a single space
727             KILL - the newlines are removed
728              
729             See the section below on "using the parser options" for hints and warnings.
730              
731              
732             =back
733              
734             Default values will be used for any options which are left unspecified.
735              
736              
737             =head2 Notes on the parser options
738              
739             The default values for C and C will preserve the
740             input text exactly.
741              
742             The C option, when used in conjunction with the
743             C option, effectively "word-wraps" the text of
744             each field into a single line.
745              
746             B If you use the C option with
747             either the C or the C option,
748             you could end up eliminating all whitespace that separates the word
749             at the end of one line from the word at the beginning of the next line.
750              
751              
752             =head2 Public interface
753              
754             =over 4
755              
756             =cut
757              
758 1     1   9 use strict;
  1         6  
  1         27  
759 1     1   6 use Carp;
  1         2  
  1         1304  
760              
761             #------------------------------------------------------------
762              
763             sub error {
764 0     0   0 my $self = shift;
765 0 0       0 warn "refer: l.$.: ".join('',@_)."\n" unless $Text::Refer::QUIET;
766 0 0       0 return (wantarray ? () : undef);
767             }
768              
769             #------------------------------------------------------------
770              
771             =item new PARAMHASH
772              
773             I
774             Create and return a new parser. See above for the L<"parser options">
775             which you may give in the PARAMHASH.
776              
777             =cut
778              
779             sub new {
780 2     2   21 my ($class, %params) = @_;
781 2         4 my $self = \%params;
782 2   50     14 $self->{Class} ||= 'Text::Refer';
783 2   50     10 $self->{Newline} ||= 'KEEP';
784 2   50     8 $self->{LeadWhite} ||= 'KEEP';
785 2   50     9 $self->{GoodFields} ||= '[\041-\176]';
786              
787             # Compile allowed fields:
788 2         4 my $gf = substr($self->{GoodFields}, 1);
789 2         7 ($self->{Fields} = join('', map {chr($_)} 0..255)) =~ s{[^$gf}{}g;
  512         1121  
790              
791             # The EOL character:
792 2 50       44 if ($self->{Newline} eq 'KILL') { $self->{EOL} = "" }
  0 50       0  
793 0         0 elsif ($self->{Newline} eq 'TOSPACE') { $self->{EOL} = " " }
794 2         5 else { $self->{EOL} = "\n" }
795            
796 2         8 bless $self, $class;
797             }
798              
799             #------------------------------------------------------------
800              
801             =item create [CLASS]
802              
803             I
804             What class of objects to create.
805             The default is C.
806              
807             =cut
808              
809             sub create {
810 3     3   4 my ($self, $class) = @_;
811 3 50       5 $self->{Class} = $class if $class;
812 3         13 $self->{Class};
813             }
814              
815             #------------------------------------------------------------
816              
817             =item input FH
818              
819             I
820             Create a new object from the next record in a "refer" stream.
821             The actual class of the object is given by the C method.
822              
823             Returns the object on success, '0' on I end-of-file,
824             and undefined on error.
825              
826             Having two false values makes parsing very simple: just C
827             records until the result is false, then check to see if that last result
828             was 0 (end of file) or undef (failure).
829              
830             =cut
831              
832             sub input {
833 4     4   53 my ($self, $fh) = @_;
834 4         4 my $line; # the next line
835             my $field; # last key read in, or undef
836 4         12 local($/) = "\n"; # in case our caller has been naughty
837              
838              
839             # Get options into scalars for faster usage:
840 4         11 my $LeadWhite = $self->{LeadWhite};
841 4         5 my $EOL = $self->{EOL};
842              
843             # Skip blank lines until (legal) EOF or record:
844 4         5 while (1) {
845 12 100       58 defined($_ = <$fh>) or return 0;
846 11         11 chomp;
847 11 100       27 last if length($_); # break if we hit a nonblank line
848             }
849              
850             # Start new object:
851 3         8 my $ref = $self->create->new;
852 3         16 $ref->{LineNo} = $.;
853            
854             # Read record lines until (unexpected) EOF or done:
855 3         4 while (1) {
856 30 100       85 if (/^%(.)\s?(.*)$/) { # start new field...
    50          
857 29 50       79 (index($self->{Fields}, ($field = $1)) >= 0) or
858             return $self->error("bad record field '$field' in <$_>");
859 29   100     28 push @{$ref->{$field} ||= []}, $2;
  29         159  
860             }
861             elsif (defined($field)) { # add line to previous field...
862              
863             # Muck about with leading whitespace (implicit else is KEEP):
864 1 50       4 if ($LeadWhite eq 'KILLONE') { # kill first leading white:
    50          
865 0         0 s/^\s//;
866             } elsif ($LeadWhite eq 'KILLALL') { # kill all leading white:
867 0         0 s/^\s+//;
868             }
869              
870             # Add separator and new line to existing value:
871 1         5 $ref->{$field}[-1] .= ($EOL . $_);
872             }
873             else { # yow! line not inside record!
874 0         0 return $self->error("line outside record: <$_>");
875             }
876             } continue {
877 30 50       71 defined($_ = <$fh>) or do { # unexpected EOF... forgive it?
878 0 0       0 $self->{ForgiveEOF}? last : return $self->error("unexpected EOF")};
879 30         27 chomp;
880 30 100       61 last if ($_ eq ''); # blank line means end of record
881             }
882              
883             # Done!
884 3         13 $ref;
885             }
886              
887             =back
888              
889             =cut
890              
891              
892             #------------------------------------------------------------
893              
894             =head1 NOTES
895              
896             =head2 Under the hood
897              
898             Each "refer" object has instance variables corresponding to the actual
899             field names (C<'T'>, C<'A'>, etc.). Each of these is a reference to
900             an array of the actual values.
901              
902             Notice that, for maximum flexibility and consistency (but at the cost of
903             some space and access-efficiency), the semantics of "refer" records do
904             not come into play at this time: since everything resides in an array,
905             you can have as many %K, %D, etc. records as you like, and given them
906             entirely different semantics.
907              
908             For example, the Library Of Boring Stuff That Everyone Reads (LOBSTER) uses
909             the unused %Y as a "year" field. The parser accomodates this
910             case by politely not choking on LOBSTER .bibs (although why you would
911             want to eat a lobster bib instead of the lobster is beyond me...).
912              
913              
914             =head2 Performance
915              
916             Tolerable. On my 90MHz/32 MB RAM/I586 box running Linux 1.2.13 and Perl5.002,
917             it parses a typical 500 KB "refer" file (of 1600 records) as follows:
918              
919             8 seconds of user time for input and no output
920             10 seconds of user time for input and "quick" output
921             16 seconds of user time for input and "safe" output
922              
923             So, figure the individual speeds are:
924              
925             input: 200 records ( 60 KB) per second.
926             "quick" output: 800 records (240 KB) per second.
927             "safe" output: 200 records ( 60 KB) per second.
928              
929             By contrast, a C program which does the same work is about 8 times as fast.
930             But of course, the C code is 8 times as large, and 8 times as ugly... C<:-)>
931              
932              
933             =head2 Note to serious bib-file users
934              
935             I actually do not use "refer" files for *roffing... I used them as a
936             quick-and-dirty database for WebLib, and that's where this code comes
937             from. If you're a serious user of "refer" files, and this module doesn't
938             do what you need it to, please contact me: I'll add the functionality
939             in.
940              
941              
942             =head1 BUGS
943              
944             Some combinations of parser-options are silly.
945              
946              
947              
948             =head1 CHANGE LOG
949              
950             $Id: Refer.pm,v 1.106 1997/04/22 18:41:41 eryq Exp $
951              
952             =over 4
953              
954             =item Version 1.101
955              
956             Initial release. Adapted from Text::Bib.
957              
958             =back
959              
960              
961             =head1 AUTHOR
962              
963             Copyright (C) 1997 by Eryq,
964             F,
965             F.
966              
967              
968             =head1 NO WARRANTY
969              
970             This program is free software; you can redistribute it and/or modify
971             it under the terms of the GNU General Public License as published by
972             the Free Software Foundation; either version 2 of the License, or
973             (at your option) any later version.
974              
975             This program is distributed in the hope that it will be useful,
976             but WITHOUT ANY WARRANTY; without even the implied warranty of
977             MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
978             GNU General Public License for more details.
979              
980             For a copy of the GNU General Public License, write to the Free Software
981             Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
982              
983             =cut
984              
985             1;
986