File Coverage

blib/lib/Text/Refer.pm

Criterion	Covered	Total	%
statement	86	115	74.7
branch	31	50	62.0
condition	7	12	58.3
subroutine	15	34	44.1
pod	9	25	36.0
total	148	236	62.7

line	stmt	bran	cond	sub	pod	time	code
1							package Text::Refer;
2
3							=head1 NAME
4
5							Text::Refer - parse Unix "refer" files
6
7							I
8							interface. It will stabilize by June 1997, at which point this
9							notice will be removed. Until then, if you have any feedback,
10							please let me know!>
11
12
13							=head1 SYNOPSIS
14
15							Pull in the module:
16
17							use Text::Refer;
18
19							Parse a refer stream from a filehandle:
20
21							while ($ref = input Text::Refer \*FH) {
22							# ...do stuff with $ref...
23							}
24							defined($ref) or die "error parsing input";
25
26							Same, but using a parser object for more control:
27
28							# Create a new parser:
29							$parser = new Text::Refer::Parser LeadWhite=>'KEEP';
30
31							# Parse:
32							while ($ref = $parser->input(\*FH)) {
33							# ...do stuff with $ref...
34							}
35							defined($ref) or die "error parsing input";
36
37							Manipulating reference objects, using high-level methods:
38
39							# Get the title, author, etc.:
40							$title = $ref->title;
41							@authors = $ref->author; # list context
42							$lastAuthor = $ref->author; # scalar context
43
44							# Set the title and authors:
45							$ref->title("Cyberiad");
46							$ref->author(["S. Trurl", "C. Klapaucius"]); # arrayref for >1 value!
47
48							# Delete the abstract:
49							$ref->abstract(undef);
50
51							Same, using low-level methods:
52
53							# Get the title, author, etc.:
54							$title = $ref->get('T');
55							@authors = $ref->get('A'); # list context
56							$lastAuthor = $ref->get('A'); # scalar context
57
58							# Set the title and authors:
59							$ref->set('T', "Cyberiad");
60							$ref->set('A', "S. Trurl", "C. Klapaucius");
61
62							# Delete the abstract:
63							$ref->set('X'); # sets to empty array of values
64
65							Output:
66
67							print $ref->as_string;
68
69
70							=head1 DESCRIPTION
71
72							I
73
74							This module provides routines for parsing in the contents of
75							"refer"-format bibliographic databases: these are simple text files
76							which contain one or more bibliography records. They are usually found
77							lurking on Unix-like operating systems, with the extension F<.bib>.
78
79							Each record in a "refer" file describes a single paper, book, or article.
80							Users of nroff/troff often employ such databases when typesetting papers.
81
82							Even if you don't use *roff, this simple, easily-parsed parameter-value
83							format is still useful for recording/exchanging bibliographic
84							information. With this module, you can easily post-process
85							"refer" files: search them, convert them into LaTeX, whatever.
86
87
88							=head2 Example
89
90							Here's a possible "refer" file with three entries:
91
92							%T Cyberiad
93							%A Stanislaw Lem
94							%K robot fable
95							%I Harcourt/Brace/Jovanovich
96
97							%T Invisible Cities
98							%A Italo Calvino
99							%K city fable philosophy
100							%X In this surreal series of fables, Marco Polo tells an
101							aged Kublai Khan of the many cities he has visited in
102							his lifetime.
103
104							%T Angels and Visitations
105							%A Neil Gaiman
106							%D 1993
107
108							The lines separating the records must be I;
109							that is, they cannot contain anything but a single newline.
110
111							See refer(1) or grefer(1) for more information on "refer" files.
112
113
114							=head2 Syntax
115
116							I:>
117
118							The bibliographic database is a text file consisting of
119							records separated by one or more blank lines. Within each
120							record fields start with a % at the beginning of a line.
121							Each field has a one character name that immediately follows
122							the %. It is best to use only upper and lower case
123							letters for the names of fields. The name of the field
124							should be followed by exactly one space, and then by the
125							contents of the field. Empty fields are ignored. The
126							conventional meaning of each field is as follows:
127
128							=over 4
129
130							=item A
131
132							The name of an author. If the name contains a
133							title such as Jr. at the end, it should be separated
134							from the last name by a comma. There can be multiple
135							occurrences of the A field. The order is significant.
136							It is a good idea always to supply an A field or a Q field.
137
138							=item B
139
140							For an article that is part of a book, the title of the book
141
142							=item C
143
144							The place (city) of publication.
145
146							=item D
147
148							The date of publication. The year should be specified in full.
149							If the month is specified, the name rather than the number of
150							the month should be used, but only the first three letters are required.
151							It is a good idea always to supply a D field; if the date is unknown,
152							a value such as "in press" or "unknown" can be used.
153
154							=item E
155
156							For an article that is part of a book, the name of an editor of the book.
157							Where the work has editors and no authors, the names of the editors should
158							be given as A fields and , (ed) or , (eds) should be
159							appended to the last author.
160
161							=item G
162
163							US Government ordering number.
164
165							=item I
166
167							The publisher (issuer).
168
169							=item J
170
171							For an article in a journal, the name of the journal.
172
173							=item K
174
175							Keywords to be used for searching.
176
177							=item L
178
179							Label.
180
181							B Uniquely identifies the entry. For example, "Able94".
182
183							=item N
184
185							Journal issue number.
186
187							=item O
188
189							Other information. This is usually printed at the end of the reference.
190
191							=item P
192
193							Page number. A range of pages can be specified as m-n.
194
195							=item Q
196
197							The name of the author, if the author is not a person.
198							This will only be used if there are no A fields. There can only be one
199							Q field.
200
201							B Thanks to Mike Zimmerman for clarifying this for me:
202							it means a "corporate" author: when the "author" is listed
203							as an organization such as the UN, or RAND Corporation, or whatever.
204
205
206							=item R
207
208							Technical report number.
209
210							=item S
211
212							Series name.
213
214							=item T
215
216							Title. For an article in a book or journal, this should be the title
217							of the article.
218
219							=item V
220
221							Volume number of the journal or book.
222
223							=item X
224
225							Annotation.
226
227							B Basically, a brief abstract or description.
228
229							=back
230
231							For all fields except A and E, if there is more than one occurrence
232							of a particular field in a record, only the last such field will be used.
233
234							If accent strings are used, they should follow the character
235							to be accented. This means that the AM macro must be
236							used with the -ms macros. Accent strings should not be
237							quoted: use one \ rather than two.
238
239
240							=head2 Parsing records from "refer" files
241
242							You will nearly always use the C constructor to create
243							new instances, and nearly always as shown in the L<"SYNOPSIS">.
244
245							Internally, the records are parsed by a parser object; if you
246							invoke the class method C, a special default parser
247							is used, and this will be good enough for most tasks. However, for
248							more complex tasks, feel free to use L<"class Text::Refer::Parser">
249							to build (and use) your own fine-tuned parser, and C from
250							that instead.
251
252
253
254							=head1 CLASS Text::Refer
255
256							Each instance of this class represents a single record in a "refer" file.
257
258							=cut
259
260	1			1		3951	use strict;
	1					2
	1					37
261	1			1		5	use vars (qw($VERSION $QUIET $GroffFields));
	1					2
	1					2295
262
263
264							#------------------------------
265							#
266							# GLOBALS
267							#
268							#------------------------------
269
270							# The package version, both in 1.23 style and usable by MakeMaker:
271							$VERSION = substr q$Revision: 1.106 $, 10;
272
273							# Suppress warnings?
274							$QUIET = 0;
275
276							# Legal fields for different situations:
277							$GroffFields = '[A-EGI-LN-TVX]'; # groff
278
279							# The default parser:
280							my $Parser = new Text::Refer::Parser;
281
282
283
284
285							#==============================
286
287							=head2 Construction and input
288
289							=over 4
290
291							=cut
292
293							#------------------------------------------------------------
294
295							=item new
296
297							I
298							Build an empty "refer" record.
299
300							=cut
301
302							sub new {
303	3			3	1	5	my $type = shift;
304	3					8	bless {}, $type;
305							}
306
307							#------------------------------------------------------------
308
309							=item input FILEHANDLE
310
311							I
312							Input a new "refer" record from a filehandle. The default parser
313							is used:
314
315							while ($ref = input Text::Refer \*STDIN) {
316							# ...do stuff with $ref...
317							}
318
319							Do I use this as an instance method; it will not re-init the object
320							you give it.
321
322							=cut
323
324							sub input {
325	0			0	1	0	shift;
326	0					0	$Parser->input(@_);
327							}
328
329							=back
330
331							=cut
332
333
334
335
336							#==============================
337
338							=head2 Getting/setting attributes
339
340							=over 4
341
342							=cut
343
344							#------------------------------------------------------------
345
346							=item attr ATTR, [VALUE]
347
348							I
349							Get/set the attribute by its one-character name, ATTR.
350							The VALUE is optional, and may be given in a number of ways:
351
352							=over 4
353
354							=item *
355
356							B, the attribute will be deleted:
357
358							$ref->attr('X', undef); # delete the abstract
359
360							=item *
361
362							B it is used to
363							replace the existing values for the attribute with that I value:
364
365							$ref->attr('T', "The Police State Rears Its Ugly Head");
366							$ref->attr('D', 1997);
367
368							=item *
369
370							B it is used to replace the existing values
371							for the attribute with I
372
373							$ref->attr('A', ["S. Trurl", "C. Klapaucius"]);
374
375							We use an arrayref since an empty array would be impossible to distinguish
376							from the next two cases, where the goal is to "get" instead of "set"...
377
378							=back
379
380
381							This method returns the current (or new) value of the given attribute,
382							just as C does:
383
384							=over 4
385
386							=item *
387
388							B context,> the method will return the
389							I value (this is to mimic the behavior of I). Hence,
390							given the above, the code:
391
392							$author = $ref->attr('A');
393
394							will set C<$author> to C<"C. Klapaucius">.
395
396							=item *
397
398							B context,> the method will return the list
399							of I values, in order. Hence, given the above, the code:
400
401							@authors = $ref->attr('A');
402
403							will set C<@authors> to C<("S. Trurl", "C. Klapaucius")>.
404
405							=back
406
407
408							I this method is used as the basis of all "named" access
409							methods; hence, the following are equivalent in every way:
410
411							$ref->attr(T => $title) <=> $ref->title($title);
412							$ref->attr(A => \@authors) <=> $ref->author(\@authors);
413							$ref->attr(D => undef) <=> $ref->date(undef);
414							$auth = $ref->attr('A') <=> $auth = $ref->author;
415							@auths = $ref->attr('A') <=> @auths = $ref->author;
416
417							=cut
418
419							sub attr {
420	14			14	1	56	my ($self, $attr, $values) = @_;
421	14	100				29	if (@_ > 2) {
422							# set the "values"...
423							# undef => empty array
424							# non-arrayref => array of one element
425							# arrayref => that array
426	3	50				7	$values = defined($values) ? $values : [];
427	3	100				11	$self->set($attr, (ref($values) ? @$values : ($values)));
428							}
429	14					24	$self->get($attr);
430							}
431
432							#------------------------------------------------------------
433
434							=item author, book, city, ... [VALUE]
435
436							I
437							For every one of the standard fields in a "refer" record, this
438							module has designated a high-level attribute name:
439
440							A author G govt_no N number S series
441							B book I publisher O other_info T title
442							C city J journal P page V volume
443							D date K keywords Q corp_author X abstract
444							E editor L label R report_no
445
446							Then, for each field I with high-level attribute name I,
447							the method C works as follows:
448
449							$ref->attr('F', @args) <=> $ref->FIELDNAME(@args)
450
451							Which means:
452
453							$ref->attr(T => $title) <=> $ref->title($title);
454							$ref->attr(A => \@authors) <=> $ref->author(\@authors);
455							$ref->attr(D => undef) <=> $ref->date(undef);
456							$auth = $ref->attr('A') <=> $auth = $ref->author;
457							@auths = $ref->attr('A') <=> @auths = $ref->author;
458
459							See the documentation of C for the argument list.
460
461							=cut
462
463	8			8	1	91	sub author { shift->attr('A',@_) }
464	0			0	1	0	sub book { shift->attr('B',@_) }
465	0			0	1	0	sub city { shift->attr('C',@_) }
466	0			0	0	0	sub date { shift->attr('D',@_) }
467	0			0	0	0	sub editor { shift->attr('E',@_) }
468	0			0	0	0	sub govt_no { shift->attr('G',@_) }
469	0			0	0	0	sub publisher { shift->attr('I',@_) }
470	0			0	0	0	sub journal { shift->attr('J',@_) }
471	0			0	0	0	sub keywords { shift->attr('K',@_) }
472	3			3	0	20	sub label { shift->attr('L',@_) }
473	0			0	0	0	sub number { shift->attr('N',@_) }
474	0			0	0	0	sub other_info { shift->attr('O',@_) }
475	0			0	0	0	sub page { shift->attr('P',@_) }
476	0			0	0	0	sub corp_author { shift->attr('Q',@_) }
477	0			0	0	0	sub report_no { shift->attr('R',@_) }
478	0			0	0	0	sub series { shift->attr('S',@_) }
479	0			0	0	0	sub title { shift->attr('T',@_) }
480	0			0	0	0	sub volume { shift->attr('V',@_) }
481	0			0	0	0	sub abstract { shift->attr('X',@_) }
482
483							#------------------------------------------------------------
484
485							=item get ATTR
486
487							I
488							Get an attribute, by its one-character name.
489							In an array context, it returns all values (empty if none):
490
491							@authors = $ref->get('A'); # returns list of all authors
492
493							In a scalar context, it returns the I value (undefined if none):
494
495							$author = $ref->get('A'); # returns the last author
496
497							=cut
498
499							sub get {
500	16			16	1	50	my ($self, $attr) = @_;
501	16		50			37	my $vals = $self->{$attr} \|\| [];
502	16	100				69	(wantarray ? @$vals : $vals->[-1]);
503							}
504
505							#------------------------------------------------------------
506
507							=item set ATTR, VALUES...
508
509							I
510							Set an attribute, by its one-character name.
511
512							$ref->set('A', "S. Trurl", "C. Klapaucius");
513
514							An empty array of VALUES deletes the attribute:
515
516							$ref->set('A'); # deletes all authors
517
518							No useful return value is currently defined.
519
520							=cut
521
522							sub set {
523	4			4	1	15	my $self = shift;
524	4					5	my $attr = shift;
525	4	50				26	if (@_) { $self->{$attr} = [@_] }
	4					11
526	0					0	else { delete $self->{$attr} }
527	4					8	1;
528							}
529
530							=back
531
532							=cut
533
534
535							#==============================
536
537							=head2 Output
538
539							=over 4
540
541							=cut
542
543							#------------------------------------------------------------
544							#
545							# _wrap STRING
546							#
547							# Split string into lines not exceeding 80 chars in length.
548
549							my $SMIN = 50; # don't split at nonwords before this position
550							my $SMAX = 75; # max line length
551
552							sub _wrap {
553	10			10		13	pos($_[0]) = 0;
554	10					66	$_[0] =~ s{\G ( # from current position...
555							(.{1,$SMAX})(?:\n\|\Z) # next line (if of legal length), plus EOL
556							\| # or,
557							(.{$SMIN,$SMAX}\W) # longest prefx of MIN-MAX chars endng in nonword
558							\| # or,
559							(.{$SMAX}) # the first MAX chars
560							)
561							}{
562	10	50				34	(defined($2) ? $2 : $1) . "\n" # replace with text followed by \n
563							}gexo;
564	10	50				23	chop $_[0] if (substr($_[0], -1, 1) eq "\n"); # get rid of final \n
565	10					10	1;
566							}
567
568							#-----------------------------------------------------q-------
569
570							=item as_string [OPTSHASH]
571
572							I
573							Return the "refer" record as a string, usually for printing:
574
575							print $ref->as_string;
576
577							The options are:
578
579							=over 4
580
581							=item Quick
582
583							If true, do it quickly, but unsafely.
584							I they are output as-is.
585							That means if you used parser-options which destroyed any of the
586							formatting whitespace (e.g., C with C),
587							there is a risk that the output object will be an invalid "refer" record.
588
589							=back
590
591							The fields are output with %L first (if it exists), and then the
592							remaining fields in alphabetical order. The following "safety measures"
593							are normally taken:
594
595							=over 4
596
597							=item *
598
599							Lines longer than 76 characters are wrapped (if possible, at a non-word
600							character a reasonable length in, but there is a chance that they will
601							simply be "split" if no such character is available).
602
603							=item *
604
605							Any occurences of '%' immediately after a newline are preceded by a
606							single space.
607
608							=back
609
610							These safety measures are slightly time-consuming, and are silly if you
611							are merely outputting a "refer" object which you have read in verbatim
612							(i.e., using the default parser-options) from a valid "refer" file.
613							In these cases, you may want to use the B option.
614
615							=cut
616
617							sub as_string {
618	2			2	1	180	my ($self, %opts) = @_;
619	2					3	my ($key, $val);
620
621							# Figure out the keys to use, and put them in order:
622	2	100				7	my @keys = sort grep {(length == 1) && ($_ ne 'L')} (keys %$self);
	18					64
623	2	50				9	defined($self->{'L'}) && unshift(@keys, 'L');
624
625							# Output:
626	2					2	my @lines;
627	2					3	foreach $key (@keys) {
628	16					21	foreach $val (@{$self->{$key}}) {
	16					21
629	20	100				30	unless ($opts{Quick}) {
630							### print "UNWRAPPED = [$val]\n";
631	10					17	_wrap($val); # make sure no line exceeds 80 chars
632							### print "WRAPPED = [$val]\n";
633	10					10	$val =~ s/\n%/\n %/g; # newlines must NOT be followed by %
634	10					10	$val =~ s/\n+\Z//; # strip trailing newlines
635							}
636	20					51	push @lines, join('', '%', $key, ' ', $val, "\n");
637							}
638							}
639	2					17	join '', @lines;
640							}
641
642							=back
643
644							=cut
645
646
647
648
649
650							#==============================
651							#
652							package Text::Refer::Parser;
653							#
654							#==============================
655
656							=head1 CLASS Text::Refer::Parser
657
658							Instances of this class do the actual parsing.
659
660
661							=head2 Parser options
662
663							The options you may give to C are as follows:
664
665							=over 4
666
667							=item ForgiveEOF
668
669							Normally, the last record in a file must end with a blank line, or
670							else this module will suspect it of being incomplete and return an
671							error. However, if you give this option as true, it will allow
672							the last record to be terminated by an EOF.
673
674							=item GoodFields
675
676							By default, the parser accepts any (one-character) field name that is
677							a printable ASCII character (no whitespace). Formally, this is:
678
679							[\041-\176]
680
681							However, when compiling parser options, you can supply your own regular
682							expression for validating (one-character) field names.
683							(I you must supply the square brackets; they are there to remind
684							you that you should give a well-formed single-character expression).
685							One standard expression is provided for you:
686
687							$Text::Refer::GroffFields = '[A-EGI-LN-TVX]'; # legal groff fields
688
689							Illegal fields which are encounterd during parsing result in a syntax error.
690
691							B You really shouldn't use this unless you absolutely need to.
692							The added regular expression test slows down the parser.
693
694
695							=item LeadWhite
696
697							In many "refer" files, continuation lines (the 2nd, 3rd, etc. lines of a
698							field) are written with leading whitespace, like this:
699
700							%T Incontrovertible Proof that Pi Equals Three
701							(for Large Values of Three)
702							%A S. Trurl
703							%X The author shows how anyone can use various common household
704							objects to obtain successively less-accurate estimations of
705							pi, until finally arriving at a desired integer approximation,
706							which nearly always is three.
707
708							This leading whitespace serves two purposes: (1) it makes it impossible
709							to mistake a continuation line for a field, since % can no longer be the
710							first character, and (2) it makes the entries easier to read.
711							The C option controls what is done with this whitespace:
712
713							KEEP - default; the whitespace is untouched
714							KILLONE - exactly one character of leading whitespace is removed
715							KILLALL - all leading whitespace is removed
716
717							See the section below on "using the parser options" for hints and warnings.
718
719
720							=item Newline
721
722							The C option controls what is done with the newlines that
723							separate adjacent lines in the same field:
724
725							KEEP - default; the newlines are kept in the field value
726							TOSPACE - convert each newline to a single space
727							KILL - the newlines are removed
728
729							See the section below on "using the parser options" for hints and warnings.
730
731
732							=back
733
734							Default values will be used for any options which are left unspecified.
735
736
737							=head2 Notes on the parser options
738
739							The default values for C and C will preserve the
740							input text exactly.
741
742							The C option, when used in conjunction with the
743							C option, effectively "word-wraps" the text of
744							each field into a single line.
745
746							B If you use the C option with
747							either the C or the C option,
748							you could end up eliminating all whitespace that separates the word
749							at the end of one line from the word at the beginning of the next line.
750
751
752							=head2 Public interface
753
754							=over 4
755
756							=cut
757
758	1			1		9	use strict;
	1					6
	1					27
759	1			1		6	use Carp;
	1					2
	1					1304
760
761							#------------------------------------------------------------
762
763							sub error {
764	0			0		0	my $self = shift;
765	0	0				0	warn "refer: l.$.: ".join('',@_)."\n" unless $Text::Refer::QUIET;
766	0	0				0	return (wantarray ? () : undef);
767							}
768
769							#------------------------------------------------------------
770
771							=item new PARAMHASH
772
773							I
774							Create and return a new parser. See above for the L<"parser options">
775							which you may give in the PARAMHASH.
776
777							=cut
778
779							sub new {
780	2			2		21	my ($class, %params) = @_;
781	2					4	my $self = \%params;
782	2		50			14	$self->{Class} \|\|= 'Text::Refer';
783	2		50			10	$self->{Newline} \|\|= 'KEEP';
784	2		50			8	$self->{LeadWhite} \|\|= 'KEEP';
785	2		50			9	$self->{GoodFields} \|\|= '[\041-\176]';
786
787							# Compile allowed fields:
788	2					4	my $gf = substr($self->{GoodFields}, 1);
789	2					7	($self->{Fields} = join('', map {chr($_)} 0..255)) =~ s{[^$gf}{}g;
	512					1121
790
791							# The EOL character:
792	2	50				44	if ($self->{Newline} eq 'KILL') { $self->{EOL} = "" }
	0	50				0
793	0					0	elsif ($self->{Newline} eq 'TOSPACE') { $self->{EOL} = " " }
794	2					5	else { $self->{EOL} = "\n" }
795
796	2					8	bless $self, $class;
797							}
798
799							#------------------------------------------------------------
800
801							=item create [CLASS]
802
803							I
804							What class of objects to create.
805							The default is C.
806
807							=cut
808
809							sub create {
810	3			3		4	my ($self, $class) = @_;
811	3	50				5	$self->{Class} = $class if $class;
812	3					13	$self->{Class};
813							}
814
815							#------------------------------------------------------------
816
817							=item input FH
818
819							I
820							Create a new object from the next record in a "refer" stream.
821							The actual class of the object is given by the C method.
822
823							Returns the object on success, '0' on I end-of-file,
824							and undefined on error.
825
826							Having two false values makes parsing very simple: just C
827							records until the result is false, then check to see if that last result
828							was 0 (end of file) or undef (failure).
829
830							=cut
831
832							sub input {
833	4			4		53	my ($self, $fh) = @_;
834	4					4	my $line; # the next line
835							my $field; # last key read in, or undef
836	4					12	local($/) = "\n"; # in case our caller has been naughty
837
838
839							# Get options into scalars for faster usage:
840	4					11	my $LeadWhite = $self->{LeadWhite};
841	4					5	my $EOL = $self->{EOL};
842
843							# Skip blank lines until (legal) EOF or record:
844	4					5	while (1) {
845	12	100				58	defined($_ = <$fh>) or return 0;
846	11					11	chomp;
847	11	100				27	last if length($_); # break if we hit a nonblank line
848							}
849
850							# Start new object:
851	3					8	my $ref = $self->create->new;
852	3					16	$ref->{LineNo} = $.;
853
854							# Read record lines until (unexpected) EOF or done:
855	3					4	while (1) {
856	30	100				85	if (/^%(.)\s?(.*)$/) { # start new field...
		50
857	29	50				79	(index($self->{Fields}, ($field = $1)) >= 0) or
858							return $self->error("bad record field '$field' in <$_>");
859	29		100			28	push @{$ref->{$field} \|\|= []}, $2;
	29					159
860							}
861							elsif (defined($field)) { # add line to previous field...
862
863							# Muck about with leading whitespace (implicit else is KEEP):
864	1	50				4	if ($LeadWhite eq 'KILLONE') { # kill first leading white:
		50
865	0					0	s/^\s//;
866							} elsif ($LeadWhite eq 'KILLALL') { # kill all leading white:
867	0					0	s/^\s+//;
868							}
869
870							# Add separator and new line to existing value:
871	1					5	$ref->{$field}[-1] .= ($EOL . $_);
872							}
873							else { # yow! line not inside record!
874	0					0	return $self->error("line outside record: <$_>");
875							}
876							} continue {
877	30	50				71	defined($_ = <$fh>) or do { # unexpected EOF... forgive it?
878	0	0				0	$self->{ForgiveEOF}? last : return $self->error("unexpected EOF")};
879	30					27	chomp;
880	30	100				61	last if ($_ eq ''); # blank line means end of record
881							}
882
883							# Done!
884	3					13	$ref;
885							}
886
887							=back
888
889							=cut
890
891
892							#------------------------------------------------------------
893
894							=head1 NOTES
895
896							=head2 Under the hood
897
898							Each "refer" object has instance variables corresponding to the actual
899							field names (C<'T'>, C<'A'>, etc.). Each of these is a reference to
900							an array of the actual values.
901
902							Notice that, for maximum flexibility and consistency (but at the cost of
903							some space and access-efficiency), the semantics of "refer" records do
904							not come into play at this time: since everything resides in an array,
905							you can have as many %K, %D, etc. records as you like, and given them
906							entirely different semantics.
907
908							For example, the Library Of Boring Stuff That Everyone Reads (LOBSTER) uses
909							the unused %Y as a "year" field. The parser accomodates this
910							case by politely not choking on LOBSTER .bibs (although why you would
911							want to eat a lobster bib instead of the lobster is beyond me...).
912
913
914							=head2 Performance
915
916							Tolerable. On my 90MHz/32 MB RAM/I586 box running Linux 1.2.13 and Perl5.002,
917							it parses a typical 500 KB "refer" file (of 1600 records) as follows:
918
919							8 seconds of user time for input and no output
920							10 seconds of user time for input and "quick" output
921							16 seconds of user time for input and "safe" output
922
923							So, figure the individual speeds are:
924
925							input: 200 records ( 60 KB) per second.
926							"quick" output: 800 records (240 KB) per second.
927							"safe" output: 200 records ( 60 KB) per second.
928
929							By contrast, a C program which does the same work is about 8 times as fast.
930							But of course, the C code is 8 times as large, and 8 times as ugly... C<:-)>
931
932
933							=head2 Note to serious bib-file users
934
935							I actually do not use "refer" files for *roffing... I used them as a
936							quick-and-dirty database for WebLib, and that's where this code comes
937							from. If you're a serious user of "refer" files, and this module doesn't
938							do what you need it to, please contact me: I'll add the functionality
939							in.
940
941
942							=head1 BUGS
943
944							Some combinations of parser-options are silly.
945
946
947
948							=head1 CHANGE LOG
949
950							$Id: Refer.pm,v 1.106 1997/04/22 18:41:41 eryq Exp $
951
952							=over 4
953
954							=item Version 1.101
955
956							Initial release. Adapted from Text::Bib.
957
958							=back
959
960
961							=head1 AUTHOR
962
963							Copyright (C) 1997 by Eryq,
964							F,
965							F.
966
967
968							=head1 NO WARRANTY
969
970							This program is free software; you can redistribute it and/or modify
971							it under the terms of the GNU General Public License as published by
972							the Free Software Foundation; either version 2 of the License, or
973							(at your option) any later version.
974
975							This program is distributed in the hope that it will be useful,
976							but WITHOUT ANY WARRANTY; without even the implied warranty of
977							MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
978							GNU General Public License for more details.
979
980							For a copy of the GNU General Public License, write to the Free Software
981							Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
982
983							=cut
984
985							1;
986