File Coverage

blib/lib/Text/BibTeX/NameFormat.pm
Criterion Covered Total %
statement 32 32 100.0
branch 4 6 66.6
condition 4 11 36.3
subroutine 8 8 100.0
pod 4 4 100.0
total 52 61 85.2


line stmt bran cond sub pod time code
1             # ----------------------------------------------------------------------
2             # NAME : BibTeX/NameFormat.pm
3             # CLASSES : Text::BibTeX::NameFormat
4             # RELATIONS :
5             # DESCRIPTION: Provides a way to format already-parsed BibTeX-style
6             # author names. (The parsing is done by the
7             # Text::BibTeX:Name class.)
8             # CREATED : Nov 1997, Greg Ward
9             # MODIFIED :
10             # VERSION : $Id$
11             # COPYRIGHT : Copyright (c) 1997-2000 by Gregory P. Ward. All rights
12             # reserved.
13             #
14             # This file is part of the Text::BibTeX library. This
15             # library is free software; you may redistribute it and/or
16             # modify it under the same terms as Perl itself.
17             # ----------------------------------------------------------------------
18              
19             package Text::BibTeX::NameFormat;
20              
21             require 5.004;
22              
23 13     13   90 use strict;
  13         27  
  13         386  
24 13     13   65 use Carp;
  13         32  
  13         700  
25 13     13   85 use vars qw'$VERSION';
  13         34  
  13         6456  
26             $VERSION = 0.87;
27              
28             =head1 NAME
29              
30             Text::BibTeX::NameFormat - format BibTeX-style author names
31              
32             =head1 SYNOPSIS
33              
34             use Text::BibTeX::NameFormat;
35              
36             $format = Text::BibTeX::NameFormat->($parts, $abbrev_first);
37              
38             $format->set_text ($part,
39             $pre_part, $post_part,
40             $pre_token, $post_token);
41              
42             $format->set_options ($part, $abbrev, $join_tokens, $join_part
43              
44             ## Uses the encoding/binmode and normalization form stored in $name
45             $formatted_name = $format->apply ($name);
46              
47             =head1 DESCRIPTION
48              
49             After splitting a name into its components parts (represented as a
50             C object), you often want to put it back together
51             again as a single string formatted in a consistent way.
52             C provides a very flexible way to do this,
53             generally in two stages: first, you create a "name format" which
54             describes how to put the tokens and parts of any name back together, and
55             then you apply the format to a particular name.
56              
57             The "name format" is encapsulated in a C
58             object. The constructor (C) includes some clever behind-the-scenes
59             trickery that means you can usually get away with calling it alone, and
60             not need to do any customization of the format object. If you do need
61             to customize the format, though, the C and C
62             methods provide that capability.
63              
64             Note that C is a fairly direct translation of
65             the name-formatting C interface in the B library. This manual
66             page is meant to provide enough information to use the Perl class, but
67             for more details and examples, consult L.
68              
69             =head1 CONSTANTS
70              
71             Two enumerated types for dealing with names and name formatting have
72             been brought from C into Perl. In the B documentation, you'll
73             see references to C and C. The former lists
74             the four "parts" of a BibTeX name: first, von, last, and jr; its values
75             (in both C and Perl) are C, C, C, and
76             C. The latter lists the ways in which C (the
77             C function that corresponds to C's C
78             method) can join adjacent tokens together: C, C,
79             C, and C. Both sets of values may be
80             imported from the C module, using the import tags
81             C and C. For instance:
82              
83             use Text::BibTeX qw(:nameparts :joinmethods);
84             use Text::BibTeX::Name;
85             use Text::BibTeX::NameFormat;
86              
87             The "name part" constants are used to specify surrounding text or
88             formatting options on a per-part basis: for instance, you can supply the
89             "pre-token" text, or the "abbreviate" flag, for a single part without
90             affecting other parts. The "join methods" are two of the three
91             formatting options that you can set for a part: you can control how to
92             join the individual tokens of a name (C<"JR Smith">, or C<"J R Smith">,
93             or C<"J~R Smith">, and you can control how the final token of one part
94             is joined to the next part (C<"la Roche"> versus C<"la~Roche">).
95              
96             =head1 METHODS
97              
98             =over 4
99              
100             =item new(PARTS, ABBREV_FIRST)
101              
102             Creates a new name format, with the two most common customizations: which
103             parts to include (and in what order), and whether to abbreviate the first
104             name. PARTS should be a string with at most four characters, one representing
105             each part that you want to occur in a formatted name (defaults to C<"fvlj">).
106             For example, C<"fvlj"> means to format names in "first von last jr" order,
107             while C<"vljf"> denotes "von last jr first." ABBREV_FIRST is just a boolean
108             value: false to print out the first name in full, and true to abbreviate it
109             with periods after each token and discretionary ties between tokens (defaults
110             to false). All intra- and inter-token punctuation and spacing is independently
111             controllable with the C and C methods, although these
112             will rarely be necessary---sensible defaults are chosen for everything, based
113             on the PARTS and ABBREV_FIRST values that you supply. See the description of
114             C in L for full details of the
115             choices made.
116              
117             =cut
118              
119             sub new
120             {
121 29     29 1 710 my ($class, $parts, $abbrev_first) = @_;
122              
123 29   50     107 $parts ||= "fvlj";
124 29 100       85 $abbrev_first = defined($abbrev_first)? $abbrev_first : 0;
125              
126 29 50       201 die unless $parts =~ /^[fvlj]{1,4}$/;
127              
128 29   33     94 $class = ref ($class) || $class;
129 29         68 my $self = bless {}, $class;
130 29         147 $self->{_cstruct} = create ($parts, $abbrev_first);
131 29         84 $self;
132             }
133              
134              
135             sub DESTROY
136             {
137 29     29   4291 my $self = shift;
138             free ($self->{'_cstruct'})
139 29 50       175 if defined $self->{'_cstruct'};
140             }
141              
142              
143             =item set_text (PART, PRE_PART, POST_PART, PRE_TOKEN, POST_TOKEN)
144              
145             Allows you to customize some or all of the surrounding text for a single
146             name part. Every name part has four possible chunks of text that go
147             around or within it: before/after the part as a whole, and before/after
148             each token in the part. For instance, if you are abbreviating first
149             names and wish to control the punctuation after each token in the first
150             name, you would set the "post token" text:
151              
152             $format->set_text ('first', undef, undef, undef, '');
153              
154             would set the post-token text to the empty string, resulting in names
155             like C<"J R Smith">. (Normally, abbreviated first names will have a
156             period after each token: C<"J. R. Smith">.) Note that supplying
157             C for the other three values leaves them unchanged.
158              
159             See L for full information on formatting names.
160              
161             =cut
162              
163             sub set_text
164             {
165 11     11 1 30 my ($self, $part, $pre_part, $post_part, $pre_token, $post_token) = @_;
166              
167             # Engage in a little conspiracy with the XS code (_set_text) and the
168             # underlying C function (bt_set_format_text) here. In particular,
169             # neither of those functions copy the strings we pass in here -- they
170             # just copy the C pointers. Ultimately, those refer back to the Perl
171             # strings that we're passing in now. Thus, if those Perl strings
172             # were to go away (ref count drop to zero), then the C code might
173             # have dangling pointers to free'd strings -- oops! The solution is
174             # to keep references of those Perl strings here, so that their ref
175             # count can never drop to zero without our assent. Every time
176             # set_text is called, the old references are overridden (ref count
177             # drops), and when the NameFormat object is destroyed, we destroy
178             # them (ref count drops). Other than that, there will always be some
179             # reference to the strings passed in to set_text.
180              
181             # XXX what if some of these are undef?
182              
183 11         51 $self->{'textrefs'} = [\$pre_part, \$post_part, \$pre_token, \$post_token];
184              
185 11         90 _set_text ($self->{'_cstruct'},
186             $part,
187             $pre_part,
188             $post_part,
189             $pre_token,
190             $post_token);
191 11         26 1;
192             }
193              
194              
195             =item set_options (PART, ABBREV, JOIN_TOKENS, JOIN_PART)
196              
197             Allows further customization of a name format: you can set the
198             abbreviation flag and the two token-join methods. Alas, there is no
199             mechanism for leaving a value unchanged; you must set everything with
200             C.
201              
202             For example, let's say that just dropping periods from abbreviated
203             tokens in the first name isn't enough; you I want to save
204             space by jamming the abbreviated tokens together: C<"JR Smith"> rather
205             than C<"J R Smith"> Assuming the two calls in the above example have
206             been done, the following will finish the job:
207              
208             $format->set_options (BTN_FIRST,
209             1, # keep same value for abbrev flag
210             BTJ_NOTHING, # jam tokens together
211             BTJ_SPACE); # space after final token of part
212              
213             Note that we unfortunately had to know (and supply) the current values
214             for the abbreviation flag and post-part join method, even though we were
215             only setting the intra-part join method.
216              
217             =cut
218              
219             sub set_options
220             {
221 14     14 1 39 my ($self, $part, $abbrev, $join_tokens, $join_part) = @_;
222              
223 14         42 _set_options ($self->{'_cstruct'}, $part,
224             $abbrev, $join_tokens, $join_part);
225 14         30 1;
226             }
227              
228              
229             =item apply (NAME)
230              
231             Once a name format has been created and customized to your heart's
232             content, you can use it to format any number of names using the C
233             method. NAME must be a C object (i.e., a pre-split
234             name); C returns a string containing the parts of the name
235             formatted according to the C structure it is
236             called on.
237              
238             =cut
239              
240             sub apply
241             {
242 47     47 1 119 my ($self, $name) = @_;
243              
244 47   33     116 my $name_struct = $name->{'_cstruct'} ||
245             croak "invalid Name object: no C structure";
246 47   33     95 my $format_struct = $self->{'_cstruct'} ||
247             croak "invalid NameFormat object: no C structure";
248            
249 47         249 my $ans = format_name ($name_struct, $format_struct);
250              
251 47         131 $ans = Text::BibTeX->_process_result($ans, $name->{binmode}, $name->{normalization});
252            
253 47         368 return $ans;
254             }
255              
256             =back
257              
258             =head1 EXAMPLES
259              
260             Although the process of splitting and formatting names may sound
261             complicated and convoluted from reading the above (along with
262             L), it's actually quite simple. There are really
263             only three steps to worry about: split the name (create a
264             C object), create and customize the format
265             (C object), and apply the format to the name.
266              
267             The first step is covered in L; here's a brief
268             example:
269              
270             $orig_name = 'Charles Louis Xavier Joseph de la Vall{\'e}e Poussin';
271             $name = Text::BibTeX::Name->new($orig_name);
272              
273             The various parts of the name can now be accessed through
274             C methods; for instance C<$name-Epart('von')>
275             returns the list C<("de","la")>.
276              
277             Creating the name format is equally simple:
278              
279             $format = Text::BibTeX::NameFormat->new('vljf', 1);
280              
281             creates a format that will print the name in "von last jr first" order,
282             with the first name abbreviated. And for no extra charge, you get the
283             right punctuation at the right place: a comma before any `jr' or `first'
284             tokens, and periods after each `first' token.
285              
286             For instance, we can perform no further customization on this format,
287             and apply it immediately to C<$name>. There are in fact two ways to do
288             this, depending on whether you prefer to think of it in terms of
289             "Applying the format to a name" or "formatting a name". The first is
290             done with C's C method:
291              
292             $formatted_name = $format->apply ($name);
293              
294             while the second uses C's C method:
295              
296             $formatted_name = $name->format ($format);
297              
298             which is just a wrapper around C. In
299             either case, the result with the example name and format shown is
300              
301             de~la Vall{\'e}e~Poussin, C.~L. X.~J.
302              
303             Note the strategic insertion of TeX "ties" (non-breakable spaces) at
304             sensitive spots in the name. (The exact rules for insertion of
305             discretionary ties are given in L.)
306              
307             =head1 SEE ALSO
308              
309             L, L, L.
310              
311             =head1 AUTHOR
312              
313             Greg Ward
314              
315             =head1 COPYRIGHT
316              
317             Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file
318             is part of the Text::BibTeX library. This library is free software; you
319             may redistribute it and/or modify it under the same terms as Perl itself.
320              
321             =cut
322              
323              
324             1;
325