File Coverage

blib/lib/HTML/EntityReference.pm
Criterion Covered Total %
statement 85 88 96.5
branch 36 40 90.0
condition 12 16 75.0
subroutine 18 18 100.0
pod 7 10 70.0
total 158 172 91.8


line stmt bran cond sub pod time code
1 2     2   51766 use 5.10.1;
  2         8  
  2         75  
2 2     2   867 use utf8;
  2         13  
  2         15  
3 2     2   51 use strict;
  2         10  
  2         70  
4 2     2   8 use warnings;
  2         3  
  2         141  
5              
6             package HTML::EntityReference;
7              
8             =head1 NAME
9              
10             HTML::EntityReference - A minimal, abstract, and reusable list of HTML entities
11              
12             =head1 VERSION
13              
14             Version 0.011
15              
16             =cut
17              
18             our $VERSION = '0.011';
19              
20             =head1 SYNOPSIS
21              
22             This is a listing of HTML character entities. It is intended to be the last time such a list is compiled into a module, being meant to be exposed and usable in any situation. I found several modules that dealt with Entities, but did not do what I needed, or were for internal use.
23              
24             The essential characteristic of this data is that "entities exist".
25              
26             The entity is nothing more than a name for a Unicode character. Everything else having to do with it is attached to the character, and should be something I can find in the Unicode database and related Unicode Perl stuff. The most fundamental thing is a map of names to code point numbers. I mean the number itself (an integer), not some string representation of the number in hex or decimal or decorated with some other escape system. From the code point value, it is a single step to get the actual character, or the formatted numeric entity, or whatever.
27              
28             You can use the supplied hash directly. Or, this module provides some simple functions that abstract the way the data is actually stored and return the common cases.
29              
30             The function calls also provide for an easy way to check multiple tables in one go. So non-standard entities recognised by some browsers or historically are documented here also.
31              
32             use HTML::EntityReference;
33             my $codepoint= HTML::EntityReference::ordinal('ldquo'); # the integer 8220
34             say "Character is known formally as ", charnames::viacode($codepoint), '.';
35             my $char= HTML::EntityReference::character('amp'); # the string '&'
36            
37             # can look up the other way too
38             my $entity= HTML::EntityReference::from_ordinal(0x2026);
39             say "You can use &$entity; on a web page." # "hellip"
40            
41             # use non-standard definitions
42             $codepoint= HTML::EntityReference::ordinal($whatsit, ':all');
43              
44             =cut
45              
46 2     2   10 use constant INVERSE => '; INVERSE';
  2         3  
  2         96  
47 2     2   9 use Carp;
  2         3  
  2         2654  
48              
49             =head1 Data Tables
50              
51             =head2 %W3C_Entities
52              
53             The package variable C<%W3C_Entities> contains the standard HTML entities as keys, and the code point (integer) as the value. The source also contains comments copied from L.
54              
55             =cut
56              
57             # see
58             our %W3C_Entities= (
59             # %HTMLlat1; Latin 1
60             nbsp => 160, # no-break space = non-breaking space,
61             iexcl => 161, # inverted exclamation mark
62             cent => 162, # cent sign
63             pound => 163, # pound sign
64             curren => 164, # currency sign
65             yen => 165, # yen sign = yuan sign
66             brvbar => 166, # broken bar = broken vertical bar
67             sect => 167, # section sign
68             uml => 168, # diaeresis = spacing diaeresis
69             copy => 169, # copyright sign
70             ordf => 170, # feminine ordinal indicator
71             laquo => 171, # left-pointing double angle quotation mark = left pointing guillemet
72             not => 172, # not sign
73             shy => 173, # soft hyphen = discretionary hyphen
74             reg => 174, # registered sign = registered trade mark sign
75             macr => 175, # macron = spacing macron = overline = APL overbar
76             deg => 176, # degree sign
77             plusmn => 177, # plus-minus sign = plus-or-minus sign
78             sup2 => 178, # superscript two = superscript digit two
79             sup3 => 179, # superscript three = superscript digit three
80             acute => 180, # acute accent = spacing acute
81             micro => 181, # micro sign, U+00B5 ISOnum
82             para => 182, # pilcrow sign = paragraph sign
83             middot => 183, # middle dot = Georgian comma
84             cedil => 184, # cedilla = spacing cedilla
85             sup1 => 185, # superscript one = superscript digit one
86             ordm => 186, # masculine ordinal indicator
87             raquo => 187, # right-pointing double angle quotation mark = right pointing guillemet
88             frac14 => 188, # vulgar fraction one quarter = fraction one quarter
89             frac12 => 189, # vulgar fraction one half = fraction one half
90             frac34 => 190, # vulgar fraction three quarters = fraction three quarters
91             iquest => 191, # inverted question mark = turned question mark
92             Agrave => 192, # latin capital letter A with grave = latin capital letter A grave
93             Aacute => 193, # latin capital letter A with acute
94             Acirc => 194, # latin capital letter A with circumflex
95             Atilde => 195, # latin capital letter A with tilde
96             Auml => 196, # latin capital letter A with diaeresis
97             Aring => 197, # latin capital letter A with ring above = latin capital letter A ring
98             AElig => 198, # latin capital letter AE = latin capital ligature AE
99             Ccedil => 199, # latin capital letter C with cedilla
100             Egrave => 200, # latin capital letter E with grave
101             Eacute => 201, # latin capital letter E with acute
102             Ecirc => 202, # latin capital letter E with circumflex
103             Euml => 203, # latin capital letter E with diaeresis
104             Igrave => 204, # latin capital letter I with grave
105             Iacute => 205, # latin capital letter I with acute
106             Icirc => 206, # latin capital letter I with circumflex
107             Iuml => 207, # latin capital letter I with diaeresis
108             ETH => 208, # latin capital letter ETH
109             Ntilde => 209, # latin capital letter N with tilde
110             Ograve => 210, # latin capital letter O with grave
111             Oacute => 211, # latin capital letter O with acute
112             Ocirc => 212, # latin capital letter O with circumflex
113             Otilde => 213, # latin capital letter O with tilde
114             Ouml => 214, # latin capital letter O with diaeresis
115             times => 215, # multiplication sign
116             Oslash => 216, # latin capital letter O with stroke = latin capital letter O slash
117             Ugrave => 217, # latin capital letter U with grave
118             Uacute => 218, # latin capital letter U with acute
119             Ucirc => 219, # latin capital letter U with circumflex
120             Uuml => 220, # latin capital letter U with diaeresis
121             Yacute => 221, # latin capital letter Y with acute
122             THORN => 222, # latin capital letter THORN
123             szlig => 223, # latin small letter sharp s = ess-zed
124             agrave => 224, # latin small letter a with grave = latin small letter a grave
125             aacute => 225, # latin small letter a with acute
126             acirc => 226, # latin small letter a with circumflex
127             atilde => 227, # latin small letter a with tilde
128             auml => 228, # latin small letter a with diaeresis
129             aring => 229, # latin small letter a with ring above = latin small letter a ring
130             aelig => 230, # latin small letter ae = latin small ligature ae
131             ccedil => 231, # latin small letter c with cedilla
132             egrave => 232, # latin small letter e with grave
133             eacute => 233, # latin small letter e with acute
134             ecirc => 234, # latin small letter e with circumflex
135             euml => 235, # latin small letter e with diaeresis
136             igrave => 236, # latin small letter i with grave
137             iacute => 237, # latin small letter i with acute
138             icirc => 238, # latin small letter i with circumflex
139             iuml => 239, # latin small letter i with diaeresis
140             eth => 240, # latin small letter eth
141             ntilde => 241, # latin small letter n with tilde
142             ograve => 242, # latin small letter o with grave
143             oacute => 243, # latin small letter o with acute
144             ocirc => 244, # latin small letter o with circumflex
145             otilde => 245, # latin small letter o with tilde
146             ouml => 246, # latin small letter o with diaeresis
147             divide => 247, # division sign
148             oslash => 248, # latin small letter o with stroke = latin small letter o slash
149             ugrave => 249, # latin small letter u with grave
150             uacute => 250, # latin small letter u with acute
151             ucirc => 251, # latin small letter u with circumflex
152             uuml => 252, # latin small letter u with diaeresis
153             yacute => 253, # latin small letter y with acute
154             thorn => 254, # latin small letter thorn
155             yuml => 255, # latin small letter y with diaeresis
156              
157             # %HTMLsymbol; Mathematical, Greek and Symbolic characters
158             # Latin Extended-B
159             fnof => 402, # latin small f with hook = function = florin
160             # Greek
161             Alpha => 913, # greek capital letter alpha
162             Beta => 914, # greek capital letter beta
163             Gamma => 915, # greek capital letter gamma
164             Delta => 916, # greek capital letter delta
165             Epsilon => 917, # greek capital letter epsilon
166             Zeta => 918, # greek capital letter zeta
167             Eta => 919, # greek capital letter eta
168             Theta => 920, # greek capital letter theta
169             Iota => 921, # greek capital letter iota
170             Kappa => 922, # greek capital letter kappa
171             Lambda => 923, # greek capital letter lambda
172             Mu => 924, # greek capital letter mu
173             Nu => 925, # greek capital letter nu
174             Xi => 926, # greek capital letter xi
175             Omicron => 927, # greek capital letter omicron
176             Pi => 928, # greek capital letter pi
177             Rho => 929, # greek capital letter rho
178             # there is no Sigmaf, and no U+03A2 character either
179             Sigma => 931, # greek capital letter sigma
180             Tau => 932, # greek capital letter tau
181             Upsilon => 933, # greek capital letter upsilon
182             Phi => 934, # greek capital letter phi
183             Chi => 935, # greek capital letter chi
184             Psi => 936, # greek capital letter psi
185             Omega => 937, # greek capital letter omega
186             alpha => 945, # greek small letter alpha
187             beta => 946, # greek small letter beta
188             gamma => 947, # greek small letter gamma
189             delta => 948, # greek small letter delta
190             epsilon => 949, # greek small letter epsilon
191             zeta => 950, # greek small letter zeta
192             eta => 951, # greek small letter eta
193             theta => 952, # greek small letter theta
194             iota => 953, # greek small letter iota
195             kappa => 954, # greek small letter kappa
196             lambda => 955, # greek small letter lambda
197             mu => 956, # greek small letter mu
198             nu => 957, # greek small letter nu
199             xi => 958, # greek small letter xi
200             omicron => 959, # greek small letter omicron
201             pi => 960, # greek small letter pi
202             rho => 961, # greek small letter rho
203             sigmaf => 962, # greek small letter final sigma
204             sigma => 963, # greek small letter sigma
205             tau => 964, # greek small letter tau
206             upsilon => 965, # greek small letter upsilon
207             phi => 966, # greek small letter phi
208             chi => 967, # greek small letter chi
209             psi => 968, # greek small letter psi
210             omega => 969, # greek small letter omega
211             thetasym => 977, # greek small letter theta symbol
212             upsih => 978, # greek upsilon with hook symbol
213             piv => 982, # greek pi symbol
214             # General Punctuation
215             bull => 8226, # bullet = black small circle,
216             # bullet is NOT the same as bullet operator, U+2219
217             hellip => 8230, # horizontal ellipsis = three dot leader
218             prime => 8242, # prime = minutes = feet
219             Prime => 8243, # double prime = seconds = inches,
220             oline => 8254, # overline = spacing overscore
221             frasl => 8260, # fraction slash
222             # Letterlike Symbols
223             weierp => 8472, # script capital P = power set = Weierstrass p
224             image => 8465, # blackletter capital I = imaginary part
225             real => 8476, # blackletter capital R = real part symbol
226             trade => 8482, # trade mark sign
227             alefsym => 8501, # alef symbol = first transfinite cardinal
228             # alef symbol is NOT the same as hebrew letter alef, U+05D0 although the same glyph could be used to depict both characters
229             # Arrows
230             larr => 8592, # leftwards arrow
231             uarr => 8593, # upwards arrow
232             rarr => 8594, # rightwards arrow
233             darr => 8595, # downwards arrow
234             harr => 8596, # left right arrow
235             crarr => 8629, # downwards arrow with corner leftwards = carriage return
236             lArr => 8656, # leftwards double arrow
237             # ISO 10646 does not say that lArr is the same as the 'is implied by' arrow but also does not have any other character for that function. So ? lArr can be used for 'is implied by' as ISOtech suggests
238             uArr => 8657, # upwards double arrow
239             rArr => 8658, # rightwards double arrow
240             # ISO 10646 does not say this is the 'implies' character but does not have another character with this function so ? rArr can be used for 'implies' as ISOtech suggests
241             dArr => 8659, # downwards double arrow
242             hArr => 8660, # left right double arrow
243             # Mathematical Operators
244             forall => 8704, # for all
245             part => 8706, # partial differential
246             exist => 8707, # there exists
247             empty => 8709, # empty set = null set = diameter
248             nabla => 8711, # nabla = backward difference
249             isin => 8712, # element of
250             notin => 8713, # not an element of
251             ni => 8715, # contains as member
252             # should there be a more memorable name than 'ni'?
253             prod => 8719, # n-ary product = product sign
254             # prod is NOT the same character as U+03A0 'greek capital letter pi' though the same glyph might be used for both
255             sum => 8721, # n-ary sumation
256             # sum is NOT the same character as U+03A3 'greek capital letter sigma' though the same glyph might be used for both
257             minus => 8722, # minus sign
258             lowast => 8727, # asterisk operator
259             radic => 8730, # square root = radical sign
260             prop => 8733, # proportional to
261             infin => 8734, # infinity
262             ang => 8736, # angle
263             and => 8743, # logical and = wedge
264             or => 8744, # logical or = vee
265             cap => 8745, # intersection = cap
266             cup => 8746, # union = cup
267             int => 8747, # integral
268             there4 => 8756, # therefore
269             sim => 8764, # tilde operator = varies with = similar to,
270             # tilde operator is NOT the same character as the tilde, U+007E, although the same glyph might be used to represent both
271             cong => 8773, # approximately equal to
272             asymp => 8776, # almost equal to = asymptotic to
273             ne => 8800, # not equal to
274             equiv => 8801, # identical to
275             le => 8804, # less-than or equal to
276             ge => 8805, # greater-than or equal to
277             sub => 8834, # subset of
278             sup => 8835, # superset of
279             # note that nsup, 'not a superset of, U+2283' is not covered by the Symbol font encoding and is not included. Should it be, for symmetry? It is in ISOamsn
280             nsub => 8836, # not a subset of
281             sube => 8838, # subset of or equal to
282             supe => 8839, # superset of or equal to
283             oplus => 8853, # circled plus = direct sum
284             otimes => 8855, # circled times = vector product
285             perp => 8869, # up tack = orthogonal to = perpendicular
286             sdot => 8901, # dot operator
287             # dot operator is NOT the same character as U+00B7 middle dot
288             # Miscellaneous Technical
289             lceil => 8968, # left ceiling = apl upstile
290             rceil => 8969, # right ceiling
291             lfloor => 8970, # left floor = apl downstile
292             rfloor => 8971, # right floor
293             lang => 9001, # left-pointing angle bracket = bra
294             # lang is NOT the same character as U+003C 'less than' or U+2039 'single left-pointing angle quotation mark'
295             rang => 9002, # right-pointing angle bracket = ket
296             # rang is NOT the same character as U+003E 'greater than' or U+203A 'single right-pointing angle quotation mark'
297             # Geometric Shapes
298             loz => 9674, # lozenge
299             # Miscellaneous Symbols
300             spades => 9824, # black spade suit
301             # black here seems to mean filled as opposed to hollow
302             clubs => 9827, # black club suit = shamrock
303             hearts => 9829, # black heart suit = valentine
304             diams => 9830, # black diamond suit
305              
306             # %HTMLspecial; markup-significant and internationalization characters
307             # C0 Controls and Basic Latin
308             quot => 34, # quotation mark
309             amp => 38, # ampersand
310             lt => 60, # less-than sign
311             gt => 62, # greater-than sign
312             # Latin Extended-A
313             OElig => 338, # latin capital ligature OE
314             oelig => 339, # latin small ligature oe
315             # ligature is a misnomer, this is a separate character in some languages
316             Scaron => 352, # latin capital letter S with caron
317             scaron => 353, # latin small letter s with caron
318             Yuml => 376, # latin capital letter Y with diaeresis
319             # Spacing Modifier Letters
320             circ => 710, # modifier letter circumflex accent
321             tilde => 732, # small tilde
322             # General Punctuation
323             ensp => 8194, # en space
324             emsp => 8195, # em space
325             thinsp => 8201, # thin space
326             zwnj => 8204, # zero width non-joiner
327             zwj => 8205, # zero width joiner
328             lrm => 8206, # left-to-right mark
329             rlm => 8207, # right-to-left mark
330             ndash => 8211, # en dash
331             mdash => 8212, # em dash
332             lsquo => 8216, # left single quotation mark
333             rsquo => 8217, # right single quotation mark
334             sbquo => 8218, # single low-9 quotation mark
335             ldquo => 8220, # left double quotation mark
336             rdquo => 8221, # right double quotation mark
337             bdquo => 8222, # double low-9 quotation mark
338             dagger => 8224, # dagger
339             Dagger => 8225, # double dagger
340             permil => 8240, # per mille sign
341             lsaquo => 8249, # single left-pointing angle quotation mark
342             # lsaquo is proposed but not yet ISO standardized
343             rsaquo => 8250, # single right-pointing angle quotation mark
344             # rsaquo is proposed but not yet ISO standardized
345             euro => 8364, # euro sign
346             );
347              
348             our %HTML5_draft;
349              
350             =head2 %HTML5_draft
351              
352             The package variable C<%HTML5_draft> contains the entities defined as part of the HTML5 standard, a work in progress. These are taken from L. This is loaded on demand, since there are over two thousand of them. So if you want to use this hash directly, be sure to call one of the functions specifying 'HTML5_draft' first.
353              
354             Unlike the existing standard HTML Entity chart, this chart contains some entries that expand to more than one code point. They can be combining characters, variation selectors, and in a couple cases really are two separate characters.
355              
356             =head2 other charts
357              
358             Others will be added.
359              
360             =head2 custom charts
361              
362             You can pass your own chart data to the various functions, to be used instead of or in addtion to the built-in charts. Do this by passing a reference to the hash as an element in the I or I list.
363              
364             In addition to adding your own custom entities, you can also duplicate existing entities in order to override what gets generated (e.g. precomposed vs decomposed form), or provide priority in inverse lookups.
365              
366             (This might work in this version but has not been tested yet)
367              
368             =cut
369            
370             ## >> Other charts will go here.
371              
372              
373             my %arg_map= (
374             HTML4 => \%W3C_Entities,
375             HTML5_draft => [ \%HTML5_draft, "HTML/Entity-HTML5_draft.pl.inc" ],
376             ':all' => [qw/ HTML4 HTML5_draft /]
377             );
378              
379              
380             =head1 Functions
381              
382             The function calls also provide for an easy way to check multiple tables in one go. They also abstract the way data is actually stored, and provide handling of simple cases, and take care of busy details that you might not have thought of like multi-valued entities.
383              
384             =head2 (parameters)
385              
386             In general, the functions take the thing to be converted as the first parameter, and can take one or two additonal optional arguments. Only the C function doesn't follow this pattern exactly, taking another parameter first.
387              
388             The second parameter specifies the chart or charts to use. This is commonly referred to as the C parameter. That's because the 3rd works the same way but specifies things to C.
389              
390             The C parameter may be a string or an array reference. The string is the name of a chart or the name of a bundle. The chart names available are C<"HTML4"> and C<"HTML5_draft">. The only bundle name available is C<":all">. Others will be added in later versions. If no parameter is given at all, it is the same as using C<"HTML4">.
391              
392             If you have more to say than just one string, you can use an array reference instead. Each element of the array can be a string as explained above. An item can also be a hash reference, which is a custom chart.
393              
394             If more than one item is given as the include parameter, they are checked in order until something is found or the list exhausted.
395              
396             The C parameter is not implemented yet.
397              
398             =cut
399              
400             sub _next_arg
401             {
402 21     21   25 my $arglist= shift;
403 21   100     59 my $arg= shift(@$arglist) // return ; # pop off next argument
404 20 100       54 return $arg if ref($arg); # user put table ref directly in list, not a name.
405 16 100       40 if ($arg =~ /^:/) {
406             # it is a name for more arguments
407 4   66     269 my $list= $arg_map{$arg} // croak "No such option $arg.";
408 3         10 unshift @$arglist, @$list;
409 3         5 $arg= shift(@$arglist);
410             }
411             # look up the argument, and load if necessary.
412 15   66     180 my $value= $arg_map{$arg} // croak "No such table $arg.";
413 14 100       35 if (ref $value eq 'ARRAY') { # as opposed to a hash
414             # it is a delay load entry
415 1         3 my ($table, $name)= @$value;
416 1 50       4474 require $name unless %$table;
417 1         11 $arg_map{$arg}= $table; # don't check again next time.
418 1         3 $value= $table;
419             }
420 14         171 return $value;
421             }
422              
423             =head2 ordinal
424              
425             Calling C<$n=HTML::EntityReference::ordinal($entity);> is simply the same as looking it up in the data hash: C<$n=$HTML::EntityReference::W3C_Entities{$entity};>. It will return the code point if the C<$entity> is listed, or C otherwise.
426              
427             The return value is normally a number, the integer value of the code point that the entity refers to. In the case of multi-valued entities, the return value is an array reference.
428              
429             =cut
430              
431             sub ordinal
432             {
433 13     13 1 2229 my ($entity, $include, $exclude)= @_;
434             # >> TODO: handle excludes
435 13 100       61 return $W3C_Entities{$entity} unless defined $include; # default meaning if no argument
436 9 100       32 $include= [ $include ] unless ref $include; # single name allowed to be given directly
437 9         17 while (my $table= _next_arg($include)) {
438 9         17 my $val= $$table{$entity};
439 9 100       50 return $val if defined $val;
440             }
441 0         0 return; # not found anywhere it looked.
442             }
443              
444             =head2 character
445              
446             This is the same as calling the built-in chr on the result of ordinal, except that if the named entity was not listed it returns C. It also takes care of entities that expand into multiple code points. For multi-valued entities, it simply produces a string with more than one character in it.
447              
448             =cut
449              
450             sub character
451             {
452 3   100 3 1 9 my $ord= ordinal (@_) // return;
453 2 100       8 if (ref $ord) { # it is a list, not a number
454 1         3 return join('', map { chr($_) } @$ord);
  2         11  
455             }
456 1         7 return chr($ord);
457             }
458              
459             =head2 hex
460              
461             This is the same as calling the C on the result of C, except that if the named entity was not listed it returns C. Note that this returns the 4 hex digits I, without any decorations or prefix. You can incorporate this into a hex notation or hex entity notation, as desired. However, that might be awkward for multi-value returns, so this function doesn't handle those. See the C function instead.
462              
463             =cut
464              
465             sub hex
466             {
467 1   50 1 1 4 my $ord= ordinal (@_) // return;
468 1 50       5 carp "multi-value entities are not handled by hex. Use format instead" if (ref $ord);
469 1         8 return sprintf ("%04x", $ord);
470             }
471              
472             =head2 format
473              
474             This takes a format string as a first argument. After that are the usual entity, include, and exclude parameters. The format string is used with C. For example, C will produce C<"≎ ̸"> in scalar context.
475              
476             For multi-value entities, it will format each code point. In scalar context, they are returned as one string with separating spaces. In list context, returns a list of formatted numbers.
477              
478             =cut
479            
480             sub format
481             {
482 2     2 1 4 my $fmt= shift;
483 2   50     7 my $ord= ordinal (@_) // return;
484 2 50       8 unless (ref $ord) {
485 0         0 return sprintf ($fmt, $ord);
486             }
487 2         4 my @results= map { sprintf ($fmt, $_) } @$ord;
  4         17  
488 2 100       10 return @results if wantarray;
489 1         11 return join (' ', @results);
490             }
491            
492              
493             =head2 valid
494              
495             This returns a truth value indicating whether the specified entity name is listed.
496              
497             =cut
498              
499             sub valid
500             {
501 4     4 1 9 my ($entity, $include, $exclude)= @_;
502             # >> TODO: handle excludes
503 4 100       29 return exists $W3C_Entities{$entity} unless defined $include; # default meaning if no argument
504 1 50       5 $include= [ $include ] unless ref $include; # single name allowed to be given directly
505 1         3 while (my $table= _next_arg($include)) {
506 2 100       12 return 1 if exists $$table{$entity};
507             }
508 0         0 return; # not found anywhere it looked.
509             }
510              
511             # be sure this is performed in a consistent manner between building and looking up
512             sub array_key
513             {
514 96     96 0 135 return join (' ', map { +$_} (@_) );
  191         453  
515             }
516              
517             sub invert_table
518             {
519 2     2 0 3 my $tab= shift;
520 2         4 my %result;
521 2         14 while (my ($key, $value)= each %$tab) {
522 2377 100       3726 my $x= ref($value) ? array_key (@$value) : +$value;
523 2377         8845 $result{$x}= $key;
524             }
525 2         10 return \%result;
526             }
527              
528             sub get_reverse
529             {
530 7     7 0 9 my $table= shift;
531 7         14 my $inverse= $$table{+INVERSE};
532 7 100       28 $inverse= $$table{+INVERSE}= invert_table ($table) unless defined $inverse;
533 7         12 return $inverse;
534             }
535              
536              
537             =head2 from_... Inverse Functions
538              
539             Since Perl doesn't provide for overloading in the C++ sense, we need to clearly distinguish whether you are passing in a code point integer, or the character itself, or whatever other forms might be available. So the inverse functions match the names of the primary functions with the additon of C in front.
540              
541             The inverse lookup table is not created until it is needed, the first time this function is called. The inverse table is stored inside the main table, under a key whose name begins with a "C<;>" character. Because entities are normally parsed out as terminating with a semicolon, you won't have an entity with a semicolon I the name! So names beginning with a semicolon are used for "internal use" and if you access the charts directly (or use your custom charts), ignore these.
542              
543             =head2 from_ordinal
544              
545             If the argument contains more than one code point, it will try to match a multi-valued entity exactly. It will not take prefixes, change normalizations, or anything like that. You can pass an integer or an array ref containing integers to this function.
546              
547             If multiple entities are defined that map to the same code point(s), it will simply return one of them essentially at random. There is no way to know which one is "better" for your purpose. However, it does check the tables in the order specified by the second argument, so you can put a custom table first that includes the answers you specifically want.
548              
549             =cut
550              
551             sub from_ordinal
552             {
553 6     6 1 13 my ($codepoint, $include, $exclude)= @_;
554 2     2   1654 use integer;
  2         21  
  2         10  
555 6 100       23 my $key= ref($codepoint) ? array_key(@$codepoint) : 0+$codepoint;
556             # >> TODO: handle exclude option
557 6   100     34 $include //= [ \%W3C_Entities ];
558 6 100       18 $include= [ $include ] unless ref $include;
559 6         18 while (my $table= _next_arg($include)) {
560 7         15 $table= get_reverse ($table);
561 7         16 my $result= $$table{$key};
562 7 100       60 return $result if defined $result;
563             }
564             }
565              
566              
567             =head2 from_character
568              
569             This is the inverse of C. It will return undef if no entity matches the argument. See notes on from_ordinal.
570              
571             =cut
572            
573             sub from_character
574             {
575 3     3 1 7 my $char= shift;
576 3 100       17 my $ord= (length($char) == 1) ? ord($char) : [ map{ord($_)}(split('',$char)) ];
  2         8  
577 3         9 return from_ordinal ($ord, @_);
578             }
579            
580             return 1; # module loaded OK.
581              
582              
583             =head1 AUTHOR
584              
585             John M. Dlugosz, C<< >>
586              
587             =head1 BUGS
588              
589             Please report any bugs or feature requests to C, or through
590             the web interface at L. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
591              
592              
593              
594              
595             =head1 SUPPORT
596              
597             You can find documentation for this module with the perldoc command.
598              
599             perldoc HTML::EntityReference
600              
601              
602             You can also look for information at:
603              
604             =over 4
605              
606             =item * RT: CPAN's request tracker (report bugs here)
607              
608             L
609              
610             =item * AnnoCPAN: Annotated CPAN documentation
611              
612             L
613              
614             =item * CPAN Ratings
615              
616             L
617              
618             =item * Search CPAN
619              
620             L
621              
622             =back
623              
624              
625             =head1 ACKNOWLEDGEMENTS
626              
627             Thanks to Zsbán Ambrus for suggesting the handling of multiple charts. That pretty much made the module what it became.
628              
629             Thanks to those on PerlMonks who chatted with me regarding the specifications and ideas.
630              
631             =head1 LICENSE AND COPYRIGHT
632              
633             Copyright 2011 John M. Dlugosz.
634              
635             This program is free software; you can redistribute it and/or modify it
636             under the terms of either: the GNU General Public License as published
637             by the Free Software Foundation; or the Artistic License.
638              
639             See http://dev.perl.org/licenses/ for more information.
640              
641              
642             =cut
643              
644             1; # End of HTML::EntityReference