File Coverage

blib/lib/Roman/Unicode.pm

Criterion	Covered	Total	%
statement	94	102	92.1
branch	28	30	93.3
condition	9	9	100.0
subroutine	21	22	95.4
pod	8	8	100.0
total	160	171	93.5

line	stmt	bran	cond	sub	pod	time	code
1	5			5		111077	use utf8;
	5					36
	5					26
2	5			5		208	use 5.014;
	5					18
3
4							=encoding utf8
5
6							=head1 NAME
7
8							Roman::Unicode - Make roman numerals, using the Unicode characters for them
9
10							=head1 SYNOPSIS
11
12							use Roman::Unicode qw( to_roman is_roman to_perl );
13
14							my $perl_number = to_perl( $roman ) if is_roman( $roman );
15							my $roman_number = to_roman( $arabic );
16
17							=head1 DESCRIPTION
18
19							I made this module as a way to demonstrate various Unicode things without
20							mixing up natural language stuff. Surprisingly, roman numerals can do quite
21							a bit with that. You'll have to read the source to see it in action.
22
23							There are many fancy characters in this documentation, so you need a good
24							font that has the right glyphs. The Symbola font is a good one:
25							http://users.teilar.gr/~g1951d/
26
27							=head2 Functions
28
29							=over 4
30
31							=item is_roman( STRING )
32
33							Returns true if the string looks like a valid roman numeral. This
34							works with either the ASCII version or the ones using the characters
35							in the U+2160 to U+2188 range. You cannot mix the uppercase and lowercase
36							numerals.
37
38							=item to_perl( ROMAN )
39
40							If the argument is a valid roman numeral, C returns the Perl
41							number. Otherwise, it returns nothing.
42
43							=item to_roman( PERL_NUMBER )
44
45							If the argument is a valid Perl number, even if it is a string,
46							C returns the roman numeral representation. This uses the
47							characters in the U+2160 to U+2188 range.
48
49							If the number cannot be represented as roman numerals, this returns
50							nothing. Note that 0 doesn't have a roman numeral representation.
51
52							If you want the lowercase version, you can use C on the result.
53							However, some of the roman numerals don't have lowercase versions.
54
55							=item to_ascii( ROMAN )
56
57							If the argument is a valid roman numeral, it returns an ASCII
58							representation of it. Most of the numeral code points have compatible
59							decompositions, so the first step uses NFKD decomposition. For other
60							characters, it uses ASCII art representations:
61
62							Roman ASCII art
63							------ ----------
64							ↁ \|)
65							ↂ ((\|))
66							ↈ (((\|)))
67							ↇ \|))
68
69							=back
70
71							=head2 Case mapping
72
73							As a demonstration of case mapping, I supply one function that uses
74							L. You can lexically override the case-mapping functions
75							as described in that module's documentation.
76
77							=over 4
78
79							=item to_roman_lower
80
81							A subroutine you can use with C. It's a bit more special
82							because it turns the higher magnitude characters into ASCII versions. That
83							means that the return value might not be a valid according to C. It
84							returns nothing if the input isn't a valid Roman numeral string.
85
86							You can also use this as a stand-alone function instead of C. That's the
87							smart way to do it, but then you don't get to play with C.
88
89							=back
90
91							=head2 User-defined properties
92
93							Perl lets you define your own properties, as documented in L. This
94							module defines several.
95
96							=over 4
97
98							=item IsRoman
99
100							The C property is a combination of C and
101							C.
102
103							=item IsUppercaseRoman
104
105							The C property matches these code points:
106
107							Ⅰ U+2160 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ
108							Ⅴ U+2164 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ
109							Ⅹ U+2169 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴛᴇɴ
110							Ⅼ U+216C ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪꜰᴛʏ
111							Ⅽ U+216D ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ʜᴜɴᴅʀᴇᴅ
112							Ⅾ U+216E ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ ʜᴜɴᴅʀᴇᴅ
113							Ⅿ U+216F ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ᴛʜᴏᴜsᴀɴᴅ
114							ↁ U+2181 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ ᴛʜᴏᴜsᴀɴᴅ
115							ↂ U+2182 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴛᴇɴ ᴛʜᴏᴜsᴀɴᴅ
116							ↇ U+2187 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪꜰᴛʏ ᴛʜᴏᴜsᴀɴᴅ
117							ↈ U+2188 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ʜᴜɴᴅʀᴇᴅ ᴛʜᴏᴜsᴀɴᴅ
118
119							This excludes the other Roman numeral code points, such as Ⅻ (U+216B, ʀᴏᴍᴀɴ
120							ɴᴜᴍᴇʀᴀʟ ᴛᴡᴇʟᴠᴇ) since they are not designed to be part of larger strings of
121							Roman numerals.
122
123							=item IsLowercaseRoman
124
125							The C is the set of lowercase code points derived from the
126							set of code points in C. It checks each code point in
127							C and checks the Unicode Character Database (UCD) through
128							L to see if it has a lowercase mapping. If there is a lowercase
129							mapping, it makes it part of this property.
130
131							=back
132
133							=head1 LIMITATIONS
134
135							By using just the defined roman numerals characters in the Unicode Character
136							Set, you're limited to numbers less than 400,000 (although you could make
137							ↈↈↈↈ if you wanted, since that's not unheard of).
138
139							=head1 AUTHOR
140
141							brian d foy C<< >>
142
143							This module started with the Roman module, credited to:
144
145							OZAWA Sakuro C<< >> 1995-1997
146
147							Alexandr Ciornii, C<< >> 2007
148
149							=head1 COPYRIGHT
150
151							Copyright © 2011-2021, brian d foy .
152
153							You can use this module under the terms of Artistic License 2.0.
154
155							=cut
156
157							package Roman::Unicode {
158	5			5		28	use feature qw(unicode_strings);
	5					9
	5					556
159
160	5			5		33	use strict;
	5					18
	5					105
161	5			5		33	use warnings;
	5					10
	5					178
162	5			5		509	use open IO => ':utf8';
	5					1251
	5					43
163
164	5			5		475	use Exporter 'import';
	5					10
	5					345
165							our @EXPORT_OK = qw( is_roman to_perl to_roman to_ascii );
166							our $VERSION = '1.032_02';
167
168	5			5		5073	use Unicode::UCD;
	5					235320
	5					315
169	5			5		42	use Unicode::Normalize qw(NFKD);
	5					12
	5					1210
170
171							# I'm specifically not using the characters for the other roman numerals
172							# because those are meant to stand alone, as they might in a clock face
173							our %valid_roman = map { $_, 1 } (
174							# the capitals U+2160 to U+216F, U+2180 to U+2182, U+2187 to U+2188
175							qw(Ⅰ Ⅴ Ⅹ Ⅼ Ⅽ Ⅾ Ⅿ ↁ ↂ ↇ ↈ ),
176							# the lowercase U+2170 to U+217f
177							qw(ⅰ ⅴ ⅹ ⅼ ⅽ ⅾ ⅿ),
178							# the ASCII
179							qw(I V X L C D M),
180							qw(i v x l c d m),
181
182							);
183
184							our %roman2arabic = qw(
185							Ⅰ 1 Ⅴ 5 Ⅹ 10
186							Ⅼ 50 Ⅽ 100 Ⅾ 500 Ⅿ 1000 ↁ 5000 ↂ 10000 ↇ 50000 ↈ 100000
187
188							ⅰ 1 ⅴ 5 ⅹ 10
189							ⅼ 50 ⅽ 100 ⅾ 500 ⅿ 1000
190							);
191
192	28			28		170	sub _get_chars { my @chars = $_[0] =~ /(\X)/ug }
193
194	57			57		380	sub _highest_value { (sort { $a <=> $b } values %roman2arabic)[-1] }
	3249					3363
195
196							sub is_roman($) {
197	122	100		122	1	34516	$_[0] =~ / \A \p{IsUppercaseRoman}+ \z /x
198							or
199							$_[0] =~ / \A \p{IsLowercaseRoman}+ \z /x
200							}
201
202							sub to_perl($) { # Stolen from Roman.pm, mostly
203	34	100		34	1	69	is_roman $_[0] or return;
204	28					56	my($last_digit) = _highest_value();
205	28					37	my($arabic);
206
207	28					53	foreach my $char ( _get_chars( $_[0] ) ) {
208	71					106	my $digit = $roman2arabic{$char};
209	71	100				115	$arabic -= 2 * $last_digit if $last_digit < $digit;
210	71					99	$arabic += ($last_digit = $digit);
211							}
212
213	28					116	$arabic;
214							}
215
216	0					0	BEGIN {
217
218	5			5		46	my %roman_digits = qw(
219							1 ⅠⅤ
220							10 ⅩⅬ
221							100 ⅭⅮ
222							1000 Ⅿↁ
223							10000 ↂↇ
224							100000 ↈↈↈↈ
225							);
226
227	5					49	my @figure = reverse sort keys %roman_digits;
228	5					1697	$roman_digits{$_} = [split(//, $roman_digits{$_}, 2)] foreach @figure;
229
230							sub to_roman($) { # stolen from Roman.pm, mostly
231	34			34	1	59	my( $arg ) = @_;
232
233							{
234	5			5		2521	no warnings 'numeric';
	5					11
	5					1508
	34					43
235	34	100	100			134	0 < $arg and $arg < 4 * _highest_value() or return;
236							}
237
238	28					66	my($x, $roman) = ( '', '' );
239	28					43	foreach my $figure ( @figure ) {
240	168					221	my( $digit, $i, $v ) = (int( $arg/$figure ), @{$roman_digits{$figure}});
	168					345
241
242	168					196	$roman .= do {
243	168	100	100			567	if( 1 <= $digit and $digit <= 3 ) { $i x $digit }
	16	100	100			35
		100
		100
		100
244	5					10	elsif( $digit == 4 ) { "$i$v" }
245	6					11	elsif( $digit == 5 ) { $v }
246	3					9	elsif( 6 <= $digit and $digit <= 8 ) { $v . $i x ($digit - 5) }
247	10					20	elsif( $digit == 9 ) { "$i$x" }
248							};
249
250	168					215	$arg -= $digit * $figure;
251	168					210	$x = $i;
252							}
253
254	28					220	$roman;
255							}
256							}
257
258							sub to_ascii {
259	27			27	1	731	my( $roman ) = @_;
260	27	100				61	return unless is_roman( $roman );
261
262	22					123	$roman = Unicode::Normalize::NFKD( $roman );
263
264	22					61	$roman =~ s/ↁ/\|))/g;
265	22					47	$roman =~ s/ↂ/((\|))/g;
266	22					45	$roman =~ s/ↈ/(((\|)))/g;
267	22					38	$roman =~ s/ↇ/\|)))/g;
268
269	22					93	$roman;
270							}
271
272							sub IsRoman {
273	3			3	1	33826	IsUppercaseRoman() . IsLowercaseRoman()
274							}
275
276							sub IsUppercaseRoman {
277	11			11	1	2352	return <<"CODE_NUMBERS";
278							2160
279							2164
280							2169
281							216C 216F
282							2181 2182
283							2187 2188
284							CODE_NUMBERS
285							}
286
287							sub IsLowercaseRoman {
288	8			8	1	2125	state $string;
289	8	100				38	return $string if defined $string;
290
291	3					7	my @codes = ();
292
293	3					8	my $uppers = IsUppercaseRoman();
294	3			3		19	open my $string_fh, '<', \ $uppers;
	3					7
	3					18
	3					99
295	3					2360	while( my $line = <$string_fh> ) {
296	18					29	my @n = map { hex } map { m/(\p{HexDigit}+)/g } $line;
	27					69
	18					82
297	18	100				43	if( @n == 1 ) { push @codes, $n[0] }
	9					43
298	18	100				50	if( @n == 2 ) { push @codes, $n[0] .. $n[1] };
	9					39
299							}
300
301	21					39	my @lowers = map { hex } map {
302	3					6	my $char_info = Unicode::UCD::charinfo( $_ );
	33					95
303	33	100				766225	$char_info->{lower} ? $char_info->{lower} : ();
304							} @codes;
305
306							$string = join "\n", map {
307	3					11	sprintf( '%04X', $_ )
	21					53
308							} @lowers;
309
310	3					44	$string .= "\n";
311							}
312
313							# Use this with Unicode::Casing, or not
314							sub to_roman_lower {
315	0	0		0	1		return unless &is_roman;
316
317	0						my $lower = CORE::lc( $_[0] );
318
319	0						$lower =~ s/ↁ/\|)/g; # ↁ U+2181
320	0						$lower =~ s/ↂ/((\|))/g; # ↂ U+2182
321	0						$lower =~ s/ↇ/\|))/g; # ↇ U+2187
322	0						$lower =~ s/ↈ/(((\|)))/g; # ↈ U+2188
323
324	0						return $lower;
325							}
326							}
327
328							1;