File Coverage

blib/lib/Pod/Cats.pm

Criterion	Covered	Total	%
statement	104	140	74.2
branch	28	42	66.6
condition	13	23	56.5
subroutine	22	30	73.3
pod	10	10	100.0
total	177	245	72.2

line	stmt	bran	cond	sub	pod	time	code
1							package Pod::Cats;
2
3	5			5		96741	use warnings;
	5					8
	5					146
4	5			5		23	use strict;
	5					10
	5					94
5	5			5		89	use 5.010;
	5					19
6
7	5			5		2674	use Pod::Cats::Parser::MGC;
	5					17
	5					173
8	5			5		38	use List::Util qw(min max);
	5					9
	5					523
9	5			5		26	use Carp;
	5					9
	5					289
10	5			5		4128	use String::Util qw/crunch/;
	5					17846
	5					8244
11
12							=head1 NAME
13
14							Pod::Cats - The POD-like markup language written for podcats.in
15
16							=head1 VERSION
17
18							Version 0.05
19
20							=head1 DESCRIPTION
21
22							POD is an expressive markup language - like Perl is an expressive programming
23							language - and for a plain text file format there is little finer. Pod::Cats is
24							an extension of the POD semantics that adds more syntax and more flexibility to
25							the language.
26
27							Pod::Cats is designed to be extended and doesn't implement any default
28							commands or entities.
29
30							=head1 SYNTAX
31
32							Pod::Cats syntax borrows ideas from POD and adds its own.
33
34							A paragraph is any block of text delimited by blank lines (whitespace ignored).
35							This is the same as POD, and basically allows you to use hard word wrapping in
36							your markup without having to join them all together for output later.
37
38							There are three command paragraphs, which are defined by their first character.
39							This character must be in the first column; whitespace at the start of a
40							paragraph is syntactically relevant.
41
42							=over 4
43
44							=item C<=COMMAND CONTENT>
45							X
46
47							A line beginning with the C<=> symbol denotes a single I. Usually this
48							will be some sort of header, perhaps the equivalent of a C<< >>, something
49							like that. It is roughly equivalent to the self-closing tag in XML. B
50							is just text that may or may not be present. The relationship of B to
51							the B is for you to define, as is the meaning of B.
52
53							When a C<=COMMAND> block is completed, it is passed to L.
54
55							=item C<+NAME CONTENT>
56							X
57
58							A line beginning with C<+> opens a named block; its name is B. Similar to
59							C<=COMMAND>, the B is arbitrary, and its relationship to the B of
60							the block is up to you.
61
62							When this is encountered you are invited to L.
63
64							=item C<-NAME>
65							X
66
67							A line beginning with C<-> is the end of the named block previously started.
68							These must match in reverse order to the C<+> block with the matching B -
69							basically the same as XML's pairs. It is passed to L,
70							and unlike the other two command paragraphs it accepts no content.
71
72							=back
73
74							Then there are two types of text paragraph, for which the text is not
75							syntactically relevant but whitespace still is:
76
77							=over 4
78
79							=item Verbatim paragraphs
80
81							A line whose first character is whitespace is considered verbatim. No removal of
82							whitespace is done to the rest of the paragraph if the first character is
83							whitespace; all your text is repeated verbatim, hence the name
84
85							The verbatim paragraph continues until the first non-verbatim paragraph is
86							encountered. A blank line is no longer considered to end the paragraph.
87							Therefore, two verbatim paragraphs can only be separated by a non-verbatim
88							paragraph with non-whitespace content. The special formatting code C<< ZZ<><> >>
89							can be used on its own to separate them with zero-width content.
90
91							All lines in the verbatim paragraph will have their leading whitespace removed.
92							This is done intelligently: the I amount of leading whitespace found on
93							any line is removed from all lines. This allows you to indent other lines (even
94							the first one) relative to the syntactic whitespace that defines the verbatim
95							paragraph without your indentation being parsed out.
96
97							L are not parsed in verbatim paragraphs, as expected.
98
99							When a verbatim paragraph has been collated, it is passed to L.
100
101							=item Paragraphs
102
103							Everything that doesn't get caught by one of the above rules is deemed to be a
104							plain text paragraph. As with all paragraphs, a single line break is removed by
105							the parser and a blank line causes the paragraph to be processed. It is passed
106							to L.
107
108							=back
109
110							And finally the inline formatting markup, entities.
111
112							=over
113
114							=item C<< XZ<><> >>
115							X X
116
117							An entity is defined as a capital letter followed by a delimiter that is
118							repeated n times, then any amount of text up to a matching quantity of a
119							balanced delimiter.
120
121							In normal POD the only delimiter is C<< < >>, so entities have the format C<<
122							XZ<><> >>; except that the opening delimiter may be duplicated as long as the
123							closing delimiter matches, allowing you to put the delimiter itself inside the
124							entity: C<<< XZ<><<>> >>>; in Pod::Cats you can use any delimiter, removing the
125							requirement to duplicate it at all: C<< C[ XZ<><> ] >>.
126
127							Once an entity has begun, nested entities are only considered if the delimiters
128							are the same as those used for the outer entity: C<< B[ I[bold-italic] ] >>;
129							C<< B[IZ<>] >>.
130
131							Apart from the special entity C<< ZZ<><> >>, the letter used for the entity has
132							no inherent meaning to Pod::Cats. The parsed entity is provided to
133							L. C<< ZZ<><> >> retains its meaning from POD, which is to be a
134							zero-width 'divider' to break up things that would otherwise be considered
135							syntax. You are not given C<< ZZ<><> >> to handle, and C<< ZZ<><> >> itself will
136							produce undef if it is the only content to an element. A paragraph comprising solely
137							C<< ZZ<><> >> will never generate a parsed paragraph; it will be skipped.
138
139							=back
140
141							=head1 METHODS
142
143							=cut
144
145							our $VERSION = '0.07';
146
147							=head2 new
148
149							Create a new parser. Options are provided as a hashref, but there is currently
150							only one:
151
152							=over
153
154							=item delimiters
155
156							A string containing delimiters to use. Bracketed delimiters will be balanced;
157							other delimiters will simply be used as-is. This echoes the delimiter philosophy
158							of Perl syntax such as regexes and C. The string should be all the possible
159							delimiters, listed once each, and only the opening brackets of balanced pairs.
160
161							The default is C<< '<' >>, same as POD.
162
163							=back
164
165							=cut
166
167							sub new {
168	3			3	1	70	my $class = shift;
169	3		100			34	my $opts = shift \|\| {};
170	3					7	my $self = bless $opts, $class; # FIXME
171
172	3					10	return $self;
173							}
174
175							=head2 parse
176
177							Parses a string containing whatever Pod::Cats code you have.
178
179							=cut
180
181							sub parse {
182	0			0	1	0	my ($self, $string) = @_;
183
184	0					0	return $self->parse_lines(split /\n/, $string);
185							}
186
187							=head2 parse_file
188
189							Opens the file given by filename and reads it all in and then parses that.
190
191							=cut
192
193							sub parse_file {
194	0			0	1	0	my ($self, $filename) = @_;
195
196	0	0				0	carp "File not found: " . $filename unless -e $filename;
197
198	0					0	open my $fh, "<", $filename;
199	0					0	chomp(my @lines = <$fh>);
200	0					0	close $fh;
201
202	0					0	return $self->parse_lines(@lines);
203							}
204
205							=head2 parse_lines
206
207							L and L both come here, which just takes the markup text
208							as an array of lines and parses them. This is where the logic happens. It is
209							exposed publicly so you can parse an array of your own if you want.
210
211							=cut
212
213							sub parse_lines {
214	3			3	1	86	my ($self, @lines) = @_;
215
216	3					8	my $result = "";
217	3					28	$self->_new_buffer;
218
219							# Special lines are:
220							# - a blank line. An exception is between verbatim paragraphs, so we will
221							# simply re-merge verbatim paras later on
222							# - A line starting with =, + or -. Command paragraph. Process the previous
223							# buffer and start a new one with this.
224							# - Anything else continues the previous buffer, or starts a normal paragraph
225
226	3					6	my $line_num = @lines;
227	3					22	shift @lines while $lines[0] !~ /\S/; # shift off leading blank lines!
228
229							# Adjust for lines we removed.
230	3					5	$line_num = $line_num - @lines;
231
232	3		50	0		33	my $original_warn = $SIG{__WARN__} \|\| sub { warn $_[0] };
	0					0
233							local $SIG{__WARN__} = sub {
234	0			0		0	my $warning = shift;
235	0					0	$warning =~ s/at .+/at line $line_num/;
236	0					0	$original_warn->($warning);
237	3					22	};
238
239	3					10	for my $line (@lines) {
240	46					52	$line_num++;
241
242	46					75	for ($line) {
243	46	100				209	if (/^\s*$/) {
		50
		100
244	21					54	$self->_handle_blank_line;
245							}
246							elsif (/^([=+-])/) {
247	0					0	my $type = $1;
248							my $buftype = {
249							'+' => 'begin',
250							'-' => 'end',
251							'=' => 'command',
252	0					0	}->{$type};
253
254							# First word is command name; rest is buffer.
255	0					0	my ($command, $buffer) = $line =~ /^\Q$type\E(\w+)\s(.)$/;
256
257	0					0	$self->_handle_command_line($buftype, $command);
258	0					0	$self->_buffer($buffer);
259							}
260							elsif (/^\s+\S/) {
261	10					26	$self->_handle_verbatim_line($line);
262							}
263							else {
264	15					47	$self->_handle_normal_line($line);
265							}
266							}
267							}
268
269	3					22	$self->_end_of_line;
270							}
271
272							sub _buffer {
273	54			54		66	my $self = shift;
274	54					58	push @{ $self->{buffer} }, @_;
	54					159
275							}
276
277							sub _new_buffer {
278	38			38		60	my $self = shift;
279							# The buffer type goes in the first element, and its
280							# contents, if any, in the rest.
281	38					202	$self->{buffer} = [];
282							}
283
284							sub _end_previous_buffer {
285	10			10		13	my $self = shift;
286	10	100				18	return unless $self->_buftype;
287
288	5	50				11	if ($self->_buftype eq 'verbatim') {
289							warn "Verbatim paragraph ended without blank line"
290	5	50				16	if $self->{buffer}->[-1] =~ /\S/;
291
292	5					9	$self->_process_buffer;
293							return
294	5					10	}
295
296	0					0	warn $self->_buftype . " ended without blank line";
297	0					0	$self->_process_buffer;
298
299	0					0	return 0;
300							}
301
302							sub _handle_blank_line {
303	21			21		26	my $self = shift;
304	21	50				40	return unless $self->_buftype;
305
306							# Handle a verbatim buffer when a different buffer starts, or we run out of
307							# lines.
308	21	100				53	if ($self->_buftype eq 'verbatim') {
309	9					21	$self->_buffer('');
310	9					21	return;
311							}
312
313	12					44	$self->_process_buffer;
314							}
315
316							sub _handle_command_line {
317	0			0		0	my $self = shift;
318	0					0	my $type = shift;
319	0					0	my $command = shift;
320
321	0					0	$self->_end_previous_buffer;
322
323	0					0	$self->_buffer($type, $command);
324							}
325
326							# Verbatim lines just get buffered until something else happens.
327							sub _handle_verbatim_line {
328	10			10		12	my $self = shift;
329
330	10	100				19	if ($self->_buftype ne 'verbatim') {
331	5					12	$self->_end_previous_buffer;
332	5					11	$self->_buffer('verbatim');
333							}
334	10					25	$self->_buffer(@_);
335							}
336
337							sub _handle_normal_line {
338	15			15		21	my $self = shift;
339	15					19	my $line = shift;
340
341							# Nothing special, continue previous buffer or start a paragraph.
342	15	100				49	if ($self->_buftype eq 'verbatim') {
343	5					10	$self->_end_previous_buffer;
344							}
345
346	15	50				36	if (not $self->_buftype) {
347	15					46	$self->_buffer('paragraph');
348							}
349
350	15					36	$self->_buffer($line);
351							}
352
353							sub _buftype {
354	117			117		201	my $self = shift;
355	117	100				163	if (@{ $self->{buffer} }) {
	117					300
356	82					241	return $self->{buffer}->[0];
357							}
358	35					106	return '';
359							}
360
361							sub _end_of_line {
362	3			3		6	my $self = shift;
363	3					8	$self->_process_buffer;
364							}
365
366							sub _process_buffer {
367	20			20		28	my $self = shift;
368
369	20	50				39	return '' unless $self->_buftype;
370
371	20					31	my @buffer = @{$self->{buffer}};
	20					56
372
373	20					41	for (shift @buffer) {
374	20	100	0			45	if($_ eq 'paragraph') {
		50
		0
		0
375							# concatenate the lines and normalise whitespace.
376	15					69	my $para = crunch(join " ", @buffer);
377	15					239	$self->handle_paragraph($self->_process_entities($para));
378	15					5212	$self->_new_buffer;
379							}
380							elsif($_ eq 'verbatim') {
381							# There will have been a warning if this is not the empty string
382	5	50				14	pop @buffer if $buffer[-1] eq '';
383
384							# find the lowest level of indentation in this buffer and strip it
385							# Avoid zeroes from blank lines in between
386	5	100				9	my $indent_level = min map { /^(\s*)/; length $1 \|\| () } @buffer;
	14					26
	14					98
387	5					61	s/^\s{$indent_level}// for @buffer;
388
389	5					12	my $para = join "\n", @buffer;
390	5					15	$self->handle_verbatim($para);
391							}
392							elsif($_ eq 'command' \|\| $_ eq 'begin') {
393	0					0	my $type = shift @buffer;
394	0					0	my $content = crunch(join " ", @buffer);
395	0					0	$self->${\"handle_$_"}($type, $self->_process_entities($content));
	0					0
396							}
397							elsif($_ eq 'end') {
398	0					0	$self->handle_end($buffer[0]);
399							}
400							}
401
402	20					5224	$self->_new_buffer;
403							}
404
405							=head2 handle_verbatim
406
407							The verbatim paragraph as it was in the code, except with the minimum amount of
408							whitespace stripped from each line as described in L.
409							Passed in as a single string with line breaks preserved.
410
411							Do whatever you want. Default is to return the string straight back atcha.
412
413							=cut
414
415							sub handle_verbatim {
416	5			5	1	22	shift;
417	5					10	shift;
418							}
419
420							=head2 handle_entity
421
422							Passed the letter of the L as the first argument and its content
423							as the rest of @_. The content will alternate between plain text and the return
424							value of this function for any nested entities inside this one.
425
426							For this reason you should return a scalar from this method, be it text or a
427							ref. The default is to concatenate @_, thus replacing entities with their
428							contents.
429
430							Note that this method is the only one whose return value is of relevance to the
431							parser, since what you return from this will appear in another handler,
432							depending on what type of paragraph the entity is in.
433
434							You will never get the C<< ZZ<><> >> entity.
435
436							=cut
437
438							sub handle_entity {
439	5			5	1	6670	shift; shift;
	5					6
440	5		100			40	join ' ', map $_ // '', @_;
441							}
442
443							# preprocess paragraph before giving it to the user. handle_entity is called
444							# from the parser itself.
445							sub _process_entities {
446	15			15		26	my ($self, $para) = @_;
447
448							# 1. replace POD-like Z<...> with user-defined functions.
449							# Z itself is the only actual exception to that.
450							$self->{parser} \|\|= Pod::Cats::Parser::MGC->new(
451							object => $self,
452	15		100			80	delimiters => $self->{delimiters} // '<'
			66
453							);
454
455	15					55	my $parsed = $self->{parser}->from_string( $para );
456
457							# Single return of undef was Z<>
458	15	100	66			1189	return defined $parsed->[0] \|\| @$parsed > 1 ? @$parsed : ();
459							}
460
461							=head2 handle_paragraph
462
463							The paragraph is split into sections that alternate between plain text and the
464							return values of L as described above. These
465							sections are arrayed in @_. Note that the paragraph could start with an entity.
466
467							By default it returns @_ concatenated, since the default behaviour of
468							L is to remove the formatting but keep the
469							contents.
470
471							=cut
472
473							sub handle_paragraph {
474	15		100	15	1	40	shift; join ' ', map $_ // '', @_;
	15					88
475							}
476
477							=head2 handle_command
478
479							When a L is encountered it comes here. The first argument is
480							the B (from B<=COMMAND>); the rest of the arguments follow the rules of
481							L and alternate between plain text and parsed
482							entities.
483
484							By default it returns @_ concatenated, same as paragraphs.
485
486							=cut
487
488							sub handle_command {
489	0		0	0	1		shift; shift; join ' ', map $_ // '', @_;
	0
	0
490							}
491
492							=head2 handle_begin
493
494							This is handled the same as L, except it is
495							called when a L command is encountered. The same rules apply.
496
497							=cut
498
499							sub handle_begin {
500	0		0	0	1		shift; shift; join ' ', map $_ // '', @_;
	0
	0
501							}
502
503							=head2 handle_end
504
505							The counterpart to the begin handler. This is called when the L paragraph
506							is encountered. The parser will already have discovered whether your begins and
507							ends are not balanced so you don't need to worry about that.
508
509							Note that there is no content for an end paragraph so the only argument this
510							gets is the command name.
511
512							=cut
513
514				0	1		sub handle_end { }
515
516							=head1 TODO
517
518							=over
519
520							=item The document is parsed into DOM, then events are fired SAX-like.
521							Preferable to fire the events and build the DOM from that.
522
523							=item Currently the matching of begin/end commands is a bit naive.
524
525							=item Line numbers of errors are not yet reported.
526
527							=back
528
529							=head1 AUTHOR
530
531							Altreus, C<< >>
532
533							=head1 BUGS
534
535							Bug reports to github please: http://github.com/Altreus/Pod-Cats/issues
536
537							=head1 SUPPORT
538
539							You are reading the only documentation for this module.
540
541							For more help, give me a holler on irc.freenode.com #perl
542
543							=head1 ACKNOWLEDGEMENTS
544
545							Paul Evans (LeoNerd) basically wrote Parser::MGC because I was whining about not
546							being able to parse these entity delimiters with any of the token parsers I
547							could find; and then he wrote a POD example that I only had to tweak in order to
548							do so. So a lot of the credit should go to him!
549
550							=head1 LICENSE AND COPYRIGHT
551
552							Copyright 2013 Altreus.
553
554							This module is released under the MIT licence.
555
556							=cut
557
558							1;