File Coverage

blib/lib/Decl/Tok.pm
Criterion Covered Total %
statement 17 403 4.2
branch 0 172 0.0
condition 0 50 0.0
subroutine 6 18 33.3
pod 7 7 100.0
total 30 650 4.6


line stmt bran cond sub pod time code
1             package Decl::Tok;
2              
3 1     1   65925 use 5.006;
  1         3  
4 1     1   6 use strict;
  1         1  
  1         21  
5 1     1   6 use warnings;
  1         2  
  1         46  
6              
7 1     1   462 use parent 'Iterator::Records';
  1         275  
  1         4  
8 1     1   9963 use Iterator::Records::Lines;
  1         7336  
  1         34  
9 1     1   9 use Carp;
  1         2  
  1         3152  
10              
11             =head1 NAME
12              
13             Decl::Tok - Given a line iterator, returns a token stream that tokenizes the lines as first-pass Decl
14              
15             =head1 VERSION
16              
17             Version 0.01
18              
19             =cut
20              
21             our $VERSION = '0.01';
22              
23              
24             =head1 SYNOPSIS
25              
26             Decl can be parsed at different levels of detail. The tokenizer simply takes a line iterator and skims it to extract the bare minimum of the line shape in order
27             to facilitate indexing or support some other parsing, such as building an in-memory data structure. This token stream is pretty minimal; for instance, the contents
28             of brackets are not parsed at all, just identified and passed through as a kind of quoted string. The next stage in processing has to identify any internal structure
29             in bracketed line parameters.
30              
31             =head1 CREATING A TOKENIZER
32              
33             Essentially, C is currently the beginning and end of tokenization at the stream level. A parameter tokenizer could be added as a second level of processing,
34             but I currently don't plan to do that, instead breaking out bracketed data in the data structure parser.
35              
36             =head2 skim (source, type)
37              
38             Given either a line stream or a string, sets up a first-line token stream drawing from the source. If a line stream, it must have the fields ['type', 'lno', 'indent', 'len', 'text'],
39             and 'type' must be either 'line' or 'blank'.
40              
41             The output of this tokenizer is a stream with the same fields (which means that a line stream is actually a token stream); the types are many and varied.
42              
43             The C parameter to the skimmer determines the basic text type to be expected - must be one of text, para, block, tag, or textplus. The default is tag.
44              
45             =cut
46              
47             sub _make_debug_out {
48 0     0     my $d = shift;
49 0 0   0     return sub { } unless $d;
50 0 0         return $d if ref $d;
51 0           print STDERR "\n";
52             sub {
53 0     0     print STDERR shift . "\n";
54 0           };
55             }
56              
57             sub skim {
58 0     0 1   my $class = shift;
59              
60 0           my $self = bless ({}, $class);
61 0           $self->{input} = shift;
62            
63             # If the source is a string, define a line iterator on it. If it's code, then we're transmogrifying an existing token stream.
64             # If the source is already a line iterator, use it directly. Otherwise, croak.
65 0 0         if (not ref ($self->{input})) {
    0          
    0          
66 0           $self->{input_mode} = 'string';
67 0           $self->{source} = Iterator::Records::Lines->new ('string', $self->{input});
68             } elsif (ref ($self->{input}) eq 'CODE') {
69             # This is a transmogrification of a line iterator, so keep the class intact but initialize like a vanilla itrecs.
70             # Transmogrification creates a new itrecs of the parent's class because the parent might have custom transmogrifiers. (See thread on 2019-02-23.)
71 0           $self->{input_mode} = 'transmog';
72 0           $self->{gen} = $self->{input_type};
73 0           $self->{f} = shift;
74 0           $self->{id} = '*';
75 0           return $self;
76             } elsif ($self->{input}->can('iter')) {
77 0           $self->{input_mode} = 'lines';
78 0           $self->{source} = $self->{input};
79             } else {
80 0           croak "Can't use this input type";
81             }
82              
83 0   0       $self->{input_type} = shift || 'tag';
84            
85 0           my $debug_out = _make_debug_out(shift);
86            
87             # As we're a tokenizer, we know our fields.
88 0           $self->{f} = ['type', 'lno', 'indent', 'len', 'text'];
89 0           $self->{id} = '*';
90              
91             $self->{gen} = sub {
92             # Set up our state.
93             # The following are the settings for 'tag'; for other input types, see 'if's below.
94 0     0     my $stop_on_blank = 0;
95 0           my $check_barewords = 1;
96 0           my $check_quotes = 1;
97 0           my $check_brackets = 1;
98 0           my $check_sigils = 1;
99 0           my $check_plus = 0;
100 0           my $force_sigil_space = 1;
101 0           my $tokens_found = 0;
102 0           my $in_text_block = 0;
103            
104 0           my $qindent = 0;
105 0           my $qindmin = 0;
106 0           my $lastlno = 0;
107 0           my $blanks = 0;
108 0           my $closer = 0;
109 0           my $blanks_before = 0;
110 0           my $plus_quoted = 0;
111 0           my $plus_indent = 0;
112            
113 0           my $starting_quote = 0;
114 0           my $quoting = 0;
115 0           my $glom = 0;
116 0           my $glom_until = undef;
117 0           my $first_line = 1;
118 0           my $first = 0;
119            
120 0           my $line_continues = 0;
121 0           my $line_continued = 0;
122              
123 0           my @token_buffer = ();
124             my $yield_token = sub {
125 0           my $tok = shift;
126 0 0         return unless $tok;
127 0           unshift @token_buffer, $tok;
128 0           };
129            
130 0           my @indentation_stack = ();
131            
132             my $pop_frame = sub { # This weird style is because I'm porting from a Racket generator. This seems most natural as a translation; sorry for the accent.
133 0           my $correction = shift;
134             #$debug_out->("Asked to pop frame from stack of " . scalar (@indentation_stack));
135 0 0         return unless scalar @indentation_stack;
136 0           my $ret = ['end', $lastlno - $blanks_before - $correction, $qindmin, 0, '']; # This is the token that will be returned by the caller.
137 0           $qindmin = pop @indentation_stack;
138 0           $debug_out->("Indentation stack popped, now " . scalar (@indentation_stack) . " with q $qindmin");
139 0           $closer = 1;
140 0           $first_line = 1;
141 0 0 0       if (not @indentation_stack and $self->{input_type} eq 'textplus') {
142 0           $check_barewords = 0;
143 0           $check_quotes = 0;
144 0           $check_brackets = 0;
145 0           $check_plus = 1;
146 0           $plus_quoted = 0;
147 0           $first_line = 1;
148 0           $quoting = 0;
149 0           $starting_quote = 0;
150 0           $stop_on_blank = 1;
151             }
152 0           return $ret;
153 0           };
154             my $pop_frames_to_indent = sub {
155 0           my $indent = shift;
156             #print STDERR "Asked to pop stack of " . scalar(@indentation_stack) . " to indent $indent; q $qindmin\n" if $debug;
157 0   0       while (@indentation_stack and $indent < $qindmin) {
158             #print STDERR "Popping frame " . scalar(@indentation_stack) . " because i $indent > q $qindmin\n" if $debug;
159 0 0         $yield_token->($pop_frame->( $indent == -1 ? 0 : 1 ));
160             }
161 0           };
162            
163 0 0         if ($self->{input_type} eq 'text') {
164 0           $quoting = 1;
165 0           $glom = 1;
166             }
167 0 0         if ($self->{input_type} eq 'para') {
168 0           $stop_on_blank = 1;
169             }
170 0 0         if ($self->{input_type} eq 'block') {
171 0           $check_barewords = 0;
172 0           $check_quotes = 0;
173 0           $check_brackets = 0;
174 0           $force_sigil_space = 1;
175 0           $stop_on_blank = 1;
176             }
177 0 0         if ($self->{input_type} eq 'textplus') {
178 0           $check_barewords = 0;
179 0           $check_quotes = 0;
180 0           $check_brackets = 0;
181 0           $check_plus = 1;
182 0           $force_sigil_space = 1;
183 0           $stop_on_blank = 1;
184             }
185            
186 0           my $done = 0;
187            
188 0           my $line_iter = $self->{source}->iter;
189            
190             sub {
191             YIELD:
192             #print STDERR "Yield @ line $lastlno\n" if $debug;
193 0 0         if (@token_buffer) {
194             #print STDERR "Popping token buffer\n" if $debug;
195 0           my $tok = pop @token_buffer;
196 0           my ($t, $l, $n, $len, $tt) = @$tok;
197 0           $debug_out->( " --> ['$t', $l, $n, $len, '$tt']" );
198 0           return $tok;
199             }
200 0 0         if ($done) {
201             #print STDERR "Done and glom is $glom and bb $blanks\n" if $debug;
202 0 0 0       if ($glom and $blanks) {
203 0           while ($blanks) {
204 0           $lastlno += 1;
205 0           $yield_token->(['text', $lastlno, 0, 0, '']);
206 0           $blanks -= 1;
207             }
208             }
209 0 0         if (@indentation_stack) {
210             #print STDERR scalar @indentation_stack;
211             #print STDERR " Done, so popping stack\n";
212 0           $pop_frames_to_indent->(-1);
213 0           goto YIELD;
214             }
215 0           return undef;
216             }
217            
218             # Get a line and quit if there aren't any more.
219 0           my $line = $line_iter->();
220 0 0         if (not $line) {
221 0           $done = 1;
222 0           goto YIELD;
223             }
224            
225 0           $line_continued = $line_continues;
226 0           $line_continues = 0;
227            
228             # We have a line to process.
229 0           my ($type, $lno, $indent, $len, $text) = @$line;
230 0           $debug_out->( "LINE $lno: $type $indent $len $text" );
231             my $advance_cursor = sub {
232 0           my $tlen = shift; # Token length to advance by
233 0           $indent = $indent + $tlen;
234 0           $len = $len - $tlen;
235 0           $text = substr ($text, $tlen);
236 0           };
237            
238 0           $first = 1;
239 0           $tokens_found = 0;
240 0           my $non_name = 0;
241 0           my $textrest = 0;
242              
243 0 0         if ($line_continued) {
244 0           $first = 0;
245 0           $non_name = 1;
246             }
247            
248             my $push_frame = sub {
249 0           my $indmin = shift;
250             #print STDERR "Pushing $indmin on stack\n";
251 0           push @indentation_stack, $qindmin;
252 0           $qindmin = $indmin;
253 0           };
254             my $push_frame_on_plus = sub {
255 0           my $indent = shift;
256 0           $pop_frames_to_indent->($indent);
257 0           $push_frame->($indent + 1);
258 0           $yield_token->(['start', $lno, $indent, 0, '']);
259 0           };
260             my $push_frame_if_first = sub {
261 0           my $indent = shift;
262 0 0         if ($first) {
263 0           $first = 0;
264 0 0         if ($plus_quoted) { # The *first* tag quoted in the textplus + block shouldn't start a frame, but then that flag should turn off
    0          
265 0           $plus_quoted = 0;
266             } elsif (not $plus_quoted) {
267 0           $pop_frames_to_indent->($indent);
268 0           $push_frame->($indent + 1);
269 0           $yield_token->(['start', $lno, $indent, 0, '']);
270             }
271             }
272 0           };
273              
274 0 0         if ($type eq 'blank') { # Our line iterator just gave us a blank line.
275 0           $first_line = 1;
276 0 0 0       if ($stop_on_blank and not $quoting and not $plus_quoted) {
      0        
277 0           $blanks += 1;
278 0           $in_text_block = 0;
279 0           $debug_out->( "Popping frame at ll# $lastlno bb $blanks_before corr 0" );
280 0           $yield_token->($pop_frame->(0));
281             }
282 0 0 0       $blanks += 1 if ($glom or $self->{input_type} eq 'text');
283 0           goto YIELD;
284             }
285            
286 0           $blanks_before = $blanks;
287 0           $blanks = 0;
288 0           $lastlno = $lno;
289            
290 0 0         if ($self->{input_type} eq 'para') {
291 0 0         if ($first_line) {
292 0           $yield_token->(['tstart', $lno, 0, 0, '']);
293 0           $push_frame->(0);
294             }
295 0           $first_line = 0;
296 0           $yield_token->(['text', $lno, $indent, $len, $text]);
297 0           goto YIELD;
298             }
299            
300 0 0 0       if ($plus_quoted and $indent < $plus_indent) {
301 0           $pop_frames_to_indent->($indent);
302             }
303            
304             NEXT_TOKEN:
305 0           $debug_out->( "Advancing token $lno $indent $len" );
306 0           my $probe;
307            
308 0           $probe = match_white ($text);
309 0 0         if ($probe) {
310 0           my ($tlen, $ttxt) = @$probe;
311 0           $advance_cursor->($tlen);
312 0           goto NEXT_TOKEN;
313             }
314            
315 0 0         if ($textrest) {
316 0           $yield_token->(['text', $lno, $indent, $len, $text]);
317 0           goto YIELD;
318             }
319            
320 0 0         if ($starting_quote) {
321 0           $push_frame->($indent);
322 0           $yield_token->(['qstart', $lno, $indent, 0, '']);
323 0           $quoting = 1;
324 0           $starting_quote = 0;
325             }
326            
327 0 0 0       if ($quoting or $self->{input_type} eq 'text' or $self->{input_type} eq 'para') {
      0        
328 0           $debug_out->("Quoting or reading text");
329 0 0         if ($glom) {
330 0 0         if ($glom_until) {
331 0 0         if ($text eq $glom_until) {
332 0           $glom = 0;
333 0           $glom_until = undef;
334 0           while ($blanks_before) {
335 0           $yield_token->(['text', $lno - $blanks_before, 0, 0, '']);
336 0           $blanks_before -= 1;
337             }
338 0           $yield_token->(['closer', $lno, $indent, $len, $text]);
339 0           $yield_token->(['end', $lno, $indent, 0, '']);
340 0           $pop_frame->(1);
341 0           goto YIELD;
342             }
343             }
344 0           $debug_out->( "Glomming with $blanks_before blanks" );
345 0           while ($blanks_before) {
346 0           $yield_token->(['text', $lno - $blanks_before, 0, 0, '']);
347 0           $blanks_before -= 1;
348             }
349 0           $yield_token->(['text', $lno, $indent, $len, $text]);
350 0           goto YIELD;
351             }
352            
353 0           $debug_out->( "Not glomming, i $indent q $qindmin" );
354 0 0         if ($indent < $qindmin) { # !differs: wrong paren indentation in Racket
355 0           $debug_out->( "This line is too far back already; quoted text done" );
356 0 0         if ($closer) {
357 0 0         if ($closer eq $text) {
358 0           $yield_token->(['closer', $lno, $indent - $qindmin, $len, $text]); # The indentation of the closer is *negative*.
359 0           $pop_frames_to_indent->($indent);
360 0           $quoting = 0;
361 0           goto YIELD; # We used up this line with the closer.
362             }
363             }
364            
365 0           $pop_frames_to_indent->($indent); # We're done with the quoted text and have to buffer the appropriate tokens, but this line
366             # still needs to be processed.
367 0           $quoting = 0;
368             } else {
369 0           while ($blanks_before) {
370 0           $yield_token->(['text', $lno - $blanks_before, 0, 0, '']);
371 0           $blanks_before -= 1;
372             }
373 0           $yield_token->(['text', $lno, $indent - $qindmin, $len, $text]);
374 0           goto YIELD;
375             }
376             }
377            
378 0 0         if ($check_plus) {
379 0           $probe = match_plus ($text);
380 0 0 0       if ($probe and not $line_continued) {
381 0           my ($tlen, $ttxt) = @$probe;
382 0 0         if ($in_text_block) {
383 0           $debug_out->( "Encountered plus tag in text block; popping frame" );
384 0           $pop_frame->(1);
385 0           $yield_token->(['end', $lno, $indent, 0, '']);
386 0           $in_text_block = 0;
387             }
388            
389 0           $push_frame_on_plus->($indent);
390 0           $yield_token->(['plus', $lno, $indent, $tlen, $ttxt]);
391 0           $advance_cursor->($tlen);
392 0           $check_barewords = 1; # Simulate 'tag' mode until the stack pops off
393 0           $check_quotes = 1;
394 0           $check_brackets = 1;
395 0           $check_plus = 0;
396 0           $plus_quoted = 1;
397 0           $plus_indent = $indent;
398 0           $tokens_found = 1;
399 0           goto NEXT_TOKEN;
400             }
401             }
402            
403 0 0         if ($check_barewords) {
404 0           $probe = match_bareword ($text);
405 0 0         if ($probe) {
406 0           my ($tlen, $ttxt) = @$probe;
407 0 0         my $ttype = $first ? 'tag' :
    0          
408             $non_name ? 'word' :
409             'name';
410 0 0         $push_frame_if_first->($indent) unless $line_continued;
411 0           $yield_token->([$ttype, $lno, $indent, $tlen, $ttxt]);
412 0           $advance_cursor->($tlen);
413 0           $tokens_found = 1;
414 0           goto NEXT_TOKEN;
415             }
416             }
417            
418 0 0         if ($check_brackets) {
419 0           $probe = match_brackets ($text);
420 0 0         if ($probe) {
421 0           my ($tlen, $ttxt) = @$probe;
422 0 0         $push_frame_if_first->($indent) unless $line_continued;
423 0           $yield_token->(['bracket', $lno, $indent, $tlen, $ttxt]);
424 0           $advance_cursor->($tlen);
425 0           $non_name = 1;
426 0           $tokens_found = 1;
427 0           goto NEXT_TOKEN;
428             }
429             }
430            
431 0 0         if ($check_quotes) {
432 0           $probe = match_quoted ($text);
433 0 0         if ($probe) {
434 0           my ($tlen, $ttxt) = @$probe;
435 0 0         $push_frame_if_first->($indent) unless $line_continued;
436 0           $yield_token->(['quote', $lno, $indent, $tlen, $ttxt]);
437 0           $advance_cursor->($tlen);
438 0           $non_name = 1;
439 0           $tokens_found = 1;
440 0           goto NEXT_TOKEN;
441             }
442             }
443            
444 0 0         if ($check_sigils) {
445 0           $debug_out->( "Checking for sigil on $text" );
446 0           $probe = match_sigil ($text);
447 0 0         if ($probe) {
448 0           my ($tlen, $ttxt) = @$probe;
449 0           $debug_out->( "Sigil $ttxt" );
450 0           $tokens_found = 1;
451 0 0         if ($in_text_block) {
452 0           $debug_out->( "Encountered sigil in text block; popping frame" );
453 0           $pop_frame->(1);
454 0           $yield_token->(['end', $lno, $indent, 0, '']);
455 0           $in_text_block = 0;
456             }
457 0           my $line_cont = $ttxt eq '|';
458 0 0         my $ttype = $line_cont ? 'cont' : 'sigil';
459 0 0         $debug_out->( "f $first fss $force_sigil_space mw " . (match_white(substr($text, $tlen)) ? 1 : 0));
460 0 0 0       if (not $first or not $force_sigil_space or ($tlen eq $len) or match_white(substr($text, $tlen))) { # !differs: substr(...) instead of ttxt
      0        
      0        
461             # ^^^^^^^^^^^^^^^ - this feels arbitrary for dash-alone corner case
462 0 0         $push_frame_if_first->($indent) unless $line_continued;
463 0           $yield_token->([$ttype, $lno, $indent, $tlen, $ttxt]);
464 0           $advance_cursor->($tlen);
465 0           $non_name = 1;
466 0           $textrest = $line_cont;
467 0           $debug_out->( " - sigil is $ttxt" );
468 0           $closer = _closing_bracket ($ttxt);
469 0 0         $debug_out->( " - closer is " . ($closer ? $closer : '(none)') );
470            
471 0 0         if ($ttxt =~ /<<(.*)$/) { # Special <<< or <
472 0           $len = 0;
473 0           $glom = 1;
474 0 0         if ($1 eq '') {
    0          
475 0           $glom_until = 'EOF';
476             } elsif ($1 ne '<') {
477 0           $glom_until = $1;
478             } else {
479 0           $glom_until = undef; # This gloms to the end of the line iterator.
480             }
481             }
482            
483 0 0         if (not $line_cont) {
484 0           $debug_out->( "Starting quoted text next line" );
485 0           $starting_quote = 1;
486 0           $blanks_before = 0;
487 0 0         if (not $len) {
488 0           goto YIELD;
489             }
490             }
491 0 0         $line_continues = 1 if $line_cont;
492             #goto YIELD; # !differs: fall through instead of NEXT_TOKEN
493             }
494             }
495             }
496            
497             # We have a line, but it didn't get tokenized up until here because the matchers are disabled - meaning it's a text line in a text mode.
498             # So here we just pretend we're in 'para' mode, except we need to dump any starting whitespace.
499 0           $probe = match_white($text);
500 0 0         if ($probe) {
501 0           $advance_cursor->($probe->[0]);
502             }
503 0 0         if ($len) {
504 0 0         if ($tokens_found) {
505 0           $debug_out->( "There is extra text on this line." );
506 0 0         if ($first_line) {
507 0           $first_line = 0;
508 0           $yield_token->(['qstart', $lno, $indent, 0, '']); # !differs: 'tstart' in Racket
509 0           $push_frame->($indent);
510 0           $quoting = 1; # !differs
511 0           $starting_quote = 0; # !differs
512 0           $debug_out->( "qindmin now $qindmin" );
513             }
514             } else {
515 0           $debug_out->( "This line is text but not quoted." );
516 0 0         if ($first_line) {
517 0           $first_line = 0;
518 0           $yield_token->(['tstart', $lno, $indent, 0, '']);
519 0           $push_frame->($indent);
520 0           $in_text_block = 1;
521             }
522             }
523 0           $yield_token->(['text', $lno, $indent - $qindmin, $len, $text]); # Note: indent will always be 0
524             }
525            
526 0           goto YIELD;
527             }
528 0           };
  0            
529              
530 0           $self;
531             }
532              
533             sub _closing_bracket {
534 0     0     my $bracket = shift;
535 0 0         return '}' if $bracket eq '{';
536 0 0         return ')' if $bracket eq '(';
537 0 0         return '>' if $bracket eq '<';
538 0 0         return ']' if $bracket eq ']';
539             }
540              
541             =head1 MATCHING INDIVIDUAL TOKENS
542              
543             The different classes of tokens each have a matcher that can be called on a given string. If the string begins with the relevant type of token, the caller gets back the length
544             that matched and the matched token as an arrayref. If not, it gets an undef.
545              
546             =head2 match_white
547              
548             Matches whitespace.
549              
550             =cut
551              
552             sub match_white {
553 0 0   0 1   if ($_[0] =~ /^(\s+)/) {
554 0           return [length ($1), $1]
555             }
556 0           return undef;
557             }
558              
559             =head2 match_bareword
560              
561             Given a string, checks whether it starts with a Decl bareword. A bareword is pretty liberal in comparison with most languages; mostly we just have to make sure we don't
562             collide with sigils.
563              
564             =cut
565              
566             sub match_bareword {
567 0     0 1   my $string = shift;
568              
569 0 0         if ($string =~ /^([[:alnum:]_](:*[[:alnum:]!@#$%\^&*=\-+~_.,;\|\(\)\[\]\?<>{}])*)(.*)/) {
570 0           my ($bareword, $rest) = ($1, $2);
571             # The above regex grabs colon-punctuation sigils where it shouldn't, so let's split those off now.
572             # (I actually worked this out in, and brought the tests over from, Racket tok.rkt.)
573 0 0         if ($bareword =~ /^(.*):[[:punct:]]+$/) {
574 0           $bareword = $1;
575             }
576             # One slight weirdness that is best handled in a separate stage: a sigil of the form ...<
577 0 0         if ($bareword =~ /([[:punct:]]*<<([[:alnum:]\-]*))$/) {
578 0           $bareword = substr ($bareword, 0, length($bareword) - length($1));
579             }
580 0           return [length ($bareword), $bareword];
581             }
582 0           return undef;
583             }
584              
585             =head2 match_quoted
586              
587             =cut
588              
589             sub match_quoted {
590 0     0 1   my $string = shift;
591            
592             # Do we have a single-quoted string?
593 0 0         if ($string =~ /^'((?:\\.|[^'])*)'(.*)/) {
594 0           my ($content, $rest) = ($1, $2);
595 0           return [length($content)+2, substr($string, 0, length($content)+2)];
596             }
597              
598             # How about a double-quoted string?
599 0 0         if ($string =~ /^"((?:\\.|[^"])*)"(.*)/) {
600 0           my ($content, $rest) = ($1, $2);
601 0           return [length($content)+2, substr($string, 0, length($content)+2)];
602             }
603 0           return undef;
604             }
605              
606             =head2 match_brackets
607              
608             =cut
609              
610             sub match_brackets {
611 0     0 1   my $string = shift;
612 0           my $bracket = substr ($string, 0, 1);
613 0 0 0       return undef unless $bracket eq '[' or $bracket eq '{' or $bracket eq '(' or $bracket eq '<';
      0        
      0        
614 0           my $closer = $bracket;
615 0           $closer =~ tr/\(\[<{/)]>}/;
616            
617 0           my $copy = substr ($string, 1);
618            
619             # First, eliminate all quoted strings.
620 0           while ($copy =~ /('(?:\\.|[^'])*'|\"(?:\\.|[^\"])*\")/) {
621 0           my $match = $1;
622 0           my $rep = ' ' x length($match);
623 0           $copy =~ s/\Q$match\E/$rep/g;
624             }
625             # Now, eliminate all bracket pairs.
626 0           while ($copy =~ /(\Q$bracket\E(?:\\.|[^$bracket$closer])*\Q$closer\E)/) {
627 0           my $match = $1;
628 0           my $rep = ' ' x length($match);
629 0           $copy =~ s/\Q$match\E/$rep/g;
630             }
631             # Is there a closer on the line?
632 0 0         if ($copy =~ /^(.*)\Q$closer\E(.*)/) {
633 0           return [length($1) + 2, substr($string, 0, length($1) + 2)];
634             }
635 0           return undef;
636             }
637              
638             =head2 match_sigil
639              
640             =cut
641              
642             sub match_sigil {
643 0     0 1   my $string = shift;
644 0 0         if ($string =~ /^([[:punct:]]+)([[:alnum:]\-]*)/) {
645 0           my ($p, $tag) = ($1, $2);
646 0 0         $p .= $tag if $p =~ /<<$/;
647 0           return [length ($p), $p];
648             }
649 0           return undef;
650             }
651              
652             =head2 match_plus
653              
654             =cut
655              
656             sub match_plus {
657 0 0   0 1   if ($_[0] =~ /^\+/) {
658 0           return [1, '+']
659             }
660 0           return undef;
661             }
662              
663              
664             =head1 AUTHOR
665              
666             Michael Roberts, C<< >>
667              
668             =head1 BUGS
669              
670             Please report any bugs or feature requests to C, or through
671             the web interface at L. I will be notified, and then you'll
672             automatically be notified of progress on your bug as I make changes.
673              
674              
675              
676              
677             =head1 SUPPORT
678              
679             You can find documentation for this module with the perldoc command.
680              
681             perldoc Decl::Tok
682              
683              
684             You can also look for information at:
685              
686             =over 4
687              
688             =item * RT: CPAN's request tracker (report bugs here)
689              
690             L
691              
692             =item * AnnoCPAN: Annotated CPAN documentation
693              
694             L
695              
696             =item * CPAN Ratings
697              
698             L
699              
700             =item * Search CPAN
701              
702             L
703              
704             =back
705              
706              
707             =head1 ACKNOWLEDGEMENTS
708              
709              
710             =head1 LICENSE AND COPYRIGHT
711              
712             Copyright 2021 Michael Roberts.
713              
714             This program is free software; you can redistribute it and/or modify it
715             under the terms of the the Artistic License (2.0). You may obtain a
716             copy of the full license at:
717              
718             L
719              
720             Any use, modification, and distribution of the Standard or Modified
721             Versions is governed by this Artistic License. By using, modifying or
722             distributing the Package, you accept this license. Do not use, modify,
723             or distribute the Package, if you do not accept this license.
724              
725             If your Modified Version has been derived from a Modified Version made
726             by someone other than you, you are nevertheless required to ensure that
727             your Modified Version complies with the requirements of this license.
728              
729             This license does not grant you the right to use any trademark, service
730             mark, tradename, or logo of the Copyright Holder.
731              
732             This license includes the non-exclusive, worldwide, free-of-charge
733             patent license to make, have made, use, offer to sell, sell, import and
734             otherwise transfer the Package with respect to any patent claims
735             licensable by the Copyright Holder that are necessarily infringed by the
736             Package. If you institute patent litigation (including a cross-claim or
737             counterclaim) against any party alleging that the Package constitutes
738             direct or contributory patent infringement, then this Artistic License
739             to you shall terminate on the date that such litigation is filed.
740              
741             Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER
742             AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
743             THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
744             PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY
745             YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR
746             CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR
747             CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE,
748             EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
749              
750              
751             =cut
752              
753             1; # End of Decl::Tok