File Coverage

blib/lib/Text/RewriteRules.pm
Criterion Covered Total %
statement 254 312 81.4
branch 82 120 68.3
condition n/a
subroutine 12 13 92.3
pod n/a
total 348 445 78.2


line stmt bran cond sub pod time code
1             package Text::RewriteRules;
2              
3 7     7   208571 use Data::Dumper;
  7         82725  
  7         566  
4 7     7   7374 use Filter::Simple;
  7         257282  
  7         58  
5              
6 7     7   567 use warnings;
  7         19  
  7         178  
7 7     7   39 use strict;
  7         14  
  7         231  
8              
9 7     7   316 use 5.010000; # 5.10.0
  7         28  
  7         47561  
10              
11             =encoding UTF-8
12              
13             =head1 NAME
14              
15             Text::RewriteRules - A system to rewrite text using regexp-based rules
16              
17             =cut
18              
19             our $VERSION = '0.25';
20              
21             =head1 SYNOPSIS
22              
23             use Text::RewriteRules;
24              
25             RULES email
26             \.==> DOT
27             @==> AT
28             ENDRULES
29              
30             print email("ambs@cpan.org") # prints ambs AT cpan DOT org
31              
32             RULES/m inc
33             (\d+)=e=> $1+1
34             ENDRULES
35              
36             print inc("I saw 11 cats and 23 dogs") # prints I saw 12 cats and 24 dogs
37              
38             =head1 ABSTRACT
39              
40             This module uses a simplified syntax for regexp-based rules for
41             rewriting text. You define a set of rules, and the system applies them
42             until no more rule can be applied.
43              
44             Two variants are provided:
45              
46             =over 4
47              
48             =item 1
49              
50             traditional rewrite (RULES function):
51              
52             while it is possible do substitute
53             | apply first substitution rule
54              
55             =item 2
56              
57             cursor based rewrite (RULES/m function):
58              
59             add a cursor to the beginning of the string
60             while not reach end of string
61             | apply substitute just after cursor and advance cursor
62             | or advance cursor if no rule can be applied
63              
64             =back
65              
66             =head1 DESCRIPTION
67              
68             A lot of computer science problems can be solved using rewriting
69             rules.
70              
71             Rewriting rules consist of mainly two parts: a regexp (LHS: Left Hand
72             Side) that is matched with the text, and the string to use to
73             substitute the content matched with the regexp (RHS: Right Hand Side).
74              
75             Now, why don't use a simple substitute? Because we want to define a
76             set of rules and match them again and again, until no more regexp of
77             the LHS matches.
78              
79             A point of discussion is the syntax to define this system. A brief
80             discussion shown that some users would prefer a function to receive an
81             hash with the rules, some other, prefer some syntax sugar.
82              
83             The approach used is the last: we use C such that we
84             can add a specific non-perl syntax inside the Perl script. This
85             improves legibility of big rewriting rules systems.
86              
87             This documentation is divided in two parts: first we will see the
88             reference of the module. Kind of, what it does, with a brief
89             explanation. Follows a tutorial which will be growing through time and
90             releases.
91              
92             =head1 SYNTAX REFERENCE
93              
94             Note: most of the examples are very stupid, but that is the easiest
95             way to explain the basic syntax.
96              
97             The basic syntax for the rewrite rules is a block, started by the
98             keyword C and ended by the C. Everything between
99             them is handled by the module and interpreted as rules or comments.
100              
101             The C keyword can handle a set of flags (we will see that
102             later), and requires a name for the rule-set. This name will be used
103             to define a function for that rewriting system.
104              
105             RULES functioname
106             ...
107             ENDRULES
108              
109             The function is defined in the main namespace where the C
110             block appears.
111              
112             In this block, each line can be a comment (Perl style), an empty line
113             or a rule.
114              
115             =head2 Basic Rule
116              
117             A basic rule is a simple substitution:
118              
119             RULES foobar
120             foo==>bar
121             ENDRULES
122              
123             The arrow C<==E> is used as delimiter. At its left is the regexp
124             to match, at the right side, the substitution. So, the previous block
125             defines a C function that substitutes all C by
126             C.
127              
128             Although this can seems similar to a global substitution, it is
129             not. With a global substitution you can't do an endless loop. With
130             this module it is very simple. I know you will get the idea.
131              
132             You can use the syntax of Perl both on the left and right hand side of
133             the rule, including C<$1...>.
134              
135             =head2 Execution Rule
136              
137             If the Perl substitution supports execution, why not to support it,
138             also? So, you got the idea. Here is an example:
139              
140             RULES foo
141             (\d+)b=e=>'b' x $1
142             (\d+)a=eval=>'a' x ($1*2)
143             ENDRULES
144              
145             So, for any number followed by a C, we replace by that number of
146             C. For each number followed by an C, we replace them by twice
147             that number of C.
148              
149             Also, you mean evaluation using an C or C inside the arrow. I
150             should remind you can mix all these rules together in the same
151             rewriting system.
152              
153             =head2 Conditional Rule
154              
155             On some cases we want to perform a substitution if the pattern matches
156             B a set of conditions about that pattern (or not) are true.
157              
158             For that, we use a three part rule. We have the common rule plus the
159             condition part, separated from the rule by C. These conditional
160             rules can be applied both for basic and execution rules.
161              
162             RULES translate
163             ([[:alpha:]]+)=e=>$dic{$1}!! exists($dic{$1})
164             ENDRULES
165              
166             The previous example would translate all words that exist on the
167             dictionary.
168              
169             =head2 Begin Rule
170              
171             Sometimes it is useful to change something on the string before
172             starting to apply the rules. For that, there is a special rule named
173             C (or C for abbreviate) just with a RHS. This RHS is Perl
174             code. Any Perl code. If you want to modify the string, use C<$_>.
175              
176             RULES foo
177             =b=> $_.=" END"
178             ENDRULES
179              
180             =head2 Last Rule
181              
182             As you use C on Perl to skip the remaining code on a loop, you
183             can also call a C (or C) rule when a specific pattern
184             matches.
185              
186             Like the C rule with only a RHS, the C rule has only a
187             LHS:
188              
189             RULES foo
190             foobar=l=>
191             ENDRULES
192              
193             This way, the rules iterate until the string matches with C.
194              
195             You can also supply a condition in a last rule:
196              
197             RULES bar
198             f(o+)b(a+)r=l=> !! length($1) == 2 * length($2);
199              
200             =head2 Rules with /x mode
201              
202             It is possible to use the regular expressions /x mode in the rewrite rules.
203             In this case:
204              
205             =over 4
206              
207             =item 1
208              
209             there must be an empty line between rules
210              
211             =item 2
212              
213             you can insert space and line breaks into the regular expression:
214              
215             RULES/x f1
216             (\d+)
217             (\d{3})
218             (000)
219             ==>$1 milhao e $2 mil!! $1 == 1
220              
221             ENDRULES
222              
223             =back
224              
225             =head1 POWER EXPRESSIONS
226              
227             To facilitate matching complex languages Text::RewriteRules defines a
228             set of regular expressions that you can use (without defining them).
229              
230             =head2 Parenthesis
231              
232             There are three kind of usual parenthesis: the standard parenthesis,
233             brackets or curly braces. You can match a balanced string of
234             parenthesis using the power expressions C<[[:PB:]]>, C<[[:BB:]]> and
235             C<[[:CBB:]]> for these three kind of parenthesis.
236              
237             For instance, if you apply this rule:
238              
239             [[:BB:]]==>foo
240              
241             to this string
242              
243             something [ a [ b] c [d ]] and something more
244              
245             then, you will get
246              
247             something foo and something more
248              
249             Note that if you apply it to
250              
251             something [[ not ] balanced [ here
252              
253             then you will get
254              
255             something [foo balanced [ here
256              
257             =head2 XML tags
258              
259             The power expression C<[[:XML:]]> match a XML tag (with or without
260             children XML tags. Note that this expression matches only well formed
261             XML tags.
262              
263             As an example, the rule
264              
265             [[:XML:]]=>tag
266              
267             applied to the string
268              
269             and
270              
271             will result in
272              
273             and tag
274              
275              
276             =cut
277              
278             our $DEBUG = 0;
279             our $count = 0;
280             our $NL = qr/\r?\n\r?/;
281             my %pseudo_classes=(
282             TEXENV => 'TEXENV',
283             PB => 'PB',
284             BB => 'BB',
285             CBB => 'CBB',
286             XML => 'XMLtree',
287             'XML+1' => \&_tag_re,
288             );
289              
290             sub _regular_expressions {
291 13     13   134 return <<'EORE';
292              
293             our $__XMLattrs = qr/(?:
294             \s+[a-zA-Z0-9:-]+\s*
295             =
296             \s*(?: '[^']+' | "[^"]+" ))*/x;
297              
298             ### This (?\n) is a BIG hack!
299             our $__XMLempty = qr/<(?[a-zA-Z0-9:-]+)(?\b)$__XMLattrs\/>/x;
300              
301             our $__XMLtree2 = qr/$__XMLempty |
302             (?
303             <(?[a-zA-Z0-9:-]+)$__XMLattrs>
304             (?: $__XMLempty | [^<]++ | (?&XML) )*+
305             <\/\k>
306             )/x;
307             our $__XMLtree = qr/$__XMLempty |
308             (?
309             <(?[a-zA-Z0-9:-]+)$__XMLattrs>
310             (?(?: $__XMLempty | [^<]++ | $__XMLtree2 )*+)
311             <\/\k>
312             )/x;
313             our $__XMLinner = qr/(?: [^<]++ | $__XMLempty | $__XMLtree2 )*+/x;
314              
315             our $__CBB = qr{ (? \{ (?(?:[^\{\}]++|(?&cbb1))*+) \} ) }sx;
316             our $__BB = qr{ (? \[ (? (?:[^\[\]]++|(?&bb1) )*+) \] ) }sx;
317             our $__PB = qr{ (? \( (? (?:[^\(\)]++|(?&pb1) )*+) \) ) }sx;
318              
319             our $__TEXENV = qr{\\begin\{(\w+)\}(.*?)\\end\{\1\}}s; ## \begin{$1}$2\end
320             our $__TEXENV1 = qr{\\begin\{(\w+)\}($__BB?)($__CBB)(.*?)\\end\{\1\}}s; ## \begin{$1}[$2]{$3}$4\end
321              
322              
323             EORE
324             }
325              
326             sub _tag_re {
327 6     6   12 my $tagname = shift;
328 6         34 return "<$tagname\$__XMLattrs(?:\/>|>\$__XMLinner<\/$tagname>)";
329             }
330              
331              
332             sub _expand_pseudo_classes {
333 126     126   175 my $rules = shift;
334              
335 126 50       209 $rules =~ s/(\[\[:(\w+):\]\])/$pseudo_classes{$2}?"\$__$pseudo_classes{$2}":$1/ge;
  18         80  
336 126         186 $rules =~ s/\[\[:(\w+)\(([^,\(\)]+)\):\]\]/$pseudo_classes{"$1+1"}->($2)/ge;
  6         20  
337              
338 126         258 return $rules;
339             }
340              
341             sub _mrules {
342 18     18   50 my ($conf, $name, $rules) = @_;
343 18         26 ++$count;
344              
345 18         35 my $code = "sub $name {\n";
346 18         25 $code .= " my \$p = shift;\n";
347 18         25 $code .= " my \$_M = \"\\x01\";\n";
348 18         23 $code .= " for (\$p) {\n";
349 18         21 $code .= " my \$modified = 1;\n";
350 18         22 $code .= " \$_ = \$_M.\$_;\n";
351 18         30 $code .= " #__$count#\n";
352 18         27 $code .= " my \$iteration = 0;\n";
353 18         23 $code .= " MAIN: while (\$modified) {\n";
354 18         22 $code .= " \$iteration++;\n";
355              
356 18 50       41 if ($DEBUG) {
357 0         0 $code .= " print STDERR \" >\$_\\n\";\n"
358             }
359              
360 18         20 $code .= " \$modified = 0;\n";
361              
362 18 50       218 my $ICASE = exists($conf->{i})?"i":"";
363 18 100       37 my $DX = exists($conf->{x})?"x":"";
364 18 50       37 if (exists($conf->{d})) {
365 0         0 $code .= " print STDERR \"Iteration on $name: \$iteration\n\$p\n\";";
366             }
367              
368 18         21 my @rules;
369 18 100       40 if ($DX eq "x") {
370 2         61 @rules = split /$NL$NL/, $rules;
371             } else {
372 16         138 @rules = split /$NL/, $rules;
373             }
374              
375 18         40 for my $rule (@rules) {
376 34         196 $rule =~ s/$NL$//;
377            
378 34 100       492 if ($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/) {
    50          
    100          
    50          
    100          
    100          
    50          
379 2         12 my ($ant,$con,$cond) = ($1,$3,$4);
380 2 50       10 $ICASE = "i" if $2 =~ m!i!;
381 2         7 $ant = _expand_pseudo_classes($ant);
382              
383 2         8 $code .= " while (m{\${_M}(?:$ant)}g$ICASE) {\n";
384 2         5 $code .= " if ($cond) {\n";
385 2         6 $code .= " s{\${_M}(?:$ant)}{$con\${_M}}$ICASE;\n";
386 2         5 $code .= " pos = undef;\n";
387 2         3 $code .= " \$modified = 1;\n";
388 2         24 $code .= " next MAIN\n";
389 2         4 $code .= " }\n";
390 2         7 $code .= " }\n";
391              
392             } elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/) {
393 0         0 my ($ant,$con,$cond) = ($1,$3,$4);
394 0 0       0 $ICASE = "i" if $2 =~ m!i!;
395 0         0 $ant = _expand_pseudo_classes($ant);
396              
397 0         0 $code .= " while (m{\${_M}(?:$ant)}g$ICASE) {\n";
398 0         0 $code .= " if ($cond) {\n";
399 0         0 $code .= " s{\${_M}(?:$ant)}{eval{$con}.\${_M}}e$ICASE;\n";
400 0         0 $code .= " pos = undef;\n";
401 0         0 $code .= " \$modified = 1;\n";
402 0         0 $code .= " next MAIN\n";
403 0         0 $code .= " }\n";
404 0         0 $code .= " }\n";
405              
406             } elsif ($rule =~ m/(.*?)(=i?=>)(.*)/) {
407 22         53 my ($ant,$con) = ($1,$3);
408 22 100       64 $ICASE = "i" if $2 =~ m!i!;
409 22         41 $ant = _expand_pseudo_classes($ant);
410              
411 22         51 $code .= " if (m{\${_M}(?:$ant)}$ICASE) {\n";
412 22         177 $code .= " s{\${_M}(?:$ant)}{$con\${_M}}$ICASE;\n";
413 22         28 $code .= " \$modified = 1;\n";
414 22         27 $code .= " next\n";
415 22         52 $code .= " }\n";
416              
417             } elsif($rule =~ m/=b(?:egin)?=>(.*)/s) {
418              
419 0         0 my $ac = $1;
420 0         0 $code =~ s/(#__$count#\n)/$ac;\n$1/;
421              
422             } elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/) {
423 6         17 my ($ant,$con) = ($1,$3);
424 6 100       23 $ICASE = "i" if $2 =~ m!i!;
425 6         14 $ant = _expand_pseudo_classes($ant);
426              
427 6         17 $code .= " if (m{\${_M}(?:$ant)}$ICASE) {\n";
428 6         174 $code .= " s{\${_M}(?:$ant)}{eval{$con}.\"\$_M\"}e$ICASE;\n";
429 6         10 $code .= " \$modified = 1;\n";
430 6         7 $code .= " next\n";
431 6         18 $code .= " }\n";
432              
433             } elsif($rule =~ m/(.*?)(=(?:i=)?l(?:ast)?=>\s*!!(.*))/s) {
434 2         7 my ($ant,$cond) = ($1,$3);
435 2 50       11 $ICASE = "i" if $2 =~ m!i!;
436 2         6 $ant = _expand_pseudo_classes($ant);
437              
438 2         8 $code .= " if (m{\${_M}(?:$ant)}$ICASE$DX) {\n";
439 2         6 $code .= " if ($cond) {\n";
440 2         3 $code .= " s{\${_M}}{};\n";
441 2         3 $code .= " last\n";
442 2         4 $code .= " }\n";
443 2         6 $code .= " }\n";
444              
445             } elsif($rule =~ m/(.*?)(=(?:i=)?l(?:ast)?=>)/s) {
446 2         6 my ($ant) = ($1);
447 2 50       11 $ICASE = "i" if $2 =~ m!i!;
448 2         5 $ant = _expand_pseudo_classes($ant);
449              
450 2         9 $code .= " if (m{\${_M}(?:$ant)}$ICASE$DX) {\n";
451 2         4 $code .= " s{\${_M}}{};\n";
452 2         3 $code .= " last\n";
453 2         53 $code .= " }\n";
454              
455             } else {
456 0 0       0 warn "Unknown rule: $rule\n" unless $rule =~ m!^\s*(#|$)!;
457             }
458             }
459             ##---
460              
461             # Make it walk...
462 18         205 $code .= " if (m{\${_M}(.|\\n)}) {\n";
463 18         21 $code .= " s{\${_M}(.|\\n)}{\$1\${_M}};\n";
464 18         26 $code .= " \$modified = 1;\n";
465 18         18 $code .= " next\n";
466 18         20 $code .= " }\n";
467              
468 18         18 $code .= " }\n";
469 18         24 $code .= " s/\$_M//;\n";
470 18         19 $code .= " }\n";
471 18         19 $code .= " return \$p;\n";
472 18         20 $code .= "}\n";
473              
474 18         416 $code;
475             }
476              
477             sub _rules {
478 62     62   97 my ($conf, $name, $rules) = @_;
479 62         62 ++$count;
480              
481 62         302 my $code = "sub $name {\n";
482 62         80 $code .= " my \$p = shift;\n";
483 62         73 $code .= " for (\$p) {\n";
484 62         71 $code .= " my \$modified = 1;\n";
485 62         95 $code .= " #__$count#\n";
486 62         76 $code .= " my \$iteration = 0;\n";
487 62         78 $code .= " MAIN: while(\$modified) {\n";
488 62 50       243 $code .= " print STDERR \$_;\n" if $DEBUG > 1;
489 62         78 $code .= " \$modified = 0;\n";
490 62         179 $code .= " \$iteration++;\n";
491              
492             ##---
493              
494 62 100       124 my $DICASE = exists($conf->{i})?"i":"";
495 62 100       110 my $DX = exists($conf->{x})?"x":"";
496 62 50       119 if (exists($conf->{d})) {
497 0         0 $code .= " print STDERR \"Iteration on $name: \$iteration\n\$p\n\";";
498             }
499              
500 62         58 my @rules;
501 62 100       152 if ($DX eq "x") {
502 4         67 @rules = split /$NL$NL/, $rules;
503             } else {
504 58         427 @rules = split /$NL/, $rules;
505             }
506              
507 62         112 for my $rule (@rules) {
508 74         287 $rule =~ s/$NL$//;
509              
510 74         88 my $ICASE = $DICASE;
511              
512 74 100       1323 if($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/s) {
    100          
    100          
    100          
    100          
    100          
    50          
513 4         16 my ($ant,$con,$cond) = ($1,$3,$4);
514 4 100       15 $ICASE = "i" if $2 =~ m!i!;
515 4         8 $ant = _expand_pseudo_classes($ant);
516              
517 4         11 $code .= " while (m{$ant}g$ICASE$DX) {\n";
518 4         14 $code .= " if ($cond) {\n";
519 4         10 $code .= " s{$ant\\G}{$con}$ICASE$DX;\n";
520 4         6 $code .= " pos = undef;\n";
521 4         6 $code .= " \$modified = 1;\n";
522 4         5 $code .= " next MAIN\n";
523 4         6 $code .= " }\n";
524 4         12 $code .= " }\n";
525              
526             } elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/s) {
527 4         13 my ($ant,$con,$cond) = ($1,$3,$4);
528 4 100       17 $ICASE = "i" if $2 =~ m!i!;
529 4         8 $ant = _expand_pseudo_classes($ant);
530              
531 4         12 $code .= " while (m{$ant}g$ICASE$DX) {\n";
532 4         9 $code .= " if ($cond) {\n";
533 4         10 $code .= " s{$ant\\G}{$con}e${ICASE}${DX};\n";
534 4         6 $code .= " pos = undef;\n";
535 4         22 $code .= " \$modified = 1;\n";
536 4         5 $code .= " next MAIN\n";
537 4         6 $code .= " }\n";
538 4         12 $code .= " }\n";
539              
540             } elsif ($rule =~ m/(.*?)(=i?=>)(.*)/s) {
541 50         114 my ($ant,$con) = ($1,$3);
542 50 100       128 $ICASE = "i" if $2 =~ m!i!;
543 50         214 $ant = _expand_pseudo_classes($ant);
544              
545 50         119 $code .= " if (m{$ant}$ICASE$DX) {\n";
546 50         113 $code .= " s{$ant}{$con}$ICASE$DX;\n";
547 50         56 $code .= " \$modified = 1;\n";
548 50         62 $code .= " next\n";
549 50         133 $code .= " }\n";
550              
551             } elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/s) {
552 8         21 my ($ant,$con) = ($1,$3);
553 8 100       27 $ICASE = "i" if $2 =~ m!i!;
554 8         16 $ant = _expand_pseudo_classes($ant);
555              
556 8         24 $code .= " if (m{$ant}$ICASE$DX) {\n";
557 8         20 $code .= " s{$ant}{$con}e$ICASE$DX;\n";
558 8         11 $code .= " \$modified = 1;\n";
559 8         9 $code .= " next\n";
560 8         22 $code .= " }\n";
561              
562             } elsif($rule =~ m/=b(?:egin)?=>(.*)/s) {
563              
564 2         5 my $ac = $1;
565 2         39 $code =~ s/(#__$count#\n)/$ac;\n$1/;
566              
567             } elsif($rule =~ m/(.*?)(=(i=)?l(ast)?=>\s*!!(.*))/s) {
568 2         9 my ($ant,$cond) = ($1,$5);
569 2 50       10 $ICASE = "i" if $2 =~ m!i!;
570 2         6 $ant = _expand_pseudo_classes($ant);
571              
572 2         9 $code .= " if (m{$ant}$ICASE$DX) {\n";
573 2         4 $code .= " if ($cond) {\n";
574 2         5 $code .= " last\n";
575 2         3 $code .= " }\n";
576 2         6 $code .= " }\n";
577              
578             } elsif($rule =~ m/(.*?)(=(i=)?l(ast)?=>)/s) {
579 4         9 my ($ant) = ($1);
580 4 100       14 $ICASE = "i" if $2 =~ m!i!;
581 4         9 $ant = _expand_pseudo_classes($ant);
582              
583 4         22 $code .= " if (m{$ant}$ICASE$DX) {\n";
584 4         5 $code .= " last\n";
585 4         11 $code .= " }\n";
586              
587             } else {
588 0 0       0 warn "Unknown rule: $rule\n" unless $rule =~ m!^\s*(#|$)!;
589             }
590             }
591              
592             ##---
593              
594 62         89 $code .= " }\n";
595 62         68 $code .= " }\n";
596 62         62 $code .= " return \$p;\n";
597 62         131 $code .= "}\n";
598              
599 62         1544 $code;
600             }
601              
602             sub _lrules {
603 8     8   14 my ($conf, $name, $rules) = @_;
604 8         10 ++$count;
605              
606 8         11 my $code = "my \$${name}_input = \"\";\n";
607 8         17 $code .= "sub ${name}_init {\n";
608 8         17 $code .= " \$${name}_input = shift;\n";
609 8         9 $code .= " return 1;\n";
610 8         9 $code .= "}\n\n";
611              
612 8         12 $code .= "sub $name {\n";
613 8         16 $code .= " return undef if not defined \$${name}_input;\n";
614 8 50       94 $code .= " print STDERR \$_;\n" if $DEBUG > 1;
615 8         14 $code .= " for (\$${name}_input) {\n";
616              
617             ##---
618              
619 8 50       21 my $DICASE = exists($conf->{i})?"i":"";
620 8 100       20 my $DX = exists($conf->{x})?"x":"";
621              
622 8         7 my @rules;
623 8 100       17 if ($DX eq "x") {
624 2         52 @rules = split /$NL$NL/, $rules;
625             } else {
626 6         183 @rules = split /$NL/, $rules;
627             }
628              
629 8         19 for my $rule (@rules) {
630 22         90 $rule =~ s/$NL$//;
631              
632 22         29 my $ICASE = $DICASE;
633              
634 22 50       410 if ($rule =~ m/=EOF=>(.*)/s) {
    100          
    50          
    100          
    50          
    50          
    100          
    50          
635              
636 0         0 my $act = $1;
637 0         0 $code .= " if (m{^\$}) {\n";
638 0         0 $code .= " \$${name}_input = undef;\n";
639 0         0 $code .= " return \"$act\";\n";
640 0         0 $code .= " }\n";
641              
642             } elsif ($rule =~ m/=EOF=e=>(.*)/s) {
643              
644 2         5 my $act = $1;
645 2         4 $code .= " if (m{^\$}) {\n";
646 2         6 $code .= " \$${name}_input = undef;\n";
647 2         20 $code .= " return $act;\n";
648 2         8 $code .= " }\n";
649            
650             } elsif ($rule =~ m/(.*?)(=(?:i=)?ignore=>)(.*)!!(.*)/s) {
651 0         0 my ($ant,$cond) = ($1, $4);
652 0 0       0 $ICASE = "i" if $2 =~ m!i!;
653 0         0 $ant = _expand_pseudo_classes($ant);
654              
655 0         0 $code .= " if (m{^$ant}g$ICASE$DX) {\n";
656 0         0 $code .= " if ($cond) {\n";
657 0         0 $code .= " s{$ant}{}$ICASE$DX;\n";
658 0         0 $code .= " pos = undef;\n";
659 0         0 $code .= " return $name();\n";
660 0         0 $code .= " }\n";
661 0         0 $code .= " }\n";
662              
663             } elsif ($rule =~ m/(.*?)(=(?:i=)?ignore=>)(.*)/s) {
664 4         10 my ($ant) = ($1);
665 4 50       15 $ICASE = "i" if $2 =~ m!i!;
666 4         8 $ant = _expand_pseudo_classes($ant);
667              
668 4         12 $code .= " if (m{^$ant}g$ICASE$DX) {\n";
669 4         11 $code .= " s{$ant}{}$ICASE$DX;\n";
670 4         6 $code .= " pos = undef;\n";
671 4         8 $code .= " return $name();\n";
672 4         9 $code .= " }\n";
673              
674             } elsif($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/s) {
675 0         0 my ($ant,$con,$cond) = ($1,$3,$4);
676 0 0       0 $ICASE = "i" if $2 =~ m!i!;
677 0         0 $ant = _expand_pseudo_classes($ant);
678              
679 0         0 $code .= " if (m{^$ant}g$ICASE$DX) {\n";
680 0         0 $code .= " if ($cond) {\n";
681 0         0 $code .= " s{$ant}{}$ICASE$DX;\n";
682 0         0 $code .= " pos = undef;\n";
683 0         0 $code .= " return \"$con\"\n";
684 0         0 $code .= " }\n";
685 0         0 $code .= " }\n";
686              
687             } elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/s) {
688 0         0 my ($ant,$con,$cond) = ($1,$3,$4);
689 0 0       0 $ICASE = "i" if $2 =~ m!i!;
690 0         0 $ant = _expand_pseudo_classes($ant);
691              
692 0         0 $code .= " if (m{^$ant}g$ICASE$DX) {\n";
693 0         0 $code .= " if ($cond) {\n";
694 0         0 $code .= " s{$ant}{}${ICASE}${DX};\n";
695 0         0 $code .= " pos = undef;\n";
696 0         0 $code .= " return $con;\n";
697 0         0 $code .= " }\n";
698 0         0 $code .= " }\n";
699              
700             } elsif($rule =~ m/(.*?)(=i?=>)(.*)/s) {
701            
702 4         11 my ($ant,$con) = ($1,$3);
703 4 50       16 $ICASE = "i" if $2 =~ m!i!;
704 4         11 $ant = _expand_pseudo_classes($ant);
705              
706 4         13 $code .= " if (m{^$ant}g$ICASE$DX) {\n";
707 4         9 $code .= " s{$ant}{}$ICASE$DX;\n";
708 4         6 $code .= " pos = undef;\n";
709 4         10 $code .= " return \"$con\"\n";
710 4         11 $code .= " }\n";
711              
712             } elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/s) {
713 12         36 my ($ant,$con) = ($1,$3);
714 12 50       53 $ICASE = "i" if $2 =~ m!i!;
715 12         32 $ant = _expand_pseudo_classes($ant);
716              
717 12         34 $code .= " if (m{^$ant}g$ICASE$DX) {\n";
718 12         24 $code .= " s{$ant}{}${ICASE}${DX};\n";
719 12         14 $code .= " pos = undef;\n";
720 12         16 $code .= " return $con;\n";
721 12         30 $code .= " }\n";
722              
723             } else {
724 0 0       0 warn "Unknown rule in lexer mode: $rule\n" unless $rule =~ m!^\s*(#|$)!;
725             }
726             }
727              
728             ##---
729              
730 8         15 $code .= " }\n";
731 8         10 $code .= " return undef;\n";
732 8         10 $code .= "}\n";
733              
734 8         113 $code;
735             }
736              
737              
738             FILTER {
739             return if m!^(\s|\n)*$!;
740              
741             s!^!_regular_expressions()!e;
742              
743             print STDERR "BEFORE>>>>\n$_\n<<<<\n" if $DEBUG;
744              
745             s!^MRULES +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES!_mrules({}, $1,$2)!gem;
746              
747             s!^LRULES +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES!_lrules({}, $1,$2)!gem;
748              
749             s{^RULES((?:\/\w+)?) +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES}{
750             my ($a,$b,$c) = ($1,$2,$3);
751             my $conf = {map {($_=>$_)} split //,$a};
752             if (exists($conf->{'l'})) {
753             _lrules($conf, $b, $c)
754             } elsif (exists($conf->{'m'})) {
755             _mrules($conf,$b,$c)
756             } else {
757             _rules($conf,$b,$c)
758             }
759             }gem;
760              
761              
762              
763             print STDERR "AFTER>>>>\n$_\n<<<<\n" if $DEBUG;
764              
765             $_
766             };
767              
768             sub _compiler{
769              
770 0     0   0 local $/ = undef;
771 0         0 $_ = <>;
772              
773 0         0 print __compiler($_);
774             }
775              
776             sub __compiler {
777 6     6   2734 my $str = shift;
778              
779 6         14 for ($str) {
780              
781 6         35 s!use Text::RewriteRules;!_regular_expressions()!e;
  6         16  
782              
783 6         52 s!^MRULES +(\w+)\s*\n((?:.|\n)*?)^ENDRULES!_mrules({}, $1,$2)!gem;
  7         107  
784              
785 6         51 s!^LRULES +(\w+)\s*\n((?:.|\n)*?)^ENDRULES!_lrules({}, $1,$2)!gem;
  0         0  
786              
787 6         81 s{^RULES((?:\/\w+)?) +(\w+)\s*\n((?:.|\n)*?)^ENDRULES}{
788 37         108 my ($a,$b,$c) = ($1,$2,$3);
789              
790 37         77 my $conf = {map {($_=>$_)} split //,$a};
  22         68  
791 37 100       125 if (exists($conf->{'l'})) {
    100          
792 4         9 _lrules($conf,$b,$c)
793             } elsif (exists($conf->{'m'})) {
794 2         11 _mrules($conf,$b,$c)
795             } else {
796 31         66 _rules($conf,$b,$c)
797             }
798             }gem;
799             }
800              
801 6         231 return $str;
802             }
803              
804             =head1 TUTORIAL
805              
806             At the moment, just a set of commented examples.
807              
808             Example1 -- from number to portuguese words (using traditional rewriting)
809              
810             Example2 -- Naif translator (using cursor-based rewriting)
811              
812             =head1 Conversion between numbers and words
813              
814             Yes, you can use L and similar (for other
815             languages). Meanwhile, before it existed we needed to write such a
816             conversion tool.
817              
818             Here I present a subset of the rules (for numbers bellow 1000). The
819             generated text is Portuguese but I think you can get the idea. I'll
820             try to create a version for English very soon.
821              
822             You can check the full code on the samples directory (file
823             C).
824              
825             use Text::RewriteRules;
826              
827             RULES num2words
828             100==>cem
829             1(\d\d)==>cento e $1
830             0(\d\d)==>$1
831             200==>duzentos
832             300==>trezentos
833             400==>quatrocentos
834             500==>quinhentos
835             600==>seiscentos
836             700==>setecentos
837             800==>oitocentos
838             900==>novecentos
839             (\d)(\d\d)==>${1}00 e $2
840              
841             10==>dez
842             11==>onze
843             12==>doze
844             13==>treze
845             14==>catorze
846             15==>quinze
847             16==>dezasseis
848             17==>dezassete
849             18==>dezoito
850             19==>dezanove
851             20==>vinte
852             30==>trinta
853             40==>quarenta
854             50==>cinquenta
855             60==>sessenta
856             70==>setenta
857             80==>oitenta
858             90==>noventa
859             0(\d)==>$1
860             (\d)(\d)==>${1}0 e $2
861              
862             1==>um
863             2==>dois
864             3==>três
865             4==>quatro
866             5==>cinco
867             6==>seis
868             7==>sete
869             8==>oito
870             9==>nove
871             0$==>zero
872             0==>
873             ==>
874             ,==>,
875             ENDRULES
876              
877             num2words(123); # returns "cento e vinte e três"
878              
879             =head2 Naif translator (using cursor-based rewriting)
880              
881             use Text::RewriteRules;
882             %dict=(driver=>"motorista",
883             the=>"o",
884             of=>"de",
885             car=>"carro");
886              
887             $word='\b\w+\b';
888              
889             if( b(a("I see the Driver of the car")) eq "(I) (see) o Motorista do carro" )
890             {print "ok\n"}
891             else {print "ko\n"}
892              
893             RULES/m a
894             ($word)==>$dict{$1}!! defined($dict{$1})
895             ($word)=e=> ucfirst($dict{lc($1)}) !! defined($dict{lc($1)})
896             ($word)==>($1)
897             ENDRULES
898              
899             RULES/m b
900             \bde o\b==>do
901             ENDRULES
902              
903             =head1 AUTHOR
904              
905             Alberto Simões, C<< >>
906              
907             José João Almeida, C<< >>
908              
909             =head1 BUGS
910              
911             We know documentation is missing and you all want to use this module.
912             In fact we are using it a lot, what explains why we don't have the
913             time to write documentation.
914              
915             Please report any bugs or feature requests to
916             C, or through the web interface at
917             L. I will be notified, and then you'll automatically
918             be notified of progress on your bug as I make changes.
919              
920             =head1 ACKNOWLEDGEMENTS
921              
922             Damian Conway for Filter::Simple
923              
924             =head1 COPYRIGHT & LICENSE
925              
926             Copyright 2004-2012 Alberto Simões and José João Almeida, All Rights Reserved.
927              
928             This program is free software; you can redistribute it and/or modify it
929             under the same terms as Perl itself.
930              
931             =cut
932              
933             1; # End of Text::RewriteRules