File Coverage

blib/lib/Text/Trac2GFM.pm
Criterion Covered Total %
statement 146 150 97.3
branch 61 68 89.7
condition 24 37 64.8
subroutine 10 10 100.0
pod 2 2 100.0
total 243 267 91.0


line stmt bran cond sub pod time code
1 16     16   203943 use strict;
  16         21  
  16         377  
2 16     16   45 use warnings;
  16         18  
  16         655  
3             package Text::Trac2GFM;
4             # ABSTRACT: Converts TracWiki formatted text to GitLab-flavored Markdown (GFM).
5             $Text::Trac2GFM::VERSION = '0.001';
6 16     16   7378 use String::Util ':all';
  16         62681  
  16         2917  
7              
8             use Exporter::Easy (
9 16         89 OK => [ 'trac2gfm', 'gfmtitle' ]
10 16     16   6671 );
  16         16511  
11              
12             =head1 NAME
13              
14             Text::Trac2GFM
15              
16             =head1 SYNOPSIS
17              
18             As a Perl library:
19              
20             use Text::Trac2GFM qw( trac2gfm gfmtitle );
21              
22             # GitLab Wiki compatible title: 'api-users-and-accounts'
23             my $gitlab_wiki_title = gfmtitle('API/Users & Accounts');
24              
25             my $gfm_page = trac2gfm($tracwiki_markup);
26              
27             Using the included C command line program:
28              
29             $ trac2gfm
30              
31             Or piped to C:
32              
33             $ cat | trac2gfm
34              
35             =head1 DESCRIPTION
36              
37             This module provides functions which ease the migration of TracWiki formatted
38             wikis (or any other content, such as ticket descriptions, which use TracWiki
39             markup) to GitLab projects using GitLab Flavored Markdown (GFM).
40              
41             For the most part, this module assumes that your input TracWiki text is fairly
42             well-formed and valid. Some concessions are made for whitespace in markup that
43             may not be optional in TracWiki, but which we can reliably treat as such.
44             However, blatant violations such as an opening C<{{{> for a pre-formatted code
45             block that is never followed by a closing C<}}}> will break your output.
46             Similar breakage can occur with horribly mis-nested emphasis markup, or wildly
47             malformed links.
48              
49             If your TracWiki markup renders properly on a Trac wiki, this module I
50             convert it correctly (barring any special exceptions noted below). If it does
51             not, please file a bug (or better yet, submit a patch)!
52              
53             =head1 EXPORTED FUNCTIONS
54              
55             This module does not export any functions by default. You must select the ones
56             you wish you use explicitly during module import. The following functions are
57             available for importing:
58              
59             =head2 trac2gfm ($markup, $options)
60              
61             Provided a scalar containing TracWiki markup, returns a scalar containing GFM
62             compliant markup. As many markup features as can be converted are, but please
63             note that GitLab-flavored Markdown does not support absolutely everything that
64             TracWiki does.
65              
66             An optional (though important) hash reference of options may be provided as the
67             second argument.
68              
69             =over
70              
71             =item * commits
72              
73             A hash containing the mappings for any repository changeset/commit references
74             in your wiki pages. This is crucial if you are migrating a project from Trac's
75             Subversion module to a Gitlab project (which is, obviously, in Git). All of
76             yuor SVN changesets will have been converted to Git commits. For this option,
77             the keys are your original Subversion changeset numbers and the values are the
78             new Git commit IDs (you may use the full hashes or the shortened ones). These
79             mappings should be extracted from the output of the C command.
80              
81             =item * image_base
82              
83             A string with the base URL where any embedded or attached images are located.
84             For Gitlab this will generally be https://///uploads/
85             where the domain, namespace, and project should hopefully be self-explanatory,
86             and the hash is simply a randomized string. Note that this URL should map to
87             the appropriate uploads directory on your Gitlab server where you have copied
88             the images/attachments.
89              
90             =back
91              
92             These options are used both for markup conversion as well as any necessary
93             title rewriting, so in addition to the keys just mentioned, you will likely
94             also need to pass in the options documented for C below.
95              
96             Things that do get converted:
97              
98             =over
99              
100             =item * Paragraphs (should have gone without saying)
101              
102             =item * Headings
103              
104             =item * Emphasis (bold, italic, and underline; including nesting)
105              
106             =item * Lists (numbered, bulleted, and lettered; latter being converted to bulleted)
107              
108             =item * Pre-formatted text and code blocks
109              
110             =item * Blockquotes
111              
112             =item * Links
113              
114             =item * TracLinks
115              
116             =over
117              
118             =item * Issues/Tickets
119              
120             =item * Changesets (including mapping SVN changeset numbers to Git commit IDs)
121              
122             =back
123              
124             =item * Image macross (for images on the current wiki page only)
125              
126             =item * Tables
127              
128             =back
129              
130             Things that do I convert (at least not yet):
131              
132             =over
133              
134             =item * Definition Lists
135              
136             =item * Images from anywhere other than the current wiki page
137              
138             =item * Macros
139              
140             =back
141              
142             =cut
143              
144             sub trac2gfm {
145 38     38 1 123 my ($trac, $opts) = @_;
146              
147 38         126 my $end_with_nl = $trac =~ m{\n$}s;
148              
149             # To properly convert TracLinks using the same title conversions the caller
150             # may be supplying when using gfmtitle directly, we need to accept the same
151             # here and pass it along to any of our own invocations to that function.
152 38 100 66     126 $opts = {} unless defined $opts && ref($opts) eq 'HASH';
153              
154             # Additionally, we need some conversion mappings for ourselves - where wiki
155             # images will be living and any SVN changeset -> Git commit mappings.
156 38 100       127 $opts->{'image_base'} = '/' unless exists $opts->{'image_base'};
157 38 100 66     105 $opts->{'commits'} = {} unless exists $opts->{'commits'} && ref($opts->{'commits'}) eq 'HASH';
158              
159             # Enforce UNIX linebreaks and convert 0xa0 non breaking spaces to regular spaces
160 38         61 $trac =~ s{\r\n}{\n}gs;
161 38         41 $trac =~ s{\xa0}{ }g;
162              
163             # Headings ('=== Foo ===' -> '### Foo')
164 38         45 $trac =~ s{^(=+)([^=]+)=*$}{ ('#' x length($1)) . ' ' . crunch($2) }gme;
  6         50  
165              
166             # Paragraph spacing
167 38         82 $trac =~ s{\n{2,}}{\n\n}gs;
168              
169             # Blockquotes (opening line only - remaining multiline are handled later)
170 38         58 $trac =~ s{\n\n\s{2,}(\S[^\n]*)(\n|$)}{\n\n> $1$2}gs;
171              
172             # Numbered, lettered, and bulleted lists (preserving nesting/indentation)
173 38         99 $trac =~ s{^(\s*\d+)[.)\]]\s*}{$1. }gm;
174 38         81 $trac =~ s{^(\s*)[a-z]+[.)\]]\s*}{$1* }gm;
175 38         59 $trac =~ s{^(\s*)\*\s*([^\*]+)$}{$1* $2}gm;
176              
177             # Various forms of emphasis
178 38         55 $trac =~ s{__([^\n_]+|[^\n_]+_?[^\n_]+)__}{
    $1
}g;
179 38         27 my $edge = 0;
180 38 100       46 $trac =~ s{'''''}{ ++$edge % 2 == 1 ? '**_' : '_**' }ge;
  2         6  
181 38         45 $trac =~ s{'''}{**}g;
182 38         42 $trac =~ s{''}{_}g;
183              
184             # Preformatting blocks (including highlighter selection)
185 38         39 $trac =~ s|^\}\}\}$|```|gm;
186 38 100       47 $trac =~ s|^\{\{\{(?:#!(\w+))?| '```' . (defined $1 ? $1 : '') |gme;
  2         8  
187              
188             # In-line preformatting
189 38         75 $trac =~ s/(\{\{\{|\}\}\})/`/g;
190              
191             # CamelCase internal wiki links
192 38         360 $trac =~ s{
193             (^|\s) ( !? ([A-Z][a-z0-9]+){2,} ) \b
194             }{
195 2 100       16 substr($2, 0, 1) eq '!'
196             ? $1 . substr($2, 1)
197             : $1 . '[' . $2 . '](' . gfmtitle($2, $opts) . ')'
198             }gxe;
199              
200             # Explicit wiki links
201 38         51 $trac =~ s{
202             \[wiki: ([^\s]+) \s* ([^\]]+)? \]
203             }{
204 2         4 my $l_title = gfmtitle($1, $opts);
205 2 100 66     14 defined $2 && length($2) > 0
206             ? '[' . $2 . '](' . $l_title . ')'
207             : '[' . $l_title . '](' . $l_title . ')'
208             }gmex;
209              
210             # Named URLs
211 38         56 $trac =~ s{
212             \[ (\w+://[^\]\s]+) \s* ([^\]]+)? \]
213             }{
214 3 100 66     21 defined $2 && length($2) > 0
215             ? '[' . $2 . '](' . $1 . ')'
216             : $1
217             }gmex;
218              
219             ## Trac project links (issues, commits, users, etc.)
220             # Tickets
221 38         96 $trac =~ s{(?:#|ticket:|bug:)(\d+)}{#$1}g;
222              
223             # Changesets
224 38         291 $trac =~ s{
225             ( (r|changeset:)(?\d+) | \[(?\d+)\] )
226             }{
227 16     16   16655 exists $opts->{'commits'}{$+{'num'}}
  16         4635  
  16         20586  
228             ? $opts->{'commits'}{$+{'num'}}
229 6 100       54 : $+{'num'}
230             }gxe;
231              
232             # Image macros
233 38         57 $trac =~ s{
234             \[\[Image\( ([^\)]+) \)\]\]
235             }{
236 3         20 my @path = split('/', $1);
237 3         5 my $url = $opts->{'image_base'};
238 3 50       10 $url .= (substr($url, -1, 1) eq '/' ? '' : '/') . $path[-1];
239 3         17 sprintf('![%s](%s)', $path[-1], $url);
240             }gxe;
241              
242             # Manual linebreaks cleanup
243 38         45 $trac =~ s{\n?(\[\[BR\s*\]\])+}{ }gs;
244              
245             # Track contents of the current table for conversion as a whole
246 38         37 my @table;
247              
248 38         115 my @lines = split(/\n/, $trac);
249              
250             LINE:
251 38         99 for (my $i = 0; $i <= $#lines; $i++) {
252             # Table conversion
253 96 100       207 if ($lines[$i] =~ m{^\s*\|\|}s) {
    50          
254             # We need the entire table before we can convert its markup, so
255             # gather the lines into @table while also clearing the current $line
256 6         7 push(@table, $lines[$i]);
257 6         3 $lines[$i] = '';
258 6         11 next LINE;
259             } elsif (@table > 0) {
260             # We have table content, but just hit a line that is not part of the
261             # table, so we can now convert that markup and add it back in at
262             # the previous line (since the current one may require its own
263             # non-table-y processing).
264 0         0 $lines[$i-1] = _convert_table(@table);
265 0         0 @table = ();
266             }
267              
268 90 100       179 if ($lines[$i] =~ m{^\s*$}) {
269 11         19 next LINE;
270             }
271              
272             # Blockquote continuations.
273 79 100 100     278 if ($i > 0 && $lines[$i-1] =~ m{^>}) {
274 2 50       5 if ($lines[$i] =~ m{^\s+(\S.*)}) {
275 2         7 $lines[$i] = "> $1";
276             } else {
277             # Blockquote was terminated by outdenting, but without the
278             # customary blank line in between. Add that, close the block,
279             # and move to the next line.
280 0         0 $lines[$i] = "\n$lines[$i]";
281 0         0 next LINE;
282             }
283             }
284             }
285              
286             # If we still have table content, then we hit the end of the markup right
287             # on a table row. Go ahead and consume and convert the straggler.
288 38 100 66     85 push(@lines, _convert_table(@table)) if @table && @table > 0;
289              
290 38 100       57 if (@lines == 1) {
291 20         29 $trac = $lines[0];
292             } else {
293 18         43 $trac = join("\n", @lines);
294 18         28 $trac =~ s{\n{3,}}{\n\n}gs;
295             }
296              
297 38 100 100     184 $trac .= "\n" if $end_with_nl && $trac !~ m{\n$}s;
298              
299 38         197 return $trac;
300             }
301              
302             =head2 gfmtitle ($title_string, $options)
303              
304             Provided a single line string, C<$title_string>, returns a variant suitable for
305             use as the title of a GitLab Wiki page. Default mutations include replacement
306             of all whitespace and disallowed characters with dashes along with a reduction
307             to non-repeating kebab casing.
308              
309             Some common technical terms that would otherwise render strangely within the
310             restrictions of GFM titles are replaced with more verbose versions (e.g. 'C++'
311             becomes 'c-plus-plus' instead of 'c-' as it would without special handling).
312              
313             You may also pass in an optional hash reference containing the following
314             options to override some of the default behavior:
315              
316             =over
317              
318             =item * downcase
319              
320             Defaults to true. Providing any false-y value will cause C to retain
321             the case of your input string, instead of lower-casing it.
322              
323             =item * unslash
324              
325             Defaults to true. Providing any false-y value will cause slashes (C) to be
326             retained in the output, instead of converting them to dashes (C<->). Note that
327             this can cause problems if you are committing your converted wiki pages into a
328             local Git repository - special care will be needed to escape the retained
329             slashes so that they are treated as part of the filename itself instead of as a
330             directory separator.
331              
332             =item * terms
333              
334             Allows you to supply your own special term conversions, or override any default
335             ones provided by this module. This is helpful in the event that your wiki uses
336             words or phrases which are mangled in unfortunate ways. The keys of the hashref
337             should be the terms (case-insensitive) as they appear in your wiki titles and
338             the values should be the form to which they should be converted. For example,
339             to keep a sane version of 'C++' in your wiki titles for GitLab (where the plus
340             sign is not allowed), you might do:
341              
342             gfmtitle('Languages/C++', { terms => { 'c++' => 'c-plus-plus' } });
343              
344             =back
345              
346             =cut
347              
348             sub gfmtitle {
349 13     13 1 27 my ($title, $opts) = @_;
350              
351 13         27 my $defaults = {
352             downcase => 1,
353             unslash => 1,
354             terms => {},
355             };
356              
357 13 50 33     59 return unless defined $title && length($title) > 0;
358              
359             # Special-case WikiStart, since TracWiki uses that as the homepage of a wiki
360             # and GitLab uses 'home'.
361 13 50       20 return 'home' if $title eq 'WikiStart';
362              
363             # Override our defaults if caller has provided anything.
364 13 100 66     32 if (defined $opts && ref($opts) eq 'HASH') {
365 6         6 foreach my $k (keys %{$opts}) {
  6         14  
366 9         15 $defaults->{$k} = $opts->{$k};
367             }
368             }
369              
370             # Not terrifically wonderful, but some developer/tech/etc. terms that would
371             # otherwise convert in very unfortunate ways. Keys are case-insensitive.
372             # Values are what we'll mutate them into for GitLab wikis. These are done
373             # before any other mangling, so the values don't necessarily have to be
374             # perfect "GitLab" identifiers.
375 13         54 my %special_terms = (
376             '&' => '-and-',
377             '@' => '-at-',
378             'c++ ' => 'C-Plus-Plus',
379             'a#' => 'A-Sharp',
380             'c#' => 'C-Sharp',
381             'f#' => 'F-Sharp',
382             'j#' => 'J-Sharp',
383             '.net' => '-Dot-Net',
384             );
385              
386             # Add any user-supplied replacement terms.
387 13 50 33     49 if (exists $defaults->{'terms'} && ref($defaults->{'terms'}) eq 'HASH') {
388 13         7 $special_terms{$_} = $defaults->{'terms'}{$_} for keys %{$defaults->{'terms'}};
  13         32  
389             }
390              
391             # GitLab wiki titles are restricted to (roughly) [a-zA-Z0-9_-/].
392             # Additionally, they encourage kebab-casing in their examples.
393 13 100       33 $title =~ s{/}{-}g if $defaults->{'unslash'};
394 13         40 $title =~ s{(^\s+|\s+$)}{}gs;
395 13         408 $title =~ s{$_}{ $special_terms{$_} }ige for keys %special_terms;
  3         34  
396 13         37 $title =~ s{[^a-zA-Z0-9/]+}{-}gs;
397              
398 13 100       24 if ($defaults->{'downcase'}) {
399 12 100       68 $title =~ s{([A-Z][a-z])}{-$1}g if $title =~ m{\b([A-Z][a-z0-9]+){2,}\b}s;
400 12         36 $title = lc($title);
401             }
402              
403 13         32 $title =~ s{-+}{-}g;
404 13         38 $title =~ s{(^-+|-+$)}{}gs;
405              
406 13         70 return $title;
407             }
408              
409             sub _convert_table {
410 3     3   3 my ($header, @rows) = @_;
411              
412 3         4 my @headers = _split_table_line($header);
413 3 100       4 my @aligns = map { $_ =~ m{^\S.*\s+$}s ? 'l' : $_ =~ m{^\s+.*\S$} ? 'r' : 'c' } @headers;
  8 100       25  
414 3         3 my @widths = map { length(crunch($_)) } @headers;
  8         48  
415              
416 3         22 my ($i, $j);
417              
418 3         5 for ($i = 0; $i <= $#rows; $i++) {
419 3         5 $rows[$i] = [map { crunch($_) } _split_table_line($rows[$i])];
  8         36  
420 3         21 for ($j = 0; $j <= $#{$rows[$i]}; $j++) {
  11         20  
421 8 100 66     24 $widths[$j] = length($rows[$i][$j])
422             unless defined $widths[$j]
423             && $widths[$j] > length($rows[$i][$j]);
424             }
425             }
426              
427             # GFM requires the header marker row to be at least three dashes. We add two
428             # so there's room for aligning marks.
429 3 100       4 @widths = map { $_ >= 5 ? $_ : 5 } @widths;
  8         12  
430              
431             # Ensure that we have an alignment for every column (in case there were
432             # more columns in a row under the headers). Default is centering.
433 3         5 push(@aligns, ('c') x ($#widths - $#aligns));
434              
435 3         2 my @table;
436              
437 3         5 for ($i = 0; $i <= $#aligns; $i++) {
438 8 50       18 $headers[$i] = crunch($headers[$i]) if defined $headers[$i];
439 8   50     85 $headers[$i] = _align_cell($headers[$i] // '', $aligns[$i], $widths[$i]);
440             }
441 3         7 push(@table, join(' | ', @headers));
442              
443 3         2 my @marks;
444 3         6 for ($i = 0; $i <= $#aligns; $i++) {
445 8         8 my $bar = '-' x $widths[$i];
446 8 100       14 if ($aligns[$i] eq 'l') {
    100          
447 2         5 $bar = ':' . substr($bar, 1);
448             } elsif ($aligns[$i] eq 'r') {
449 1         1 $bar = substr($bar, 0, -1) . ':';
450             } else {
451 5         7 $bar = ':' . substr($bar, 1, -1) . ':';
452             }
453 8         13 push(@marks, $bar);
454             }
455 3         5 push(@table, join(' | ', @marks));
456              
457 3         4 foreach my $row (@rows) {
458 3         4 for ($i = 0; $i <= $#aligns; $i++) {
459 8   50     13 $row->[$i] = _align_cell($row->[$i] // '', $aligns[$i], $widths[$i]);
460             }
461 3         3 push(@table, join(' | ', @{$row}));
  3         6  
462             }
463              
464 3         8 my $gfm_table = '| ' . join(" |\n| ", @table) . " |\n";
465 3         9 return $gfm_table;
466             }
467              
468             sub _split_table_line {
469 6     6   5 my ($line) = @_;
470              
471 6         20 chomp($line);
472              
473 6         34 $line =~ s{(^\s*\|\||\|\|\s*$)}{}gs;
474              
475 6         12 return map { $_ =~ s{(^=|=$)}{}gs; $_ } split(/\|\|/, $line);
  16         37  
  16         17  
476             }
477              
478             sub _align_cell {
479 16     16   15 my ($text, $align, $width) = @_;
480              
481 16 100       23 if ($align eq 'l') {
    100          
482 4         12 $text = sprintf('%-' . $width . 's', $text);
483             } elsif ($align eq 'r') {
484 2         4 $text = sprintf('%' . $width . 's', $text);
485             } else {
486 10         29 $text = sprintf('%-' . $width . 's', (' ' x int(($width - length($text)) / 2)) . $text);
487             }
488              
489 16         37 return $text;
490             }
491              
492             =head1 LIMITATIONS
493              
494             This module makes a few concessions to sloppiness (and tolerated, though not
495             official, markup), but for the most part it assumes your source content in the
496             TracWiki markup is generally well-formed and valid.
497              
498             =head2 Tables
499              
500             Tables, specifically, will face known limitations in their conversion. GFM
501             tables do not support row or column spanning, and cannot handle multi-line
502             contents in the markup (the newline will terminate the current cell's content).
503             As a result, complicated table markup from TracWiki pages will likely need to
504             be hand-wrangled after the conversion.
505              
506             In addition to the lack of spanning in GFM, this converter will base the cell
507             alignment on the contents of the first row. While TracWiki markup allows each
508             cell to have its own independent alignment, GFM tables set the alignment on a
509             per-column basis using markup in the headers.
510              
511             Headers are also mandatory in GFM tables, whereas they are optional in TracWiki.
512             The first row of every TracWiki table will be used as the header in the GFM
513             table, regardless of whether it included the C<||=Foo=||> markup.
514              
515             =head1 BUGS
516              
517             There are no known bugs at the time of this release. There may well be some
518             misfeatures, though.
519              
520             Please report any bugs or deficiencies you may discover to the module's GitHub
521             Issues page:
522              
523             L
524              
525             Pull requests are welcome.
526              
527             =head1 AUTHORS
528              
529             Jon Sime
530              
531             =head1 LICENSE AND COPYRIGHT
532              
533             This software is copyright (c) 2016 by Jon Sime.
534              
535             This module is free software; you can redistribute it and/or
536             modify it under the same terms as Perl itself. See L.
537              
538             This program is distributed in the hope that it will be useful,
539             but WITHOUT ANY WARRANTY; without even the implied warranty of
540             MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
541              
542             =cut
543              
544             1;