File Coverage

blib/lib/HTML/ParagraphSplit.pm
Criterion Covered Total %
statement 98 103 95.1
branch 25 28 89.2
condition 5 8 62.5
subroutine 12 12 100.0
pod 2 2 100.0
total 142 153 92.8


line stmt bran cond sub pod time code
1             package HTML::ParagraphSplit;
2              
3 9     9   245354 use strict;
  9         23  
  9         761  
4 9     9   51 use warnings;
  9         18  
  9         897  
5              
6             our $VERSION = '1.05';
7              
8             require Exporter;
9              
10             our @ISA = qw( Exporter );
11              
12             our @EXPORT_OK = qw( split_paragraphs split_paragraphs_to_text );
13              
14 9     9   9807 use HTML::Entities;
  9         66328  
  9         933  
15 9     9   12493 use HTML::TreeBuilder;
  9         329774  
  9         158  
16 9     9   435 use HTML::Tagset;
  9         19  
  9         250  
17 9     9   52 use Scalar::Util qw/ blessed /;
  9         19  
  9         1163  
18              
19 9     9   56 use vars qw( %p_content );
  9         18  
  9         24411  
20             *p_content = *HTML::Tagset::is_Possible_Strict_P_Content;
21              
22              
23             =head1 NAME
24              
25             HTML::ParagraphSplit - Change text containing HTML into a formatted HTML fragment
26              
27             =head1 SYNOPSIS
28              
29             use HTML::ParagraphSplit qw( split_paragraphs_to_text split_paragraphs );
30              
31             # Read in from a file handle, output text
32             print split_paragraphs_to_text(\*ARGV);
33              
34             # Convert text to nicely split text
35             print split_paragraphs_to_text(<
36             This is one paragraph.
37              
38             This is a another paragraph.
39             END_OF_MARKUP
40              
41             # Convert to an HTML::Element object instead
42             my $tree = split_paragraphs($html_input);
43             print $tree->as_HTML;
44              
45             # Create your own HTML::Element object and split it
46             my $tree = HTML::TreeBuilder->new;
47             $tree->parse($text);
48             $tree->eof;
49              
50             split_paragraphs($tree);
51              
52             my $html_fragment = $tree->guts->as_HTML;
53             $tree->delete;
54              
55             =head1 DESCRIPTION
56              
57             The purpose of this library is to provide methods for converting double line-breaks in text to HTML paragraphs (i.e., wrap in CPEE/PE> tags). It can also convert single line breaks into CBRE> tags. In addition, markup can be mixed in as well and this library will DoTheRightThing(tm). There are a number of additional options that can modify how the paragraph splits are performed.
58              
59             For example, given this input (the initial text was generated by DadaDodo L, btw):
60              
61             I see over the noise but I don't understand sometimes.
62              
63            
  1. One
  2. Two
  3. Three
64              
65             Fortunately, we've traded the club you can't skimp on the do because This
66             week! Presented by code Lounge: except, for controlling Knox video cameras
67             Linux well that the reason, the runlevel to run some reason number of coming
68             back next server; sees you Control display a steep
69             and I tagged with specifications of six feet, moving to Code, flyer main room
70             motel balcony,

and airflow in which define the ability to run a common. We

71             need to current in a manner
than six months and that already gotten a 
72             webcast is roughly long and bulk: and up the src page: and updates on a:
73             user will probably does this.
74              
75             This would be converted into the following:
76              
77            

I see over the noise but I don't understand sometimes.

78              
79            
  1. One
  2. Two
  3. Three
80              
81            

Fortunately, we've traded the club you can't skimp on the do because This

82             week! Presented by code Lounge: except, for controlling Knox video cameras
83             Linux well that the reason, the runlevel to run some reason number of coming
84             back next server; sees you Control display a steep
85             and I tagged with specifications of six feet, moving to Code, flyer main room
86             motel balcony,

87            

and airflow in which define the ability to run a common. We need to

88             current in a manner

89            
than six months and that already gotten a 
90             webcast
91            

is roughly long and bulk: and up the src page: and updates on a: user will

92             probably does this.

93              
94             This allows authors to use HTML markup some without having to cope with getting their paragraph tags right.
95              
96             This library depends upon L and L. You may wish to see the documentation for those libraries for additional details.
97              
98             =head1 METHODS
99              
100             The primary method of this library is C. An additional method, C is provided to simplify the task of generating output without having to fuss with L.
101              
102             =head2 split_paragraphs
103              
104             =over
105              
106             =item $element = split_paragraphs($handle, \%options)
107              
108             =item $element = split_paragraphs($text, \%options)
109              
110             =item $element = split_paragraphs($element, \%options)
111              
112             =back
113              
114             This method has three forms, which vary only in the input they receive. If the first argument is a file handle, C<$handle>, then that handle will be read, parsed, and split. If the first argument is a scalar, C<$text>, then that text will parsed and split. If the first argument is a subclass of L, C<$element>, then the tree represented by the node will be traversed and split.
115              
116             If you use the third form, your tree will be modified in place and the same tree will be returned. You will want to clone the tree ahead of time if you need to preserve the old tree.
117              
118             All forms take an optional second parameter, C<\%options>, which is a reference to a hash of options which modify the default behavior. See below for details.
119              
120             The first two forms perform an extra step, but are handled essentially the same after the input is parsed into an L using L. This is done using the defaults, except that C is set to a true value (otherwise, we lose any double returns that were in the original text). If you parse your own trees, you'll probably want to do the same.
121              
122             This method will search down the element tree and find the first node with non-implicit child ndoes and use that as the root of operations.
123              
124             The C method then walks the tree and wraps any undecorated text node in a paragraph. Any double line break discovered will result in multiple paragraphs. Any paragraph content elements (as defined by C<%is_Possible_Strict_P_Content> of L) will be inserted into the paragraph elements as if they were text. Any block level tags (i.e., not in C<%is_Possible_Strict_P_Content>) cause a paragraph break immediately before and after such elements.
125              
126             Any text found within a block-level node may also be paragraphified. Those blocks of text will not be wrapped in paragraphs unless they contain a double-line break (that way we're not inserting C

-tags without an explicit need for them).

127              
128             Note also that this will insert C

-tags conservatively. If more than two line-breaks are present, even if they are mixed with other white space, all of that whitespace will be treated as the same paragraph break. No empty C

-tags or C

-tags containing only whitespace will be inserted (mostly). The only exception is when the white space is created by white space entities, such as C< >.

129              
130             All of that is the default behavior. That behavior may be modified by the second parameter, which is used to specify options that modify that behavior.
131              
132             Here's the list of options and what they do:
133              
134             =over
135              
136             =item p_on_breaks_only =E 1
137              
138             If this option is used, then paragrpahs will not be added to your text unless there is at least one double-line break. This option is used internally to make sure nested elements do not have extra C

-tags unnecessarily.

139              
140             =item single_line_breaks_to_br =E 1
141              
142             If this option is given, then single line breaks will also be converted to C
-tags.
143              
144             =item br_only_if_can_tighten =E 1
145              
146             This option modifies the C option by specifying that C
-tags are not added within blocks that cannot be tightened (i.e., aren't set in C<%canTighten> of L). This can be useful for preventing double-line breaks from appearing inside C
-tags or C