File Coverage

blib/lib/XS/Parse/Keyword.pm
Criterion Covered Total %
statement 5 5 100.0
branch n/a
condition n/a
subroutine 2 2 100.0
pod n/a
total 7 7 100.0


line stmt bran cond sub pod time code
1             # You may distribute under the terms of either the GNU General Public License
2             # or the Artistic License (the same terms as Perl itself)
3             #
4             # (C) Paul Evans, 2021-2022 -- leonerd@leonerd.org.uk
5              
6             package XS::Parse::Keyword 0.29;
7              
8 22     22   1495784 use v5.14;
  22         352  
9 22     22   123 use warnings;
  22         50  
  22         3593  
10              
11             require XSLoader;
12             XSLoader::load( __PACKAGE__, our $VERSION );
13              
14             =head1 NAME
15              
16             C - XS functions to assist in parsing keyword syntax
17              
18             =head1 DESCRIPTION
19              
20             This module provides some XS functions to assist in writing syntax modules
21             that provide new perl-visible syntax, primarily for authors of keyword plugins
22             using the C hook mechanism. It is unlikely to be of much
23             use to anyone else; and highly unlikely to be any use when writing perl code
24             using these. Unless you are writing a keyword plugin using XS, this module is
25             not for you.
26              
27             This module is also currently experimental, and the design is still evolving
28             and subject to change. Later versions may break ABI compatibility, requiring
29             changes or at least a rebuild of any module that depends on it.
30              
31             =cut
32              
33             =head1 XS FUNCTIONS
34              
35             =head2 boot_xs_parse_keyword
36              
37             void boot_xs_parse_keyword(double ver);
38              
39             Call this function from your C section in order to initialise the module
40             and parsing hooks.
41              
42             I should either be 0 or a decimal number for the module version
43             requirement; e.g.
44              
45             boot_xs_parse_keyword(0.14);
46              
47             =head2 register_xs_parse_keyword
48              
49             void register_xs_parse_keyword(const char *keyword,
50             const struct XSParseKeywordHooks *hooks, void *hookdata);
51              
52             This function installs a set of parsing hooks to be associated with the given
53             keyword. Such a keyword will then be handled automatically by a keyword parser
54             installed by C itself.
55              
56             =cut
57              
58             =head1 PARSE HOOKS
59              
60             The C structure provides the following hook stages, which
61             are invoked in the given order.
62              
63             =head2 flags
64              
65             The following flags are defined:
66              
67             =over 4
68              
69             =item C
70              
71             The parse or build function is expected to return C.
72              
73             =item C
74              
75             The parse or build function is expected to return C.
76              
77             These two flags are largely for the benefit of giving static information at
78             registration time to assist static parsing or other related tasks to know what
79             kind of grammatical element this keyword will produce.
80              
81             =item C
82              
83             The syntax forms a complete statement, which should be followed by a statement
84             separator semicolon (C<;>). This semicolon is optional at the end of a block.
85              
86             The semicolon, if present, will be consumed automatically.
87              
88             =back
89              
90             =head2 The C Stage
91              
92             const char *permit_hintkey;
93             bool (*permit) (pTHX_ void *hookdata);
94              
95             Called by the installed keyword parser hook which is used to handle keywords
96             registered by L.
97              
98             As a shortcut for the common case, the C may point to a string
99             to look up from the hints hash. If the given key name is not found in the
100             hints hash then the keyword is not permitted. If the key is present then the
101             C function is invoked as normal.
102              
103             If not rejected by a hint key that was not found in the hints hash, the
104             function part of the stage is called next and should inspect whether the
105             keyword is permitted at this time perhaps by inspecting other lexical clues,
106             and return true only if the keyword is permitted.
107              
108             Both the string and the function are optional. Either or both may be present.
109             If neither is present then the keyword is always permitted - which is likely
110             not what you wanted to do.
111              
112             =head2 The C Stage
113              
114             void (*check)(pTHX_ void *hookdata);
115              
116             Invoked once the keyword has been permitted. If present, this hook function
117             can check the surrounding lexical context, state, or other information and
118             throw an exception if it is unhappy that the keyword should apply in this
119             position.
120              
121             =head2 The C Stage
122              
123             This stage is invoked once the keyword has been checked, and actually
124             parses the incoming text into an optree. It is implemented by calling the
125             B of the following function pointers which is not NULL. The invoked
126             function may optionally build an optree to represent the parsed syntax, and
127             place it into the variable addressed by C. If it does not, then a simple
128             C will be constructed in its place.
129              
130             C is called both before and after this stage is invoked, so
131             in many simple cases the hook function itself does not need to bother with it.
132              
133             int (*parse)(pTHX_ OP **out, void *hookdata);
134              
135             If present, this should consume text from the parser buffer by invoking
136             C or C functions and eventually return a C
137             result value.
138              
139             This is the most generic and powerful of the options, but requires the most
140             amount of implementation work.
141              
142             int (*build)(pTHX_ OP **out, XSParseKeywordPiece *args[], size_t nargs, void *hookdata);
143              
144             If C is not present, this is called instead after parsing a sequence of
145             arguments, of types given by the I field; which should be a zero-
146             terminated array of piece types.
147              
148             This alternative is somewhat less generic and powerful than providing C
149             yourself, but involves much less parsing work and is shorter and easier to
150             implement.
151              
152             int (*build1)(pTHX_ OP **out, XSParseKeywordPiece *arg0, void *hookdata);
153              
154             If neither C nor C are present, this is called as a simpler
155             variant of C when only a single argument is required. It takes its type
156             from the C field instead.
157              
158             =cut
159              
160             =head1 PIECES AND PIECE TYPES
161              
162             When using the C or C alternatives for the C phase, the
163             actual syntax is parsed automatically by this module, according to the
164             specification given by the I or I field. The result of that
165             parsing step is placed into the I or I parameter to the invoked
166             function, using a C type consisting of the following fields:
167              
168             typedef struct
169             union {
170             OP *op;
171             CV *cv;
172             SV *sv;
173             int i;
174             struct {
175             SV *name;
176             SV *value;
177             } attr;
178             PADOFFSET padix;
179             struct XSParseInfixInfo *infix;
180             };
181             int line;
182             } XSParseKeywordPiece;
183              
184             Which field of the anonymous union is set depends on the type of the piece.
185             The I field contains the line number of the source file where parsing of
186             that piece began.
187              
188             Some piece types are "atomic", whose definition is self-contained. Others are
189             structural, defined in terms of inner pieces. Together these form an entire
190             tree-shaped definition of the syntax that the keyword expects to find.
191              
192             Atomic types generally provide exactly one argument into the list of I
193             (with the exception of literal matches, which do not provide anything).
194             Structural types may provide an initial argument themselves, followed by a
195             list of the values of each sub-piece they contained inside them. Thus, while
196             the data structure defining the syntax shape is a tree, the argument values it
197             parses into is passed as a flat array to the C function.
198              
199             Some structural types need to be able to determine whether or not syntax
200             relating some optional part of them is present in the incoming source text. In
201             this case, the pieces relating to those optional parts must support "probing".
202             This ability is also noted below.
203              
204             The type of each piece should be one of the following macro values.
205              
206             =head2 XPK_BLOCK
207              
208             I
209              
210             XPK_BLOCK
211              
212             A brace-delimited block of code is expected, passed as an optree in the I
213             field. This will be parsed as a block within the current function scope.
214              
215             This can be probed by checking for the presence of an open-brace (C<{>)
216             character.
217              
218             Be careful defining grammars with this because an open-brace is also a valid
219             character to start a term expression, for example. Given a choice between
220             C and C, either of them could try to consume such
221             code as
222              
223             { 123, 456 }
224              
225             =head2 XPK_BLOCK_VOIDCTX, XPK_BLOCK_SCALARCTX, XPK_BLOCK_LISTCTX
226              
227             Variants of C which wrap a void, scalar or list-context scope
228             around the block.
229              
230             =head2 XPK_PREFIXED_BLOCK
231              
232             I
233              
234             XPK_PREFIXED_BLOCK(pieces ...)
235              
236             Some pieces are expected, followed by a brace-delimited block of code, which
237             is passed as an optree in the I field. The prefix pieces are parsed first,
238             and their results are passed before the block itself.
239              
240             The entire sequence, including the prefix items, is contained within a pair of
241             C / C calls. This permits the prefix pieces to
242             introduce new items into the lexical scope of the block - for example by the
243             use of C.
244              
245             A call to C is automatically made at the end of the prefix pieces,
246             before the block itself is parsed, ensuring any new lexical variables are now
247             visible.
248              
249             In addition, the following extra piece types are recognised here:
250              
251             =over 4
252              
253             =item XPK_SETUP
254              
255             void setup(pTHX_ void *hookdata);
256              
257             XPK_SETUP(&setup)
258              
259             I
260              
261             This piece type runs a function given by pointer. Typically this function may
262             be used to introduce new lexical state into the parser, or in some other way
263             have some side-effect on the parsing context of the block to be parsed.
264              
265             =back
266              
267             =head2 XPK_PREFIXED_BLOCK_ENTERLEAVE
268              
269             A variant of C which additionally wraps the entire parsing
270             operation, including the C, C and any calls to
271             C functions, within a C/C pair.
272              
273             This should not make a difference to the standard parser pieces provided here,
274             but may be useful behaviour for the code in the setup function, especially if
275             it wishes to modify parser state and use the savestack to ensure it is
276             restored again when parsing has finished.
277              
278             =head2 XPK_ANONSUB
279              
280             I
281              
282             A brace-delimited block of code is expected, and assembled into the body of a
283             new anonymous subroutine. This will be passed as a protosub CV in the I
284             field.
285              
286             =head2 XPK_ARITHEXPR
287              
288             I
289              
290             XPK_ARITHEXPR
291              
292             An arithmetic expression is expected, parsed using C, and
293             passed as an optree in the I field.
294              
295             =head2 XPK_ARITHEXPR_VOIDCTX, XPK_ARITHEXPR_SCALARCTX
296              
297             Variants of C which puts the expression in void or scalar context.
298              
299             =head2 XPK_TERMEXPR
300              
301             I
302              
303             XPK_TERMEXPR
304              
305             A term expression is expected, parsed using C, and passed as
306             an optree in the I field.
307              
308             =head2 XPK_TERMEXPR_VOIDCTX, XPK_TERMEXPR_SCALARCTX
309              
310             Variants of C which puts the expression in void or scalar context.
311              
312             =head2 XPK_PREFIXED_TERMEXPR_ENTERLEAVE
313              
314             XPK_PREFIXED_TERMEXPR_ENTERLEAVE(pieces ...)
315              
316             A variant of C which expects a sequence pieces first before it
317             parses a term expression, similar to how C
318             works. The entire operation is wrapped in an C/C pair.
319              
320             This is intended just for use of C pieces as prefixes. Any other
321             pieces which actually parse real input are likely to cause overly-complex,
322             subtle, or outright ambiguous grammars, and should be avoided.
323              
324             =head2 XPK_LISTEXPR
325              
326             I
327              
328             XPK_LISTEXPR
329              
330             A list expression is expected, parsed using C, and passed as
331             an optree in the I field.
332              
333             =head2 XPK_LISTEXPR_LISTCTX
334              
335             Variant of C which puts the expression in list context.
336              
337             =head2 XPK_IDENT, XPK_IDENT_OPT
338              
339             I
340              
341             A bareword identifier name is expected, and passed as an SV containing a PV
342             in the I field. An identifier is not permitted to contain a double colon
343             (C<::>).
344              
345             The C<_OPT>-suffixed version is optional; if no identifier is found then I
346             is set to C.
347              
348             =head2 XPK_PACKAGENAME, XPK_PACKAGENAME_OPT
349              
350             I
351              
352             A bareword package name is expected, and passed as an SV containing a PV in
353             the I field. A package name is similar to an identifier, except it permits
354             double colons in the middle.
355              
356             The C<_OPT>-suffixed version is optional; if no package name is found then
357             I is set to C.
358              
359             =head2 XPK_LEXVARNAME
360              
361             I
362              
363             XPK_LEXVARNAME(kind)
364              
365             A lexical variable name is expected, and passed as an SV containing a PV in
366             the I field. The C argument specifies what kinds of variable are
367             permitted, and should be a bitmask of one or more bits from
368             C, C and C. A convenient
369             shortcut C permits all three.
370              
371             =head2 XPK_ATTRIBUTES
372              
373             I
374              
375             A list of C<:>-prefixed attributes is expected, in the same format as sub or
376             variable attributes. An optional leading C<:> indicates the presence of
377             attributes, then one or more of them are parsed. Attributes may be optionally
378             separated by additional C<:>s, but this is not required.
379              
380             Each attribute is expected to be an identifier name, followed by an optional
381             value wrapped in parentheses. Whitespace is B permitted between the name
382             and value, as per standard Perl parsing rules.
383              
384             :attrname
385             :attrname(value)
386              
387             The I field indicates how many attributes were found. That number of
388             additional arguments are then passed, each containing two SVs in the
389             I and I fields. This number may be zero.
390              
391             It is not an error for there to be no attributes present, or for the optional
392             colon to be missing. In this case I will be set to zero.
393              
394             =head2 XPK_VSTRING, XPK_VSTRING_OPT
395              
396             I
397              
398             A version string is expected, of the form C including the leading C
399             character. It is passed as a L SV object in the I field.
400              
401             The C<_OPT>-suffixed version is optional; if no version string is found then
402             I is set to C.
403              
404             =head2 XPK_LEXVAR_MY
405              
406             I
407              
408             XPK_LEXVAR_MY(kind)
409              
410             A lexical variable name is expected, added to the current pad as if specified
411             in a C expression, and passed as the pad index in the I field.
412              
413             The C argument specifies what kinds of variable are permitted, as per
414             C.
415              
416             =head2 XPK_COMMA, XPK_COLON, XPK_EQUALS
417              
418             I
419              
420             A literal character (C<,>, C<:> or C<=>) is expected. No argument value is passed.
421              
422             =head2 XPK_AUTOSEMI
423              
424             I
425              
426             A literal semicolon (C<;>) as a statement terminator is optionally expected.
427             If the next token is a closing brace to indicate the end of a block, then a
428             semicolon is not required. If anything else is encountered an error will be
429             raised.
430              
431             This piece type is the same as specifying the C. It is
432             useful to put at the end of a sequence that forms part of a choice of syntax,
433             where some forms indicate a statement ending in a semicolon, whereas others
434             may end in a full block that does not need one.
435              
436             =head2 XPK_INFIX_*
437              
438             I
439              
440             An infix operator as recognised by L. The returned pointer
441             points to a structure allocated by C describing the
442             operator.
443              
444             Various versions of the macro are provided, each using a different selection
445             filter to choose certain available infix operators:
446              
447             XPK_INFIX_RELATION # any relational operator
448             XPK_INFIX_EQUALITY # an equality operator like `==` or `eq`
449             XPK_INFIX_MATCH_NOSMART # any sort of "match"-like operator, except smartmatch
450             XPK_INFIX_MATCH_SMART # XPK_INFIX_MATCH_NOSMART plus smartmatch
451              
452             =head2 XPK_LITERAL
453              
454             I
455              
456             XPK_LITERAL("literal")
457              
458             A literal string match is expected. No argument value is passed.
459              
460             This form should generally be avoided if at all possible, because it is very
461             easy to abuse to make syntaxes which confuse humans and code tools alike.
462             Generally it is best reserved just for the first component of a
463             C or C sequence, to provide a "secondary keyword"
464             that such a repeated item can look out for.
465              
466             =head2 XPK_KEYWORD
467              
468             I
469              
470             XPK_KEYWORD("keyword")
471              
472             A literal string match is expected. No argument value is passed.
473              
474             This is similar to C except that it additionally checks that the
475             following character is not an identifier character. This ensures that the
476             expected keyword-like behaviour is preserved. For example, given the input
477             C<"keyword">, the piece C would match it, whereas
478             C would not because of the subsequent C<"w"> character.
479              
480             =head2 XPK_SEQUENCE
481              
482             I
483              
484             XPK_SEQUENCE(pieces ...)
485              
486             A structural type which contains a number of pieces. This is normally
487             equivalent to simply placing the pieces in sequence inside their own
488             container, but it is useful inside C or C.
489              
490             An C supports probe if its first contained piece does; i.e.
491             is transparent to probing.
492              
493             =head2 XPK_OPTIONAL
494              
495             I
496              
497             XPK_OPTIONAL(pieces ...)
498              
499             A structural type which may expects to find its contained pieces, or is happy
500             not to. This will pass an argument whose I field contains either 1 or 0,
501             depending whether the contents were found. The first piece type within must
502             support probe.
503              
504             =head2 XPK_REPEATED
505              
506             I
507              
508             XPK_REPEATED(pieces ...)
509              
510             A structural type which expects to find zero or more repeats of its contained
511             pieces. This will pass an argument whose I field contains the count of the
512             number of repeats it found. The first piece type within must support probe.
513              
514             =head2 XPK_CHOICE
515              
516             I
517              
518             XPK_CHOICE(options ...)
519              
520             A structural type which expects to find one of a number of alternative
521             options. An ordered list of types is provided, all of which must support
522             probe. This will pass an argument whose I field gives the index of the
523             first choice that was accepted. The first option takes the value 0.
524              
525             As each of the options is interpreted as an alternative, not a sequence, you
526             should use C if a sequence of multiple items should be
527             considered as a single alternative.
528              
529             It is not an error if no choice matches. At that point, the I field will be
530             set to -1.
531              
532             If you require a failure message in this case, set the final choice to be of
533             type C. This will cause an error message to be printed instead.
534              
535             XPK_FAILURE("message string")
536              
537             =head2 XPK_TAGGEDCHOICE
538              
539             I
540              
541             XPK_TAGGEDCHOICE(choice, tag, ...)
542              
543             A structural type similar to C, except that each choice type is
544             followed by an element of type C which gives an integer. It is that
545             integer value, rather than the positional index of the choice within the list,
546             which is passed in the I field.
547              
548             XPK_TAG(value)
549              
550             As each of the options is interpreted as an alternative, not a sequence, you
551             should use C if a sequence of multiple items should be
552             considered as a single alternative.
553              
554             =head2 XPK_COMMALIST
555              
556             I
557              
558             XPK_COMMALIST(pieces ...)
559              
560             A structural type which expects to find one or more repeats of its contained
561             pieces, separated by literal comma (C<,>) characters. This is somewhat similar
562             to C, except that it needs at least one copy, needs commas
563             between its items, but does not require that the first contained piece support
564             probe (the comma itself is sufficient to indicate a repeat).
565              
566             An C supports probe if its first contained piece does; i.e.
567             is transparent to probing.
568              
569             =head2 XPK_PARENSCOPE
570              
571             I
572              
573             XPK_PARENSCOPE(pieces ...)
574              
575             A structural type which expects to find a sequence of pieces, all contained in
576             parentheses as C<( ... )>. This will pass no extra arguments.
577              
578             =head2 XPK_ARGSCOPE
579              
580             I
581              
582             XPK_ARGSCOPE(pieces ...)
583              
584             A structural type similar to C, except that the parentheses
585             themselves are optional; much like Perl's parsing of calls to known functions.
586              
587             If parentheses are encountered in the input, they will be consumed by this
588             piece and it will behave identically to C. If there is no open
589             parenthesis, this piece will behave like C and consume all the
590             pieces inside it, without expecting a closing parenthesis.
591              
592             =head2 XPK_BRACKETSCOPE
593              
594             I
595              
596             XPK_BRACKETSCOPE(pieces ...)
597              
598             A structural type which expects to find a sequence of pieces, all contained in
599             square brackets as C<[ ... ]>. This will pass no extra arguments.
600              
601             =head2 XPK_BRACESCOPE
602              
603             I
604              
605             XPK_BRACESCOPE(pieces ...)
606              
607             A structural type which expects to find a sequence of pieces, all contained in
608             braces as C<{ ... }>. This will pass no extra arguments.
609              
610             Note that this is not necessary to use with C or C;
611             those will already consume a set of braces. This is intended for special
612             constrained syntax that should not just accept an arbitrary block.
613              
614             =head2 XPK_CHEVRONSCOPE
615              
616             I
617              
618             XPK_CHEVRONSCOPE(pieces ...)
619              
620             A structural type which expects to find a sequence of pieces, all contained in
621             angle brackets as C<< < ... > >>. This will pass no extra arguments.
622              
623             Remember that expressions like C<< a > b >> are valid term expressions, so the
624             contents of this scope shouldn't allow arbitrary expressions or the closing
625             bracket will be ambiguous.
626              
627             =head2 XPK_PARENSCOPE_OPT, XPK_BRACKETSCOPE_OPT, XPK_BRACESCOPE_OPT, XPK_CHEVRONSCOPE_OPT
628              
629             I
630              
631             XPK_PARENSCOPE_OPT(pieces ...)
632             XPK_BRACKETSCOPE_OPT(pieces ...)
633             XPK_BRACESCOPE_OPT(pieces ...)
634             XPK_CHEVERONSCOPE_OPT(pieces ...)
635              
636             Each of the four C macros above has an optional variant, whose
637             name is suffixed by C<_OPT>. These pass an argument whose I field is either
638             true or false, indicating whether the scope was found, followed by the values
639             from the scope itself.
640              
641             This is a convenient shortcut to nesting the scope within a C
642             macro.
643              
644             =cut
645              
646             =head1 AUTHOR
647              
648             Paul Evans
649              
650             =cut
651              
652             0x55AA;