src - OpenBSD base system

diff options


context:
space:
mode:

author	Stuart Henderson <sthen@cvs.openbsd.org>	2013-03-25 20:09:46 +0000
committer	Stuart Henderson <sthen@cvs.openbsd.org>	2013-03-25 20:09:46 +0000
commit	6eda9412f8dc717200dec22c57afdd2a46edfa8e (patch)
tree	c835d11c3e3f90de6a8561320ec54f077bfd358f /gnu/usr.bin/perl/pod/perlinterp.pod
parent	19adbb7d6dbd1516758cad91978165088a19e917 (diff)

import perl 5.16.3 from CPAN - worked on by Andrew Fresh and myself

Diffstat (limited to 'gnu/usr.bin/perl/pod/perlinterp.pod')

-rw-r--r--

gnu/usr.bin/perl/pod/perlinterp.pod

742

1 files changed, 742 insertions, 0 deletions

diff --git a/gnu/usr.bin/perl/pod/perlinterp.pod b/gnu/usr.bin/perl/pod/perlinterp.pod
new file mode 100644
index 00000000000..c7f21209de5
--- /dev/null
+++ b/gnu/usr.bin/perl/pod/perlinterp.pod

@@ -0,0 +1,742 @@

+=encoding utf8

+=for comment

+Consistent formatting of this file is achieved with:

+ perl ./Porting/podtidy pod/perlinterp.pod

+=head1 NAME

+perlinterp - An overview of the Perl interpreter

+=head1 DESCRIPTION

+This document provides an overview of how the Perl interpreter works at

+the level of C code, along with pointers to the relevant C source code

+files.

+=head1 ELEMENTS OF THE INTERPRETER

+The work of the interpreter has two main stages: compiling the code

+into the internal representation, or bytecode, and then executing it.

+L<perlguts/Compiled code> explains exactly how the compilation stage

+happens.

+Here is a short breakdown of perl's operation:

+=head2 Startup

+The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)

+This is very high-level code, enough to fit on a single screen, and it

+resembles the code found in L<perlembed>; most of the real action takes

+place in F<perl.c>

+F<perlmain.c> is generated by C<ExtUtils::Miniperl> from

+F<miniperlmain.c> at make time, so you should make perl to follow this

+along.

+First, F<perlmain.c> allocates some memory and constructs a Perl

+interpreter, along these lines:

+ 1 PERL_SYS_INIT3(&argc,&argv,&env);

+ 2

+ 3 if (!PL_do_undump) {

+ 4 my_perl = perl_alloc();

+ 5 if (!my_perl)

+ 6 exit(1);

+ 7 perl_construct(my_perl);

+ 8 PL_perl_destruct_level = 0;

+ 9 }

+Line 1 is a macro, and its definition is dependent on your operating

+system. Line 3 references C<PL_do_undump>, a global variable - all

+global variables in Perl start with C<PL_>. This tells you whether the

+current running program was created with the C<-u> flag to perl and

+then F<undump>, which means it's going to be false in any sane context.

+Line 4 calls a function in F<perl.c> to allocate memory for a Perl

+interpreter. It's quite a simple function, and the guts of it looks

+like this:

+ my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));

+Here you see an example of Perl's system abstraction, which we'll see

+later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's

+own C<malloc> as defined in F<malloc.c> if you selected that option at

+configure time.

+Next, in line 7, we construct the interpreter using perl_construct,

+also in F<perl.c>; this sets up all the special variables that Perl

+needs, the stacks, and so on.

+Now we pass Perl the command line options, and tell it to go:

+ exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);

+ if (!exitstatus)

+ perl_run(my_perl);

+ exitstatus = perl_destruct(my_perl);

+ perl_free(my_perl);

+C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined

+in F<perl.c>, which processes the command line options, sets up any

+statically linked XS modules, opens the program and calls C<yyparse> to

+parse it.

+=head2 Parsing

+The aim of this stage is to take the Perl source, and turn it into an

+op tree. We'll see what one of those looks like later. Strictly

+speaking, there's three things going on here.

+C<yyparse>, the parser, lives in F<perly.c>, although you're better off

+reading the original YACC input in F<perly.y>. (Yes, Virginia, there

+B<is> a YACC grammar for Perl!) The job of the parser is to take your

+code and "understand" it, splitting it into sentences, deciding which

+operands go with which operators and so on.

+The parser is nobly assisted by the lexer, which chunks up your input

+into tokens, and decides what type of thing each token is: a variable

+name, an operator, a bareword, a subroutine, a core function, and so

+on. The main point of entry to the lexer is C<yylex>, and that and its

+associated routines can be found in F<toke.c>. Perl isn't much like

+other computer languages; it's highly context sensitive at times, it

+can be tricky to work out what sort of token something is, or where a

+token ends. As such, there's a lot of interplay between the tokeniser

+and the parser, which can get pretty frightening if you're not used to

+it.

+As the parser understands a Perl program, it builds up a tree of

+operations for the interpreter to perform during execution. The

+routines which construct and link together the various operations are

+to be found in F<op.c>, and will be examined later.

+=head2 Optimization

+Now the parsing stage is complete, and the finished tree represents the

+operations that the Perl interpreter needs to perform to execute our

+program. Next, Perl does a dry run over the tree looking for

+optimisations: constant expressions such as C<3 + 4> will be computed

+now, and the optimizer will also see if any multiple operations can be

+replaced with a single one. For instance, to fetch the variable

+C<$foo>, instead of grabbing the glob C<*foo> and looking at the scalar

+component, the optimizer fiddles the op tree to use a function which

+directly looks up the scalar in question. The main optimizer is C<peep>

+in F<op.c>, and many ops have their own optimizing functions.

+=head2 Running

+Now we're finally ready to go: we have compiled Perl byte code, and all

+that's left to do is run it. The actual execution is done by the

+C<runops_standard> function in F<run.c>; more specifically, it's done

+by these three innocent looking lines:

+ while ((PL_op = PL_op->op_ppaddr(aTHX))) {

+ PERL_ASYNC_CHECK();

+ }

+You may be more comfortable with the Perl version of that:

+ PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};

+Well, maybe not. Anyway, each op contains a function pointer, which

+stipulates the function which will actually carry out the operation.

+This function will return the next op in the sequence - this allows for

+things like C<if> which choose the next op dynamically at run time. The

+C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt

+execution if required.

+The actual functions called are known as PP code, and they're spread

+between four files: F<pp_hot.c> contains the "hot" code, which is most

+often used and highly optimized, F<pp_sys.c> contains all the

+system-specific functions, F<pp_ctl.c> contains the functions which

+implement control structures (C<if>, C<while> and the like) and F<pp.c>

+contains everything else. These are, if you like, the C code for Perl's

+built-in functions and operators.

+Note that each C<pp_> function is expected to return a pointer to the

+next op. Calls to perl subs (and eval blocks) are handled within the

+same runops loop, and do not consume extra space on the C stack. For

+example, C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or

+C<CxEVAL> block struct onto the context stack which contain the address

+of the op following the sub call or eval. They then return the first op

+of that sub or eval block, and so execution continues of that sub or

+block. Later, a C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB>

+or C<CxEVAL>, retrieves the return op from it, and returns it.

+=head2 Exception handing

+Perl's exception handing (i.e. C<die> etc.) is built on top of the

+low-level C<setjmp()>/C<longjmp()> C-library functions. These basically

+provide a way to capture the current PC and SP registers and later

+restore them; i.e. a C<longjmp()> continues at the point in code where

+a previous C<setjmp()> was done, with anything further up on the C

+stack being lost. This is why code should always save values using

+C<SAVE_FOO> rather than in auto variables.

+The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and

+C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and

+C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while

+C<die> within C<eval> does a C<JMPENV_JUMP(3)>.

+At entry points to perl, such as C<perl_parse()>, C<perl_run()> and

+C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops

+loop or whatever, and handle possible exception returns. For a 2

+return, final cleanup is performed, such as popping stacks and calling

+C<CHECK> or C<END> blocks. Amongst other things, this is how scope

+cleanup still occurs during an C<exit>.

+If a C<die> can find a C<CxEVAL> block on the context stack, then the

+stack is popped to that level and the return op in that block is

+assigned to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed.

+This normally passes control back to the guard. In the case of

+C<perl_run> and C<call_sv>, a non-null C<PL_restartop> triggers

+re-entry to the runops loop. The is the normal way that C<die> or

+C<croak> is handled within an C<eval>.

+Sometimes ops are executed within an inner runops loop, such as tie,

+sort or overload code. In this case, something like

+ sub FETCH { eval { die } }

+would cause a longjmp right back to the guard in C<perl_run>, popping

+both runops loops, which is clearly incorrect. One way to avoid this is

+for the tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in

+the inner runops loop, but for efficiency reasons, perl in fact just

+sets a flag, using C<CATCH_SET(TRUE)>. The C<pp_require>,

+C<pp_entereval> and C<pp_entertry> ops check this flag, and if true,

+they call C<docatch>, which does a C<JMPENV_PUSH> and starts a new

+runops level to execute the code, rather than doing it on the current

+loop.

+As a further optimisation, on exit from the eval block in the C<FETCH>,

+execution of the code following the block is still carried on in the

+inner loop. When an exception is raised, C<docatch> compares the

+C<JMPENV> level of the C<CxEVAL> with C<PL_top_env> and if they differ,

+just re-throws the exception. In this way any inner loops get popped.

+Here's an example.

+ 1: eval { tie @a, 'A' };

+ 2: sub A::TIEARRAY {

+ 3: eval { die };

+ 4: die;

+ 5: }

+To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH>

+then enters a runops loop. This loop executes the eval and tie ops on

+line 1, with the eval pushing a C<CxEVAL> onto the context stack.

+The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops

+loop to execute the body of C<TIEARRAY>. When it executes the entertry

+op on line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch>

+which does a C<JMPENV_PUSH> and starts a third runops loop, which then

+executes the die op. At this point the C call stack looks like this:

+ Perl_pp_die

+ Perl_runops # third loop

+ S_docatch_body

+ S_docatch

+ Perl_pp_entertry

+ Perl_runops # second loop

+ S_call_body

+ Perl_call_sv

+ Perl_pp_tie

+ Perl_runops # first loop

+ S_run_body

+ perl_run

+ main

+and the context and data stacks, as shown by C<-Dstv>, look like:

+ STACK 0: MAIN

+ CX 0: BLOCK =>

+ CX 1: EVAL => AV() PV("A"\0)

+ retop=leave

+ STACK 1: MAGIC

+ CX 0: SUB =>

+ retop=(null)

+ CX 1: EVAL => *

+ retop=nextstate

+The die pops the first C<CxEVAL> off the context stack, sets

+C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns

+to the top C<docatch>. This then starts another third-level runops

+level, which executes the nextstate, pushmark and die ops on line 4. At

+the point that the second C<pp_die> is called, the C call stack looks

+exactly like that above, even though we are no longer within an inner

+eval; this is because of the optimization mentioned earlier. However,

+the context stack now looks like this, ie with the top CxEVAL popped:

+ STACK 0: MAIN

+ CX 0: BLOCK =>

+ CX 1: EVAL => AV() PV("A"\0)

+ retop=leave

+ STACK 1: MAGIC

+ CX 0: SUB =>

+ retop=(null)

+The die on line 4 pops the context stack back down to the CxEVAL,

+leaving it as:

+ STACK 0: MAIN

+ CX 0: BLOCK =>

+As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a

+C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch:

+ S_docatch

+ Perl_pp_entertry

+ Perl_runops # second loop

+ S_call_body

+ Perl_call_sv

+ Perl_pp_tie

+ Perl_runops # first loop

+ S_run_body

+ perl_run

+ main

+In this case, because the C<JMPENV> level recorded in the C<CxEVAL>

+differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)>

+and the C stack unwinds to:

+ perl_run

+ main

+Because C<PL_restartop> is non-null, C<run_body> starts a new runops

+loop and execution continues.

+=head2 INTERNAL VARIABLE TYPES

+You should by now have had a look at L<perlguts>, which tells you about

+Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do

+that now.

+These variables are used not only to represent Perl-space variables,

+but also any constants in the code, as well as some structures

+completely internal to Perl. The symbol table, for instance, is an

+ordinary Perl hash. Your code is represented by an SV as it's read into

+the parser; any program files you call are opened via ordinary Perl

+filehandles, and so on.

+The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a

+Perl program. Let's see, for instance, how Perl treats the constant

+C<"hello">.

+ % perl -MDevel::Peek -e 'Dump("hello")'

+ 1 SV = PV(0xa041450) at 0xa04ecbc

+ 2 REFCNT = 1

+ 3 FLAGS = (POK,READONLY,pPOK)

+ 4 PV = 0xa0484e0 "hello"\0

+ 5 CUR = 5

+ 6 LEN = 6

+Reading C<Devel::Peek> output takes a bit of practise, so let's go

+through it line by line.

+Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in

+memory. SVs themselves are very simple structures, but they contain a

+pointer to a more complex structure. In this case, it's a PV, a

+structure which holds a string value, at location C<0xa041450>. Line 2

+is the reference count; there are no other references to this data, so

+it's 1.

+Line 3 are the flags for this SV - it's OK to use it as a PV, it's a

+read-only SV (because it's a constant) and the data is a PV internally.

+Next we've got the contents of the string, starting at location

+C<0xa0484e0>.

+Line 5 gives us the current length of the string - note that this does

+B<not> include the null terminator. Line 6 is not the length of the

+string, but the length of the currently allocated buffer; as the string

+grows, Perl automatically extends the available storage via a routine

+called C<SvGROW>.

+You can get at any of these quantities from C very easily; just add

+C<Sv> to the name of the field shown in the snippet, and you've got a

+macro which will return the value: C<SvCUR(sv)> returns the current

+length of the string, C<SvREFCOUNT(sv)> returns the reference count,

+C<SvPV(sv, len)> returns the string itself with its length, and so on.

+More macros to manipulate these properties can be found in L<perlguts>.

+Let's take an example of manipulating a PV, from C<sv_catpvn>, in

+F<sv.c>

+ 1 void

+ 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)

+ 3 {

+ 4 STRLEN tlen;

+ 5 char *junk;

+ 6 junk = SvPV_force(sv, tlen);

+ 7 SvGROW(sv, tlen + len + 1);

+ 8 if (ptr == junk)

+ 9 ptr = SvPVX(sv);

+ 10 Move(ptr,SvPVX(sv)+tlen,len,char);

+ 11 SvCUR(sv) += len;

+ 12 *SvEND(sv) = '\0';

+ 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */

+ 14 SvTAINT(sv);

+ 15 }

+This is a function which adds a string, C<ptr>, of length C<len> onto

+the end of the PV stored in C<sv>. The first thing we do in line 6 is

+make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>

+macro to force a PV. As a side effect, C<tlen> gets set to the current

+value of the PV, and the PV itself is returned to C<junk>.

+In line 7, we make sure that the SV will have enough room to

+accommodate the old string, the new string and the null terminator. If

+C<LEN> isn't big enough, C<SvGROW> will reallocate space for us.

+Now, if C<junk> is the same as the string we're trying to add, we can

+grab the string directly from the SV; C<SvPVX> is the address of the PV

+in the SV.

+Line 10 does the actual catenation: the C<Move> macro moves a chunk of

+memory around: we move the string C<ptr> to the end of the PV - that's

+the start of the PV plus its current length. We're moving C<len> bytes

+of type C<char>. After doing so, we need to tell Perl we've extended

+the string, by altering C<CUR> to reflect the new length. C<SvEND> is a

+macro which gives us the end of the string, so that needs to be a

+C<"\0">.

+Line 13 manipulates the flags; since we've changed the PV, any IV or NV

+values will no longer be valid: if we have C<$a=10; $a.="6";> we don't

+want to use the old IV of 10. C<SvPOK_only_utf8> is a special

+UTF-8-aware version of C<SvPOK_only>, a macro which turns off the IOK

+and NOK flags and turns on POK. The final C<SvTAINT> is a macro which

+launders tainted data if taint mode is turned on.

+AVs and HVs are more complicated, but SVs are by far the most common

+variable type being thrown around. Having seen something of how we

+manipulate these, let's go on and look at how the op tree is

+constructed.

+=head1 OP TREES

+First, what is the op tree, anyway? The op tree is the parsed

+representation of your program, as we saw in our section on parsing,

+and it's the sequence of operations that Perl goes through to execute

+your program, as we saw in L</Running>.

+An op is a fundamental operation that Perl can perform: all the

+built-in functions and operators are ops, and there are a series of ops

+which deal with concepts the interpreter needs internally - entering

+and leaving a block, ending a statement, fetching a variable, and so

+on.

+The op tree is connected in two ways: you can imagine that there are

+two "routes" through it, two orders in which you can traverse the tree.

+First, parse order reflects how the parser understood the code, and

+secondly, execution order tells perl what order to perform the

+operations in.

+The easiest way to examine the op tree is to stop Perl after it has

+finished parsing, and get it to dump out the tree. This is exactly what

+the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>

+and L<B::Debug|B::Debug> do.

+Let's have a look at how Perl sees C<$a = $b + $c>:

+ % perl -MO=Terse -e '$a=$b+$c'

+ 1 LISTOP (0x8179888) leave

+ 2 OP (0x81798b0) enter

+ 3 COP (0x8179850) nextstate

+ 4 BINOP (0x8179828) sassign

+ 5 BINOP (0x8179800) add [1]

+ 6 UNOP (0x81796e0) null [15]

+ 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b

+ 8 UNOP (0x81797e0) null [15]

+ 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c

+ 10 UNOP (0x816b4f0) null [15]

+ 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a

+Let's start in the middle, at line 4. This is a BINOP, a binary

+operator, which is at location C<0x8179828>. The specific operator in

+question is C<sassign> - scalar assignment - and you can find the code

+which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a

+binary operator, it has two children: the add operator, providing the

+result of C<$b+$c>, is uppermost on line 5, and the left hand side is

+on line 10.

+Line 10 is the null op: this does exactly nothing. What is that doing

+there? If you see the null op, it's a sign that something has been

+optimized away after parsing. As we mentioned in L</Optimization>, the

+optimization stage sometimes converts two operations into one, for

+example when fetching a scalar variable. When this happens, instead of

+rewriting the op tree and cleaning up the dangling pointers, it's

+easier just to replace the redundant operation with the null op.

+Originally, the tree would have looked like this:

+ 10 SVOP (0x816b4f0) rv2sv [15]

+ 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a

+That is, fetch the C<a> entry from the main symbol table, and then look

+at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>)

+happens to do both these things.

+The right hand side, starting at line 5 is similar to what we've just

+seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add

+together two C<gvsv>s.

+Now, what's this about?

+ 1 LISTOP (0x8179888) leave

+ 2 OP (0x81798b0) enter

+ 3 COP (0x8179850) nextstate

+C<enter> and C<leave> are scoping ops, and their job is to perform any

+housekeeping every time you enter and leave a block: lexical variables

+are tidied up, unreferenced variables are destroyed, and so on. Every

+program will have those first three lines: C<leave> is a list, and its

+children are all the statements in the block. Statements are delimited

+by C<nextstate>, so a block is a collection of C<nextstate> ops, with

+the ops to be performed for each statement being the children of

+C<nextstate>. C<enter> is a single op which functions as a marker.

+That's how Perl parsed the program, from top to bottom:

+ Program

+ |

+ Statement

+ |

+ =

+ / \

+ $a +

+ / \

+ $b $c

+However, it's impossible to B<perform> the operations in this order:

+you have to find the values of C<$b> and C<$c> before you add them

+together, for instance. So, the other thread that runs through the op

+tree is the execution order: each op has a field C<op_next> which

+points to the next op to be run, so following these pointers tells us

+how perl executes the code. We can traverse the tree in this order

+using the C<exec> option to C<B::Terse>:

+ % perl -MO=Terse,exec -e '$a=$b+$c'

+ 1 OP (0x8179928) enter

+ 2 COP (0x81798c8) nextstate

+ 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b

+ 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c

+ 5 BINOP (0x8179878) add [1]

+ 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a

+ 7 BINOP (0x81798a0) sassign

+ 8 LISTOP (0x8179900) leave

+This probably makes more sense for a human: enter a block, start a

+statement. Get the values of C<$b> and C<$c>, and add them together.

+Find C<$a>, and assign one to the other. Then leave.

+The way Perl builds up these op trees in the parsing process can be

+unravelled by examining F<perly.y>, the YACC grammar. Let's take the

+piece we need to construct the tree for C<$a = $b + $c>

+ 1 term : term ASSIGNOP term

+ 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }

+ 3 | term ADDOP term

+ 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }

+If you're not used to reading BNF grammars, this is how it works:

+You're fed certain things by the tokeniser, which generally end up in

+upper case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in

+your code. C<ASSIGNOP> is provided when C<=> is used for assigning.

+These are "terminal symbols", because you can't get any simpler than

+them.

+The grammar, lines one and three of the snippet above, tells you how to

+build up more complex forms. These complex forms, "non-terminal

+symbols" are generally placed in lower case. C<term> here is a

+non-terminal symbol, representing a single expression.

+The grammar gives you the following rule: you can make the thing on the

+left of the colon if you see all the things on the right in sequence.

+This is called a "reduction", and the aim of parsing is to completely

+reduce the input. There are several different ways you can perform a

+reduction, separated by vertical bars: so, C<term> followed by C<=>

+followed by C<term> makes a C<term>, and C<term> followed by C<+>

+followed by C<term> can also make a C<term>.

+So, if you see two terms with an C<=> or C<+>, between them, you can

+turn them into a single expression. When you do this, you execute the

+code in the block on the next line: if you see C<=>, you'll do the code

+in line 2. If you see C<+>, you'll do the code in line 4. It's this

+code which contributes to the op tree.

+ | term ADDOP term

+ { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }

+What this does is creates a new binary op, and feeds it a number of

+variables. The variables refer to the tokens: C<$1> is the first token

+in the input, C<$2> the second, and so on - think regular expression

+backreferences. C<$$> is the op returned from this reduction. So, we

+call C<newBINOP> to create a new binary operator. The first parameter

+to C<newBINOP>, a function in F<op.c>, is the op type. It's an addition

+operator, so we want the type to be C<ADDOP>. We could specify this

+directly, but it's right there as the second token in the input, so we

+use C<$2>. The second parameter is the op's flags: 0 means "nothing

+special". Then the things to add: the left and right hand side of our

+expression, in scalar context.

+=head1 STACKS

+When perl executes something like C<addop>, how does it pass on its

+results to the next op? The answer is, through the use of stacks. Perl

+has a number of stacks to store things it's currently working on, and

+we'll look at the three most important ones here.

+=head2 Argument stack

+Arguments are passed to PP code and returned from PP code using the

+argument stack, C<ST>. The typical way to handle arguments is to pop

+them off the stack, deal with them how you wish, and then push the

+result back onto the stack. This is how, for instance, the cosine

+operator works:

+ NV value;

+ value = POPn;

+ value = Perl_cos(value);

+ XPUSHn(value);

+We'll see a more tricky example of this when we consider Perl's macros

+below. C<POPn> gives you the NV (floating point value) of the top SV on

+the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and

+push the result back as an NV. The C<X> in C<XPUSHn> means that the

+stack should be extended if necessary - it can't be necessary here,

+because we know there's room for one more item on the stack, since

+we've just removed one! The C<XPUSH*> macros at least guarantee safety.

+Alternatively, you can fiddle with the stack directly: C<SP> gives you

+the first element in your portion of the stack, and C<TOP*> gives you

+the top SV/IV/NV/etc. on the stack. So, for instance, to do unary

+negation of an integer:

+ SETi(-TOPi);

+Just set the integer value of the top stack entry to its negation.

+Argument stack manipulation in the core is exactly the same as it is in

+XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer

+description of the macros used in stack manipulation.

+=head2 Mark stack

+I say "your portion of the stack" above because PP code doesn't

+necessarily get the whole stack to itself: if your function calls

+another function, you'll only want to expose the arguments aimed for

+the called function, and not (necessarily) let it get at your own data.

+The way we do this is to have a "virtual" bottom-of-stack, exposed to

+each function. The mark stack keeps bookmarks to locations in the

+argument stack usable by each function. For instance, when dealing with

+a tied variable, (internally, something with "P" magic) Perl has to

+call methods for accesses to the tied variables. However, we need to

+separate the arguments exposed to the method to the argument exposed to

+the original function - the store or fetch or whatever it may be.

+Here's roughly how the tied C<push> is implemented; see C<av_push> in

+F<av.c>:

+ 1 PUSHMARK(SP);

+ 2 EXTEND(SP,2);

+ 3 PUSHs(SvTIED_obj((SV*)av, mg));

+ 4 PUSHs(val);

+ 5 PUTBACK;

+ 6 ENTER;

+ 7 call_method("PUSH", G_SCALAR|G_DISCARD);

+ 8 LEAVE;

+Let's examine the whole implementation, for practice:

+ 1 PUSHMARK(SP);

+Push the current state of the stack pointer onto the mark stack. This

+is so that when we've finished adding items to the argument stack, Perl

+knows how many things we've added recently.

+ 2 EXTEND(SP,2);

+ 3 PUSHs(SvTIED_obj((SV*)av, mg));

+ 4 PUSHs(val);

+We're going to add two more items onto the argument stack: when you

+have a tied array, the C<PUSH> subroutine receives the object and the

+value to be pushed, and that's exactly what we have here - the tied

+object, retrieved with C<SvTIED_obj>, and the value, the SV C<val>.

+ 5 PUTBACK;

+Next we tell Perl to update the global stack pointer from our internal

+variable: C<dSP> only gave us a local copy, not a reference to the

+global.

+ 6 ENTER;

+ 7 call_method("PUSH", G_SCALAR|G_DISCARD);

+ 8 LEAVE;

+C<ENTER> and C<LEAVE> localise a block of code - they make sure that

+all variables are tidied up, everything that has been localised gets

+its previous value returned, and so on. Think of them as the C<{> and

+C<}> of a Perl block.

+To actually do the magic method call, we have to call a subroutine in

+Perl space: C<call_method> takes care of that, and it's described in

+L<perlcall>. We call the C<PUSH> method in scalar context, and we're

+going to discard its return value. The call_method() function removes

+the top element of the mark stack, so there is nothing for the caller

+to clean up.

+=head2 Save stack

+C doesn't have a concept of local scope, so perl provides one. We've

+seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save

+stack implements the C equivalent of, for example:

+ {

+ local $foo = 42;

+ ...

+ }

+See L<perlguts/"Localizing changes"> for how to use the save stack.

+=head1 MILLIONS OF MACROS

+One thing you'll notice about the Perl source is that it's full of

+macros. Some have called the pervasive use of macros the hardest thing

+to understand, others find it adds to clarity. Let's take an example,

+the code which implements the addition operator:

+ 1 PP(pp_add)

+ 2 {

+ 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);

+ 4 {

+ 5 dPOPTOPnnrl_ul;

+ 6 SETn( left + right );

+ 7 RETURN;

+ 8 }

+ 9 }

+Every line here (apart from the braces, of course) contains a macro.

+The first line sets up the function declaration as Perl expects for PP

+code; line 3 sets up variable declarations for the argument stack and

+the target, the return value of the operation. Finally, it tries to see

+if the addition operation is overloaded; if so, the appropriate

+subroutine is called.

+Line 5 is another variable declaration - all variable declarations

+start with C<d> - which pops from the top of the argument stack two NVs

+(hence C<nn>) and puts them into the variables C<right> and C<left>,

+hence the C<rl>. These are the two operands to the addition operator.

+Next, we call C<SETn> to set the NV of the return value to the result

+of adding the two values. This done, we return - the C<RETURN> macro

+makes sure that our return value is properly handled, and we pass the

+next operator to run back to the main run loop.

+Most of these macros are explained in L<perlapi>, and some of the more

+important ones are explained in L<perlxs> as well. Pay special

+attention to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for

+information on the C<[pad]THX_?> macros.

+=head1 FURTHER READING

+For more information on the Perl internals, please see the documents

+listed at L<perl/Internals and C Language Interface>.