diff options
author | afresh1 <afresh1@cvs.openbsd.org> | 2014-03-24 14:59:14 +0000 |
---|---|---|
committer | afresh1 <afresh1@cvs.openbsd.org> | 2014-03-24 14:59:14 +0000 |
commit | c080cf55b5ad88c4056e6e9a4f858e0dfbf642b1 (patch) | |
tree | 90e52b9a68c9bf2fe8cd12484950cdc93821c2c4 /gnu/usr.bin/perl/pod | |
parent | 2fae50d18aceff793a4705626eb1156e0070870a (diff) |
Import perl-5.18.2
OK espie@ sthen@ deraadt@
Diffstat (limited to 'gnu/usr.bin/perl/pod')
43 files changed, 7132 insertions, 1889 deletions
diff --git a/gnu/usr.bin/perl/pod/perl5005delta.pod b/gnu/usr.bin/perl/pod/perl5005delta.pod index 62661254a21..e73bcebc429 100644 --- a/gnu/usr.bin/perl/pod/perl5005delta.pod +++ b/gnu/usr.bin/perl/pod/perl5005delta.pod @@ -509,7 +509,7 @@ as L<perldos> on some systems). MiNT is now supported. See F<README.mint>. -MPE/iX is now supported. See F<README.mpeix>. +MPE/iX is now supported. See README.mpeix. MVS (aka OS390, aka Open Edition) is now supported. See F<README.os390> (installed as L<perlos390> on some systems). diff --git a/gnu/usr.bin/perl/pod/perl5120delta.pod b/gnu/usr.bin/perl/pod/perl5120delta.pod index f8a1810c861..6cbfb7adf36 100644 --- a/gnu/usr.bin/perl/pod/perl5120delta.pod +++ b/gnu/usr.bin/perl/pod/perl5120delta.pod @@ -116,7 +116,7 @@ it is interpolated into a regexp. See L<overload>. Extension modules can now cleanly hook into the Perl parser to define new kinds of keyword-headed expression and compound statement. The syntax following the keyword is defined entirely by the extension. This -allow a completely non-Perl sublanguage to be parsed inline, with the +allows a completely non-Perl sublanguage to be parsed inline, with the correct ops cleanly generated. See L<perlapi/PL_keyword_plugin> for the mechanism. The Perl core diff --git a/gnu/usr.bin/perl/pod/perl5125delta.pod b/gnu/usr.bin/perl/pod/perl5125delta.pod new file mode 100644 index 00000000000..90cb04a8329 --- /dev/null +++ b/gnu/usr.bin/perl/pod/perl5125delta.pod @@ -0,0 +1,241 @@ +=encoding utf8 + +=head1 NAME + +perl5125delta - what is new for perl v5.12.5 + +=head1 DESCRIPTION + +This document describes differences between the 5.12.4 release and +the 5.12.5 release. + +If you are upgrading from an earlier release such as 5.12.3, first read +L<perl5124delta>, which describes differences between 5.12.3 and +5.12.4. + +=head1 Security + +=head2 C<Encode> decode_xs n-byte heap-overflow (CVE-2011-2939) + +A bug in C<Encode> could, on certain inputs, cause the heap to overflow. +This problem has been corrected. Bug reported by Robert Zacek. + +=head2 C<File::Glob::bsd_glob()> memory error with GLOB_ALTDIRFUNC (CVE-2011-2728). + +Calling C<File::Glob::bsd_glob> with the unsupported flag GLOB_ALTDIRFUNC would +cause an access violation / segfault. A Perl program that accepts a flags value from +an external source could expose itself to denial of service or arbitrary code +execution attacks. There are no known exploits in the wild. The problem has been +corrected by explicitly disabling all unsupported flags and setting unused function +pointers to null. Bug reported by Clément Lecigne. + +=head2 Heap buffer overrun in 'x' string repeat operator (CVE-2012-5195) + +Poorly written perl code that allows an attacker to specify the count to +perl's 'x' string repeat operator can already cause a memory exhaustion +denial-of-service attack. A flaw in versions of perl before 5.15.5 can +escalate that into a heap buffer overrun; coupled with versions of glibc +before 2.16, it possibly allows the execution of arbitrary code. + +This problem has been fixed. + +=head1 Incompatible Changes + +There are no changes intentionally incompatible with 5.12.4. If any +exist, they are bugs and reports are welcome. + +=head1 Modules and Pragmata + +=head2 Updated Modules + +=head3 L<B::Concise> + +L<B::Concise> no longer produces mangled output with the B<-tree> option +[perl #80632]. + +=head3 L<charnames> + +A regression introduced in Perl 5.8.8 has been fixed, that caused +C<charnames::viacode(0)> to return C<undef> instead of the string "NULL" +[perl #72624]. + +=head3 L<Encode> has been upgraded from version 2.39 to version 2.39_01. + +See L</Security>. + +=head3 L<File::Glob> has been upgraded from version 1.07 to version 1.07_01. + +See L</Security>. + +=head3 L<Unicode::UCD> + +The documentation for the C<upper> function now actually says "upper", not +"lower". + +=head3 L<Module::CoreList> + +L<Module::CoreList> has been updated to version 2.50_02 to add data for +this release. + +=head1 Changes to Existing Documentation + +=head2 L<perlebcdic> + +The L<perlebcdic> document contains a helpful table to use in C<tr///> to +convert between EBCDIC and Latin1/ASCII. Unfortunately, the table was the +inverse of the one it describes. This has been corrected. + +=head2 L<perlunicode> + +The section on +L<User-Defined Case Mappings|perlunicode/User-Defined Case Mappings> had +some bad markup and unclear sentences, making parts of it unreadable. This +has been rectified. + +=head2 L<perluniprops> + +This document has been corrected to take non-ASCII platforms into account. + +=head1 Installation and Configuration Improvements + +=head2 Platform Specific Changes + +=over 4 + +=item Mac OS X + +There have been configuration and test fixes to make Perl build cleanly on +Lion and Mountain Lion. + +=item NetBSD + +The NetBSD hints file was corrected to be compatible with NetBSD 6.* + +=back + +=head1 Selected Bug Fixes + +=over 4 + +=item * + +C<chop> now correctly handles characters above "\x{7fffffff}" +[perl #73246]. + +=item * + +C<< ($<,$>) = (...) >> stopped working properly in 5.12.0. It is supposed +to make a single C<setreuid()> call, rather than calling C<setruid()> and +C<seteuid()> separately. Consequently it did not work properly. This has +been fixed [perl #75212]. + +=item * + +Fixed a regression of kill() when a match variable is used for the +process ID to kill [perl #75812]. + +=item * + +C<UNIVERSAL::VERSION> no longer leaks memory. It started leaking in Perl +5.10.0. + +=item * + +The C-level C<my_strftime> functions no longer leaks memory. This fixes a +memory leak in C<POSIX::strftime> [perl #73520]. + +=item * + +C<caller> no longer leaks memory when called from the DB package if +C<@DB::args> was assigned to after the first call to C<caller>. L<Carp> +was triggering this bug [perl #97010]. + +=item * + +Passing to C<index> an offset beyond the end of the string when the string +is encoded internally in UTF8 no longer causes panics [perl #75898]. + +=item * + +Syntax errors in C<< (?{...}) >> blocks in regular expressions no longer +cause panic messages [perl #2353]. + +=item * + +Perl 5.10.0 introduced some faulty logic that made "U*" in the middle of +a pack template equivalent to "U0" if the input string was empty. This has +been fixed [perl #90160]. + +=back + +=head1 Errata + +=head2 split() and C<@_> + +split() no longer modifies C<@_> when called in scalar or void context. +In void context it now produces a "Useless use of split" warning. +This is actually a change introduced in perl 5.12.0, but it was missed from +that release's L<perl5120delta>. + +=head1 Acknowledgements + +Perl 5.12.5 represents approximately 17 months of development since Perl 5.12.4 +and contains approximately 1,900 lines of changes across 64 files from 18 +authors. + +Perl continues to flourish into its third decade thanks to a vibrant community +of users and developers. The following people are known to have contributed the +improvements that became Perl 5.12.5: + +Andy Dougherty, Chris 'BinGOs' Williams, Craig A. Berry, David Mitchell, +Dominic Hargreaves, Father Chrysostomos, Florian Ragwitz, George Greer, Goro +Fuji, Jesse Vincent, Karl Williamson, Leon Brocard, Nicholas Clark, Rafael +Garcia-Suarez, Reini Urban, Ricardo Signes, Steve Hay, Tony Cook. + +The list above is almost certainly incomplete as it is automatically generated +from version control history. In particular, it does not include the names of +the (very much appreciated) contributors who reported issues to the Perl bug +tracker. + +Many of the changes included in this version originated in the CPAN modules +included in Perl's core. We're grateful to the entire CPAN community for +helping Perl to flourish. + +For a more complete list of all of Perl's historical contributors, please see +the F<AUTHORS> file in the Perl source distribution. + +=head1 Reporting Bugs + +If you find what you think is a bug, you might check the articles +recently posted to the comp.lang.perl.misc newsgroup and the perl +bug database at http://rt.perl.org/perlbug/ . There may also be +information at http://www.perl.org/ , the Perl Home Page. + +If you believe you have an unreported bug, please run the B<perlbug> +program included with your release. Be sure to trim your bug down +to a tiny but sufficient test case. Your bug report, along with the +output of C<perl -V>, will be sent off to perlbug@perl.org to be +analysed by the Perl porting team. + +If the bug you are reporting has security implications, which make it +inappropriate to send to a publicly archived mailing list, then please send +it to perl5-security-report@perl.org. This points to a closed subscription +unarchived mailing list, which includes all the core committers, who be able +to help assess the impact of issues, figure out a resolution, and help +co-ordinate the release of patches to mitigate or fix the problem across all +platforms on which Perl is supported. Please only use this address for +security issues in the Perl core, not for modules independently +distributed on CPAN. + +=head1 SEE ALSO + +The F<Changes> file for an explanation of how to view exhaustive details +on what changed. + +The F<INSTALL> file for how to build Perl. + +The F<README> file for general stuff. + +The F<Artistic> and F<Copying> files for copyright information. + +=cut diff --git a/gnu/usr.bin/perl/pod/perl5140delta.pod b/gnu/usr.bin/perl/pod/perl5140delta.pod index 74c82a8e141..26df41c6520 100644 --- a/gnu/usr.bin/perl/pod/perl5140delta.pod +++ b/gnu/usr.bin/perl/pod/perl5140delta.pod @@ -1210,7 +1210,7 @@ generation task. L<CPAN::Meta> version 2.110440 has been added as a dual-life module. It provides a standard library to read, interpret and write CPAN distribution -metadata files (like F<META.json> and F<META.yml)> that describe a +metadata files (like F<META.json> and F<META.yml>) that describe a distribution, its contents, and the requirements for building it and installing it. The latest CPAN distribution metadata specification is included as L<CPAN::Meta::Spec> and notes on changes in the specification diff --git a/gnu/usr.bin/perl/pod/perl5144delta.pod b/gnu/usr.bin/perl/pod/perl5144delta.pod new file mode 100644 index 00000000000..23deecb4f2a --- /dev/null +++ b/gnu/usr.bin/perl/pod/perl5144delta.pod @@ -0,0 +1,240 @@ +=encoding utf8 + +=head1 NAME + +perl5144delta - what is new for perl v5.14.4 + +=head1 DESCRIPTION + +This document describes differences between the 5.14.3 release and +the 5.14.4 release. + +If you are upgrading from an earlier release such as 5.12.0, first read +L<perl5140delta>, which describes differences between 5.12.0 and +5.14.0. + +=head1 Core Enhancements + +No changes since 5.14.0. + +=head1 Security + +This release contains one major, and medium, and a number of minor +security fixes. The latter are included mainly to allow the test suite to +pass cleanly with the clang compiler's address sanitizer facility. + +=head2 CVE-2013-1667: memory exhaustion with arbitrary hash keys + +With a carefully crafted set of hash keys (for example arguments on a +URL), it is possible to cause a hash to consume a large amount of memory +and CPU, and thus possibly to achieve a Denial-of-Service. + +This problem has been fixed. + +=head2 memory leak in Encode + +The UTF-8 encoding implementation in Encode.xs had a memory leak which has been +fixed. + +=head2 [perl #111594] Socket::unpack_sockaddr_un heap-buffer-overflow + +A read buffer overflow could occur when copying C<sockaddr> buffers. +Fairly harmless. + +This problem has been fixed. + +=head2 [perl #111586] SDBM_File: fix off-by-one access to global ".dir" + +An extra byte was being copied for some string literals. Fairly harmless. + +This problem has been fixed. + +=head2 off-by-two error in List::Util + +A string literal was being used that included two bytes beyond the +end of the string. Fairly harmless. + +This problem has been fixed. + +=head2 [perl #115994] fix segv in regcomp.c:S_join_exact() + +Under debugging builds, while marking optimised-out regex nodes as type +C<OPTIMIZED>, it could treat blocks of exact text as if they were nodes, +and thus SEGV. Fairly harmless. + +This problem has been fixed. + +=head2 [perl #115992] PL_eval_start use-after-free + +The statement C<local $[;>, when preceded by an C<eval>, and when not part +of an assignment, could crash. Fairly harmless. + +This problem has been fixed. + +=head2 wrap-around with IO on long strings + +Reading or writing strings greater than 2**31 bytes in size could segfault +due to integer wraparound. + +This problem has been fixed. + +=head1 Incompatible Changes + +There are no changes intentionally incompatible with 5.14.0. If any +exist, they are bugs and reports are welcome. + +=head1 Deprecations + +There have been no deprecations since 5.14.0. + +=head1 Modules and Pragmata + +=head2 New Modules and Pragmata + +None + +=head2 Updated Modules and Pragmata + +The following modules have just the minor code fixes as listed above in +L</Security> (version numbers have not changed): + +=over 4 + +=item Socket + +=item SDBM_File + +=item List::Util + +=back + +L<Encode> has been upgraded from version 2.42_01 to version 2.42_02. + +L<Module::CoreList> has been updated to version 2.49_06 to add data for +this release. + +=head2 Removed Modules and Pragmata + +None. + +=head1 Documentation + +=head2 New Documentation + +None. + +=head2 Changes to Existing Documentation + +None. + +=head1 Diagnostics + +No new or changed diagnostics. + +=head1 Utility Changes + +None + +=head1 Configuration and Compilation + +No changes. + +=head1 Platform Support + +=head2 New Platforms + +None. + +=head2 Discontinued Platforms + +None. + +=head2 Platform-Specific Notes + +=over 4 + +=item VMS + +5.14.3 failed to compile on VMS due to incomplete application of a patch +series that allowed C<userelocatableinc> and C<usesitecustomize> to be +used simultaneously. Other platforms were not affected and the problem +has now been corrected. + +=back + +=head1 Selected Bug Fixes + +=over 4 + +=item * + +In Perl 5.14.0, C<$tainted ~~ @array> stopped working properly. Sometimes +it would erroneously fail (when C<$tainted> contained a string that occurs +in the array I<after> the first element) or erroneously succeed (when +C<undef> occurred after the first element) [perl #93590]. + +=back + +=head1 Known Problems + +None. + +=head1 Acknowledgements + +Perl 5.14.4 represents approximately 5 months of development since Perl 5.14.3 +and contains approximately 1,700 lines of changes across 49 files from 12 +authors. + +Perl continues to flourish into its third decade thanks to a vibrant community +of users and developers. The following people are known to have contributed the +improvements that became Perl 5.14.4: + +Andy Dougherty, Chris 'BinGOs' Williams, Christian Hansen, Craig A. Berry, +Dave Rolsky, David Mitchell, Dominic Hargreaves, Father Chrysostomos, +Florian Ragwitz, Reini Urban, Ricardo Signes, Yves Orton. + + +The list above is almost certainly incomplete as it is automatically generated +from version control history. In particular, it does not include the names of +the (very much appreciated) contributors who reported issues to the Perl bug +tracker. + +For a more complete list of all of Perl's historical contributors, please see +the F<AUTHORS> file in the Perl source distribution. + + +=head1 Reporting Bugs + +If you find what you think is a bug, you might check the articles +recently posted to the comp.lang.perl.misc newsgroup and the perl +bug database at http://rt.perl.org/perlbug/ . There may also be +information at http://www.perl.org/ , the Perl Home Page. + +If you believe you have an unreported bug, please run the L<perlbug> +program included with your release. Be sure to trim your bug down +to a tiny but sufficient test case. Your bug report, along with the +output of C<perl -V>, will be sent off to perlbug@perl.org to be +analysed by the Perl porting team. + +If the bug you are reporting has security implications, which make it +inappropriate to send to a publicly archived mailing list, then please send +it to perl5-security-report@perl.org. This points to a closed subscription +unarchived mailing list, which includes all the core committers, who be able +to help assess the impact of issues, figure out a resolution, and help +co-ordinate the release of patches to mitigate or fix the problem across all +platforms on which Perl is supported. Please only use this address for +security issues in the Perl core, not for modules independently +distributed on CPAN. + +=head1 SEE ALSO + +The F<Changes> file for an explanation of how to view exhaustive details +on what changed. + +The F<INSTALL> file for how to build Perl. + +The F<README> file for general stuff. + +The F<Artistic> and F<Copying> files for copyright information. + +=cut diff --git a/gnu/usr.bin/perl/pod/perl5160delta.pod b/gnu/usr.bin/perl/pod/perl5160delta.pod index 9b67d17a243..ad29389806a 100644 --- a/gnu/usr.bin/perl/pod/perl5160delta.pod +++ b/gnu/usr.bin/perl/pod/perl5160delta.pod @@ -603,7 +603,7 @@ Thread.pm =back -=head2 Platforms with no supporting programmers: +=head2 Platforms with no supporting programmers These platforms will probably have their special build support removed during the diff --git a/gnu/usr.bin/perl/pod/perl5163delta.pod b/gnu/usr.bin/perl/pod/perl5163delta.pod new file mode 100644 index 00000000000..c97f154837c --- /dev/null +++ b/gnu/usr.bin/perl/pod/perl5163delta.pod @@ -0,0 +1,133 @@ +=encoding utf8 + +=head1 NAME + +perl5163delta - what is new for perl v5.16.3 + +=head1 DESCRIPTION + +This document describes differences between the 5.16.2 release and +the 5.16.3 release. + +If you are upgrading from an earlier release such as 5.16.1, first read +L<perl5162delta>, which describes differences between 5.16.1 and +5.16.2. + +=head1 Core Enhancements + +No changes since 5.16.0. + +=head1 Security + +This release contains one major and a number of minor security fixes. +These latter are included mainly to allow the test suite to pass cleanly +with the clang compiler's address sanitizer facility. + +=head2 CVE-2013-1667: memory exhaustion with arbitrary hash keys + +With a carefully crafted set of hash keys (for example arguments on a +URL), it is possible to cause a hash to consume a large amount of memory +and CPU, and thus possibly to achieve a Denial-of-Service. + +This problem has been fixed. + +=head2 wrap-around with IO on long strings + +Reading or writing strings greater than 2**31 bytes in size could segfault +due to integer wraparound. + +This problem has been fixed. + +=head2 memory leak in Encode + +The UTF-8 encoding implementation in Encode.xs had a memory leak which has been +fixed. + +=head1 Incompatible Changes + +There are no changes intentionally incompatible with 5.16.0. If any +exist, they are bugs and reports are welcome. + +=head1 Deprecations + +There have been no deprecations since 5.16.0. + +=head1 Modules and Pragmata + +=head2 Updated Modules and Pragmata + +=over 4 + +=item * + +L<Encode> has been upgraded from version 2.44 to version 2.44_01. + +=item * + +L<Module::CoreList> has been upgraded from version 2.76 to version 2.76_02. + +=item * + +L<XS::APItest> has been upgraded from version 0.38 to version 0.39. + +=back + +=head1 Known Problems + +None. + +=head1 Acknowledgements + +Perl 5.16.3 represents approximately 4 months of development since Perl 5.16.2 +and contains approximately 870 lines of changes across 39 files from 7 authors. + +Perl continues to flourish into its third decade thanks to a vibrant community +of users and developers. The following people are known to have contributed the +improvements that became Perl 5.16.3: + +Andy Dougherty, Chris 'BinGOs' Williams, Dave Rolsky, David Mitchell, Michael +Schroeder, Ricardo Signes, Yves Orton. + +The list above is almost certainly incomplete as it is automatically generated +from version control history. In particular, it does not include the names of +the (very much appreciated) contributors who reported issues to the Perl bug +tracker. + +For a more complete list of all of Perl's historical contributors, please see +the F<AUTHORS> file in the Perl source distribution. + +=head1 Reporting Bugs + +If you find what you think is a bug, you might check the articles +recently posted to the comp.lang.perl.misc newsgroup and the perl +bug database at http://rt.perl.org/perlbug/ . There may also be +information at http://www.perl.org/ , the Perl Home Page. + +If you believe you have an unreported bug, please run the L<perlbug> +program included with your release. Be sure to trim your bug down +to a tiny but sufficient test case. Your bug report, along with the +output of C<perl -V>, will be sent off to perlbug@perl.org to be +analysed by the Perl porting team. + +If the bug you are reporting has security implications, which make it +inappropriate to send to a publicly archived mailing list, then please +send it to perl5-security-report@perl.org. This points to a closed +subscription unarchived mailing list, which includes all the core +committers, who will be able to help assess the impact of issues, figure +out a resolution, and help co-ordinate the release of patches to +mitigate or fix the problem across all platforms on which Perl is +supported. Please only use this address for security issues in the Perl +core, not for modules independently distributed on CPAN. + +=head1 SEE ALSO + +The F<Changes> file for an explanation of how to view exhaustive details +on what changed. + +The F<INSTALL> file for how to build Perl. + +The F<README> file for general stuff. + +The F<Artistic> and F<Copying> files for copyright information. + +=cut diff --git a/gnu/usr.bin/perl/pod/perl5180delta.pod b/gnu/usr.bin/perl/pod/perl5180delta.pod new file mode 100644 index 00000000000..c60abf71863 --- /dev/null +++ b/gnu/usr.bin/perl/pod/perl5180delta.pod @@ -0,0 +1,3786 @@ +=encoding utf8 + +=head1 NAME + +perl5180delta - what is new for perl v5.18.0 + +=head1 DESCRIPTION + +This document describes differences between the v5.16.0 release and the v5.18.0 +release. + +If you are upgrading from an earlier release such as v5.14.0, first read +L<perl5160delta>, which describes differences between v5.14.0 and v5.16.0. + +=head1 Core Enhancements + +=head2 New mechanism for experimental features + +Newly-added experimental features will now require this incantation: + + no warnings "experimental::feature_name"; + use feature "feature_name"; # would warn without the prev line + +There is a new warnings category, called "experimental", containing +warnings that the L<feature> pragma emits when enabling experimental +features. + +Newly-added experimental features will also be given special warning IDs, +which consist of "experimental::" followed by the name of the feature. (The +plan is to extend this mechanism eventually to all warnings, to allow them +to be enabled or disabled individually, and not just by category.) + +By saying + + no warnings "experimental::feature_name"; + +you are taking responsibility for any breakage that future changes to, or +removal of, the feature may cause. + +Since some features (like C<~~> or C<my $_>) now emit experimental warnings, +and you may want to disable them in code that is also run on perls that do not +recognize these warning categories, consider using the C<if> pragma like this: + + no if $] >= 5.018, warnings => "experimental::feature_name"; + +Existing experimental features may begin emitting these warnings, too. Please +consult L<perlexperiment> for information on which features are considered +experimental. + +=head2 Hash overhaul + +Changes to the implementation of hashes in perl v5.18.0 will be one of the most +visible changes to the behavior of existing code. + +By default, two distinct hash variables with identical keys and values may now +provide their contents in a different order where it was previously identical. + +When encountering these changes, the key to cleaning up from them is to accept +that B<hashes are unordered collections> and to act accordingly. + +=head3 Hash randomization + +The seed used by Perl's hash function is now random. This means that the +order which keys/values will be returned from functions like C<keys()>, +C<values()>, and C<each()> will differ from run to run. + +This change was introduced to make Perl's hashes more robust to algorithmic +complexity attacks, and also because we discovered that it exposes hash +ordering dependency bugs and makes them easier to track down. + +Toolchain maintainers might want to invest in additional infrastructure to +test for things like this. Running tests several times in a row and then +comparing results will make it easier to spot hash order dependencies in +code. Authors are strongly encouraged not to expose the key order of +Perl's hashes to insecure audiences. + +Further, every hash has its own iteration order, which should make it much +more difficult to determine what the current hash seed is. + +=head3 New hash functions + +Perl v5.18 includes support for multiple hash functions, and changed +the default (to ONE_AT_A_TIME_HARD), you can choose a different +algorithm by defining a symbol at compile time. For a current list, +consult the F<INSTALL> document. Note that as of Perl v5.18 we can +only recommend use of the default or SIPHASH. All the others are +known to have security issues and are for research purposes only. + +=head3 PERL_HASH_SEED environment variable now takes a hex value + +C<PERL_HASH_SEED> no longer accepts an integer as a parameter; +instead the value is expected to be a binary value encoded in a hex +string, such as "0xf5867c55039dc724". This is to make the +infrastructure support hash seeds of arbitrary lengths, which might +exceed that of an integer. (SipHash uses a 16 byte seed.) + +=head3 PERL_PERTURB_KEYS environment variable added + +The C<PERL_PERTURB_KEYS> environment variable allows one to control the level of +randomization applied to C<keys> and friends. + +When C<PERL_PERTURB_KEYS> is 0, perl will not randomize the key order at all. The +chance that C<keys> changes due to an insert will be the same as in previous +perls, basically only when the bucket size is changed. + +When C<PERL_PERTURB_KEYS> is 1, perl will randomize keys in a non-repeatable +way. The chance that C<keys> changes due to an insert will be very high. This +is the most secure and default mode. + +When C<PERL_PERTURB_KEYS> is 2, perl will randomize keys in a repeatable way. +Repeated runs of the same program should produce the same output every time. + +C<PERL_HASH_SEED> implies a non-default C<PERL_PERTURB_KEYS> setting. Setting +C<PERL_HASH_SEED=0> (exactly one 0) implies C<PERL_PERTURB_KEYS=0> (hash key +randomization disabled); settng C<PERL_HASH_SEED> to any other value implies +C<PERL_PERTURB_KEYS=2> (deterministic and repeatable hash key randomization). +Specifying C<PERL_PERTURB_KEYS> explicitly to a different level overrides this +behavior. + +=head3 Hash::Util::hash_seed() now returns a string + +Hash::Util::hash_seed() now returns a string instead of an integer. This +is to make the infrastructure support hash seeds of arbitrary lengths +which might exceed that of an integer. (SipHash uses a 16 byte seed.) + +=head3 Output of PERL_HASH_SEED_DEBUG has been changed + +The environment variable PERL_HASH_SEED_DEBUG now makes perl show both the +hash function perl was built with, I<and> the seed, in hex, in use for that +process. Code parsing this output, should it exist, must change to accommodate +the new format. Example of the new format: + + $ PERL_HASH_SEED_DEBUG=1 ./perl -e1 + HASH_FUNCTION = MURMUR3 HASH_SEED = 0x1476bb9f + +=head2 Upgrade to Unicode 6.2 + +Perl now supports Unicode 6.2. A list of changes from Unicode +6.1 is at L<http://www.unicode.org/versions/Unicode6.2.0>. + +=head2 Character name aliases may now include non-Latin1-range characters + +It is possible to define your own names for characters for use in +C<\N{...}>, C<charnames::vianame()>, etc. These names can now be +comprised of characters from the whole Unicode range. This allows for +names to be in your native language, and not just English. Certain +restrictions apply to the characters that may be used (you can't define +a name that has punctuation in it, for example). See L<charnames/CUSTOM +ALIASES>. + +=head2 New DTrace probes + +The following new DTrace probes have been added: + +=over 4 + +=item * + +C<op-entry> + +=item * + +C<loading-file> + +=item * + +C<loaded-file> + +=back + +=head2 C<${^LAST_FH}> + +This new variable provides access to the filehandle that was last read. +This is the handle used by C<$.> and by C<tell> and C<eof> without +arguments. + +=head2 Regular Expression Set Operations + +This is an B<experimental> feature to allow matching against the union, +intersection, etc., of sets of code points, similar to +L<Unicode::Regex::Set>. It can also be used to extend C</x> processing +to [bracketed] character classes, and as a replacement of user-defined +properties, allowing more complex expressions than they do. See +L<perlrecharclass/Extended Bracketed Character Classes>. + +=head2 Lexical subroutines + +This new feature is still considered B<experimental>. To enable it: + + use 5.018; + no warnings "experimental::lexical_subs"; + use feature "lexical_subs"; + +You can now declare subroutines with C<state sub foo>, C<my sub foo>, and +C<our sub foo>. (C<state sub> requires that the "state" feature be +enabled, unless you write it as C<CORE::state sub foo>.) + +C<state sub> creates a subroutine visible within the lexical scope in which +it is declared. The subroutine is shared between calls to the outer sub. + +C<my sub> declares a lexical subroutine that is created each time the +enclosing block is entered. C<state sub> is generally slightly faster than +C<my sub>. + +C<our sub> declares a lexical alias to the package subroutine of the same +name. + +For more information, see L<perlsub/Lexical Subroutines>. + +=head2 Computed Labels + +The loop controls C<next>, C<last> and C<redo>, and the special C<dump> +operator, now allow arbitrary expressions to be used to compute labels at run +time. Previously, any argument that was not a constant was treated as the +empty string. + +=head2 More CORE:: subs + +Several more built-in functions have been added as subroutines to the +CORE:: namespace - namely, those non-overridable keywords that can be +implemented without custom parsers: C<defined>, C<delete>, C<exists>, +C<glob>, C<pos>, C<protoytpe>, C<scalar>, C<split>, C<study>, and C<undef>. + +As some of these have prototypes, C<prototype('CORE::...')> has been +changed to not make a distinction between overridable and non-overridable +keywords. This is to make C<prototype('CORE::pos')> consistent with +C<prototype(&CORE::pos)>. + +=head2 C<kill> with negative signal names + +C<kill> has always allowed a negative signal number, which kills the +process group instead of a single process. It has also allowed signal +names. But it did not behave consistently, because negative signal names +were treated as 0. Now negative signals names like C<-INT> are supported +and treated the same way as -2 [perl #112990]. + +=head1 Security + +=head2 See also: hash overhaul + +Some of the changes in the L<hash overhaul|/"Hash overhaul"> were made to +enhance security. Please read that section. + +=head2 C<Storable> security warning in documentation + +The documentation for C<Storable> now includes a section which warns readers +of the danger of accepting Storable documents from untrusted sources. The +short version is that deserializing certain types of data can lead to loading +modules and other code execution. This is documented behavior and wanted +behavior, but this opens an attack vector for malicious entities. + +=head2 C<Locale::Maketext> allowed code injection via a malicious template + +If users could provide a translation string to Locale::Maketext, this could be +used to invoke arbitrary Perl subroutines available in the current process. + +This has been fixed, but it is still possible to invoke any method provided by +C<Locale::Maketext> itself or a subclass that you are using. One of these +methods in turn will invoke the Perl core's C<sprintf> subroutine. + +In summary, allowing users to provide translation strings without auditing +them is a bad idea. + +This vulnerability is documented in CVE-2012-6329. + +=head2 Avoid calling memset with a negative count + +Poorly written perl code that allows an attacker to specify the count to perl's +C<x> string repeat operator can already cause a memory exhaustion +denial-of-service attack. A flaw in versions of perl before v5.15.5 can escalate +that into a heap buffer overrun; coupled with versions of glibc before 2.16, it +possibly allows the execution of arbitrary code. + +The flaw addressed to this commit has been assigned identifier CVE-2012-5195 +and was researched by Tim Brown. + +=head1 Incompatible Changes + +=head2 See also: hash overhaul + +Some of the changes in the L<hash overhaul|/"Hash overhaul"> are not fully +compatible with previous versions of perl. Please read that section. + +=head2 An unknown character name in C<\N{...}> is now a syntax error + +Previously, it warned, and the Unicode REPLACEMENT CHARACTER was +substituted. Unicode now recommends that this situation be a syntax +error. Also, the previous behavior led to some confusing warnings and +behaviors, and since the REPLACEMENT CHARACTER has no use other than as +a stand-in for some unknown character, any code that has this problem is +buggy. + +=head2 Formerly deprecated characters in C<\N{}> character name aliases are now errors. + +Since v5.12.0, it has been deprecated to use certain characters in +user-defined C<\N{...}> character names. These now cause a syntax +error. For example, it is now an error to begin a name with a digit, +such as in + + my $undraftable = "\N{4F}"; # Syntax error! + +or to have commas anywhere in the name. See L<charnames/CUSTOM ALIASES>. + +=head2 C<\N{BELL}> now refers to U+1F514 instead of U+0007 + +Unicode 6.0 reused the name "BELL" for a different code point than it +traditionally had meant. Since Perl v5.14, use of this name still +referred to U+0007, but would raise a deprecation warning. Now, "BELL" +refers to U+1F514, and the name for U+0007 is "ALERT". All the +functions in L<charnames> have been correspondingly updated. + +=head2 New Restrictions in Multi-Character Case-Insensitive Matching in Regular Expression Bracketed Character Classes + +Unicode has now withdrawn their previous recommendation for regular +expressions to automatically handle cases where a single character can +match multiple characters case-insensitively, for example, the letter +LATIN SMALL LETTER SHARP S and the sequence C<ss>. This is because +it turns out to be impracticable to do this correctly in all +circumstances. Because Perl has tried to do this as best it can, it +will continue to do so. (We are considering an option to turn it off.) +However, a new restriction is being added on such matches when they +occur in [bracketed] character classes. People were specifying +things such as C</[\0-\xff]/i>, and being surprised that it matches the +two character sequence C<ss> (since LATIN SMALL LETTER SHARP S occurs in +this range). This behavior is also inconsistent with using a +property instead of a range: C<\p{Block=Latin1}> also includes LATIN +SMALL LETTER SHARP S, but C</[\p{Block=Latin1}]/i> does not match C<ss>. +The new rule is that for there to be a multi-character case-insensitive +match within a bracketed character class, the character must be +explicitly listed, and not as an end point of a range. This more +closely obeys the Principle of Least Astonishment. See +L<perlrecharclass/Bracketed Character Classes>. Note that a bug [perl +#89774], now fixed as part of this change, prevented the previous +behavior from working fully. + +=head2 Explicit rules for variable names and identifiers + +Due to an oversight, single character variable names in v5.16 were +completely unrestricted. This opened the door to several kinds of +insanity. As of v5.18, these now follow the rules of other identifiers, +in addition to accepting characters that match the C<\p{POSIX_Punct}> +property. + +There is no longer any difference in the parsing of identifiers +specified by using braces versus without braces. For instance, perl +used to allow C<${foo:bar}> (with a single colon) but not C<$foo:bar>. +Now that both are handled by a single code path, they are both treated +the same way: both are forbidden. Note that this change is about the +range of permissible literal identifiers, not other expressions. + +=head2 Vertical tabs are now whitespace + +No one could recall why C<\s> didn't match C<\cK>, the vertical tab. +Now it does. Given the extreme rarity of that character, very little +breakage is expected. That said, here's what it means: + +C<\s> in a regex now matches a vertical tab in all circumstances. + +Literal vertical tabs in a regex literal are ignored when the C</x> +modifier is used. + +Leading vertical tabs, alone or mixed with other whitespace, are now +ignored when interpreting a string as a number. For example: + + $dec = " \cK \t 123"; + $hex = " \cK \t 0xF"; + + say 0 + $dec; # was 0 with warning, now 123 + say int $dec; # was 0, now 123 + say oct $hex; # was 0, now 15 + +=head2 C</(?{})/> and C</(??{})/> have been heavily reworked + +The implementation of this feature has been almost completely rewritten. +Although its main intent is to fix bugs, some behaviors, especially +related to the scope of lexical variables, will have changed. This is +described more fully in the L</Selected Bug Fixes> section. + +=head2 Stricter parsing of substitution replacement + +It is no longer possible to abuse the way the parser parses C<s///e> like +this: + + %_=(_,"Just another "); + $_="Perl hacker,\n"; + s//_}->{_/e;print + +=head2 C<given> now aliases the global C<$_> + +Instead of assigning to an implicit lexical C<$_>, C<given> now makes the +global C<$_> an alias for its argument, just like C<foreach>. However, it +still uses lexical C<$_> if there is lexical C<$_> in scope (again, just like +C<foreach>) [perl #114020]. + +=head2 The smartmatch family of features are now experimental + +Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been +a regular point of complaint. Although there are a number of ways in which +it is useful, it has also proven problematic and confusing for both users and +implementors of Perl. There have been a number of proposals on how to best +address the problem. It is clear that smartmatch is almost certainly either +going to change or go away in the future. Relying on its current behavior +is not recommended. + +Warnings will now be issued when the parser sees C<~~>, C<given>, or C<when>. +To disable these warnings, you can add this line to the appropriate scope: + + no if $] >= 5.018, warnings => "experimental::smartmatch"; + +Consider, though, replacing the use of these features, as they may change +behavior again before becoming stable. + +=head2 Lexical C<$_> is now experimental + +Since it was introduced in Perl v5.10, it has caused much confusion with no +obvious solution: + +=over + +=item * + +Various modules (e.g., List::Util) expect callback routines to use the +global C<$_>. C<use List::Util 'first'; my $_; first { $_ == 1 } @list> +does not work as one would expect. + +=item * + +A C<my $_> declaration earlier in the same file can cause confusing closure +warnings. + +=item * + +The "_" subroutine prototype character allows called subroutines to access +your lexical C<$_>, so it is not really private after all. + +=item * + +Nevertheless, subroutines with a "(@)" prototype and methods cannot access +the caller's lexical C<$_>, unless they are written in XS. + +=item * + +But even XS routines cannot access a lexical C<$_> declared, not in the +calling subroutine, but in an outer scope, iff that subroutine happened not +to mention C<$_> or use any operators that default to C<$_>. + +=back + +It is our hope that lexical C<$_> can be rehabilitated, but this may +cause changes in its behavior. Please use it with caution until it +becomes stable. + +=head2 readline() with C<$/ = \N> now reads N characters, not N bytes + +Previously, when reading from a stream with I/O layers such as +C<encoding>, the readline() function, otherwise known as the C<< <> >> +operator, would read I<N> bytes from the top-most layer. [perl #79960] + +Now, I<N> characters are read instead. + +There is no change in behaviour when reading from streams with no +extra layers, since bytes map exactly to characters. + +=head2 Overridden C<glob> is now passed one argument + +C<glob> overrides used to be passed a magical undocumented second argument +that identified the caller. Nothing on CPAN was using this, and it got in +the way of a bug fix, so it was removed. If you really need to identify +the caller, see L<Devel::Callsite> on CPAN. + +=head2 Here doc parsing + +The body of a here document inside a quote-like operator now always begins +on the line after the "<<foo" marker. Previously, it was documented to +begin on the line following the containing quote-like operator, but that +was only sometimes the case [perl #114040]. + +=head2 Alphanumeric operators must now be separated from the closing +delimiter of regular expressions + +You may no longer write something like: + + m/a/and 1 + +Instead you must write + + m/a/ and 1 + +with whitespace separating the operator from the closing delimiter of +the regular expression. Not having whitespace has resulted in a +deprecation warning since Perl v5.14.0. + +=head2 qw(...) can no longer be used as parentheses + +C<qw> lists used to fool the parser into thinking they were always +surrounded by parentheses. This permitted some surprising constructions +such as C<foreach $x qw(a b c) {...}>, which should really be written +C<foreach $x (qw(a b c)) {...}>. These would sometimes get the lexer into +the wrong state, so they didn't fully work, and the similar C<foreach qw(a +b c) {...}> that one might expect to be permitted never worked at all. + +This side effect of C<qw> has now been abolished. It has been deprecated +since Perl v5.13.11. It is now necessary to use real parentheses +everywhere that the grammar calls for them. + +=head2 Interaction of lexical and default warnings + +Turning on any lexical warnings used first to disable all default warnings +if lexical warnings were not already enabled: + + $*; # deprecation warning + use warnings "void"; + $#; # void warning; no deprecation warning + +Now, the C<debugging>, C<deprecated>, C<glob>, C<inplace> and C<malloc> warnings +categories are left on when turning on lexical warnings (unless they are +turned off by C<no warnings>, of course). + +This may cause deprecation warnings to occur in code that used to be free +of warnings. + +Those are the only categories consisting only of default warnings. Default +warnings in other categories are still disabled by C<< use warnings "category" >>, +as we do not yet have the infrastructure for controlling +individual warnings. + +=head2 C<state sub> and C<our sub> + +Due to an accident of history, C<state sub> and C<our sub> were equivalent +to a plain C<sub>, so one could even create an anonymous sub with +C<our sub { ... }>. These are now disallowed outside of the "lexical_subs" +feature. Under the "lexical_subs" feature they have new meanings described +in L<perlsub/Lexical Subroutines>. + +=head2 Defined values stored in environment are forced to byte strings + +A value stored in an environment variable has always been stringified. In this +release, it is converted to be only a byte string. First, it is forced to be +only a string. Then if the string is utf8 and the equivalent of +C<utf8::downgrade()> works, that result is used; otherwise, the equivalent of +C<utf8::encode()> is used, and a warning is issued about wide characters +(L</Diagnostics>). + +=head2 C<require> dies for unreadable files + +When C<require> encounters an unreadable file, it now dies. It used to +ignore the file and continue searching the directories in C<@INC> +[perl #113422]. + +=head2 C<gv_fetchmeth_*> and SUPER + +The various C<gv_fetchmeth_*> XS functions used to treat a package whose +named ended with C<::SUPER> specially. A method lookup on the C<Foo::SUPER> +package would be treated as a C<SUPER> method lookup on the C<Foo> package. This +is no longer the case. To do a C<SUPER> lookup, pass the C<Foo> stash and the +C<GV_SUPER> flag. + +=head2 C<split>'s first argument is more consistently interpreted + +After some changes earlier in v5.17, C<split>'s behavior has been +simplified: if the PATTERN argument evaluates to a string +containing one space, it is treated the way that a I<literal> string +containing one space once was. + +=head1 Deprecations + +=head2 Module removals + +The following modules will be removed from the core distribution in a future +release, and will at that time need to be installed from CPAN. Distributions +on CPAN which require these modules will need to list them as prerequisites. + +The core versions of these modules will now issue C<"deprecated">-category +warnings to alert you to this fact. To silence these deprecation warnings, +install the modules in question from CPAN. + +Note that these are (with rare exceptions) fine modules that you are encouraged +to continue to use. Their disinclusion from core primarily hinges on their +necessity to bootstrapping a fully functional, CPAN-capable Perl installation, +not usually on concerns over their design. + +=over + +=item L<encoding> + +The use of this pragma is now strongly discouraged. It conflates the encoding +of source text with the encoding of I/O data, reinterprets escape sequences in +source text (a questionable choice), and introduces the UTF-8 bug to all runtime +handling of character strings. It is broken as designed and beyond repair. + +For using non-ASCII literal characters in source text, please refer to L<utf8>. +For dealing with textual I/O data, please refer to L<Encode> and L<open>. + +=item L<Archive::Extract> + +=item L<B::Lint> + +=item L<B::Lint::Debug> + +=item L<CPANPLUS> and all included C<CPANPLUS::*> modules + +=item L<Devel::InnerPackage> + +=item L<Log::Message> + +=item L<Log::Message::Config> + +=item L<Log::Message::Handlers> + +=item L<Log::Message::Item> + +=item L<Log::Message::Simple> + +=item L<Module::Pluggable> + +=item L<Module::Pluggable::Object> + +=item L<Object::Accessor> + +=item L<Pod::LaTeX> + +=item L<Term::UI> + +=item L<Term::UI::History> + +=back + +=head2 Deprecated Utilities + +The following utilities will be removed from the core distribution in a +future release as their associated modules have been deprecated. They +will remain available with the applicable CPAN distribution. + +=over + +=item L<cpanp> + +=item C<cpanp-run-perl> + +=item L<cpan2dist> + +These items are part of the C<CPANPLUS> distribution. + +=item L<pod2latex> + +This item is part of the C<Pod::LaTeX> distribution. + +=back + +=head2 PL_sv_objcount + +This interpreter-global variable used to track the total number of +Perl objects in the interpreter. It is no longer maintained and will +be removed altogether in Perl v5.20. + +=head2 Five additional characters should be escaped in patterns with C</x> + +When a regular expression pattern is compiled with C</x>, Perl treats 6 +characters as white space to ignore, such as SPACE and TAB. However, +Unicode recommends 11 characters be treated thusly. We will conform +with this in a future Perl version. In the meantime, use of any of the +missing characters will raise a deprecation warning, unless turned off. +The five characters are: + + U+0085 NEXT LINE + U+200E LEFT-TO-RIGHT MARK + U+200F RIGHT-TO-LEFT MARK + U+2028 LINE SEPARATOR + U+2029 PARAGRAPH SEPARATOR + +=head2 User-defined charnames with surprising whitespace + +A user-defined character name with trailing or multiple spaces in a row is +likely a typo. This now generates a warning when defined, on the assumption +that uses of it will be unlikely to include the excess whitespace. + +=head2 Various XS-callable functions are now deprecated + +All the functions used to classify characters will be removed from a +future version of Perl, and should not be used. With participating C +compilers (e.g., gcc), compiling any file that uses any of these will +generate a warning. These were not intended for public use; there are +equivalent, faster, macros for most of them. + +See L<perlapi/Character classes>. The complete list is: + +C<is_uni_alnum>, C<is_uni_alnumc>, C<is_uni_alnumc_lc>, +C<is_uni_alnum_lc>, C<is_uni_alpha>, C<is_uni_alpha_lc>, +C<is_uni_ascii>, C<is_uni_ascii_lc>, C<is_uni_blank>, +C<is_uni_blank_lc>, C<is_uni_cntrl>, C<is_uni_cntrl_lc>, +C<is_uni_digit>, C<is_uni_digit_lc>, C<is_uni_graph>, +C<is_uni_graph_lc>, C<is_uni_idfirst>, C<is_uni_idfirst_lc>, +C<is_uni_lower>, C<is_uni_lower_lc>, C<is_uni_print>, +C<is_uni_print_lc>, C<is_uni_punct>, C<is_uni_punct_lc>, +C<is_uni_space>, C<is_uni_space_lc>, C<is_uni_upper>, +C<is_uni_upper_lc>, C<is_uni_xdigit>, C<is_uni_xdigit_lc>, +C<is_utf8_alnum>, C<is_utf8_alnumc>, C<is_utf8_alpha>, +C<is_utf8_ascii>, C<is_utf8_blank>, C<is_utf8_char>, +C<is_utf8_cntrl>, C<is_utf8_digit>, C<is_utf8_graph>, +C<is_utf8_idcont>, C<is_utf8_idfirst>, C<is_utf8_lower>, +C<is_utf8_mark>, C<is_utf8_perl_space>, C<is_utf8_perl_word>, +C<is_utf8_posix_digit>, C<is_utf8_print>, C<is_utf8_punct>, +C<is_utf8_space>, C<is_utf8_upper>, C<is_utf8_xdigit>, +C<is_utf8_xidcont>, C<is_utf8_xidfirst>. + +In addition these three functions that have never worked properly are +deprecated: +C<to_uni_lower_lc>, C<to_uni_title_lc>, and C<to_uni_upper_lc>. + +=head2 Certain rare uses of backslashes within regexes are now deprecated + +There are three pairs of characters that Perl recognizes as +metacharacters in regular expression patterns: C<{}>, C<[]>, and C<()>. +These can be used as well to delimit patterns, as in: + + m{foo} + s(foo)(bar) + +Since they are metacharacters, they have special meaning to regular +expression patterns, and it turns out that you can't turn off that +special meaning by the normal means of preceding them with a backslash, +if you use them, paired, within a pattern delimited by them. For +example, in + + m{foo\{1,3\}} + +the backslashes do not change the behavior, and this matches +S<C<"f o">> followed by one to three more occurrences of C<"o">. + +Usages like this, where they are interpreted as metacharacters, are +exceedingly rare; we think there are none, for example, in all of CPAN. +Hence, this deprecation should affect very little code. It does give +notice, however, that any such code needs to change, which will in turn +allow us to change the behavior in future Perl versions so that the +backslashes do have an effect, and without fear that we are silently +breaking any existing code. + +=head2 Splitting the tokens C<(?> and C<(*> in regular expressions + +A deprecation warning is now raised if the C<(> and C<?> are separated +by white space or comments in C<(?...)> regular expression constructs. +Similarly, if the C<(> and C<*> are separated in C<(*VERB...)> +constructs. + +=head2 Pre-PerlIO IO implementations + +In theory, you can currently build perl without PerlIO. Instead, you'd use a +wrapper around stdio or sfio. In practice, this isn't very useful. It's not +well tested, and without any support for IO layers or (thus) Unicode, it's not +much of a perl. Building without PerlIO will most likely be removed in the +next version of perl. + +PerlIO supports a C<stdio> layer if stdio use is desired. Similarly a +sfio layer could be produced in the future, if needed. + +=head1 Future Deprecations + +=over + +=item * + +Platforms without support infrastructure + +Both Windows CE and z/OS have been historically under-maintained, and are +currently neither successfully building nor regularly being smoke tested. +Efforts are underway to change this situation, but it should not be taken for +granted that the platforms are safe and supported. If they do not become +buildable and regularly smoked, support for them may be actively removed in +future releases. If you have an interest in these platforms and you can lend +your time, expertise, or hardware to help support these platforms, please let +the perl development effort know by emailing C<perl5-porters@perl.org>. + +Some platforms that appear otherwise entirely dead are also on the short list +for removal between now and v5.20.0: + +=over + +=item DG/UX + +=item NeXT + +=back + +We also think it likely that current versions of Perl will no longer +build AmigaOS, DJGPP, NetWare (natively), OS/2 and Plan 9. If you +are using Perl on such a platform and have an interest in ensuring +Perl's future on them, please contact us. + +We believe that Perl has long been unable to build on mixed endian +architectures (such as PDP-11s), and intend to remove any remaining +support code. Similarly, code supporting the long umaintained GNU +dld will be removed soon if no-one makes themselves known as an +active user. + +=item * + +Swapping of $< and $> + +Perl has supported the idiom of swapping $< and $> (and likewise $( and +$)) to temporarily drop permissions since 5.0, like this: + + ($<, $>) = ($>, $<); + +However, this idiom modifies the real user/group id, which can have +undesirable side-effects, is no longer useful on any platform perl +supports and complicates the implementation of these variables and list +assignment in general. + +As an alternative, assignment only to C<< $> >> is recommended: + + local $> = $<; + +See also: L<Setuid Demystified|http://www.cs.berkeley.edu/~daw/papers/setuid-usenix02.pdf>. + +=item * + +C<microperl>, long broken and of unclear present purpose, will be removed. + +=item * + +Revamping C<< "\Q" >> semantics in double-quotish strings when combined with +other escapes. + +There are several bugs and inconsistencies involving combinations +of C<\Q> and escapes like C<\x>, C<\L>, etc., within a C<\Q...\E> pair. +These need to be fixed, and doing so will necessarily change current +behavior. The changes have not yet been settled. + +=item * + +Use of C<$x>, where C<x> stands for any actual (non-printing) C0 control +character will be disallowed in a future Perl version. Use C<${x}> +instead (where again C<x> stands for a control character), +or better, C<$^A> , where C<^> is a caret (CIRCUMFLEX ACCENT), +and C<A> stands for any of the characters listed at the end of +L<perlebcdic/OPERATOR DIFFERENCES>. + +=back + +=head1 Performance Enhancements + +=over 4 + +=item * + +Lists of lexical variable declarations (C<my($x, $y)>) are now optimised +down to a single op and are hence faster than before. + +=item * + +A new C preprocessor define C<NO_TAINT_SUPPORT> was added that, if set, +disables Perl's taint support altogether. Using the -T or -t command +line flags will cause a fatal error. Beware that both core tests as +well as many a CPAN distribution's tests will fail with this change. On +the upside, it provides a small performance benefit due to reduced +branching. + +B<Do not enable this unless you know exactly what you are getting yourself +into.> + +=item * + +C<pack> with constant arguments is now constant folded in most cases +[perl #113470]. + +=item * + +Speed up in regular expression matching against Unicode properties. The +largest gain is for C<\X>, the Unicode "extended grapheme cluster." The +gain for it is about 35% - 40%. Bracketed character classes, e.g., +C<[0-9\x{100}]> containing code points above 255 are also now faster. + +=item * + +On platforms supporting it, several former macros are now implemented as static +inline functions. This should speed things up slightly on non-GCC platforms. + +=item * + +The optimisation of hashes in boolean context has been extended to +affect C<scalar(%hash)>, C<%hash ? ... : ...>, and C<sub { %hash || ... }>. + +=item * + +Filetest operators manage the stack in a fractionally more efficient manner. + +=item * + +Globs used in a numeric context are now numified directly in most cases, +rather than being numified via stringification. + +=item * + +The C<x> repetition operator is now folded to a single constant at compile +time if called in scalar context with constant operands and no parentheses +around the left operand. + +=back + +=head1 Modules and Pragmata + +=head2 New Modules and Pragmata + +=over 4 + +=item * + +L<Config::Perl::V> version 0.16 has been added as a dual-lifed module. +It provides structured data retrieval of C<perl -V> output including +information only known to the C<perl> binary and not available via L<Config>. + +=back + +=head2 Updated Modules and Pragmata + +For a complete list of updates, run: + + $ corelist --diff 5.16.0 5.18.0 + +You can substitute your favorite version in place of C<5.16.0>, too. + +=over + +=item * + +L<Archive::Extract> has been upgraded to 0.68. + +Work around an edge case on Linux with Busybox's unzip. + +=item * + +L<Archive::Tar> has been upgraded to 1.90. + +ptar now supports the -T option as well as dashless options +[rt.cpan.org #75473], [rt.cpan.org #75475]. + +Auto-encode filenames marked as UTF-8 [rt.cpan.org #75474]. + +Don't use C<tell> on L<IO::Zlib> handles [rt.cpan.org #64339]. + +Don't try to C<chown> on symlinks. + +=item * + +L<autodie> has been upgraded to 2.13. + +C<autodie> now plays nicely with the 'open' pragma. + +=item * + +L<B> has been upgraded to 1.42. + +The C<stashoff> method of COPs has been added. This provides access to an +internal field added in perl 5.16 under threaded builds [perl #113034]. + +C<B::COP::stashpv> now supports UTF-8 package names and embedded NULs. + +All C<CVf_*> and C<GVf_*> +and more SV-related flag values are now provided as constants in the C<B::> +namespace and available for export. The default export list has not changed. + +This makes the module work with the new pad API. + +=item * + +L<B::Concise> has been upgraded to 0.95. + +The C<-nobanner> option has been fixed, and C<format>s can now be dumped. +When passed a sub name to dump, it will check also to see whether it +is the name of a format. If a sub and a format share the same name, +it will dump both. + +This adds support for the new C<OpMAYBE_TRUEBOOL> and C<OPpTRUEBOOL> flags. + +=item * + +L<B::Debug> has been upgraded to 1.18. + +This adds support (experimentally) for C<B::PADLIST>, which was +added in Perl 5.17.4. + +=item * + +L<B::Deparse> has been upgraded to 1.20. + +Avoid warning when run under C<perl -w>. + +It now deparses +loop controls with the correct precedence, and multiple statements in a +C<format> line are also now deparsed correctly. + +This release suppresses trailing semicolons in formats. + +This release adds stub deparsing for lexical subroutines. + +It no longer dies when deparsing C<sort> without arguments. It now +correctly omits the comma for C<system $prog @args> and C<exec $prog +@args>. + +=item * + +L<bignum>, L<bigint> and L<bigrat> have been upgraded to 0.33. + +The overrides for C<hex> and C<oct> have been rewritten, eliminating +several problems, and making one incompatible change: + +=over + +=item * + +Formerly, whichever of C<use bigint> or C<use bigrat> was compiled later +would take precedence over the other, causing C<hex> and C<oct> not to +respect the other pragma when in scope. + +=item * + +Using any of these three pragmata would cause C<hex> and C<oct> anywhere +else in the program to evalute their arguments in list context and prevent +them from inferring $_ when called without arguments. + +=item * + +Using any of these three pragmata would make C<oct("1234")> return 1234 +(for any number not beginning with 0) anywhere in the program. Now "1234" +is translated from octal to decimal, whether within the pragma's scope or +not. + +=item * + +The global overrides that facilitate lexical use of C<hex> and C<oct> now +respect any existing overrides that were in place before the new overrides +were installed, falling back to them outside of the scope of C<use bignum>. + +=item * + +C<use bignum "hex">, C<use bignum "oct"> and similar invocations for bigint +and bigrat now export a C<hex> or C<oct> function, instead of providing a +global override. + +=back + +=item * + +L<Carp> has been upgraded to 1.29. + +Carp is no longer confused when C<caller> returns undef for a package that +has been deleted. + +The C<longmess()> and C<shortmess()> functions are now documented. + +=item * + +L<CGI> has been upgraded to 3.63. + +Unrecognized HTML escape sequences are now handled better, problematic +trailing newlines are no longer inserted after E<lt>formE<gt> tags +by C<startform()> or C<start_form()>, and bogus "Insecure Dependency" +warnings appearing with some versions of perl are now worked around. + +=item * + +L<Class::Struct> has been upgraded to 0.64. + +The constructor now respects overridden accessor methods [perl #29230]. + +=item * + +L<Compress::Raw::Bzip2> has been upgraded to 2.060. + +The misuse of Perl's "magic" API has been fixed. + +=item * + +L<Compress::Raw::Zlib> has been upgraded to 2.060. + +Upgrade bundled zlib to version 1.2.7. + +Fix build failures on Irix, Solaris, and Win32, and also when building as C++ +[rt.cpan.org #69985], [rt.cpan.org #77030], [rt.cpan.org #75222]. + +The misuse of Perl's "magic" API has been fixed. + +C<compress()>, C<uncompress()>, C<memGzip()> and C<memGunzip()> have +been speeded up by making parameter validation more efficient. + +=item * + +L<CPAN::Meta::Requirements> has been upgraded to 2.122. + +Treat undef requirements to C<from_string_hash> as 0 (with a warning). + +Added C<requirements_for_module> method. + +=item * + +L<CPANPLUS> has been upgraded to 0.9135. + +Allow adding F<blib/script> to PATH. + +Save the history between invocations of the shell. + +Handle multiple C<makemakerargs> and C<makeflags> arguments better. + +This resolves issues with the SQLite source engine. + +=item * + +L<Data::Dumper> has been upgraded to 2.145. + +It has been optimized to only build a seen-scalar hash as necessary, +thereby speeding up serialization drastically. + +Additional tests were added in order to improve statement, branch, condition +and subroutine coverage. On the basis of the coverage analysis, some of the +internals of Dumper.pm were refactored. Almost all methods are now +documented. + +=item * + +L<DB_File> has been upgraded to 1.827. + +The main Perl module no longer uses the C<"@_"> construct. + +=item * + +L<Devel::Peek> has been upgraded to 1.11. + +This fixes compilation with C++ compilers and makes the module work with +the new pad API. + +=item * + +L<Digest::MD5> has been upgraded to 2.52. + +Fix C<Digest::Perl::MD5> OO fallback [rt.cpan.org #66634]. + +=item * + +L<Digest::SHA> has been upgraded to 5.84. + +This fixes a double-free bug, which might have caused vulnerabilities +in some cases. + +=item * + +L<DynaLoader> has been upgraded to 1.18. + +This is due to a minor code change in the XS for the VMS implementation. + +This fixes warnings about using C<CODE> sections without an C<OUTPUT> +section. + +=item * + +L<Encode> has been upgraded to 2.49. + +The Mac alias x-mac-ce has been added, and various bugs have been fixed +in Encode::Unicode, Encode::UTF7 and Encode::GSM0338. + +=item * + +L<Env> has been upgraded to 1.04. + +Its SPLICE implementation no longer misbehaves in list context. + +=item * + +L<ExtUtils::CBuilder> has been upgraded to 0.280210. + +Manifest files are now correctly embedded for those versions of VC++ which +make use of them. [perl #111782, #111798]. + +A list of symbols to export can now be passed to C<link()> when on +Windows, as on other OSes [perl #115100]. + +=item * + +L<ExtUtils::ParseXS> has been upgraded to 3.18. + +The generated C code now avoids unnecessarily incrementing +C<PL_amagic_generation> on Perl versions where it's done automatically +(or on current Perl where the variable no longer exists). + +This avoids a bogus warning for initialised XSUB non-parameters [perl +#112776]. + +=item * + +L<File::Copy> has been upgraded to 2.26. + +C<copy()> no longer zeros files when copying into the same directory, +and also now fails (as it has long been documented to do) when attempting +to copy a file over itself. + +=item * + +L<File::DosGlob> has been upgraded to 1.10. + +The internal cache of file names that it keeps for each caller is now +freed when that caller is freed. This means +C<< use File::DosGlob 'glob'; eval 'scalar <*>' >> no longer leaks memory. + +=item * + +L<File::Fetch> has been upgraded to 0.38. + +Added the 'file_default' option for URLs that do not have a file +component. + +Use C<File::HomeDir> when available, and provide C<PERL5_CPANPLUS_HOME> to +override the autodetection. + +Always re-fetch F<CHECKSUMS> if C<fetchdir> is set. + +=item * + +L<File::Find> has been upgraded to 1.23. + +This fixes inconsistent unixy path handling on VMS. + +Individual files may now appear in list of directories to be searched +[perl #59750]. + +=item * + +L<File::Glob> has been upgraded to 1.20. + +File::Glob has had exactly the same fix as File::DosGlob. Since it is +what Perl's own C<glob> operator itself uses (except on VMS), this means +C<< eval 'scalar <*>' >> no longer leaks. + +A space-separated list of patterns return long lists of results no longer +results in memory corruption or crashes. This bug was introduced in +Perl 5.16.0. [perl #114984] + +=item * + +L<File::Spec::Unix> has been upgraded to 3.40. + +C<abs2rel> could produce incorrect results when given two relative paths or +the root directory twice [perl #111510]. + +=item * + +L<File::stat> has been upgraded to 1.07. + +C<File::stat> ignores the L<filetest> pragma, and warns when used in +combination therewith. But it was not warning for C<-r>. This has been +fixed [perl #111640]. + +C<-p> now works, and does not return false for pipes [perl #111638]. + +Previously C<File::stat>'s overloaded C<-x> and C<-X> operators did not give +the correct results for directories or executable files when running as +root. They had been treating executable permissions for root just like for +any other user, performing group membership tests I<etc> for files not owned +by root. They now follow the correct Unix behaviour - for a directory they +are always true, and for a file if any of the three execute permission bits +are set then they report that root can execute the file. Perl's builtin +C<-x> and C<-X> operators have always been correct. + +=item * + +L<File::Temp> has been upgraded to 0.23 + +Fixes various bugs involving directory removal. Defers unlinking tempfiles if +the initial unlink fails, which fixes problems on NFS. + +=item * + +L<GDBM_File> has been upgraded to 1.15. + +The undocumented optional fifth parameter to C<TIEHASH> has been +removed. This was intended to provide control of the callback used by +C<gdbm*> functions in case of fatal errors (such as filesystem problems), +but did not work (and could never have worked). No code on CPAN even +attempted to use it. The callback is now always the previous default, +C<croak>. Problems on some platforms with how the C<C> C<croak> function +is called have also been resolved. + +=item * + +L<Hash::Util> has been upgraded to 0.15. + +C<hash_unlocked> and C<hashref_unlocked> now returns true if the hash is +unlocked, instead of always returning false [perl #112126]. + +C<hash_unlocked>, C<hashref_unlocked>, C<lock_hash_recurse> and +C<unlock_hash_recurse> are now exportable [perl #112126]. + +Two new functions, C<hash_locked> and C<hashref_locked>, have been added. +Oddly enough, these two functions were already exported, even though they +did not exist [perl #112126]. + +=item * + +L<HTTP::Tiny> has been upgraded to 0.025. + +Add SSL verification features [github #6], [github #9]. + +Include the final URL in the response hashref. + +Add C<local_address> option. + +This improves SSL support. + +=item * + +L<IO> has been upgraded to 1.28. + +C<sync()> can now be called on read-only file handles [perl #64772]. + +L<IO::Socket> tries harder to cache or otherwise fetch socket +information. + +=item * + +L<IPC::Cmd> has been upgraded to 0.80. + +Use C<POSIX::_exit> instead of C<exit> in C<run_forked> [rt.cpan.org #76901]. + +=item * + +L<IPC::Open3> has been upgraded to 1.13. + +The C<open3()> function no longer uses C<POSIX::close()> to close file +descriptors since that breaks the ref-counting of file descriptors done by +PerlIO in cases where the file descriptors are shared by PerlIO streams, +leading to attempts to close the file descriptors a second time when +any such PerlIO streams are closed later on. + +=item * + +L<Locale::Codes> has been upgraded to 3.25. + +It includes some new codes. + +=item * + +L<Memoize> has been upgraded to 1.03. + +Fix the C<MERGE> cache option. + +=item * + +L<Module::Build> has been upgraded to 0.4003. + +Fixed bug where modules without C<$VERSION> might have a version of '0' listed +in 'provides' metadata, which will be rejected by PAUSE. + +Fixed bug in PodParser to allow numerals in module names. + +Fixed bug where giving arguments twice led to them becoming arrays, resulting +in install paths like F<ARRAY(0xdeadbeef)/lib/Foo.pm>. + +A minor bug fix allows markup to be used around the leading "Name" in +a POD "abstract" line, and some documentation improvements have been made. + +=item * + +L<Module::CoreList> has been upgraded to 2.90 + +Version information is now stored as a delta, which greatly reduces the +size of the F<CoreList.pm> file. + +This restores compatibility with older versions of perl and cleans up +the corelist data for various modules. + +=item * + +L<Module::Load::Conditional> has been upgraded to 0.54. + +Fix use of C<requires> on perls installed to a path with spaces. + +Various enhancements include the new use of Module::Metadata. + +=item * + +L<Module::Metadata> has been upgraded to 1.000011. + +The creation of a Module::Metadata object for a typical module file has +been sped up by about 40%, and some spurious warnings about C<$VERSION>s +have been suppressed. + +=item * + +L<Module::Pluggable> has been upgraded to 4.7. + +Amongst other changes, triggers are now allowed on events, which gives +a powerful way to modify behaviour. + +=item * + +L<Net::Ping> has been upgraded to 2.41. + +This fixes some test failures on Windows. + +=item * + +L<Opcode> has been upgraded to 1.25. + +Reflect the removal of the boolkeys opcode and the addition of the +clonecv, introcv and padcv opcodes. + +=item * + +L<overload> has been upgraded to 1.22. + +C<no overload> now warns for invalid arguments, just like C<use overload>. + +=item * + +L<PerlIO::encoding> has been upgraded to 0.16. + +This is the module implementing the ":encoding(...)" I/O layer. It no +longer corrupts memory or crashes when the encoding back-end reallocates +the buffer or gives it a typeglob or shared hash key scalar. + +=item * + +L<PerlIO::scalar> has been upgraded to 0.16. + +The buffer scalar supplied may now only contain code pounts 0xFF or +lower. [perl #109828] + +=item * + +L<Perl::OSType> has been upgraded to 1.003. + +This fixes a bug detecting the VOS operating system. + +=item * + +L<Pod::Html> has been upgraded to 1.18. + +The option C<--libpods> has been reinstated. It is deprecated, and its use +does nothing other than issue a warning that it is no longer supported. + +Since the HTML files generated by pod2html claim to have a UTF-8 charset, +actually write the files out using UTF-8 [perl #111446]. + +=item * + +L<Pod::Simple> has been upgraded to 3.28. + +Numerous improvements have been made, mostly to Pod::Simple::XHTML, +which also has a compatibility change: the C<codes_in_verbatim> option +is now disabled by default. See F<cpan/Pod-Simple/ChangeLog> for the +full details. + +=item * + +L<re> has been upgraded to 0.23 + +Single character [class]es like C</[s]/> or C</[s]/i> are now optimized +as if they did not have the brackets, i.e. C</s/> or C</s/i>. + +See note about C<op_comp> in the L</Internal Changes> section below. + +=item * + +L<Safe> has been upgraded to 2.35. + +Fix interactions with C<Devel::Cover>. + +Don't eval code under C<no strict>. + +=item * + +L<Scalar::Util> has been upgraded to version 1.27. + +Fix an overloading issue with C<sum>. + +C<first> and C<reduce> now check the callback first (so C<&first(1)> is +disallowed). + +Fix C<tainted> on magical values [rt.cpan.org #55763]. + +Fix C<sum> on previously magical values [rt.cpan.org #61118]. + +Fix reading past the end of a fixed buffer [rt.cpan.org #72700]. + +=item * + +L<Search::Dict> has been upgraded to 1.07. + +No longer require C<stat> on filehandles. + +Use C<fc> for casefolding. + +=item * + +L<Socket> has been upgraded to 2.009. + +Constants and functions required for IP multicast source group membership +have been added. + +C<unpack_sockaddr_in()> and C<unpack_sockaddr_in6()> now return just the IP +address in scalar context, and C<inet_ntop()> now guards against incorrect +length scalars being passed in. + +This fixes an uninitialized memory read. + +=item * + +L<Storable> has been upgraded to 2.41. + +Modifying C<$_[0]> within C<STORABLE_freeze> no longer results in crashes +[perl #112358]. + +An object whose class implements C<STORABLE_attach> is now thawed only once +when there are multiple references to it in the structure being thawed +[perl #111918]. + +Restricted hashes were not always thawed correctly [perl #73972]. + +Storable would croak when freezing a blessed REF object with a +C<STORABLE_freeze()> method [perl #113880]. + +It can now freeze and thaw vstrings correctly. This causes a slight +incompatible change in the storage format, so the format version has +increased to 2.9. + +This contains various bugfixes, including compatibility fixes for older +versions of Perl and vstring handling. + +=item * + +L<Sys::Syslog> has been upgraded to 0.32. + +This contains several bug fixes relating to C<getservbyname()>, +C<setlogsock()>and log levels in C<syslog()>, together with fixes for +Windows, Haiku-OS and GNU/kFreeBSD. See F<cpan/Sys-Syslog/Changes> +for the full details. + +=item * + +L<Term::ANSIColor> has been upgraded to 4.02. + +Add support for italics. + +Improve error handling. + +=item * + +L<Term::ReadLine> has been upgraded to 1.10. This fixes the +use of the B<cpan> and B<cpanp> shells on Windows in the event that the current +drive happens to contain a F<\dev\tty> file. + +=item * + +L<Test::Harness> has been upgraded to 3.26. + +Fix glob semantics on Win32 [rt.cpan.org #49732]. + +Don't use C<Win32::GetShortPathName> when calling perl [rt.cpan.org #47890]. + +Ignore -T when reading shebang [rt.cpan.org #64404]. + +Handle the case where we don't know the wait status of the test more +gracefully. + +Make the test summary 'ok' line overridable so that it can be changed to a +plugin to make the output of prove idempotent. + +Don't run world-writable files. + +=item * + +L<Text::Tabs> and L<Text::Wrap> have been upgraded to +2012.0818. Support for Unicode combining characters has been added to them +both. + +=item * + +L<threads::shared> has been upgraded to 1.31. + +This adds the option to warn about or ignore attempts to clone structures +that can't be cloned, as opposed to just unconditionally dying in +that case. + +This adds support for dual-valued values as created by +L<Scalar::Util::dualvar|Scalar::Util/"dualvar NUM, STRING">. + +=item * + +L<Tie::StdHandle> has been upgraded to 4.3. + +C<READ> now respects the offset argument to C<read> [perl #112826]. + +=item * + +L<Time::Local> has been upgraded to 1.2300. + +Seconds values greater than 59 but less than 60 no longer cause +C<timegm()> and C<timelocal()> to croak. + +=item * + +L<Unicode::UCD> has been upgraded to 0.53. + +This adds a function L<all_casefolds()|Unicode::UCD/all_casefolds()> +that returns all the casefolds. + +=item * + +L<Win32> has been upgraded to 0.47. + +New APIs have been added for getting and setting the current code page. + +=back + + +=head2 Removed Modules and Pragmata + +=over + +=item * + +L<Version::Requirements> has been removed from the core distribution. It is +available under a different name: L<CPAN::Meta::Requirements>. + +=back + +=head1 Documentation + +=head2 Changes to Existing Documentation + +=head3 L<perlcheat> + +=over 4 + +=item * + +L<perlcheat> has been reorganized, and a few new sections were added. + +=back + +=head3 L<perldata> + +=over 4 + +=item * + +Now explicitly documents the behaviour of hash initializer lists that +contain duplicate keys. + +=back + +=head3 L<perldiag> + +=over 4 + +=item * + +The explanation of symbolic references being prevented by "strict refs" +now doesn't assume that the reader knows what symbolic references are. + +=back + +=head3 L<perlfaq> + +=over 4 + +=item * + +L<perlfaq> has been synchronized with version 5.0150040 from CPAN. + +=back + +=head3 L<perlfunc> + +=over 4 + +=item * + +The return value of C<pipe> is now documented. + +=item * + +Clarified documentation of C<our>. + +=back + +=head3 L<perlop> + +=over 4 + +=item * + +Loop control verbs (C<dump>, C<goto>, C<next>, C<last> and C<redo>) have always +had the same precedence as assignment operators, but this was not documented +until now. + +=back + +=head3 Diagnostics + +The following additions or changes have been made to diagnostic output, +including warnings and fatal error messages. For the complete list of +diagnostic messages, see L<perldiag>. + +=head2 New Diagnostics + +=head3 New Errors + +=over 4 + +=item * + +L<Unterminated delimiter for here document|perldiag/"Unterminated delimiter for here document"> + +This message now occurs when a here document label has an initial quotation +mark but the final quotation mark is missing. + +This replaces a bogus and misleading error message about not finding the label +itself [perl #114104]. + +=item * + +L<panic: child pseudo-process was never scheduled|perldiag/"panic: child pseudo-process was never scheduled"> + +This error is thrown when a child pseudo-process in the ithreads implementation +on Windows was not scheduled within the time period allowed and therefore was +not able to initialize properly [perl #88840]. + +=item * + +L<Group name must start with a non-digit word character in regex; marked by <-- HERE in mE<sol>%sE<sol>|perldiag/"Group name must start with a non-digit word character in regex; marked by <-- HERE in m/%s/"> + +This error has been added for C<(?&0)>, which is invalid. It used to +produce an incomprehensible error message [perl #101666]. + +=item * + +L<Can't use an undefined value as a subroutine reference|perldiag/"Can't use an undefined value as %s reference"> + +Calling an undefined value as a subroutine now produces this error message. +It used to, but was accidentally disabled, first in Perl 5.004 for +non-magical variables, and then in Perl v5.14 for magical (e.g., tied) +variables. It has now been restored. In the mean time, undef was treated +as an empty string [perl #113576]. + +=item * + +L<Experimental "%s" subs not enabled|perldiag/"Experimental "%s" subs not enabled"> + +To use lexical subs, you must first enable them: + + no warnings 'experimental::lexical_subs'; + use feature 'lexical_subs'; + my sub foo { ... } + +=back + +=head3 New Warnings + +=over 4 + +=item * + +L<'Strings with code points over 0xFF may not be mapped into in-memory file handles'|perldiag/"Strings with code points over 0xFF may not be mapped into in-memory file handles"> + +=item * + +L<'%s' resolved to '\o{%s}%d'|perldiag/"'%s' resolved to '\o{%s}%d'"> + +=item * + +L<'Trailing white-space in a charnames alias definition is deprecated'|perldiag/"Trailing white-space in a charnames alias definition is deprecated"> + +=item * + +L<'A sequence of multiple spaces in a charnames alias definition is deprecated'|perldiag/"A sequence of multiple spaces in a charnames alias definition is deprecated"> + +=item * + +L<'Passing malformed UTF-8 to "%s" is deprecated'|perldiag/"Passing malformed UTF-8 to "%s" is deprecated"> + +=item * + +L<Subroutine "&%s" is not available|perldiag/"Subroutine "&%s" is not available"> + +(W closure) During compilation, an inner named subroutine or eval is +attempting to capture an outer lexical subroutine that is not currently +available. This can happen for one of two reasons. First, the lexical +subroutine may be declared in an outer anonymous subroutine that has not +yet been created. (Remember that named subs are created at compile time, +while anonymous subs are created at run-time.) For example, + + sub { my sub a {...} sub f { \&a } } + +At the time that f is created, it can't capture the current the "a" sub, +since the anonymous subroutine hasn't been created yet. Conversely, the +following won't give a warning since the anonymous subroutine has by now +been created and is live: + + sub { my sub a {...} eval 'sub f { \&a }' }->(); + +The second situation is caused by an eval accessing a variable that has +gone out of scope, for example, + + sub f { + my sub a {...} + sub { eval '\&a' } + } + f()->(); + +Here, when the '\&a' in the eval is being compiled, f() is not currently +being executed, so its &a is not available for capture. + +=item * + +L<"%s" subroutine &%s masks earlier declaration in same %s|perldiag/"%s" subroutine &%s masks earlier declaration in same %s> + +(W misc) A "my" or "state" subroutine has been redeclared in the +current scope or statement, effectively eliminating all access to +the previous instance. This is almost always a typographical error. +Note that the earlier subroutine will still exist until the end of +the scope or until all closure references to it are destroyed. + +=item * + +L<The %s feature is experimental|perldiag/"The %s feature is experimental"> + +(S experimental) This warning is emitted if you enable an experimental +feature via C<use feature>. Simply suppress the warning if you want +to use the feature, but know that in doing so you are taking the risk +of using an experimental feature which may change or be removed in a +future Perl version: + + no warnings "experimental::lexical_subs"; + use feature "lexical_subs"; + +=item * + +L<sleep(%u) too large|perldiag/"sleep(%u) too large"> + +(W overflow) You called C<sleep> with a number that was larger than it can +reliably handle and C<sleep> probably slept for less time than requested. + +=item * + +L<Wide character in setenv|perldiag/"Wide character in %s"> + +Attempts to put wide characters into environment variables via C<%ENV> now +provoke this warning. + +=item * + +"L<Invalid negative number (%s) in chr|perldiag/"Invalid negative number (%s) in chr">" + +C<chr()> now warns when passed a negative value [perl #83048]. + +=item * + +"L<Integer overflow in srand|perldiag/"Integer overflow in srand">" + +C<srand()> now warns when passed a value that doesn't fit in a C<UV> (since the +value will be truncated rather than overflowing) [perl #40605]. + +=item * + +"L<-i used with no filenames on the command line, reading from STDIN|perldiag/"-i used with no filenames on the command line, reading from STDIN">" + +Running perl with the C<-i> flag now warns if no input files are provided on +the command line [perl #113410]. + +=back + +=head2 Changes to Existing Diagnostics + +=over 4 + +=item * + +L<$* is no longer supported|perldiag/"$* is no longer supported"> + +The warning that use of C<$*> and C<$#> is no longer supported is now +generated for every location that references them. Previously it would fail +to be generated if another variable using the same typeglob was seen first +(e.g. C<@*> before C<$*>), and would not be generated for the second and +subsequent uses. (It's hard to fix the failure to generate warnings at all +without also generating them every time, and warning every time is +consistent with the warnings that C<$[> used to generate.) + +=item * + +The warnings for C<\b{> and C<\B{> were added. They are a deprecation +warning which should be turned off by that category. One should not +have to turn off regular regexp warnings as well to get rid of these. + +=item * + +L<Constant(%s): Call to &{$^H{%s}} did not return a defined value|perldiag/Constant(%s): Call to &{$^H{%s}} did not return a defined value> + +Constant overloading that returns C<undef> results in this error message. +For numeric constants, it used to say "Constant(undef)". "undef" has been +replaced with the number itself. + +=item * + +The error produced when a module cannot be loaded now includes a hint that +the module may need to be installed: "Can't locate hopping.pm in @INC (you +may need to install the hopping module) (@INC contains: ...)" + +=item * + +L<vector argument not supported with alpha versions|perldiag/vector argument not supported with alpha versions> + +This warning was not suppressable, even with C<no warnings>. Now it is +suppressible, and has been moved from the "internal" category to the +"printf" category. + +=item * + +C<< Can't do {n,m} with n > m in regex; marked by <-- HERE in m/%s/ >> + +This fatal error has been turned into a warning that reads: + +L<< Quantifier {n,m} with n > m can't match in regex | perldiag/Quantifier {n,m} with n > m can't match in regex >> + +(W regexp) Minima should be less than or equal to maxima. If you really want +your regexp to match something 0 times, just put {0}. + +=item * + +The "Runaway prototype" warning that occurs in bizarre cases has been +removed as being unhelpful and inconsistent. + +=item * + +The "Not a format reference" error has been removed, as the only case in +which it could be triggered was a bug. + +=item * + +The "Unable to create sub named %s" error has been removed for the same +reason. + +=item * + +The 'Can't use "my %s" in sort comparison' error has been downgraded to a +warning, '"my %s" used in sort comparison' (with 'state' instead of 'my' +for state variables). In addition, the heuristics for guessing whether +lexical $a or $b has been misused have been improved to generate fewer +false positives. Lexical $a and $b are no longer disallowed if they are +outside the sort block. Also, a named unary or list operator inside the +sort block no longer causes the $a or $b to be ignored [perl #86136]. + +=back + +=head1 Utility Changes + +=head3 L<h2xs> + +=over 4 + +=item * + +F<h2xs> no longer produces invalid code for empty defines. [perl #20636] + +=back + +=head1 Configuration and Compilation + +=over 4 + +=item * + +Added C<useversionedarchname> option to Configure + +When set, it includes 'api_versionstring' in 'archname'. E.g. +x86_64-linux-5.13.6-thread-multi. It is unset by default. + +This feature was requested by Tim Bunce, who observed that +C<INSTALL_BASE> creates a library structure that does not +differentiate by perl version. Instead, it places architecture +specific files in "$install_base/lib/perl5/$archname". This makes +it difficult to use a common C<INSTALL_BASE> library path with +multiple versions of perl. + +By setting C<-Duseversionedarchname>, the $archname will be +distinct for architecture I<and> API version, allowing mixed use of +C<INSTALL_BASE>. + +=item * + +Add a C<PERL_NO_INLINE_FUNCTIONS> option + +If C<PERL_NO_INLINE_FUNCTIONS> is defined, don't include "inline.h" + +This permits test code to include the perl headers for definitions without +creating a link dependency on the perl library (which may not exist yet). + +=item * + +Configure will honour the external C<MAILDOMAIN> environment variable, if set. + +=item * + +C<installman> no longer ignores the silent option + +=item * + +Both C<META.yml> and C<META.json> files are now included in the distribution. + +=item * + +F<Configure> will now correctly detect C<isblank()> when compiling with a C++ +compiler. + +=item * + +The pager detection in F<Configure> has been improved to allow responses which +specify options after the program name, e.g. B</usr/bin/less -R>, if the user +accepts the default value. This helps B<perldoc> when handling ANSI escapes +[perl #72156]. + +=back + +=head1 Testing + +=over 4 + +=item * + +The test suite now has a section for tests that require very large amounts +of memory. These tests won't run by default; they can be enabled by +setting the C<PERL_TEST_MEMORY> environment variable to the number of +gibibytes of memory that may be safely used. + +=back + +=head1 Platform Support + +=head2 Discontinued Platforms + +=over 4 + +=item BeOS + +BeOS was an operating system for personal computers developed by Be Inc, +initially for their BeBox hardware. The OS Haiku was written as an open +source replacement for/continuation of BeOS, and its perl port is current and +actively maintained. + +=item UTS Global + +Support code relating to UTS global has been removed. UTS was a mainframe +version of System V created by Amdahl, subsequently sold to UTS Global. The +port has not been touched since before Perl v5.8.0, and UTS Global is now +defunct. + +=item VM/ESA + +Support for VM/ESA has been removed. The port was tested on 2.3.0, which +IBM ended service on in March 2002. 2.4.0 ended service in June 2003, and +was superseded by Z/VM. The current version of Z/VM is V6.2.0, and scheduled +for end of service on 2015/04/30. + +=item MPE/IX + +Support for MPE/IX has been removed. + +=item EPOC + +Support code relating to EPOC has been removed. EPOC was a family of +operating systems developed by Psion for mobile devices. It was the +predecessor of Symbian. The port was last updated in April 2002. + +=item Rhapsody + +Support for Rhapsody has been removed. + +=back + +=head2 Platform-Specific Notes + +=head3 AIX + +Configure now always adds C<-qlanglvl=extc99> to the CC flags on AIX when +using xlC. This will make it easier to compile a number of XS-based modules +that assume C99 [perl #113778]. + +=head3 clang++ + +There is now a workaround for a compiler bug that prevented compiling +with clang++ since Perl v5.15.7 [perl #112786]. + +=head3 C++ + +When compiling the Perl core as C++ (which is only semi-supported), the +mathom functions are now compiled as C<extern "C">, to ensure proper +binary compatibility. (However, binary compatibility isn't generally +guaranteed anyway in the situations where this would matter.) + +=head3 Darwin + +Stop hardcoding an alignment on 8 byte boundaries to fix builds using +-Dusemorebits. + +=head3 Haiku + +Perl should now work out of the box on Haiku R1 Alpha 4. + +=head3 MidnightBSD + +C<libc_r> was removed from recent versions of MidnightBSD and older versions +work better with C<pthread>. Threading is now enabled using C<pthread> which +corrects build errors with threading enabled on 0.4-CURRENT. + +=head3 Solaris + +In Configure, avoid running sed commands with flags not supported on Solaris. + +=head3 VMS + +=over + +=item * + +Where possible, the case of filenames and command-line arguments is now +preserved by enabling the CRTL features C<DECC$EFS_CASE_PRESERVE> and +C<DECC$ARGV_PARSE_STYLE> at start-up time. The latter only takes effect +when extended parse is enabled in the process from which Perl is run. + +=item * + +The character set for Extended Filename Syntax (EFS) is now enabled by default +on VMS. Among other things, this provides better handling of dots in directory +names, multiple dots in filenames, and spaces in filenames. To obtain the old +behavior, set the logical name C<DECC$EFS_CHARSET> to C<DISABLE>. + +=item * + +Fixed linking on builds configured with C<-Dusemymalloc=y>. + +=item * + +Experimental support for building Perl with the HP C++ compiler is available +by configuring with C<-Dusecxx>. + +=item * + +All C header files from the top-level directory of the distribution are now +installed on VMS, providing consistency with a long-standing practice on other +platforms. Previously only a subset were installed, which broke non-core +extension builds for extensions that depended on the missing include files. + +=item * + +Quotes are now removed from the command verb (but not the parameters) for +commands spawned via C<system>, backticks, or a piped C<open>. Previously, +quotes on the verb were passed through to DCL, which would fail to recognize +the command. Also, if the verb is actually a path to an image or command +procedure on an ODS-5 volume, quoting it now allows the path to contain spaces. + +=item * + +The B<a2p> build has been fixed for the HP C++ compiler on OpenVMS. + +=back + +=head3 Win32 + +=over + +=item * + +Perl can now be built using Microsoft's Visual C++ 2012 compiler by specifying +CCTYPE=MSVC110 (or MSVC110FREE if you are using the free Express edition for +Windows Desktop) in F<win32/Makefile>. + +=item * + +The option to build without C<USE_SOCKETS_AS_HANDLES> has been removed. + +=item * + +Fixed a problem where perl could crash while cleaning up threads (including the +main thread) in threaded debugging builds on Win32 and possibly other platforms +[perl #114496]. + +=item * + +A rare race condition that would lead to L<sleep|perlfunc/sleep> taking more +time than requested, and possibly even hanging, has been fixed [perl #33096]. + +=item * + +C<link> on Win32 now attempts to set C<$!> to more appropriate values +based on the Win32 API error code. [perl #112272] + +Perl no longer mangles the environment block, e.g. when launching a new +sub-process, when the environment contains non-ASCII characters. Known +problems still remain, however, when the environment contains characters +outside of the current ANSI codepage (e.g. see the item about Unicode in +C<%ENV> in L<http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod>). +[perl #113536] + +=item * + +Building perl with some Windows compilers used to fail due to a problem +with miniperl's C<glob> operator (which uses the C<perlglob> program) +deleting the PATH environment variable [perl #113798]. + +=item * + +A new makefile option, C<USE_64_BIT_INT>, has been added to the Windows +makefiles. Set this to "define" when building a 32-bit perl if you want +it to use 64-bit integers. + +Machine code size reductions, already made to the DLLs of XS modules in +Perl v5.17.2, have now been extended to the perl DLL itself. + +Building with VC++ 6.0 was inadvertently broken in Perl v5.17.2 but has +now been fixed again. + +=back + +=head3 WinCE + +Building on WinCE is now possible once again, although more work is required +to fully restore a clean build. + +=head1 Internal Changes + +=over + +=item * + +Synonyms for the misleadingly named C<av_len()> have been created: +C<av_top_index()> and C<av_tindex>. All three of these return the +number of the highest index in the array, not the number of elements it +contains. + +=item * + +SvUPGRADE() is no longer an expression. Originally this macro (and its +underlying function, sv_upgrade()) were documented as boolean, although +in reality they always croaked on error and never returned false. In 2005 +the documentation was updated to specify a void return value, but +SvUPGRADE() was left always returning 1 for backwards compatibility. This +has now been removed, and SvUPGRADE() is now a statement with no return +value. + +So this is now a syntax error: + + if (!SvUPGRADE(sv)) { croak(...); } + +If you have code like that, simply replace it with + + SvUPGRADE(sv); + +or to avoid compiler warnings with older perls, possibly + + (void)SvUPGRADE(sv); + +=item * + +Perl has a new copy-on-write mechanism that allows any SvPOK scalar to be +upgraded to a copy-on-write scalar. A reference count on the string buffer +is stored in the string buffer itself. This feature is B<not enabled by +default>. + +It can be enabled in a perl build by running F<Configure> with +B<-Accflags=-DPERL_NEW_COPY_ON_WRITE>, and we would encourage XS authors +to try their code with such an enabled perl, and provide feedback. +Unfortunately, there is not yet a good guide to updating XS code to cope +with COW. Until such a document is available, consult the perl5-porters +mailing list. + +It breaks a few XS modules by allowing copy-on-write scalars to go +through code paths that never encountered them before. + +=item * + +Copy-on-write no longer uses the SvFAKE and SvREADONLY flags. Hence, +SvREADONLY indicates a true read-only SV. + +Use the SvIsCOW macro (as before) to identify a copy-on-write scalar. + +=item * + +C<PL_glob_index> is gone. + +=item * + +The private Perl_croak_no_modify has had its context parameter removed. It is +now has a void prototype. Users of the public API croak_no_modify remain +unaffected. + +=item * + +Copy-on-write (shared hash key) scalars are no longer marked read-only. +C<SvREADONLY> returns false on such an SV, but C<SvIsCOW> still returns +true. + +=item * + +A new op type, C<OP_PADRANGE> has been introduced. The perl peephole +optimiser will, where possible, substitute a single padrange op for a +pushmark followed by one or more pad ops, and possibly also skipping list +and nextstate ops. In addition, the op can carry out the tasks associated +with the RHS of a C<< my(...) = @_ >> assignment, so those ops may be optimised +away too. + +=item * + +Case-insensitive matching inside a [bracketed] character class with a +multi-character fold no longer excludes one of the possibilities in the +circumstances that it used to. [perl #89774]. + +=item * + +C<PL_formfeed> has been removed. + +=item * + +The regular expression engine no longer reads one byte past the end of the +target string. While for all internally well-formed scalars this should +never have been a problem, this change facilitates clever tricks with +string buffers in CPAN modules. [perl #73542] + +=item * + +Inside a BEGIN block, C<PL_compcv> now points to the currently-compiling +subroutine, rather than the BEGIN block itself. + +=item * + +C<mg_length> has been deprecated. + +=item * + +C<sv_len> now always returns a byte count and C<sv_len_utf8> a character +count. Previously, C<sv_len> and C<sv_len_utf8> were both buggy and would +sometimes returns bytes and sometimes characters. C<sv_len_utf8> no longer +assumes that its argument is in UTF-8. Neither of these creates UTF-8 caches +for tied or overloaded values or for non-PVs any more. + +=item * + +C<sv_mortalcopy> now copies string buffers of shared hash key scalars when +called from XS modules [perl #79824]. + +=item * + +C<RXf_SPLIT> and C<RXf_SKIPWHITE> are no longer used. They are now +#defined as 0. + +=item * + +The new C<RXf_MODIFIES_VARS> flag can be set by custom regular expression +engines to indicate that the execution of the regular expression may cause +variables to be modified. This lets C<s///> know to skip certain +optimisations. Perl's own regular expression engine sets this flag for the +special backtracking verbs that set $REGMARK and $REGERROR. + +=item * + +The APIs for accessing lexical pads have changed considerably. + +C<PADLIST>s are now longer C<AV>s, but their own type instead. +C<PADLIST>s now contain a C<PAD> and a C<PADNAMELIST> of C<PADNAME>s, +rather than C<AV>s for the pad and the list of pad names. C<PAD>s, +C<PADNAMELIST>s, and C<PADNAME>s are to be accessed as such through the +newly added pad API instead of the plain C<AV> and C<SV> APIs. See +L<perlapi> for details. + +=item * + +In the regex API, the numbered capture callbacks are passed an index +indicating what match variable is being accessed. There are special +index values for the C<$`, $&, $&> variables. Previously the same three +values were used to retrieve C<${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}> +too, but these have now been assigned three separate values. See +L<perlreapi/Numbered capture callbacks>. + +=item * + +C<PL_sawampersand> was previously a boolean indicating that any of +C<$`, $&, $&> had been seen; it now contains three one-bit flags +indicating the presence of each of the variables individually. + +=item * + +The C<CV *> typemap entry now supports C<&{}> overloading and typeglobs, +just like C<&{...}> [perl #96872]. + +=item * + +The C<SVf_AMAGIC> flag to indicate overloading is now on the stash, not the +object. It is now set automatically whenever a method or @ISA changes, so +its meaning has changed, too. It now means "potentially overloaded". When +the overload table is calculated, the flag is automatically turned off if +there is no overloading, so there should be no noticeable slowdown. + +The staleness of the overload tables is now checked when overload methods +are invoked, rather than during C<bless>. + +"A" magic is gone. The changes to the handling of the C<SVf_AMAGIC> flag +eliminate the need for it. + +C<PL_amagic_generation> has been removed as no longer necessary. For XS +modules, it is now a macro alias to C<PL_na>. + +The fallback overload setting is now stored in a stash entry separate from +overloadedness itself. + +=item * + +The character-processing code has been cleaned up in places. The changes +should be operationally invisible. + +=item * + +The C<study> function was made a no-op in v5.16. It was simply disabled via +a C<return> statement; the code was left in place. Now the code supporting +what C<study> used to do has been removed. + +=item * + +Under threaded perls, there is no longer a separate PV allocated for every +COP to store its package name (C<< cop->stashpv >>). Instead, there is an +offset (C<< cop->stashoff >>) into the new C<PL_stashpad> array, which +holds stash pointers. + +=item * + +In the pluggable regex API, the C<regexp_engine> struct has acquired a new +field C<op_comp>, which is currently just for perl's internal use, and +should be initialized to NULL by other regex plugin modules. + +=item * + +A new function C<alloccopstash> has been added to the API, but is considered +experimental. See L<perlapi>. + +=item * + +Perl used to implement get magic in a way that would sometimes hide bugs in +code that could call mg_get() too many times on magical values. This hiding of +errors no longer occurs, so long-standing bugs may become visible now. If +you see magic-related errors in XS code, check to make sure it, together +with the Perl API functions it uses, calls mg_get() only once on SvGMAGICAL() +values. + +=item * + +OP allocation for CVs now uses a slab allocator. This simplifies +memory management for OPs allocated to a CV, so cleaning up after a +compilation error is simpler and safer [perl #111462][perl #112312]. + +=item * + +C<PERL_DEBUG_READONLY_OPS> has been rewritten to work with the new slab +allocator, allowing it to catch more violations than before. + +=item * + +The old slab allocator for ops, which was only enabled for C<PERL_IMPLICIT_SYS> +and C<PERL_DEBUG_READONLY_OPS>, has been retired. + +=back + +=head1 Selected Bug Fixes + +=over 4 + +=item * + +Here document terminators no longer require a terminating newline character when +they occur at the end of a file. This was already the case at the end of a +string eval [perl #65838]. + +=item * + +C<-DPERL_GLOBAL_STRUCT> builds now free the global struct B<after> +they've finished using it. + +=item * + +A trailing '/' on a path in @INC will no longer have an additional '/' +appended. + +=item * + +The C<:crlf> layer now works when unread data doesn't fit into its own +buffer. [perl #112244]. + +=item * + +C<ungetc()> now handles UTF-8 encoded data. [perl #116322]. + +=item * + +A bug in the core typemap caused any C types that map to the T_BOOL core +typemap entry to not be set, updated, or modified when the T_BOOL variable was +used in an OUTPUT: section with an exception for RETVAL. T_BOOL in an INPUT: +section was not affected. Using a T_BOOL return type for an XSUB (RETVAL) +was not affected. A side effect of fixing this bug is, if a T_BOOL is specified +in the OUTPUT: section (which previous did nothing to the SV), and a read only +SV (literal) is passed to the XSUB, croaks like "Modification of a read-only +value attempted" will happen. [perl #115796] + +=item * + +On many platforms, providing a directory name as the script name caused perl +to do nothing and report success. It should now universally report an error +and exit nonzero. [perl #61362] + +=item * + +C<sort {undef} ...> under fatal warnings no longer crashes. It had +begun crashing in Perl v5.16. + +=item * + +Stashes blessed into each other +(C<bless \%Foo::, 'Bar'; bless \%Bar::, 'Foo'>) no longer result in double +frees. This bug started happening in Perl v5.16. + +=item * + +Numerous memory leaks have been fixed, mostly involving fatal warnings and +syntax errors. + +=item * + +Some failed regular expression matches such as C<'f' =~ /../g> were not +resetting C<pos>. Also, "match-once" patterns (C<m?...?g>) failed to reset +it, too, when invoked a second time [perl #23180]. + +=item * + +Several bugs involving C<local *ISA> and C<local *Foo::> causing stale +MRO caches have been fixed. + +=item * + +Defining a subroutine when its typeglob has been aliased no longer results +in stale method caches. This bug was introduced in Perl v5.10. + +=item * + +Localising a typeglob containing a subroutine when the typeglob's package +has been deleted from its parent stash no longer produces an error. This +bug was introduced in Perl v5.14. + +=item * + +Under some circumstances, C<local *method=...> would fail to reset method +caches upon scope exit. + +=item * + +C</[.foo.]/> is no longer an error, but produces a warning (as before) and +is treated as C</[.fo]/> [perl #115818]. + +=item * + +C<goto $tied_var> now calls FETCH before deciding what type of goto +(subroutine or label) this is. + +=item * + +Renaming packages through glob assignment +(C<*Foo:: = *Bar::; *Bar:: = *Baz::>) in combination with C<m?...?> and +C<reset> no longer makes threaded builds crash. + +=item * + +A number of bugs related to assigning a list to hash have been fixed. Many of +these involve lists with repeated keys like C<(1, 1, 1, 1)>. + +=over 4 + +=item * + +The expression C<scalar(%h = (1, 1, 1, 1))> now returns C<4>, not C<2>. + +=item * + +The return value of C<%h = (1, 1, 1)> in list context was wrong. Previously +this would return C<(1, undef, 1)>, now it returns C<(1, undef)>. + +=item * + +Perl now issues the same warning on C<($s, %h) = (1, {})> as it does for +C<(%h) = ({})>, "Reference found where even-sized list expected". + +=item * + +A number of additional edge cases in list assignment to hashes were +corrected. For more details see commit 23b7025ebc. + +=back + +=item * + +Attributes applied to lexical variables no longer leak memory. +[perl #114764] + +=item * + +C<dump>, C<goto>, C<last>, C<next>, C<redo> or C<require> followed by a +bareword (or version) and then an infix operator is no longer a syntax +error. It used to be for those infix operators (like C<+>) that have a +different meaning where a term is expected. [perl #105924] + +=item * + +C<require a::b . 1> and C<require a::b + 1> no longer produce erroneous +ambiguity warnings. [perl #107002] + +=item * + +Class method calls are now allowed on any string, and not just strings +beginning with an alphanumeric character. [perl #105922] + +=item * + +An empty pattern created with C<qr//> used in C<m///> no longer triggers +the "empty pattern reuses last pattern" behaviour. [perl #96230] + +=item * + +Tying a hash during iteration no longer results in a memory leak. + +=item * + +Freeing a tied hash during iteration no longer results in a memory leak. + +=item * + +List assignment to a tied array or hash that dies on STORE no longer +results in a memory leak. + +=item * + +If the hint hash (C<%^H>) is tied, compile-time scope entry (which copies +the hint hash) no longer leaks memory if FETCH dies. [perl #107000] + +=item * + +Constant folding no longer inappropriately triggers the special +C<split " "> behaviour. [perl #94490] + +=item * + +C<defined scalar(@array)>, C<defined do { &foo }>, and similar constructs +now treat the argument to C<defined> as a simple scalar. [perl #97466] + +=item * + +Running a custom debugging that defines no C<*DB::DB> glob or provides a +subroutine stub for C<&DB::DB> no longer results in a crash, but an error +instead. [perl #114990] + +=item * + +C<reset ""> now matches its documentation. C<reset> only resets C<m?...?> +patterns when called with no argument. An empty string for an argument now +does nothing. (It used to be treated as no argument.) [perl #97958] + +=item * + +C<printf> with an argument returning an empty list no longer reads past the +end of the stack, resulting in erratic behaviour. [perl #77094] + +=item * + +C<--subname> no longer produces erroneous ambiguity warnings. +[perl #77240] + +=item * + +C<v10> is now allowed as a label or package name. This was inadvertently +broken when v-strings were added in Perl v5.6. [perl #56880] + +=item * + +C<length>, C<pos>, C<substr> and C<sprintf> could be confused by ties, +overloading, references and typeglobs if the stringification of such +changed the internal representation to or from UTF-8. [perl #114410] + +=item * + +utf8::encode now calls FETCH and STORE on tied variables. utf8::decode now +calls STORE (it was already calling FETCH). + +=item * + +C<$tied =~ s/$non_utf8/$utf8/> no longer loops infinitely if the tied +variable returns a Latin-1 string, shared hash key scalar, or reference or +typeglob that stringifies as ASCII or Latin-1. This was a regression from +v5.12. + +=item * + +C<s///> without /e is now better at detecting when it needs to forego +certain optimisations, fixing some buggy cases: + +=over + +=item * + +Match variables in certain constructs (C<&&>, C<||>, C<..> and others) in +the replacement part; e.g., C<s/(.)/$l{$a||$1}/g>. [perl #26986] + +=item * + +Aliases to match variables in the replacement. + +=item * + +C<$REGERROR> or C<$REGMARK> in the replacement. [perl #49190] + +=item * + +An empty pattern (C<s//$foo/>) that causes the last-successful pattern to +be used, when that pattern contains code blocks that modify the variables +in the replacement. + +=back + +=item * + +The taintedness of the replacement string no longer affects the taintedness +of the return value of C<s///e>. + +=item * + +The C<$|> autoflush variable is created on-the-fly when needed. If this +happened (e.g., if it was mentioned in a module or eval) when the +currently-selected filehandle was a typeglob with an empty IO slot, it used +to crash. [perl #115206] + +=item * + +Line numbers at the end of a string eval are no longer off by one. +[perl #114658] + +=item * + +@INC filters (subroutines returned by subroutines in @INC) that set $_ to a +copy-on-write scalar no longer cause the parser to modify that string +buffer in place. + +=item * + +C<length($object)> no longer returns the undefined value if the object has +string overloading that returns undef. [perl #115260] + +=item * + +The use of C<PL_stashcache>, the stash name lookup cache for method calls, has +been restored, + +Commit da6b625f78f5f133 in August 2011 inadvertently broke the code that looks +up values in C<PL_stashcache>. As it's a only cache, quite correctly everything +carried on working without it. + +=item * + +The error "Can't localize through a reference" had disappeared in v5.16.0 +when C<local %$ref> appeared on the last line of an lvalue subroutine. +This error disappeared for C<\local %$ref> in perl v5.8.1. It has now +been restored. + +=item * + +The parsing of here-docs has been improved significantly, fixing several +parsing bugs and crashes and one memory leak, and correcting wrong +subsequent line numbers under certain conditions. + +=item * + +Inside an eval, the error message for an unterminated here-doc no longer +has a newline in the middle of it [perl #70836]. + +=item * + +A substitution inside a substitution pattern (C<s/${s|||}//>) no longer +confuses the parser. + +=item * + +It may be an odd place to allow comments, but C<s//"" # hello/e> has +always worked, I<unless> there happens to be a null character before the +first #. Now it works even in the presence of nulls. + +=item * + +An invalid range in C<tr///> or C<y///> no longer results in a memory leak. + +=item * + +String eval no longer treats a semicolon-delimited quote-like operator at +the very end (C<eval 'q;;'>) as a syntax error. + +=item * + +C<< warn {$_ => 1} + 1 >> is no longer a syntax error. The parser used to +get confused with certain list operators followed by an anonymous hash and +then an infix operator that shares its form with a unary operator. + +=item * + +C<(caller $n)[6]> (which gives the text of the eval) used to return the +actual parser buffer. Modifying it could result in crashes. Now it always +returns a copy. The string returned no longer has "\n;" tacked on to the +end. The returned text also includes here-doc bodies, which used to be +omitted. + +=item * + +The UTF-8 position cache is now reset when accessing magical variables, to +avoid the string buffer and the UTF-8 position cache getting out of sync +[perl #114410]. + +=item * + +Various cases of get magic being called twice for magical UTF-8 +strings have been fixed. + +=item * + +This code (when not in the presence of C<$&> etc) + + $_ = 'x' x 1_000_000; + 1 while /(.)/; + +used to skip the buffer copy for performance reasons, but suffered from C<$1> +etc changing if the original string changed. That's now been fixed. + +=item * + +Perl doesn't use PerlIO anymore to report out of memory messages, as PerlIO +might attempt to allocate more memory. + +=item * + +In a regular expression, if something is quantified with C<{n,m}> where +C<S<n E<gt> m>>, it can't possibly match. Previously this was a fatal +error, but now is merely a warning (and that something won't match). +[perl #82954]. + +=item * + +It used to be possible for formats defined in subroutines that have +subsequently been undefined and redefined to close over variables in the +wrong pad (the newly-defined enclosing sub), resulting in crashes or +"Bizarre copy" errors. + +=item * + +Redefinition of XSUBs at run time could produce warnings with the wrong +line number. + +=item * + +The %vd sprintf format does not support version objects for alpha versions. +It used to output the format itself (%vd) when passed an alpha version, and +also emit an "Invalid conversion in printf" warning. It no longer does, +but produces the empty string in the output. It also no longer leaks +memory in this case. + +=item * + +C<< $obj->SUPER::method >> calls in the main package could fail if the +SUPER package had already been accessed by other means. + +=item * + +Stash aliasing (C<< *foo:: = *bar:: >>) no longer causes SUPER calls to ignore +changes to methods or @ISA or use the wrong package. + +=item * + +Method calls on packages whose names end in ::SUPER are no longer treated +as SUPER method calls, resulting in failure to find the method. +Furthermore, defining subroutines in such packages no longer causes them to +be found by SUPER method calls on the containing package [perl #114924]. + +=item * + +C<\w> now matches the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D +(ZERO WIDTH JOINER). C<\W> no longer matches these. This change is because +Unicode corrected their definition of what C<\w> should match. + +=item * + +C<dump LABEL> no longer leaks its label. + +=item * + +Constant folding no longer changes the behaviour of functions like C<stat()> +and C<truncate()> that can take either filenames or handles. +C<stat 1 ? foo : bar> nows treats its argument as a file name (since it is an +arbitrary expression), rather than the handle "foo". + +=item * + +C<truncate FOO, $len> no longer falls back to treating "FOO" as a file name if +the filehandle has been deleted. This was broken in Perl v5.16.0. + +=item * + +Subroutine redefinitions after sub-to-glob and glob-to-glob assignments no +longer cause double frees or panic messages. + +=item * + +C<s///> now turns vstrings into plain strings when performing a substitution, +even if the resulting string is the same (C<s/a/a/>). + +=item * + +Prototype mismatch warnings no longer erroneously treat constant subs as having +no prototype when they actually have "". + +=item * + +Constant subroutines and forward declarations no longer prevent prototype +mismatch warnings from omitting the sub name. + +=item * + +C<undef> on a subroutine now clears call checkers. + +=item * + +The C<ref> operator started leaking memory on blessed objects in Perl v5.16.0. +This has been fixed [perl #114340]. + +=item * + +C<use> no longer tries to parse its arguments as a statement, making +C<use constant { () };> a syntax error [perl #114222]. + +=item * + +On debugging builds, "uninitialized" warnings inside formats no longer cause +assertion failures. + +=item * + +On debugging builds, subroutines nested inside formats no longer cause +assertion failures [perl #78550]. + +=item * + +Formats and C<use> statements are now permitted inside formats. + +=item * + +C<print $x> and C<sub { print $x }-E<gt>()> now always produce the same output. +It was possible for the latter to refuse to close over $x if the variable was +not active; e.g., if it was defined outside a currently-running named +subroutine. + +=item * + +Similarly, C<print $x> and C<print eval '$x'> now produce the same output. +This also allows "my $x if 0" variables to be seen in the debugger [perl +#114018]. + +=item * + +Formats called recursively no longer stomp on their own lexical variables, but +each recursive call has its own set of lexicals. + +=item * + +Attempting to free an active format or the handle associated with it no longer +results in a crash. + +=item * + +Format parsing no longer gets confused by braces, semicolons and low-precedence +operators. It used to be possible to use braces as format delimiters (instead +of C<=> and C<.>), but only sometimes. Semicolons and low-precedence operators +in format argument lines no longer confuse the parser into ignoring the line's +return value. In format argument lines, braces can now be used for anonymous +hashes, instead of being treated always as C<do> blocks. + +=item * + +Formats can now be nested inside code blocks in regular expressions and other +quoted constructs (C</(?{...})/> and C<qq/${...}/>) [perl #114040]. + +=item * + +Formats are no longer created after compilation errors. + +=item * + +Under debugging builds, the B<-DA> command line option started crashing in Perl +v5.16.0. It has been fixed [perl #114368]. + +=item * + +A potential deadlock scenario involving the premature termination of a pseudo- +forked child in a Windows build with ithreads enabled has been fixed. This +resolves the common problem of the F<t/op/fork.t> test hanging on Windows [perl +#88840]. + +=item * + +The code which generates errors from C<require()> could potentially read one or +two bytes before the start of the filename for filenames less than three bytes +long and ending C</\.p?\z/>. This has now been fixed. Note that it could +never have happened with module names given to C<use()> or C<require()> anyway. + +=item * + +The handling of pathnames of modules given to C<require()> has been made +thread-safe on VMS. + +=item * + +Non-blocking sockets have been fixed on VMS. + +=item * + +Pod can now be nested in code inside a quoted construct outside of a string +eval. This used to work only within string evals [perl #114040]. + +=item * + +C<goto ''> now looks for an empty label, producing the "goto must have +label" error message, instead of exiting the program [perl #111794]. + +=item * + +C<goto "\0"> now dies with "Can't find label" instead of "goto must have +label". + +=item * + +The C function C<hv_store> used to result in crashes when used on C<%^H> +[perl #111000]. + +=item * + +A call checker attached to a closure prototype via C<cv_set_call_checker> +is now copied to closures cloned from it. So C<cv_set_call_checker> now +works inside an attribute handler for a closure. + +=item * + +Writing to C<$^N> used to have no effect. Now it croaks with "Modification +of a read-only value" by default, but that can be overridden by a custom +regular expression engine, as with C<$1> [perl #112184]. + +=item * + +C<undef> on a control character glob (C<undef *^H>) no longer emits an +erroneous warning about ambiguity [perl #112456]. + +=item * + +For efficiency's sake, many operators and built-in functions return the +same scalar each time. Lvalue subroutines and subroutines in the CORE:: +namespace were allowing this implementation detail to leak through. +C<print &CORE::uc("a"), &CORE::uc("b")> used to print "BB". The same thing +would happen with an lvalue subroutine returning the return value of C<uc>. +Now the value is copied in such cases. + +=item * + +C<method {}> syntax with an empty block or a block returning an empty list +used to crash or use some random value left on the stack as its invocant. +Now it produces an error. + +=item * + +C<vec> now works with extremely large offsets (E<gt>2 GB) [perl #111730]. + +=item * + +Changes to overload settings now take effect immediately, as do changes to +inheritance that affect overloading. They used to take effect only after +C<bless>. + +Objects that were created before a class had any overloading used to remain +non-overloaded even if the class gained overloading through C<use overload> +or @ISA changes, and even after C<bless>. This has been fixed +[perl #112708]. + +=item * + +Classes with overloading can now inherit fallback values. + +=item * + +Overloading was not respecting a fallback value of 0 if there were +overloaded objects on both sides of an assignment operator like C<+=> +[perl #111856]. + +=item * + +C<pos> now croaks with hash and array arguments, instead of producing +erroneous warnings. + +=item * + +C<while(each %h)> now implies C<while(defined($_ = each %h))>, like +C<readline> and C<readdir>. + +=item * + +Subs in the CORE:: namespace no longer crash after C<undef *_> when called +with no argument list (C<&CORE::time> with no parentheses). + +=item * + +C<unpack> no longer produces the "'/' must follow a numeric type in unpack" +error when it is the data that are at fault [perl #60204]. + +=item * + +C<join> and C<"@array"> now call FETCH only once on a tied C<$"> +[perl #8931]. + +=item * + +Some subroutine calls generated by compiling core ops affected by a +C<CORE::GLOBAL> override had op checking performed twice. The checking +is always idempotent for pure Perl code, but the double checking can +matter when custom call checkers are involved. + +=item * + +A race condition used to exist around fork that could cause a signal sent to +the parent to be handled by both parent and child. Signals are now blocked +briefly around fork to prevent this from happening [perl #82580]. + +=item * + +The implementation of code blocks in regular expressions, such as C<(?{})> +and C<(??{})>, has been heavily reworked to eliminate a whole slew of bugs. +The main user-visible changes are: + +=over 4 + +=item * + +Code blocks within patterns are now parsed in the same pass as the +surrounding code; in particular it is no longer necessary to have balanced +braces: this now works: + + /(?{ $x='{' })/ + +This means that this error message is no longer generated: + + Sequence (?{...}) not terminated or not {}-balanced in regex + +but a new error may be seen: + + Sequence (?{...}) not terminated with ')' + +In addition, literal code blocks within run-time patterns are only +compiled once, at perl compile-time: + + for my $p (...) { + # this 'FOO' block of code is compiled once, + # at the same time as the surrounding 'for' loop + /$p{(?{FOO;})/; + } + +=item * + +Lexical variables are now sane as regards scope, recursion and closure +behavior. In particular, C</A(?{B})C/> behaves (from a closure viewpoint) +exactly like C</A/ && do { B } && /C/>, while C<qr/A(?{B})C/> is like +C<sub {/A/ && do { B } && /C/}>. So this code now works how you might +expect, creating three regexes that match 0, 1, and 2: + + for my $i (0..2) { + push @r, qr/^(??{$i})$/; + } + "1" =~ $r[1]; # matches + +=item * + +The C<use re 'eval'> pragma is now only required for code blocks defined +at runtime; in particular in the following, the text of the C<$r> pattern is +still interpolated into the new pattern and recompiled, but the individual +compiled code-blocks within C<$r> are reused rather than being recompiled, +and C<use re 'eval'> isn't needed any more: + + my $r = qr/abc(?{....})def/; + /xyz$r/; + +=item * + +Flow control operators no longer crash. Each code block runs in a new +dynamic scope, so C<next> etc. will not see +any enclosing loops. C<return> returns a value +from the code block, not from any enclosing subroutine. + +=item * + +Perl normally caches the compilation of run-time patterns, and doesn't +recompile if the pattern hasn't changed, but this is now disabled if +required for the correct behavior of closures. For example: + + my $code = '(??{$x})'; + for my $x (1..3) { + # recompile to see fresh value of $x each time + $x =~ /$code/; + } + +=item * + +The C</msix> and C<(?msix)> etc. flags are now propagated into the return +value from C<(??{})>; this now works: + + "AB" =~ /a(??{'b'})/i; + +=item * + +Warnings and errors will appear to come from the surrounding code (or for +run-time code blocks, from an eval) rather than from an C<re_eval>: + + use re 'eval'; $c = '(?{ warn "foo" })'; /$c/; + /(?{ warn "foo" })/; + +formerly gave: + + foo at (re_eval 1) line 1. + foo at (re_eval 2) line 1. + +and now gives: + + foo at (eval 1) line 1. + foo at /some/prog line 2. + +=back + +=item * + +Perl now can be recompiled to use any Unicode version. In v5.16, it +worked on Unicodes 6.0 and 6.1, but there were various bugs if earlier +releases were used; the older the release the more problems. + +=item * + +C<vec> no longer produces "uninitialized" warnings in lvalue context +[perl #9423]. + +=item * + +An optimization involving fixed strings in regular expressions could cause +a severe performance penalty in edge cases. This has been fixed +[perl #76546]. + +=item * + +In certain cases, including empty subpatterns within a regular expression (such +as C<(?:)> or C<(?:|)>) could disable some optimizations. This has been fixed. + +=item * + +The "Can't find an opnumber" message that C<prototype> produces when passed +a string like "CORE::nonexistent_keyword" now passes UTF-8 and embedded +NULs through unchanged [perl #97478]. + +=item * + +C<prototype> now treats magical variables like C<$1> the same way as +non-magical variables when checking for the CORE:: prefix, instead of +treating them as subroutine names. + +=item * + +Under threaded perls, a runtime code block in a regular expression could +corrupt the package name stored in the op tree, resulting in bad reads +in C<caller>, and possibly crashes [perl #113060]. + +=item * + +Referencing a closure prototype (C<\&{$_[1]}> in an attribute handler for a +closure) no longer results in a copy of the subroutine (or assertion +failures on debugging builds). + +=item * + +C<eval '__PACKAGE__'> now returns the right answer on threaded builds if +the current package has been assigned over (as in +C<*ThisPackage:: = *ThatPackage::>) [perl #78742]. + +=item * + +If a package is deleted by code that it calls, it is possible for C<caller> +to see a stack frame belonging to that deleted package. C<caller> could +crash if the stash's memory address was reused for a scalar and a +substitution was performed on the same scalar [perl #113486]. + +=item * + +C<UNIVERSAL::can> no longer treats its first argument differently +depending on whether it is a string or number internally. + +=item * + +C<open> with C<< <& >> for the mode checks to see whether the third argument is +a number, in determining whether to treat it as a file descriptor or a handle +name. Magical variables like C<$1> were always failing the numeric check and +being treated as handle names. + +=item * + +C<warn>'s handling of magical variables (C<$1>, ties) has undergone several +fixes. C<FETCH> is only called once now on a tied argument or a tied C<$@> +[perl #97480]. Tied variables returning objects that stringify as "" are +no longer ignored. A tied C<$@> that happened to return a reference the +I<previous> time it was used is no longer ignored. + +=item * + +C<warn ""> now treats C<$@> with a number in it the same way, regardless of +whether it happened via C<$@=3> or C<$@="3">. It used to ignore the +former. Now it appends "\t...caught", as it has always done with +C<$@="3">. + +=item * + +Numeric operators on magical variables (e.g., S<C<$1 + 1>>) used to use +floating point operations even where integer operations were more appropriate, +resulting in loss of accuracy on 64-bit platforms [perl #109542]. + +=item * + +Unary negation no longer treats a string as a number if the string happened +to be used as a number at some point. So, if C<$x> contains the string "dogs", +C<-$x> returns "-dogs" even if C<$y=0+$x> has happened at some point. + +=item * + +In Perl v5.14, C<-'-10'> was fixed to return "10", not "+10". But magical +variables (C<$1>, ties) were not fixed till now [perl #57706]. + +=item * + +Unary negation now treats strings consistently, regardless of the internal +C<UTF8> flag. + +=item * + +A regression introduced in Perl v5.16.0 involving +C<tr/I<SEARCHLIST>/I<REPLACEMENTLIST>/> has been fixed. Only the first +instance is supposed to be meaningful if a character appears more than +once in C<I<SEARCHLIST>>. Under some circumstances, the final instance +was overriding all earlier ones. [perl #113584] + +=item * + +Regular expressions like C<qr/\87/> previously silently inserted a NUL +character, thus matching as if it had been written C<qr/\00087/>. Now it +matches as if it had been written as C<qr/87/>, with a message that the +sequence C<"\8"> is unrecognized. + +=item * + +C<__SUB__> now works in special blocks (C<BEGIN>, C<END>, etc.). + +=item * + +Thread creation on Windows could theoretically result in a crash if done +inside a C<BEGIN> block. It still does not work properly, but it no longer +crashes [perl #111610]. + +=item * + +C<\&{''}> (with the empty string) now autovivifies a stub like any other +sub name, and no longer produces the "Unable to create sub" error +[perl #94476]. + +=item * + +A regression introduced in v5.14.0 has been fixed, in which some calls +to the C<re> module would clobber C<$_> [perl #113750]. + +=item * + +C<do FILE> now always either sets or clears C<$@>, even when the file can't be +read. This ensures that testing C<$@> first (as recommended by the +documentation) always returns the correct result. + +=item * + +The array iterator used for the C<each @array> construct is now correctly +reset when C<@array> is cleared [perl #75596]. This happens, for example, when +the array is globally assigned to, as in C<@array = (...)>, but not when its +B<values> are assigned to. In terms of the XS API, it means that C<av_clear()> +will now reset the iterator. + +This mirrors the behaviour of the hash iterator when the hash is cleared. + +=item * + +C<< $class->can >>, C<< $class->isa >>, and C<< $class->DOES >> now return +correct results, regardless of whether that package referred to by C<$class> +exists [perl #47113]. + +=item * + +Arriving signals no longer clear C<$@> [perl #45173]. + +=item * + +Allow C<my ()> declarations with an empty variable list [perl #113554]. + +=item * + +During parsing, subs declared after errors no longer leave stubs +[perl #113712]. + +=item * + +Closures containing no string evals no longer hang on to their containing +subroutines, allowing variables closed over by outer subroutines to be +freed when the outer sub is freed, even if the inner sub still exists +[perl #89544]. + +=item * + +Duplication of in-memory filehandles by opening with a "<&=" or ">&=" mode +stopped working properly in v5.16.0. It was causing the new handle to +reference a different scalar variable. This has been fixed [perl #113764]. + +=item * + +C<qr//> expressions no longer crash with custom regular expression engines +that do not set C<offs> at regular expression compilation time +[perl #112962]. + +=item * + +C<delete local> no longer crashes with certain magical arrays and hashes +[perl #112966]. + +=item * + +C<local> on elements of certain magical arrays and hashes used not to +arrange to have the element deleted on scope exit, even if the element did +not exist before C<local>. + +=item * + +C<scalar(write)> no longer returns multiple items [perl #73690]. + +=item * + +String to floating point conversions no longer misparse certain strings under +C<use locale> [perl #109318]. + +=item * + +C<@INC> filters that die no longer leak memory [perl #92252]. + +=item * + +The implementations of overloaded operations are now called in the correct +context. This allows, among other things, being able to properly override +C<< <> >> [perl #47119]. + +=item * + +Specifying only the C<fallback> key when calling C<use overload> now behaves +properly [perl #113010]. + +=item * + +C<< sub foo { my $a = 0; while ($a) { ... } } >> and +C<< sub foo { while (0) { ... } } >> now return the same thing [perl #73618]. + +=item * + +String negation now behaves the same under C<use integer;> as it does +without [perl #113012]. + +=item * + +C<chr> now returns the Unicode replacement character (U+FFFD) for -1, +regardless of the internal representation. -1 used to wrap if the argument +was tied or a string internally. + +=item * + +Using a C<format> after its enclosing sub was freed could crash as of +perl v5.12.0, if the format referenced lexical variables from the outer sub. + +=item * + +Using a C<format> after its enclosing sub was undefined could crash as of +perl v5.10.0, if the format referenced lexical variables from the outer sub. + +=item * + +Using a C<format> defined inside a closure, which format references +lexical variables from outside, never really worked unless the C<write> +call was directly inside the closure. In v5.10.0 it even started crashing. +Now the copy of that closure nearest the top of the call stack is used to +find those variables. + +=item * + +Formats that close over variables in special blocks no longer crash if a +stub exists with the same name as the special block before the special +block is compiled. + +=item * + +The parser no longer gets confused, treating C<eval foo ()> as a syntax +error if preceded by C<print;> [perl #16249]. + +=item * + +The return value of C<syscall> is no longer truncated on 64-bit platforms +[perl #113980]. + +=item * + +Constant folding no longer causes C<print 1 ? FOO : BAR> to print to the +FOO handle [perl #78064]. + +=item * + +C<do subname> now calls the named subroutine and uses the file name it +returns, instead of opening a file named "subname". + +=item * + +Subroutines looked up by rv2cv check hooks (registered by XS modules) are +now taken into consideration when determining whether C<foo bar> should be +the sub call C<foo(bar)> or the method call C<< "bar"->foo >>. + +=item * + +C<CORE::foo::bar> is no longer treated specially, allowing global overrides +to be called directly via C<CORE::GLOBAL::uc(...)> [perl #113016]. + +=item * + +Calling an undefined sub whose typeglob has been undefined now produces the +customary "Undefined subroutine called" error, instead of "Not a CODE +reference". + +=item * + +Two bugs involving @ISA have been fixed. C<*ISA = *glob_without_array> and +C<undef *ISA; @{*ISA}> would prevent future modifications to @ISA from +updating the internal caches used to look up methods. The +*glob_without_array case was a regression from Perl v5.12. + +=item * + +Regular expression optimisations sometimes caused C<$> with C</m> to +produce failed or incorrect matches [perl #114068]. + +=item * + +C<__SUB__> now works in a C<sort> block when the enclosing subroutine is +predeclared with C<sub foo;> syntax [perl #113710]. + +=item * + +Unicode properties only apply to Unicode code points, which leads to +some subtleties when regular expressions are matched against +above-Unicode code points. There is a warning generated to draw your +attention to this. However, this warning was being generated +inappropriately in some cases, such as when a program was being parsed. +Non-Unicode matches such as C<\w> and C<[:word:]> should not generate the +warning, as their definitions don't limit them to apply to only Unicode +code points. Now the message is only generated when matching against +C<\p{}> and C<\P{}>. There remains a bug, [perl #114148], for the very +few properties in Unicode that match just a single code point. The +warning is not generated if they are matched against an above-Unicode +code point. + +=item * + +Uninitialized warnings mentioning hash elements would only mention the +element name if it was not in the first bucket of the hash, due to an +off-by-one error. + +=item * + +A regular expression optimizer bug could cause multiline "^" to behave +incorrectly in the presence of line breaks, such that +C<"/\n\n" =~ m#\A(?:^/$)#im> would not match [perl #115242]. + +=item * + +Failed C<fork> in list context no longer corrupts the stack. +C<@a = (1, 2, fork, 3)> used to gobble up the 2 and assign C<(1, undef, 3)> +if the C<fork> call failed. + +=item * + +Numerous memory leaks have been fixed, mostly involving tied variables that +die, regular expression character classes and code blocks, and syntax +errors. + +=item * + +Assigning a regular expression (C<${qr//}>) to a variable that happens to +hold a floating point number no longer causes assertion failures on +debugging builds. + +=item * + +Assigning a regular expression to a scalar containing a number no longer +causes subsequent numification to produce random numbers. + +=item * + +Assigning a regular expression to a magic variable no longer wipes away the +magic. This was a regression from v5.10. + +=item * + +Assigning a regular expression to a blessed scalar no longer results in +crashes. This was also a regression from v5.10. + +=item * + +Regular expression can now be assigned to tied hash and array elements with +flattening into strings. + +=item * + +Numifying a regular expression no longer results in an uninitialized +warning. + +=item * + +Negative array indices no longer cause EXISTS methods of tied variables to +be ignored. This was a regression from v5.12. + +=item * + +Negative array indices no longer result in crashes on arrays tied to +non-objects. + +=item * + +C<$byte_overload .= $utf8> no longer results in doubly-encoded UTF-8 if the +left-hand scalar happened to have produced a UTF-8 string the last time +overloading was invoked. + +=item * + +C<goto &sub> now uses the current value of @_, instead of using the array +the subroutine was originally called with. This means +C<local @_ = (...); goto &sub> now works [perl #43077]. + +=item * + +If a debugger is invoked recursively, it no longer stomps on its own +lexical variables. Formerly under recursion all calls would share the same +set of lexical variables [perl #115742]. + +=item * + +C<*_{ARRAY}> returned from a subroutine no longer spontaneously +becomes empty. + +=item * + +When using C<say> to print to a tied filehandle, the value of C<$\> is +correctly localized, even if it was previously undef. [perl #119927] + +=back + +=head1 Known Problems + +=over 4 + +=item * + +UTF8-flagged strings in C<%ENV> on HP-UX 11.00 are buggy + +The interaction of UTF8-flagged strings and C<%ENV> on HP-UX 11.00 is +currently dodgy in some not-yet-fully-diagnosed way. Expect test +failures in F<t/op/magic.t>, followed by unknown behavior when storing +wide characters in the environment. + +=back + +=head1 Obituary + +Hojung Yoon (AMORETTE), 24, of Seoul, South Korea, went to his long rest +on May 8, 2013 with llama figurine and autographed TIMTOADY card. He +was a brilliant young Perl 5 & 6 hacker and a devoted member of +Seoul.pm. He programmed Perl, talked Perl, ate Perl, and loved Perl. We +believe that he is still programming in Perl with his broken IBM laptop +somewhere. He will be missed. + +=head1 Acknowledgements + +Perl v5.18.0 represents approximately 12 months of development since +Perl v5.16.0 and contains approximately 400,000 lines of changes across +2,100 files from 113 authors. + +Perl continues to flourish into its third decade thanks to a vibrant +community of users and developers. The following people are known to +have contributed the improvements that became Perl v5.18.0: + +Aaron Crane, Aaron Trevena, Abhijit Menon-Sen, Adrian M. Enache, Alan +Haggai Alavi, Alexandr Ciornii, Andrew Tam, Andy Dougherty, Anton Nikishaev, +Aristotle Pagaltzis, Augustina Blair, Bob Ernst, Brad Gilbert, Breno G. de +Oliveira, Brian Carlson, Brian Fraser, Charlie Gonzalez, Chip Salzenberg, Chris +'BinGOs' Williams, Christian Hansen, Colin Kuskie, Craig A. Berry, Dagfinn +Ilmari Mannsåker, Daniel Dragan, Daniel Perrett, Darin McBride, Dave Rolsky, +David Golden, David Leadbeater, David Mitchell, David Nicol, Dominic +Hargreaves, E. Choroba, Eric Brine, Evan Miller, Father Chrysostomos, Florian +Ragwitz, François Perrad, George Greer, Goro Fuji, H.Merijn Brand, Herbert +Breunung, Hugo van der Sanden, Igor Zaytsev, James E Keenan, Jan Dubois, +Jasmine Ahuja, Jerry D. Hedden, Jess Robinson, Jesse Luehrs, Joaquin Ferrero, +Joel Berger, John Goodyear, John Peacock, Karen Etheridge, Karl Williamson, +Karthik Rajagopalan, Kent Fredric, Leon Timmermans, Lucas Holt, Lukas Mai, +Marcus Holland-Moritz, Markus Jansen, Martin Hasch, Matthew Horsfall, Max +Maischein, Michael G Schwern, Michael Schroeder, Moritz Lenz, Nicholas Clark, +Niko Tyni, Oleg Nesterov, Patrik Hägglund, Paul Green, Paul Johnson, Paul +Marquess, Peter Martini, Rafael Garcia-Suarez, Reini Urban, Renee Baecker, +Rhesa Rozendaal, Ricardo Signes, Robin Barker, Ronald J. Kimball, Ruslan +Zakirov, Salvador Fandiño, Sawyer X, Scott Lanning, Sergey Alekseev, Shawn M +Moore, Shirakata Kentaro, Shlomi Fish, Sisyphus, Smylers, Steffen Müller, +Steve Hay, Steve Peters, Steven Schubiger, Sullivan Beck, Sven Strickroth, +Sébastien Aperghis-Tramoni, Thomas Sibley, Tobias Leich, Tom Wyant, Tony Cook, +Vadim Konovalov, Vincent Pit, Volker Schatz, Walt Mankowski, Yves Orton, +Zefram. + +The list above is almost certainly incomplete as it is automatically generated +from version control history. In particular, it does not include the names of +the (very much appreciated) contributors who reported issues to the Perl bug +tracker. + +Many of the changes included in this version originated in the CPAN modules +included in Perl's core. We're grateful to the entire CPAN community for +helping Perl to flourish. + +For a more complete list of all of Perl's historical contributors, please see +the F<AUTHORS> file in the Perl source distribution. + +=head1 Reporting Bugs + +If you find what you think is a bug, you might check the articles recently +posted to the comp.lang.perl.misc newsgroup and the perl bug database at +http://rt.perl.org/perlbug/ . There may also be information at +http://www.perl.org/ , the Perl Home Page. + +If you believe you have an unreported bug, please run the L<perlbug> program +included with your release. Be sure to trim your bug down to a tiny but +sufficient test case. Your bug report, along with the output of C<perl -V>, +will be sent off to perlbug@perl.org to be analysed by the Perl porting team. + +If the bug you are reporting has security implications, which make it +inappropriate to send to a publicly archived mailing list, then please send it +to perl5-security-report@perl.org. This points to a closed subscription +unarchived mailing list, which includes all the core committers, who will be +able to help assess the impact of issues, figure out a resolution, and help +co-ordinate the release of patches to mitigate or fix the problem across all +platforms on which Perl is supported. Please only use this address for +security issues in the Perl core, not for modules independently distributed on +CPAN. + +=head1 SEE ALSO + +The F<Changes> file for an explanation of how to view exhaustive details on +what changed. + +The F<INSTALL> file for how to build Perl. + +The F<README> file for general stuff. + +The F<Artistic> and F<Copying> files for copyright information. + +=cut diff --git a/gnu/usr.bin/perl/pod/perl5181delta.pod b/gnu/usr.bin/perl/pod/perl5181delta.pod new file mode 100644 index 00000000000..93fb251991f --- /dev/null +++ b/gnu/usr.bin/perl/pod/perl5181delta.pod @@ -0,0 +1,217 @@ +=encoding utf8 + +=head1 NAME + +perl5181delta - what is new for perl v5.18.1 + +=head1 DESCRIPTION + +This document describes differences between the 5.18.0 release and the 5.18.1 +release. + +If you are upgrading from an earlier release such as 5.16.0, first read +L<perl5180delta>, which describes differences between 5.16.0 and 5.18.0. + +=head1 Incompatible Changes + +There are no changes intentionally incompatible with 5.18.0 +If any exist, they are bugs, and we request that you submit a +report. See L</Reporting Bugs> below. + +=head1 Modules and Pragmata + +=head2 Updated Modules and Pragmata + +=over 4 + +=item * + +B has been upgraded from 1.42 to 1.42_01, fixing bugs related to lexical +subroutines. + +=item * + +Digest::SHA has been upgraded from 5.84 to 5.84_01, fixing a crashing bug. +[RT #118649] + +=item * + +Module::CoreList has been upgraded from 2.89 to 2.96. + +=back + +=head1 Platform Support + +=head2 Platform-Specific Notes + +=over 4 + +=item AIX + +A rarely-encounted configuration bug in the AIX hints file has been corrected. + +=item MidnightBSD + +After a patch to the relevant hints file, perl should now build correctly on +MidnightBSD 0.4-RELEASE. + +=back + +=head1 Selected Bug Fixes + +=over 4 + +=item * + +Starting in v5.18.0, a construct like C</[#](?{})/x> would have its C<#> +incorrectly interpreted as a comment. The code block would be skipped, +unparsed. This has been corrected. + +=item * + +A number of memory leaks related to the new, experimental regexp bracketed +character class feature have been plugged. + +=item * + +The OP allocation code now returns correctly aligned memory in all cases +for C<struct pmop>. Previously it could return memory only aligned to a +4-byte boundary, which is not correct for an ithreads build with 64 bit IVs +on some 32 bit platforms. Notably, this caused the build to fail completely +on sparc GNU/Linux. [RT #118055] + +=item * + +The debugger's C<man> command been fixed. It was broken in the v5.18.0 +release. The C<man> command is aliased to the names C<doc> and C<perldoc> - +all now work again. + +=item * + +C<@_> is now correctly visible in the debugger, fixing a regression +introduced in v5.18.0's debugger. [RT #118169] + +=item * + +Fixed a small number of regexp constructions that could either fail to +match or crash perl when the string being matched against was +allocated above the 2GB line on 32-bit systems. [RT #118175] + +=item * + +Perl v5.16 inadvertently introduced a bug whereby calls to XSUBs that were +not visible at compile time were treated as lvalues and could be assigned +to, even when the subroutine was not an lvalue sub. This has been fixed. +[perl #117947] + +=item * + +Perl v5.18 inadvertently introduced a bug whereby dual-vars (i.e. +variables with both string and numeric values, such as C<$!> ) where the +truthness of the variable was determined by the numeric value rather than +the string value. [RT #118159] + +=item * + +Perl v5.18 inadvertently introduced a bug whereby interpolating mixed up- +and down-graded UTF-8 strings in a regex could result in malformed UTF-8 +in the pattern: specifically if a downgraded character in the range +C<\x80..\xff> followed a UTF-8 string, e.g. + + utf8::upgrade( my $u = "\x{e5}"); + utf8::downgrade(my $d = "\x{e5}"); + /$u$d/ + +[perl #118297]. + +=item * + +Lexical constants (C<my sub a() { 42 }>) no longer crash when inlined. + +=item * + +Parameter prototypes attached to lexical subroutines are now respected when +compiling sub calls without parentheses. Previously, the prototypes were +honoured only for calls I<with> parentheses. [RT #116735] + +=item * + +Syntax errors in lexical subroutines in combination with calls to the same +subroutines no longer cause crashes at compile time. + +=item * + +The dtrace sub-entry probe now works with lexical subs, instead of +crashing [perl #118305]. + +=item * + +Undefining an inlinable lexical subroutine (C<my sub foo() { 42 } undef +&foo>) would result in a crash if warnings were turned on. + +=item * + +Deep recursion warnings no longer crash lexical subroutines. [RT #118521] + +=back + +=head1 Acknowledgements + +Perl 5.18.1 represents approximately 2 months of development since Perl 5.18.0 +and contains approximately 8,400 lines of changes across 60 files from 12 +authors. + +Perl continues to flourish into its third decade thanks to a vibrant community +of users and developers. The following people are known to have contributed the +improvements that became Perl 5.18.1: + +Chris 'BinGOs' Williams, Craig A. Berry, Dagfinn Ilmari Mannsåker, David +Mitchell, Father Chrysostomos, Karl Williamson, Lukas Mai, Nicholas Clark, +Peter Martini, Ricardo Signes, Shlomi Fish, Tony Cook. + +The list above is almost certainly incomplete as it is automatically generated +from version control history. In particular, it does not include the names of +the (very much appreciated) contributors who reported issues to the Perl bug +tracker. + +Many of the changes included in this version originated in the CPAN modules +included in Perl's core. We're grateful to the entire CPAN community for +helping Perl to flourish. + +For a more complete list of all of Perl's historical contributors, please see +the F<AUTHORS> file in the Perl source distribution. + +=head1 Reporting Bugs + +If you find what you think is a bug, you might check the articles recently +posted to the comp.lang.perl.misc newsgroup and the perl bug database at +http://rt.perl.org/perlbug/ . There may also be information at +http://www.perl.org/ , the Perl Home Page. + +If you believe you have an unreported bug, please run the L<perlbug> program +included with your release. Be sure to trim your bug down to a tiny but +sufficient test case. Your bug report, along with the output of C<perl -V>, +will be sent off to perlbug@perl.org to be analysed by the Perl porting team. + +If the bug you are reporting has security implications, which make it +inappropriate to send to a publicly archived mailing list, then please send it +to perl5-security-report@perl.org. This points to a closed subscription +unarchived mailing list, which includes all the core committers, who will be +able to help assess the impact of issues, figure out a resolution, and help +co-ordinate the release of patches to mitigate or fix the problem across all +platforms on which Perl is supported. Please only use this address for +security issues in the Perl core, not for modules independently distributed on +CPAN. + +=head1 SEE ALSO + +The F<Changes> file for an explanation of how to view exhaustive details on +what changed. + +The F<INSTALL> file for how to build Perl. + +The F<README> file for general stuff. + +The F<Artistic> and F<Copying> files for copyright information. + +=cut diff --git a/gnu/usr.bin/perl/pod/perlcheat.pod b/gnu/usr.bin/perl/pod/perlcheat.pod index deee2fecdfb..f288692a874 100644 --- a/gnu/usr.bin/perl/pod/perlcheat.pod +++ b/gnu/usr.bin/perl/pod/perlcheat.pod @@ -23,42 +23,43 @@ already be overwhelming. [] anon. arrayref ${$$foo[1]}[2] aka $foo->[1]->[2] {} anon. hashref ${$$foo[1]}[2] aka $foo->[1][2] \() list of refs - NUMBERS vs STRINGS LINKS - OPERATOR PRECEDENCE = = perldoc.perl.org - -> + . search.cpan.org - ++ -- == != eq ne cpan.org - ** < > <= >= lt gt le ge pm.org - ! ~ \ u+ u- <=> cmp p3rl.org - =~ !~ perlmonks.org - * / % x SYNTAX - + - . foreach (LIST) { } for (a;b;c) { } - << >> while (e) { } until (e) { } - named uops if (e) { } elsif (e) { } else { } - < > <= >= lt gt le ge unless (e) { } elsif (e) { } else { } - == != <=> eq ne cmp ~~ given (e) { when (e) {} default {} } + SYNTAX + OPERATOR PRECEDENCE foreach (LIST) { } for (a;b;c) { } + -> while (e) { } until (e) { } + ++ -- if (e) { } elsif (e) { } else { } + ** unless (e) { } elsif (e) { } else { } + ! ~ \ u+ u- given (e) { when (e) {} default {} } + =~ !~ + * / % x NUMBERS vs STRINGS FALSE vs TRUE + + - . = = undef, "", 0, "0" + << >> + . anything else + named uops == != eq ne + < > <= >= lt gt le ge < > <= >= lt gt le ge + == != <=> eq ne cmp ~~ <=> cmp & - | ^ REGEX METACHARS REGEX MODIFIERS - && ^ string begin /i case insensitive - || // $ str end (bfr \n) /m line based ^$ - .. ... + one or more /s . includes \n - ?: * zero or more /x ignore wh.space - = += -= *= etc ? zero or one /p preserve - , => {3,7} repeat in range /a ASCII /aa safe - list ops | alternation /l locale /d dual - not [] character class /u Unicode - and \b word boundary /e evaluate /ee rpts - or xor \z string end /g global - () capture /o compile pat once - DEBUG (?:p) no capture - -MO=Deparse (?#t) comment REGEX CHARCLASSES - -MO=Terse (?=p) ZW pos ahead . [^\n] - -D## (?!p) ZW neg ahead \s whitespace - -d:Trace (?<=p) ZW pos behind \K \w word chars - (?<!p) ZW neg behind \d digits - CONFIGURATION (?>p) no backtrack \pP named property - perl -V:ivsize (?|p|p)branch reset \h horiz.wh.space - (?&NM) cap to name \R linebreak - \S \W \D \H negate + | ^ REGEX MODIFIERS REGEX METACHARS + && /i case insensitive ^ string begin + || // /m line based ^$ $ str end (bfr \n) + .. ... /s . includes \n + one or more + ?: /x ignore wh.space * zero or more + = += last goto /p preserve ? zero or one + , => /a ASCII /aa safe {3,7} repeat in range + list ops /l locale /d dual | alternation + not /u Unicode [] character class + and /e evaluate /ee rpts \b word boundary + or xor /g global \z string end + /o compile pat once () capture + DEBUG (?:p) no capture + -MO=Deparse REGEX CHARCLASSES (?#t) comment + -MO=Terse . [^\n] (?=p) ZW pos ahead + -D## \s whitespace (?!p) ZW neg ahead + -d:Trace \w word chars (?<=p) ZW pos behind \K + \d digits (?<!p) ZW neg behind + CONFIGURATION \pP named property (?>p) no backtrack + perl -V:ivsize \h horiz.wh.space (?|p|p)branch reset + \R linebreak (?<n>p)named capture + \S \W \D \H negate \g{n} ref to named cap + \K keep left part FUNCTION RETURN LISTS stat localtime caller SPECIAL VARIABLES 0 dev 0 second 0 package $_ default variable diff --git a/gnu/usr.bin/perl/pod/perlclib.pod b/gnu/usr.bin/perl/pod/perlclib.pod index 0785577dace..ef0b6b02344 100644 --- a/gnu/usr.bin/perl/pod/perlclib.pod +++ b/gnu/usr.bin/perl/pod/perlclib.pod @@ -99,8 +99,8 @@ There is no equivalent to C<fgets>; one should use C<sv_gets> instead: Instead Of: Use: - t* p = malloc(n) Newx(id, p, n, t) - t* p = calloc(n, s) Newxz(id, p, n, t) + t* p = malloc(n) Newx(p, n, t) + t* p = calloc(n, s) Newxz(p, n, t) p = realloc(p, n) Renew(p, n, t) memcpy(dst, src, n) Copy(src, dst, n, t) memmove(dst, src, n) Move(src, dst, n, t) diff --git a/gnu/usr.bin/perl/pod/perlcommunity.pod b/gnu/usr.bin/perl/pod/perlcommunity.pod index 96c7b85486e..2acb0e2399b 100644 --- a/gnu/usr.bin/perl/pod/perlcommunity.pod +++ b/gnu/usr.bin/perl/pod/perlcommunity.pod @@ -65,11 +65,17 @@ Run by O'Reilly Media (the publisher of L<the Camel Book|perlbook>, among other Perl-related literature), perl.com provides current Perl news, articles, and resources for Perl developers as well as a directory of other useful websites. +=item L<http://blogs.perl.org/> + +Many members of the community have a Perl-related blog on this site. If +you'd like to join them, you can sign up for free. + =item L<http://use.perl.org/> -use Perl; provides a slashdot-style Perl news website covering all things Perl, -from minutes of the meetings of the Perl 6 Design team to conference -announcements with (ir)relevant discussion. +use Perl; used to provide a slashdot-style news/blog website covering all +things Perl, from minutes of the meetings of the Perl 6 Design team to +conference announcements with (ir)relevant discussion. It no longer accepts +updates, but you can still use the site to read old entries and comments. =back @@ -83,6 +89,12 @@ PerlMonks is one of the largest Perl forums, and describes itself as "A place for individuals to polish, improve, and showcase their Perl skills." and "A community which allows everyone to grow and learn from each other." +=item L<http://stackoverflow.com/> + +Stack Overflow is a free question-and-answer site for programmers. It's not +focussed solely on Perl, but it does have an active group of users who do +their best to help people with their Perl programming questions. + =back =head2 User Groups diff --git a/gnu/usr.bin/perl/pod/perldebguts.pod b/gnu/usr.bin/perl/pod/perldebguts.pod index 8ae6e7baa96..a17a6b4aa35 100644 --- a/gnu/usr.bin/perl/pod/perldebguts.pod +++ b/gnu/usr.bin/perl/pod/perldebguts.pod @@ -38,7 +38,6 @@ Each array C<@{"_<$filename"}> holds the lines of $filename for a file compiled by Perl. The same is also true for C<eval>ed strings that contain subroutines, or which are currently being executed. The $filename for C<eval>ed strings looks like C<(eval 34)>. -Code assertions in regexes look like C<(re_eval 19)>. Values in this array are magical in numeric context: they compare equal to zero only if the line is not breakable. @@ -53,14 +52,14 @@ C<"$break_condition\0$action">. The same holds for evaluated strings that contain subroutines, or which are currently being executed. The $filename for C<eval>ed strings -looks like C<(eval 34)> or C<(re_eval 19)>. +looks like C<(eval 34)>. =item * Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is also the case for evaluated strings that contain subroutines, or which are currently being executed. The $filename for C<eval>ed -strings looks like C<(eval 34)> or C<(re_eval 19)>. +strings looks like C<(eval 34)>. =item * @@ -81,7 +80,7 @@ also exists. A hash C<%DB::sub> is maintained, whose keys are subroutine names and whose values have the form C<filename:startline-endline>. C<filename> has the form C<(eval 34)> for subroutines defined inside -C<eval>s, or C<(re_eval 19)> for those within regex code assertions. +C<eval>s. =item * @@ -536,236 +535,258 @@ C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) Here are the possible types, with short descriptions: +=for comment +This table is generated by regen/regcomp.pl. Any changes made here +will be lost. + +=for regcomp.pl begin + # TYPE arg-description [num-args] [longjump-len] DESCRIPTION # Exit points - END no End of program. - SUCCEED no Return from a subroutine, basically. + + END no End of program. + SUCCEED no Return from a subroutine, basically. # Anchors: - BOL no Match "" at beginning of line. - MBOL no Same, assuming multiline. - SBOL no Same, assuming singleline. - EOS no Match "" at end of string. - EOL no Match "" at end of line. - MEOL no Same, assuming multiline. - SEOL no Same, assuming singleline. - BOUND no Match "" at any word boundary using native charset - semantics for non-utf8 - BOUNDL no Match "" at any locale word boundary - BOUNDU no Match "" at any word boundary using Unicode semantics - BOUNDA no Match "" at any word boundary using ASCII semantics - NBOUND no Match "" at any word non-boundary using native charset - semantics for non-utf8 - NBOUNDL no Match "" at any locale word non-boundary - NBOUNDU no Match "" at any word non-boundary using Unicode semantics - NBOUNDA no Match "" at any word non-boundary using ASCII semantics - GPOS no Matches where last m//g left off. + BOL no Match "" at beginning of line. + MBOL no Same, assuming multiline. + SBOL no Same, assuming singleline. + EOS no Match "" at end of string. + EOL no Match "" at end of line. + MEOL no Same, assuming multiline. + SEOL no Same, assuming singleline. + BOUND no Match "" at any word boundary using + native charset semantics for non-utf8 + BOUNDL no Match "" at any locale word boundary + BOUNDU no Match "" at any word boundary using + Unicode semantics + BOUNDA no Match "" at any word boundary using ASCII + semantics + NBOUND no Match "" at any word non-boundary using + native charset semantics for non-utf8 + NBOUNDL no Match "" at any locale word non-boundary + NBOUNDU no Match "" at any word non-boundary using + Unicode semantics + NBOUNDA no Match "" at any word non-boundary using + ASCII semantics + GPOS no Matches where last m//g left off. # [Special] alternatives: - REG_ANY no Match any one character (except newline). - SANY no Match any one character. - CANY no Match any one byte. - ANYOF sv Match character in (or not in) this class, single char - match only - ANYOFV sv Match character in (or not in) this class, can - match-multiple chars - ALNUM no Match any alphanumeric character using native charset - semantics for non-utf8 - ALNUML no Match any alphanumeric char in locale - ALNUMU no Match any alphanumeric char using Unicode semantics - ALNUMA no Match [A-Za-z_0-9] - NALNUM no Match any non-alphanumeric character using native charset - semantics for non-utf8 - NALNUML no Match any non-alphanumeric char in locale - NALNUMU no Match any non-alphanumeric char using Unicode semantics - NALNUMA no Match [^A-Za-z_0-9] - SPACE no Match any whitespace character using native charset - semantics for non-utf8 - SPACEL no Match any whitespace char in locale - SPACEU no Match any whitespace char using Unicode semantics - SPACEA no Match [ \t\n\f\r] - NSPACE no Match any non-whitespace character using native charset - semantics for non-utf8 - NSPACEL no Match any non-whitespace char in locale - NSPACEU no Match any non-whitespace char using Unicode semantics - NSPACEA no Match [^ \t\n\f\r] - DIGIT no Match any numeric character using native charset semantics - for non-utf8 - DIGITL no Match any numeric character in locale - DIGITA no Match [0-9] - NDIGIT no Match any non-numeric character using native charset - i semantics for non-utf8 - NDIGITL no Match any non-numeric character in locale - NDIGITA no Match [^0-9] - CLUMP no Match any extended grapheme cluster sequence + REG_ANY no Match any one character (except newline). + SANY no Match any one character. + CANY no Match any one byte. + ANYOF sv Match character in (or not in) this + class, single char match only + ANYOF_WARN_SUPER sv Match character in (or not in) this + class, warn (if enabled) upon matching a + char above Unicode max; + ANYOF_SYNTHETIC sv Synthetic start class + + POSIXD none Some [[:class:]] under /d; the FLAGS + field gives which one + POSIXL none Some [[:class:]] under /l; the FLAGS + field gives which one + POSIXU none Some [[:class:]] under /u; the FLAGS + field gives which one + POSIXA none Some [[:class:]] under /a; the FLAGS + field gives which one + NPOSIXD none complement of POSIXD, [[:^class:]] + NPOSIXL none complement of POSIXL, [[:^class:]] + NPOSIXU none complement of POSIXU, [[:^class:]] + NPOSIXA none complement of POSIXA, [[:^class:]] + + CLUMP no Match any extended grapheme cluster + sequence # Alternation - # BRANCH The set of branches constituting a single choice are hooked - # together with their "next" pointers, since precedence prevents - # anything being concatenated to any individual branch. The - # "next" pointer of the last BRANCH in a choice points to the - # thing following the whole choice. This is also where the - # final "next" pointer of each individual branch points; each - # branch starts with the operand node of a BRANCH node. + # BRANCH The set of branches constituting a single choice are + # hooked together with their "next" pointers, since + # precedence prevents anything being concatenated to + # any individual branch. The "next" pointer of the last + # BRANCH in a choice points to the thing following the + # whole choice. This is also where the final "next" + # pointer of each individual branch points; each branch + # starts with the operand node of a BRANCH node. # - BRANCH node Match this alternative, or the next... + BRANCH node Match this alternative, or the next... # Back pointer - # BACK Normal "next" pointers all implicitly point forward; BACK - # exists to make loop structures possible. + # BACK Normal "next" pointers all implicitly point forward; + # BACK exists to make loop structures possible. # not used - BACK no Match "", "next" ptr points backward. + BACK no Match "", "next" ptr points backward. # Literals - EXACT str Match this string (preceded by length). - EXACTF str Match this string, folded, native charset semantics for - non-utf8 (prec. by length). - EXACTFL str Match this string, folded in locale (w/len). - EXACTFU str Match this string, folded, Unicode semantics for non-utf8 - (prec. by length). - EXACTFA str Match this string, folded, Unicode semantics for non-utf8, - but no ASCII-range character matches outside ASCII (prec. - by length),. + EXACT str Match this string (preceded by length). + EXACTF str Match this non-UTF-8 string (not + guaranteed to be folded) using /id rules + (w/len). + EXACTFL str Match this string (not guaranteed to be + folded) using /il rules (w/len). + EXACTFU str Match this string (folded iff in UTF-8, + length in folding doesn't change if not + in UTF-8) using /iu rules (w/len). + EXACTFA str Match this string (not guaranteed to be + folded) using /iaa rules (w/len). + EXACTFU_SS str Match this string (folded iff in UTF-8, + length in folding may change even if not + in UTF-8) using /iu rules (w/len). + EXACTFU_TRICKYFOLD str Match this folded UTF-8 string using /iu + rules # Do nothing types - NOTHING no Match empty string. + NOTHING no Match empty string. # A variant of above which delimits a group, thus stops optimizations - TAIL no Match empty string. Can jump here from outside. + TAIL no Match empty string. Can jump here from + outside. # Loops - # STAR,PLUS '?', and complex '*' and '+', are implemented as circular - # BRANCH structures using BACK. Simple cases (one character - # per match) are implemented with STAR and PLUS for speed - # and to minimize recursive plunges. + # STAR,PLUS '?', and complex '*' and '+', are implemented as + # circular BRANCH structures using BACK. Simple cases + # (one character per match) are implemented with STAR + # and PLUS for speed and to minimize recursive plunges. # - STAR node Match this (simple) thing 0 or more times. - PLUS node Match this (simple) thing 1 or more times. + STAR node Match this (simple) thing 0 or more + times. + PLUS node Match this (simple) thing 1 or more + times. - CURLY sv 2 Match this simple thing {n,m} times. - CURLYN no 2 Capture next-after-this simple thing - CURLYM no 2 Capture this medium-complex thing {n,m} times. - CURLYX sv 2 Match this complex thing {n,m} times. + CURLY sv 2 Match this simple thing {n,m} times. + CURLYN no 2 Capture next-after-this simple thing + CURLYM no 2 Capture this medium-complex thing {n,m} + times. + CURLYX sv 2 Match this complex thing {n,m} times. # This terminator creates a loop structure for CURLYX - WHILEM no Do curly processing and see if rest matches. + WHILEM no Do curly processing and see if rest + matches. # Buffer related # OPEN,CLOSE,GROUPP ...are numbered at compile time. - OPEN num 1 Mark this point in input as start of #n. - CLOSE num 1 Analogous to OPEN. - - REF num 1 Match some already matched string - REFF num 1 Match already matched string, folded using native charset - semantics for non-utf8 - REFFL num 1 Match already matched string, folded in loc. - REFFU num 1 Match already matched string, folded using unicode - semantics for non-utf8 - REFFA num 1 Match already matched string, folded using unicode - semantics for non-utf8, no mixing ASCII, non-ASCII - - # Named references. Code in regcomp.c assumes that these all are after the - # numbered references - NREF no-sv 1 Match some already matched string - NREFF no-sv 1 Match already matched string, folded using native charset - semantics for non-utf8 - NREFFL no-sv 1 Match already matched string, folded in loc. - NREFFU num 1 Match already matched string, folded using unicode - semantics for non-utf8 - NREFFA num 1 Match already matched string, folded using unicode - semantics for non-utf8, no mixing ASCII, non-ASCII - - IFMATCH off 1 2 Succeeds if the following matches. - UNLESSM off 1 2 Fails if the following matches. - SUSPEND off 1 1 "Independent" sub-RE. - IFTHEN off 1 1 Switch, should be preceded by switcher. - GROUPP num 1 Whether the group matched. + OPEN num 1 Mark this point in input as start of #n. + CLOSE num 1 Analogous to OPEN. + + REF num 1 Match some already matched string + REFF num 1 Match already matched string, folded + using native charset semantics for non- + utf8 + REFFL num 1 Match already matched string, folded in + loc. + REFFU num 1 Match already matched string, folded + using unicode semantics for non-utf8 + REFFA num 1 Match already matched string, folded + using unicode semantics for non-utf8, no + mixing ASCII, non-ASCII + + # Named references. Code in regcomp.c assumes that these all are after + # the numbered references + NREF no-sv 1 Match some already matched string + NREFF no-sv 1 Match already matched string, folded + using native charset semantics for non- + utf8 + NREFFL no-sv 1 Match already matched string, folded in + loc. + NREFFU num 1 Match already matched string, folded + using unicode semantics for non-utf8 + NREFFA num 1 Match already matched string, folded + using unicode semantics for non-utf8, no + mixing ASCII, non-ASCII + + IFMATCH off 1 2 Succeeds if the following matches. + UNLESSM off 1 2 Fails if the following matches. + SUSPEND off 1 1 "Independent" sub-RE. + IFTHEN off 1 1 Switch, should be preceded by switcher. + GROUPP num 1 Whether the group matched. # Support for long RE - LONGJMP off 1 1 Jump far away. - BRANCHJ off 1 1 BRANCH with long offset. + LONGJMP off 1 1 Jump far away. + BRANCHJ off 1 1 BRANCH with long offset. # The heavy worker - EVAL evl 1 Execute some Perl code. + EVAL evl 1 Execute some Perl code. # Modifiers - MINMOD no Next operator is not greedy. - LOGICAL no Next opcode should set the flag only. + MINMOD no Next operator is not greedy. + LOGICAL no Next opcode should set the flag only. # This is not used yet - RENUM off 1 1 Group with independently numbered parens. + RENUM off 1 1 Group with independently numbered parens. # Trie Related - # Behave the same as A|LIST|OF|WORDS would. The '..C' variants have - # inline charclass data (ascii only), the 'C' store it in the structure. - # NOTE: the relative order of the TRIE-like regops is significant + # Behave the same as A|LIST|OF|WORDS would. The '..C' variants + # have inline charclass data (ascii only), the 'C' store it in the + # structure. - TRIE trie 1 Match many EXACT(F[ALU]?)? at once. flags==type - TRIEC charclass Same as TRIE, but with embedded charclass data + TRIE trie 1 Match many EXACT(F[ALU]?)? at once. + flags==type + TRIEC trie Same as TRIE, but with embedded charclass + charclass data - # For start classes, contains an added fail table. - AHOCORASICK trie 1 Aho Corasick stclass. flags==type - AHOCORASICKC charclass Same as AHOCORASICK, but with embedded charclass data + AHOCORASICK trie 1 Aho Corasick stclass. flags==type + AHOCORASICKC trie Same as AHOCORASICK, but with embedded + charclass charclass data # Regex Subroutines - GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs arg2 - GOSTART no recurse to start of pattern + GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs + arg2 + GOSTART no recurse to start of pattern # Special conditionals - NGROUPP no-sv 1 Whether the group matched. - INSUBP num 1 Whether we are in a specific recurse. - DEFINEP none 1 Never execute directly. + NGROUPP no-sv 1 Whether the group matched. + INSUBP num 1 Whether we are in a specific recurse. + DEFINEP none 1 Never execute directly. # Backtracking Verbs - ENDLIKE none Used only for the type field of verbs - OPFAIL none Same as (?!) - ACCEPT parno 1 Accepts the current matched string. - + ENDLIKE none Used only for the type field of verbs + OPFAIL none Same as (?!) + ACCEPT parno 1 Accepts the current matched string. # Verbs With Arguments - VERB no-sv 1 Used only for the type field of verbs - PRUNE no-sv 1 Pattern fails at this startpoint if no-backtracking through this - MARKPOINT no-sv 1 Push the current location for rollback by cut. - SKIP no-sv 1 On failure skip forward (to the mark) before retrying - COMMIT no-sv 1 Pattern fails outright if backtracking through this - CUTGROUP no-sv 1 On failure go to the next alternation in the group + VERB no-sv 1 Used only for the type field of verbs + PRUNE no-sv 1 Pattern fails at this startpoint if no- + backtracking through this + MARKPOINT no-sv 1 Push the current location for rollback by + cut. + SKIP no-sv 1 On failure skip forward (to the mark) + before retrying + COMMIT no-sv 1 Pattern fails outright if backtracking + through this + CUTGROUP no-sv 1 On failure go to the next alternation in + the group # Control what to keep in $&. - KEEPS no $& begins here. + KEEPS no $& begins here. # New charclass like patterns - LNBREAK none generic newline pattern - VERTWS none vertical whitespace (Perl 6) - NVERTWS none not vertical whitespace (Perl 6) - HORIZWS none horizontal whitespace (Perl 6) - NHORIZWS none not horizontal whitespace (Perl 6) - - FOLDCHAR codepoint 1 codepoint with tricky case folding properties. + LNBREAK none generic newline pattern # SPECIAL REGOPS - # This is not really a node, but an optimized away piece of a "long" node. - # To simplify debugging output, we mark it as if it were a node - OPTIMIZED off Placeholder for dump. + # This is not really a node, but an optimized away piece of a "long" + # node. To simplify debugging output, we mark it as if it were a node + OPTIMIZED off Placeholder for dump. # Special opcode with the property that no opcode in a compiled program # will ever be of this type. Thus it can be used as a flag value that # no other opcode has been seen. END is used similarly, in that an END - # node cant be optimized. So END implies "unoptimizable" and PSEUDO mean - # "not seen anything to optimize yet". - PSEUDO off Pseudo opcode for internal use. + # node cant be optimized. So END implies "unoptimizable" and PSEUDO + # mean "not seen anything to optimize yet". + PSEUDO off Pseudo opcode for internal use. + +=for regcomp.pl end =for unprinted-credits Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 diff --git a/gnu/usr.bin/perl/pod/perldtrace.pod b/gnu/usr.bin/perl/pod/perldtrace.pod index 39551e17490..2cec25935bd 100644 --- a/gnu/usr.bin/perl/pod/perldtrace.pod +++ b/gnu/usr.bin/perl/pod/perldtrace.pod @@ -55,6 +55,10 @@ package name of the function. The C<phase-change> probe was added. +=item 5.18.0 + +The C<op-entry>, C<loading-file>, and C<loaded-file> probes were added. + =back =head1 PROBES @@ -70,7 +74,7 @@ I<caller> from a DTrace action. :*perl*::sub-entry { printf("%s::%s entered at %s line %d\n", - copyinstr(arg3), copyinstr(arg0), copyinstr(arg1), arg0); + copyinstr(arg3), copyinstr(arg0), copyinstr(arg1), arg2); } =item sub-return(SUBNAME, FILE, LINE, PACKAGE) @@ -82,7 +86,7 @@ from a DTrace action. :*perl*::sub-return { printf("%s::%s returned at %s line %d\n", - copyinstr(arg3), copyinstr(arg0), copyinstr(arg1), arg0); + copyinstr(arg3), copyinstr(arg0), copyinstr(arg1), arg2); } =item phase-change(NEWPHASE, OLDPHASE) @@ -97,6 +101,40 @@ C<${^GLOBAL_PHASE}> reports. copyinstr(arg1), copyinstr(arg0)); } +=item op-entry(OPNAME) + +Traces the execution of each opcode in the Perl runloop. This probe +is fired before the opcode is executed. When the Perl debugger is +enabled, the DTrace probe is fired I<after> the debugger hooks (but +still before the opcode itself is executed). + + :*perl*::op-entry { + printf("About to execute opcode %s\n", copyinstr(arg0)); + } + +=item loading-file(FILENAME) + +Fires when Perl is about to load an individual file, whether from +C<use>, C<require>, or C<do>. This probe fires before the file is +read from disk. The filename argument is converted to local filesystem +paths instead of providing C<Module::Name>-style names. + + :*perl*:loading-file { + printf("About to load %s\n", copyinstr(arg0)); + } + +=item loaded-file(FILENAME) + +Fires when Perl has successfully loaded an individual file, whether +from C<use>, C<require>, or C<do>. This probe fires after the file +is read from disk and its contentss evaluated. The filename argument +is converted to local filesystem paths instead of providing +C<Module::Name>-style names. + + :*perl*:loaded-file { + printf("Successfully loaded %s\n", copyinstr(arg0)); + } + =back =head1 EXAMPLES @@ -156,15 +194,23 @@ C<${^GLOBAL_PHASE}> reports. read 374 stat64 1056 +=item Perl functions that execute the most opcodes + + # dtrace -qZn 'sub-entry { self->fqn = strjoin(copyinstr(arg3), strjoin("::", copyinstr(arg0))) } op-entry /self->fqn != ""/ { @[self->fqn] = count() } END { trunc(@, 3) }' + + warnings::unimport 4589 + Exporter::Heavy::_rebuild_cache 5039 + Exporter::import 14578 + =back =head1 REFERENCES =over 4 -=item DTrace User Guide +=item DTrace Dynamic Tracing Guide -L<http://download.oracle.com/docs/cd/E19082-01/819-3620/index.html> +L<http://dtrace.org/guide/preface.html> =item DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD @@ -172,6 +218,17 @@ L<http://www.amazon.com/DTrace-Dynamic-Tracing-Solaris-FreeBSD/dp/0132091518/> =back +=head1 SEE ALSO + +=over 4 + +=item L<Devel::DTrace::Provider> + +This CPAN module lets you create application-level DTrace probes written in +Perl. + +=back + =head1 AUTHORS Shawn M Moore C<sartak@gmail.com> diff --git a/gnu/usr.bin/perl/pod/perlebcdic.pod b/gnu/usr.bin/perl/pod/perlebcdic.pod index ecd0676415f..2256fb1ef60 100644 --- a/gnu/usr.bin/perl/pod/perlebcdic.pod +++ b/gnu/usr.bin/perl/pod/perlebcdic.pod @@ -7,7 +7,7 @@ perlebcdic - Considerations for running Perl on EBCDIC platforms =head1 DESCRIPTION An exploration of some of the issues facing Perl programmers -on EBCDIC based computers. We do not cover localization, +on EBCDIC based computers. We do not cover localization, internationalization, or multi-byte character set issues other than some discussion of UTF-8 and UTF-EBCDIC. @@ -23,16 +23,16 @@ by sending mail to perlbug@perl.org The American Standard Code for Information Interchange (ASCII or US-ASCII) is a set of -integers running from 0 to 127 (decimal) that imply character -interpretation by the display and other systems of computers. -The range 0..127 can be covered by setting the bits in a 7-bit binary -digit, hence the set is sometimes referred to as "7-bit ASCII". -ASCII was described by the American National Standards Institute -document ANSI X3.4-1986. It was also described by ISO 646:1991 -(with localization for currency symbols). The full ASCII set is -given in the table below as the first 128 elements. Languages that -can be written adequately with the characters in ASCII include -English, Hawaiian, Indonesian, Swahili and some Native American +integers running from 0 to 127 (decimal) that imply character +interpretation by the display and other systems of computers. +The range 0..127 can be covered by setting the bits in a 7-bit binary +digit, hence the set is sometimes referred to as "7-bit ASCII". +ASCII was described by the American National Standards Institute +document ANSI X3.4-1986. It was also described by ISO 646:1991 +(with localization for currency symbols). The full ASCII set is +given in the table below as the first 128 elements. Languages that +can be written adequately with the characters in ASCII include +English, Hawaiian, Indonesian, Swahili and some Native American languages. There are many character sets that extend the range of integers @@ -41,28 +41,28 @@ One common one is the ISO 8859-1 character set. =head2 ISO 8859 -The ISO 8859-$n are a collection of character code sets from the -International Organization for Standardization (ISO) each of which -adds characters to the ASCII set that are typically found in European -languages many of which are based on the Roman, or Latin, alphabet. +The ISO 8859-$n are a collection of character code sets from the +International Organization for Standardization (ISO), each of which +adds characters to the ASCII set that are typically found in European +languages, many of which are based on the Roman, or Latin, alphabet. =head2 Latin 1 (ISO 8859-1) -A particular 8-bit extension to ASCII that includes grave and acute -accented Latin characters. Languages that can employ ISO 8859-1 -include all the languages covered by ASCII as well as Afrikaans, -Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian, -Portuguese, Spanish, and Swedish. Dutch is covered albeit without -the ij ligature. French is covered too but without the oe ligature. +A particular 8-bit extension to ASCII that includes grave and acute +accented Latin characters. Languages that can employ ISO 8859-1 +include all the languages covered by ASCII as well as Afrikaans, +Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian, +Portuguese, Spanish, and Swedish. Dutch is covered albeit without +the ij ligature. French is covered too but without the oe ligature. German can use ISO 8859-1 but must do so without German-style -quotation marks. This set is based on Western European extensions +quotation marks. This set is based on Western European extensions to ASCII and is commonly encountered in world wide web work. In IBM character code set identification terminology ISO 8859-1 is also known as CCSID 819 (or sometimes 0819 or even 00819). =head2 EBCDIC -The Extended Binary Coded Decimal Interchange Code refers to a +The Extended Binary Coded Decimal Interchange Code refers to a large collection of single- and multi-byte coded character sets that are different from ASCII or ISO 8859-1 and are all slightly different from each other; they typically run on host computers. The EBCDIC encodings derive from @@ -71,19 +71,19 @@ cards was such that high bits were set for the upper and lower case alphabet characters [a-z] and [A-Z], but there were gaps within each Latin alphabet range. -Some IBM EBCDIC character sets may be known by character code set +Some IBM EBCDIC character sets may be known by character code set identification numbers (CCSID numbers) or code page numbers. Perl can be compiled on platforms that run any of three commonly used EBCDIC character sets, listed below. -=head2 The 13 variant characters +=head3 The 13 variant characters Among IBM EBCDIC character code sets there are 13 characters that are often mapped to different integer values. Those characters are known as the 13 "variant" characters and are: - \ [ ] { } ^ ~ ! # | $ @ ` + \ [ ] { } ^ ~ ! # | $ @ ` When Perl is compiled for a platform, it looks at some of these characters to guess which EBCDIC character set the platform uses, and adapts itself @@ -92,26 +92,30 @@ one of the three Perl knows about, Perl will either fail to compile, or mistakenly and silently choose one of the three. They are: -=head2 0037 +=over -Character code set ID 0037 is a mapping of the ASCII plus Latin-1 -characters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used -in North American English locales on the OS/400 operating system -that runs on AS/400 computers. CCSID 0037 differs from ISO 8859-1 +=item B<0037> + +Character code set ID 0037 is a mapping of the ASCII plus Latin-1 +characters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used +in North American English locales on the OS/400 operating system +that runs on AS/400 computers. CCSID 0037 differs from ISO 8859-1 in 237 places, in other words they agree on only 19 code point values. -=head2 1047 +=item B<1047> -Character code set ID 1047 is also a mapping of the ASCII plus -Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is -used under Unix System Services for OS/390 or z/OS, and OpenEdition +Character code set ID 1047 is also a mapping of the ASCII plus +Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is +used under Unix System Services for OS/390 or z/OS, and OpenEdition for VM/ESA. CCSID 1047 differs from CCSID 0037 in eight places. -=head2 POSIX-BC +=item B<POSIX-BC> The EBCDIC code page in use on Siemens' BS2000 system is distinct from 1047 and 0037. It is identified below as the POSIX-BC set. +=back + =head2 Unicode code points versus EBCDIC code points In Unicode terminology a I<code point> is the number assigned to a @@ -218,7 +222,7 @@ you to use different encodings per IO channel. For example you may use to get four files containing "Hello World!\n" in ASCII, CP 0037 EBCDIC, ISO 8859-1 (Latin-1) (in this example identical to ASCII since only ASCII -characters were printed), and +characters were printed), and UTF-EBCDIC (in this example identical to normal EBCDIC since only characters that don't differ between EBCDIC and UTF-EBCDIC were printed). See the documentation of Encode::PerlIO for details. @@ -230,21 +234,18 @@ ignores things like the type of your filesystem (ASCII or EBCDIC). The following tables list the ASCII and Latin 1 ordered sets including the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f), -C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the -table non-printing control character names as well as the Latin 1 -extensions to ASCII have been labelled with character names roughly -corresponding to I<The Unicode Standard, Version 3.0> albeit with -substitutions such as s/LATIN// and s/VULGAR// in all cases, -s/CAPITAL LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/ -in some other cases. The "names" of the controls listed here are -the Unicode Version 1 names, except for the few that don't have names, in which -case the names in the Wikipedia article were used -(L<http://en.wikipedia.org/wiki/C0_and_C1_control_codes>). -The differences between the 0037 and 1047 sets are -flagged with ***. The differences between the 1047 and POSIX-BC sets -are flagged with ###. All ord() numbers listed are decimal. If you -would rather see this table listing octal values then run the table -(that is, the pod version of this document since this recipe may not +C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the +table names of the Latin 1 +extensions to ASCII have been labelled with character names roughly +corresponding to I<The Unicode Standard, Version 6.1> albeit with +substitutions such as s/LATIN// and s/VULGAR// in all cases, s/CAPITAL +LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/ in some other +cases. Controls are listed using their Unicode 6.1 abbreviatons. +The differences between the 0037 and 1047 sets are +flagged with **. The differences between the 1047 and POSIX-BC sets +are flagged with ##. All ord() numbers listed are decimal. If you +would rather see this table listing octal values, then run the table +(that is, the pod source text of this document, since this recipe may not work with a pod2_other_format translation) through: =over 4 @@ -253,8 +254,8 @@ work with a pod2_other_format translation) through: =back - perl -ne 'if(/(.{43})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ - -e '{printf("%s%-9.03o%-9.03o%-9.03o%.03o\n",$1,$2,$3,$4,$5)}' \ + perl -ne 'if(/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ + -e '{printf("%s%-5.03o%-5.03o%-5.03o%.03o\n",$1,$2,$3,$4,$5)}' \ perlebcdic.pod If you want to retain the UTF-x code points then in script form you @@ -268,19 +269,19 @@ might want to write: open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!"; while (<FH>) { - if (/(.{43})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) + if (/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) { if ($7 ne '' && $9 ne '') { printf( - "%s%-9.03o%-9.03o%-9.03o%-9.03o%-3o.%-5o%-3o.%.03o\n", + "%s%-5.03o%-5.03o%-5.03o%-5.03o%-3o.%-5o%-3o.%.03o\n", $1,$2,$3,$4,$5,$6,$7,$8,$9); } elsif ($7 ne '') { - printf("%s%-9.03o%-9.03o%-9.03o%-9.03o%-3o.%-5o%.03o\n", + printf("%s%-5.03o%-5.03o%-5.03o%-5.03o%-3o.%-5o%.03o\n", $1,$2,$3,$4,$5,$6,$7,$8); } else { - printf("%s%-9.03o%-9.03o%-9.03o%-9.03o%-9.03o%.03o\n", + printf("%s%-5.03o%-5.03o%-5.03o%-5.03o%-5.03o%.03o\n", $1,$2,$3,$4,$5,$6,$8); } } @@ -295,8 +296,8 @@ run the table through: =back - perl -ne 'if(/(.{43})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ - -e '{printf("%s%-9.02X%-9.02X%-9.02X%.02X\n",$1,$2,$3,$4,$5)}' \ + perl -ne 'if(/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ + -e '{printf("%s%-5.02X%-5.02X%-5.02X%.02X\n",$1,$2,$3,$4,$5)}' \ perlebcdic.pod Or, in order to retain the UTF-x code points in hexadecimal: @@ -309,284 +310,286 @@ Or, in order to retain the UTF-x code points in hexadecimal: open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!"; while (<FH>) { - if (/(.{43})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) + if (/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) { if ($7 ne '' && $9 ne '') { printf( - "%s%-9.02X%-9.02X%-9.02X%-9.02X%-2X.%-6.02X%02X.%02X\n", + "%s%-5.02X%-5.02X%-5.02X%-5.02X%-2X.%-6.02X%02X.%02X\n", $1,$2,$3,$4,$5,$6,$7,$8,$9); } elsif ($7 ne '') { - printf("%s%-9.02X%-9.02X%-9.02X%-9.02X%-2X.%-6.02X%02X\n", + printf("%s%-5.02X%-5.02X%-5.02X%-5.02X%-2X.%-6.02X%02X\n", $1,$2,$3,$4,$5,$6,$7,$8); } else { - printf("%s%-9.02X%-9.02X%-9.02X%-9.02X%-9.02X%02X\n", + printf("%s%-5.02X%-5.02X%-5.02X%-5.02X%-5.02X%02X\n", $1,$2,$3,$4,$5,$6,$8); } } } - ISO 8859-1 CCSID CCSID CCSID 1047 - chr CCSID 0819 0037 1047 POSIX-BC UTF-8 UTF-EBCDIC - ---------------------------------------------------------------------------------------------- - <NULL> 0 0 0 0 0 0 - <START OF HEADING> 1 1 1 1 1 1 - <START OF TEXT> 2 2 2 2 2 2 - <END OF TEXT> 3 3 3 3 3 3 - <END OF TRANSMISSION> 4 55 55 55 4 55 - <ENQUIRY> 5 45 45 45 5 45 - <ACKNOWLEDGE> 6 46 46 46 6 46 - <BELL> 7 47 47 47 7 47 - <BACKSPACE> 8 22 22 22 8 22 - <HORIZONTAL TABULATION> 9 5 5 5 9 5 - <LINE FEED> 10 37 21 21 10 21 *** - <VERTICAL TABULATION> 11 11 11 11 11 11 - <FORM FEED> 12 12 12 12 12 12 - <CARRIAGE RETURN> 13 13 13 13 13 13 - <SHIFT OUT> 14 14 14 14 14 14 - <SHIFT IN> 15 15 15 15 15 15 - <DATA LINK ESCAPE> 16 16 16 16 16 16 - <DEVICE CONTROL ONE> 17 17 17 17 17 17 - <DEVICE CONTROL TWO> 18 18 18 18 18 18 - <DEVICE CONTROL THREE> 19 19 19 19 19 19 - <DEVICE CONTROL FOUR> 20 60 60 60 20 60 - <NEGATIVE ACKNOWLEDGE> 21 61 61 61 21 61 - <SYNCHRONOUS IDLE> 22 50 50 50 22 50 - <END OF TRANSMISSION BLOCK> 23 38 38 38 23 38 - <CANCEL> 24 24 24 24 24 24 - <END OF MEDIUM> 25 25 25 25 25 25 - <SUBSTITUTE> 26 63 63 63 26 63 - <ESCAPE> 27 39 39 39 27 39 - <FILE SEPARATOR> 28 28 28 28 28 28 - <GROUP SEPARATOR> 29 29 29 29 29 29 - <RECORD SEPARATOR> 30 30 30 30 30 30 - <UNIT SEPARATOR> 31 31 31 31 31 31 - <SPACE> 32 64 64 64 32 64 - ! 33 90 90 90 33 90 - " 34 127 127 127 34 127 - # 35 123 123 123 35 123 - $ 36 91 91 91 36 91 - % 37 108 108 108 37 108 - & 38 80 80 80 38 80 - ' 39 125 125 125 39 125 - ( 40 77 77 77 40 77 - ) 41 93 93 93 41 93 - * 42 92 92 92 42 92 - + 43 78 78 78 43 78 - , 44 107 107 107 44 107 - - 45 96 96 96 45 96 - . 46 75 75 75 46 75 - / 47 97 97 97 47 97 - 0 48 240 240 240 48 240 - 1 49 241 241 241 49 241 - 2 50 242 242 242 50 242 - 3 51 243 243 243 51 243 - 4 52 244 244 244 52 244 - 5 53 245 245 245 53 245 - 6 54 246 246 246 54 246 - 7 55 247 247 247 55 247 - 8 56 248 248 248 56 248 - 9 57 249 249 249 57 249 - : 58 122 122 122 58 122 - ; 59 94 94 94 59 94 - < 60 76 76 76 60 76 - = 61 126 126 126 61 126 - > 62 110 110 110 62 110 - ? 63 111 111 111 63 111 - @ 64 124 124 124 64 124 - A 65 193 193 193 65 193 - B 66 194 194 194 66 194 - C 67 195 195 195 67 195 - D 68 196 196 196 68 196 - E 69 197 197 197 69 197 - F 70 198 198 198 70 198 - G 71 199 199 199 71 199 - H 72 200 200 200 72 200 - I 73 201 201 201 73 201 - J 74 209 209 209 74 209 - K 75 210 210 210 75 210 - L 76 211 211 211 76 211 - M 77 212 212 212 77 212 - N 78 213 213 213 78 213 - O 79 214 214 214 79 214 - P 80 215 215 215 80 215 - Q 81 216 216 216 81 216 - R 82 217 217 217 82 217 - S 83 226 226 226 83 226 - T 84 227 227 227 84 227 - U 85 228 228 228 85 228 - V 86 229 229 229 86 229 - W 87 230 230 230 87 230 - X 88 231 231 231 88 231 - Y 89 232 232 232 89 232 - Z 90 233 233 233 90 233 - [ 91 186 173 187 91 173 *** ### - \ 92 224 224 188 92 224 ### - ] 93 187 189 189 93 189 *** - ^ 94 176 95 106 94 95 *** ### - _ 95 109 109 109 95 109 - ` 96 121 121 74 96 121 ### - a 97 129 129 129 97 129 - b 98 130 130 130 98 130 - c 99 131 131 131 99 131 - d 100 132 132 132 100 132 - e 101 133 133 133 101 133 - f 102 134 134 134 102 134 - g 103 135 135 135 103 135 - h 104 136 136 136 104 136 - i 105 137 137 137 105 137 - j 106 145 145 145 106 145 - k 107 146 146 146 107 146 - l 108 147 147 147 108 147 - m 109 148 148 148 109 148 - n 110 149 149 149 110 149 - o 111 150 150 150 111 150 - p 112 151 151 151 112 151 - q 113 152 152 152 113 152 - r 114 153 153 153 114 153 - s 115 162 162 162 115 162 - t 116 163 163 163 116 163 - u 117 164 164 164 117 164 - v 118 165 165 165 118 165 - w 119 166 166 166 119 166 - x 120 167 167 167 120 167 - y 121 168 168 168 121 168 - z 122 169 169 169 122 169 - { 123 192 192 251 123 192 ### - | 124 79 79 79 124 79 - } 125 208 208 253 125 208 ### - ~ 126 161 161 255 126 161 ### - <DELETE> 127 7 7 7 127 7 - <PADDING CHARACTER> 128 32 32 32 194.128 32 - <HIGH OCTET PRESET> 129 33 33 33 194.129 33 - <BREAK PERMITTED HERE> 130 34 34 34 194.130 34 - <NO BREAK HERE> 131 35 35 35 194.131 35 - <INDEX> 132 36 36 36 194.132 36 - <NEXT LINE> 133 21 37 37 194.133 37 *** - <START OF SELECTED AREA> 134 6 6 6 194.134 6 - <END OF SELECTED AREA> 135 23 23 23 194.135 23 - <CHARACTER TABULATION SET> 136 40 40 40 194.136 40 - <CHARACTER TABULATION WITH JUSTIFICATION> 137 41 41 41 194.137 41 - <LINE TABULATION SET> 138 42 42 42 194.138 42 - <PARTIAL LINE FORWARD> 139 43 43 43 194.139 43 - <PARTIAL LINE BACKWARD> 140 44 44 44 194.140 44 - <REVERSE LINE FEED> 141 9 9 9 194.141 9 - <SINGLE SHIFT TWO> 142 10 10 10 194.142 10 - <SINGLE SHIFT THREE> 143 27 27 27 194.143 27 - <DEVICE CONTROL STRING> 144 48 48 48 194.144 48 - <PRIVATE USE ONE> 145 49 49 49 194.145 49 - <PRIVATE USE TWO> 146 26 26 26 194.146 26 - <SET TRANSMIT STATE> 147 51 51 51 194.147 51 - <CANCEL CHARACTER> 148 52 52 52 194.148 52 - <MESSAGE WAITING> 149 53 53 53 194.149 53 - <START OF GUARDED AREA> 150 54 54 54 194.150 54 - <END OF GUARDED AREA> 151 8 8 8 194.151 8 - <START OF STRING> 152 56 56 56 194.152 56 - <SINGLE GRAPHIC CHARACTER INTRODUCER> 153 57 57 57 194.153 57 - <SINGLE CHARACTER INTRODUCER> 154 58 58 58 194.154 58 - <CONTROL SEQUENCE INTRODUCER> 155 59 59 59 194.155 59 - <STRING TERMINATOR> 156 4 4 4 194.156 4 - <OPERATING SYSTEM COMMAND> 157 20 20 20 194.157 20 - <PRIVACY MESSAGE> 158 62 62 62 194.158 62 - <APPLICATION PROGRAM COMMAND> 159 255 255 95 194.159 255 ### - <NON-BREAKING SPACE> 160 65 65 65 194.160 128.65 - <INVERTED EXCLAMATION MARK> 161 170 170 170 194.161 128.66 - <CENT SIGN> 162 74 74 176 194.162 128.67 ### - <POUND SIGN> 163 177 177 177 194.163 128.68 - <CURRENCY SIGN> 164 159 159 159 194.164 128.69 - <YEN SIGN> 165 178 178 178 194.165 128.70 - <BROKEN BAR> 166 106 106 208 194.166 128.71 ### - <SECTION SIGN> 167 181 181 181 194.167 128.72 - <DIAERESIS> 168 189 187 121 194.168 128.73 *** ### - <COPYRIGHT SIGN> 169 180 180 180 194.169 128.74 - <FEMININE ORDINAL INDICATOR> 170 154 154 154 194.170 128.81 - <LEFT POINTING GUILLEMET> 171 138 138 138 194.171 128.82 - <NOT SIGN> 172 95 176 186 194.172 128.83 *** ### - <SOFT HYPHEN> 173 202 202 202 194.173 128.84 - <REGISTERED TRADE MARK SIGN> 174 175 175 175 194.174 128.85 - <MACRON> 175 188 188 161 194.175 128.86 ### - <DEGREE SIGN> 176 144 144 144 194.176 128.87 - <PLUS-OR-MINUS SIGN> 177 143 143 143 194.177 128.88 - <SUPERSCRIPT TWO> 178 234 234 234 194.178 128.89 - <SUPERSCRIPT THREE> 179 250 250 250 194.179 128.98 - <ACUTE ACCENT> 180 190 190 190 194.180 128.99 - <MICRO SIGN> 181 160 160 160 194.181 128.100 - <PARAGRAPH SIGN> 182 182 182 182 194.182 128.101 - <MIDDLE DOT> 183 179 179 179 194.183 128.102 - <CEDILLA> 184 157 157 157 194.184 128.103 - <SUPERSCRIPT ONE> 185 218 218 218 194.185 128.104 - <MASC. ORDINAL INDICATOR> 186 155 155 155 194.186 128.105 - <RIGHT POINTING GUILLEMET> 187 139 139 139 194.187 128.106 - <FRACTION ONE QUARTER> 188 183 183 183 194.188 128.112 - <FRACTION ONE HALF> 189 184 184 184 194.189 128.113 - <FRACTION THREE QUARTERS> 190 185 185 185 194.190 128.114 - <INVERTED QUESTION MARK> 191 171 171 171 194.191 128.115 - <A WITH GRAVE> 192 100 100 100 195.128 138.65 - <A WITH ACUTE> 193 101 101 101 195.129 138.66 - <A WITH CIRCUMFLEX> 194 98 98 98 195.130 138.67 - <A WITH TILDE> 195 102 102 102 195.131 138.68 - <A WITH DIAERESIS> 196 99 99 99 195.132 138.69 - <A WITH RING ABOVE> 197 103 103 103 195.133 138.70 - <CAPITAL LIGATURE AE> 198 158 158 158 195.134 138.71 - <C WITH CEDILLA> 199 104 104 104 195.135 138.72 - <E WITH GRAVE> 200 116 116 116 195.136 138.73 - <E WITH ACUTE> 201 113 113 113 195.137 138.74 - <E WITH CIRCUMFLEX> 202 114 114 114 195.138 138.81 - <E WITH DIAERESIS> 203 115 115 115 195.139 138.82 - <I WITH GRAVE> 204 120 120 120 195.140 138.83 - <I WITH ACUTE> 205 117 117 117 195.141 138.84 - <I WITH CIRCUMFLEX> 206 118 118 118 195.142 138.85 - <I WITH DIAERESIS> 207 119 119 119 195.143 138.86 - <CAPITAL LETTER ETH> 208 172 172 172 195.144 138.87 - <N WITH TILDE> 209 105 105 105 195.145 138.88 - <O WITH GRAVE> 210 237 237 237 195.146 138.89 - <O WITH ACUTE> 211 238 238 238 195.147 138.98 - <O WITH CIRCUMFLEX> 212 235 235 235 195.148 138.99 - <O WITH TILDE> 213 239 239 239 195.149 138.100 - <O WITH DIAERESIS> 214 236 236 236 195.150 138.101 - <MULTIPLICATION SIGN> 215 191 191 191 195.151 138.102 - <O WITH STROKE> 216 128 128 128 195.152 138.103 - <U WITH GRAVE> 217 253 253 224 195.153 138.104 ### - <U WITH ACUTE> 218 254 254 254 195.154 138.105 - <U WITH CIRCUMFLEX> 219 251 251 221 195.155 138.106 ### - <U WITH DIAERESIS> 220 252 252 252 195.156 138.112 - <Y WITH ACUTE> 221 173 186 173 195.157 138.113 *** ### - <CAPITAL LETTER THORN> 222 174 174 174 195.158 138.114 - <SMALL LETTER SHARP S> 223 89 89 89 195.159 138.115 - <a WITH GRAVE> 224 68 68 68 195.160 139.65 - <a WITH ACUTE> 225 69 69 69 195.161 139.66 - <a WITH CIRCUMFLEX> 226 66 66 66 195.162 139.67 - <a WITH TILDE> 227 70 70 70 195.163 139.68 - <a WITH DIAERESIS> 228 67 67 67 195.164 139.69 - <a WITH RING ABOVE> 229 71 71 71 195.165 139.70 - <SMALL LIGATURE ae> 230 156 156 156 195.166 139.71 - <c WITH CEDILLA> 231 72 72 72 195.167 139.72 - <e WITH GRAVE> 232 84 84 84 195.168 139.73 - <e WITH ACUTE> 233 81 81 81 195.169 139.74 - <e WITH CIRCUMFLEX> 234 82 82 82 195.170 139.81 - <e WITH DIAERESIS> 235 83 83 83 195.171 139.82 - <i WITH GRAVE> 236 88 88 88 195.172 139.83 - <i WITH ACUTE> 237 85 85 85 195.173 139.84 - <i WITH CIRCUMFLEX> 238 86 86 86 195.174 139.85 - <i WITH DIAERESIS> 239 87 87 87 195.175 139.86 - <SMALL LETTER eth> 240 140 140 140 195.176 139.87 - <n WITH TILDE> 241 73 73 73 195.177 139.88 - <o WITH GRAVE> 242 205 205 205 195.178 139.89 - <o WITH ACUTE> 243 206 206 206 195.179 139.98 - <o WITH CIRCUMFLEX> 244 203 203 203 195.180 139.99 - <o WITH TILDE> 245 207 207 207 195.181 139.100 - <o WITH DIAERESIS> 246 204 204 204 195.182 139.101 - <DIVISION SIGN> 247 225 225 225 195.183 139.102 - <o WITH STROKE> 248 112 112 112 195.184 139.103 - <u WITH GRAVE> 249 221 221 192 195.185 139.104 ### - <u WITH ACUTE> 250 222 222 222 195.186 139.105 - <u WITH CIRCUMFLEX> 251 219 219 219 195.187 139.106 - <u WITH DIAERESIS> 252 220 220 220 195.188 139.112 - <y WITH ACUTE> 253 141 141 141 195.189 139.113 - <SMALL LETTER thorn> 254 142 142 142 195.190 139.114 - <y WITH DIAERESIS> 255 223 223 223 195.191 139.115 + ISO + 8859-1 POS- + CCSID CCSID CCSID IX- + chr 0819 0037 1047 BC UTF-8 UTF-EBCDIC + --------------------------------------------------------------------- + <NUL> 0 0 0 0 0 0 + <SOH> 1 1 1 1 1 1 + <STX> 2 2 2 2 2 2 + <ETX> 3 3 3 3 3 3 + <EOT> 4 55 55 55 4 55 + <ENQ> 5 45 45 45 5 45 + <ACK> 6 46 46 46 6 46 + <BEL> 7 47 47 47 7 47 + <BS> 8 22 22 22 8 22 + <HT> 9 5 5 5 9 5 + <LF> 10 37 21 21 10 21 ** + <VT> 11 11 11 11 11 11 + <FF> 12 12 12 12 12 12 + <CR> 13 13 13 13 13 13 + <SO> 14 14 14 14 14 14 + <SI> 15 15 15 15 15 15 + <DLE> 16 16 16 16 16 16 + <DC1> 17 17 17 17 17 17 + <DC2> 18 18 18 18 18 18 + <DC3> 19 19 19 19 19 19 + <DC4> 20 60 60 60 20 60 + <NAK> 21 61 61 61 21 61 + <SYN> 22 50 50 50 22 50 + <ETB> 23 38 38 38 23 38 + <CAN> 24 24 24 24 24 24 + <EOM> 25 25 25 25 25 25 + <SUB> 26 63 63 63 26 63 + <ESC> 27 39 39 39 27 39 + <FS> 28 28 28 28 28 28 + <GS> 29 29 29 29 29 29 + <RS> 30 30 30 30 30 30 + <US> 31 31 31 31 31 31 + <SPACE> 32 64 64 64 32 64 + ! 33 90 90 90 33 90 + " 34 127 127 127 34 127 + # 35 123 123 123 35 123 + $ 36 91 91 91 36 91 + % 37 108 108 108 37 108 + & 38 80 80 80 38 80 + ' 39 125 125 125 39 125 + ( 40 77 77 77 40 77 + ) 41 93 93 93 41 93 + * 42 92 92 92 42 92 + + 43 78 78 78 43 78 + , 44 107 107 107 44 107 + - 45 96 96 96 45 96 + . 46 75 75 75 46 75 + / 47 97 97 97 47 97 + 0 48 240 240 240 48 240 + 1 49 241 241 241 49 241 + 2 50 242 242 242 50 242 + 3 51 243 243 243 51 243 + 4 52 244 244 244 52 244 + 5 53 245 245 245 53 245 + 6 54 246 246 246 54 246 + 7 55 247 247 247 55 247 + 8 56 248 248 248 56 248 + 9 57 249 249 249 57 249 + : 58 122 122 122 58 122 + ; 59 94 94 94 59 94 + < 60 76 76 76 60 76 + = 61 126 126 126 61 126 + > 62 110 110 110 62 110 + ? 63 111 111 111 63 111 + @ 64 124 124 124 64 124 + A 65 193 193 193 65 193 + B 66 194 194 194 66 194 + C 67 195 195 195 67 195 + D 68 196 196 196 68 196 + E 69 197 197 197 69 197 + F 70 198 198 198 70 198 + G 71 199 199 199 71 199 + H 72 200 200 200 72 200 + I 73 201 201 201 73 201 + J 74 209 209 209 74 209 + K 75 210 210 210 75 210 + L 76 211 211 211 76 211 + M 77 212 212 212 77 212 + N 78 213 213 213 78 213 + O 79 214 214 214 79 214 + P 80 215 215 215 80 215 + Q 81 216 216 216 81 216 + R 82 217 217 217 82 217 + S 83 226 226 226 83 226 + T 84 227 227 227 84 227 + U 85 228 228 228 85 228 + V 86 229 229 229 86 229 + W 87 230 230 230 87 230 + X 88 231 231 231 88 231 + Y 89 232 232 232 89 232 + Z 90 233 233 233 90 233 + [ 91 186 173 187 91 173 ** ## + \ 92 224 224 188 92 224 ## + ] 93 187 189 189 93 189 ** + ^ 94 176 95 106 94 95 ** ## + _ 95 109 109 109 95 109 + ` 96 121 121 74 96 121 ## + a 97 129 129 129 97 129 + b 98 130 130 130 98 130 + c 99 131 131 131 99 131 + d 100 132 132 132 100 132 + e 101 133 133 133 101 133 + f 102 134 134 134 102 134 + g 103 135 135 135 103 135 + h 104 136 136 136 104 136 + i 105 137 137 137 105 137 + j 106 145 145 145 106 145 + k 107 146 146 146 107 146 + l 108 147 147 147 108 147 + m 109 148 148 148 109 148 + n 110 149 149 149 110 149 + o 111 150 150 150 111 150 + p 112 151 151 151 112 151 + q 113 152 152 152 113 152 + r 114 153 153 153 114 153 + s 115 162 162 162 115 162 + t 116 163 163 163 116 163 + u 117 164 164 164 117 164 + v 118 165 165 165 118 165 + w 119 166 166 166 119 166 + x 120 167 167 167 120 167 + y 121 168 168 168 121 168 + z 122 169 169 169 122 169 + { 123 192 192 251 123 192 ## + | 124 79 79 79 124 79 + } 125 208 208 253 125 208 ## + ~ 126 161 161 255 126 161 ## + <DEL> 127 7 7 7 127 7 + <PAD> 128 32 32 32 194.128 32 + <HOP> 129 33 33 33 194.129 33 + <BPH> 130 34 34 34 194.130 34 + <NBH> 131 35 35 35 194.131 35 + <IND> 132 36 36 36 194.132 36 + <NEL> 133 21 37 37 194.133 37 ** + <SSA> 134 6 6 6 194.134 6 + <ESA> 135 23 23 23 194.135 23 + <HTS> 136 40 40 40 194.136 40 + <HTJ> 137 41 41 41 194.137 41 + <VTS> 138 42 42 42 194.138 42 + <PLD> 139 43 43 43 194.139 43 + <PLU> 140 44 44 44 194.140 44 + <RI> 141 9 9 9 194.141 9 + <SS2> 142 10 10 10 194.142 10 + <SS3> 143 27 27 27 194.143 27 + <DCS> 144 48 48 48 194.144 48 + <PU1> 145 49 49 49 194.145 49 + <PU2> 146 26 26 26 194.146 26 + <STS> 147 51 51 51 194.147 51 + <CCH> 148 52 52 52 194.148 52 + <MW> 149 53 53 53 194.149 53 + <SPA> 150 54 54 54 194.150 54 + <EPA> 151 8 8 8 194.151 8 + <SOS> 152 56 56 56 194.152 56 + <SGC> 153 57 57 57 194.153 57 + <SCI> 154 58 58 58 194.154 58 + <CSI> 155 59 59 59 194.155 59 + <ST> 156 4 4 4 194.156 4 + <OSC> 157 20 20 20 194.157 20 + <PM> 158 62 62 62 194.158 62 + <APC> 159 255 255 95 194.159 255 ## + <NON-BREAKING SPACE> 160 65 65 65 194.160 128.65 + <INVERTED "!" > 161 170 170 170 194.161 128.66 + <CENT SIGN> 162 74 74 176 194.162 128.67 ## + <POUND SIGN> 163 177 177 177 194.163 128.68 + <CURRENCY SIGN> 164 159 159 159 194.164 128.69 + <YEN SIGN> 165 178 178 178 194.165 128.70 + <BROKEN BAR> 166 106 106 208 194.166 128.71 ## + <SECTION SIGN> 167 181 181 181 194.167 128.72 + <DIAERESIS> 168 189 187 121 194.168 128.73 ** ## + <COPYRIGHT SIGN> 169 180 180 180 194.169 128.74 + <FEMININE ORDINAL> 170 154 154 154 194.170 128.81 + <LEFT POINTING GUILLEMET> 171 138 138 138 194.171 128.82 + <NOT SIGN> 172 95 176 186 194.172 128.83 ** ## + <SOFT HYPHEN> 173 202 202 202 194.173 128.84 + <REGISTERED TRADE MARK> 174 175 175 175 194.174 128.85 + <MACRON> 175 188 188 161 194.175 128.86 ## + <DEGREE SIGN> 176 144 144 144 194.176 128.87 + <PLUS-OR-MINUS SIGN> 177 143 143 143 194.177 128.88 + <SUPERSCRIPT TWO> 178 234 234 234 194.178 128.89 + <SUPERSCRIPT THREE> 179 250 250 250 194.179 128.98 + <ACUTE ACCENT> 180 190 190 190 194.180 128.99 + <MICRO SIGN> 181 160 160 160 194.181 128.100 + <PARAGRAPH SIGN> 182 182 182 182 194.182 128.101 + <MIDDLE DOT> 183 179 179 179 194.183 128.102 + <CEDILLA> 184 157 157 157 194.184 128.103 + <SUPERSCRIPT ONE> 185 218 218 218 194.185 128.104 + <MASC. ORDINAL INDICATOR> 186 155 155 155 194.186 128.105 + <RIGHT POINTING GUILLEMET> 187 139 139 139 194.187 128.106 + <FRACTION ONE QUARTER> 188 183 183 183 194.188 128.112 + <FRACTION ONE HALF> 189 184 184 184 194.189 128.113 + <FRACTION THREE QUARTERS> 190 185 185 185 194.190 128.114 + <INVERTED QUESTION MARK> 191 171 171 171 194.191 128.115 + <A WITH GRAVE> 192 100 100 100 195.128 138.65 + <A WITH ACUTE> 193 101 101 101 195.129 138.66 + <A WITH CIRCUMFLEX> 194 98 98 98 195.130 138.67 + <A WITH TILDE> 195 102 102 102 195.131 138.68 + <A WITH DIAERESIS> 196 99 99 99 195.132 138.69 + <A WITH RING ABOVE> 197 103 103 103 195.133 138.70 + <CAPITAL LIGATURE AE> 198 158 158 158 195.134 138.71 + <C WITH CEDILLA> 199 104 104 104 195.135 138.72 + <E WITH GRAVE> 200 116 116 116 195.136 138.73 + <E WITH ACUTE> 201 113 113 113 195.137 138.74 + <E WITH CIRCUMFLEX> 202 114 114 114 195.138 138.81 + <E WITH DIAERESIS> 203 115 115 115 195.139 138.82 + <I WITH GRAVE> 204 120 120 120 195.140 138.83 + <I WITH ACUTE> 205 117 117 117 195.141 138.84 + <I WITH CIRCUMFLEX> 206 118 118 118 195.142 138.85 + <I WITH DIAERESIS> 207 119 119 119 195.143 138.86 + <CAPITAL LETTER ETH> 208 172 172 172 195.144 138.87 + <N WITH TILDE> 209 105 105 105 195.145 138.88 + <O WITH GRAVE> 210 237 237 237 195.146 138.89 + <O WITH ACUTE> 211 238 238 238 195.147 138.98 + <O WITH CIRCUMFLEX> 212 235 235 235 195.148 138.99 + <O WITH TILDE> 213 239 239 239 195.149 138.100 + <O WITH DIAERESIS> 214 236 236 236 195.150 138.101 + <MULTIPLICATION SIGN> 215 191 191 191 195.151 138.102 + <O WITH STROKE> 216 128 128 128 195.152 138.103 + <U WITH GRAVE> 217 253 253 224 195.153 138.104 ## + <U WITH ACUTE> 218 254 254 254 195.154 138.105 + <U WITH CIRCUMFLEX> 219 251 251 221 195.155 138.106 ## + <U WITH DIAERESIS> 220 252 252 252 195.156 138.112 + <Y WITH ACUTE> 221 173 186 173 195.157 138.113 ** ## + <CAPITAL LETTER THORN> 222 174 174 174 195.158 138.114 + <SMALL LETTER SHARP S> 223 89 89 89 195.159 138.115 + <a WITH GRAVE> 224 68 68 68 195.160 139.65 + <a WITH ACUTE> 225 69 69 69 195.161 139.66 + <a WITH CIRCUMFLEX> 226 66 66 66 195.162 139.67 + <a WITH TILDE> 227 70 70 70 195.163 139.68 + <a WITH DIAERESIS> 228 67 67 67 195.164 139.69 + <a WITH RING ABOVE> 229 71 71 71 195.165 139.70 + <SMALL LIGATURE ae> 230 156 156 156 195.166 139.71 + <c WITH CEDILLA> 231 72 72 72 195.167 139.72 + <e WITH GRAVE> 232 84 84 84 195.168 139.73 + <e WITH ACUTE> 233 81 81 81 195.169 139.74 + <e WITH CIRCUMFLEX> 234 82 82 82 195.170 139.81 + <e WITH DIAERESIS> 235 83 83 83 195.171 139.82 + <i WITH GRAVE> 236 88 88 88 195.172 139.83 + <i WITH ACUTE> 237 85 85 85 195.173 139.84 + <i WITH CIRCUMFLEX> 238 86 86 86 195.174 139.85 + <i WITH DIAERESIS> 239 87 87 87 195.175 139.86 + <SMALL LETTER eth> 240 140 140 140 195.176 139.87 + <n WITH TILDE> 241 73 73 73 195.177 139.88 + <o WITH GRAVE> 242 205 205 205 195.178 139.89 + <o WITH ACUTE> 243 206 206 206 195.179 139.98 + <o WITH CIRCUMFLEX> 244 203 203 203 195.180 139.99 + <o WITH TILDE> 245 207 207 207 195.181 139.100 + <o WITH DIAERESIS> 246 204 204 204 195.182 139.101 + <DIVISION SIGN> 247 225 225 225 195.183 139.102 + <o WITH STROKE> 248 112 112 112 195.184 139.103 + <u WITH GRAVE> 249 221 221 192 195.185 139.104 ## + <u WITH ACUTE> 250 222 222 222 195.186 139.105 + <u WITH CIRCUMFLEX> 251 219 219 219 195.187 139.106 + <u WITH DIAERESIS> 252 220 220 220 195.188 139.112 + <y WITH ACUTE> 253 141 141 141 195.189 139.113 + <SMALL LETTER thorn> 254 142 142 142 195.190 139.114 + <y WITH DIAERESIS> 255 223 223 223 195.191 139.115 If you would rather see the above table in CCSID 0037 order rather than ASCII + Latin-1 order then run the table through: @@ -598,14 +601,14 @@ ASCII + Latin-1 order then run the table through: =back perl \ - -ne 'if(/.{43}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\ + -ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\ -e '{push(@l,$_)}' \ -e 'END{print map{$_->[0]}' \ -e ' sort{$a->[1] <=> $b->[1]}' \ - -e ' map{[$_,substr($_,52,3)]}@l;}' perlebcdic.pod + -e ' map{[$_,substr($_,34,3)]}@l;}' perlebcdic.pod If you would rather see it in CCSID 1047 order then change the number -52 in the last line to 61, like this: +34 in the last line to 39, like this: =over 4 @@ -614,14 +617,14 @@ If you would rather see it in CCSID 1047 order then change the number =back perl \ - -ne 'if(/.{43}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\ + -ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\ -e '{push(@l,$_)}' \ -e 'END{print map{$_->[0]}' \ -e ' sort{$a->[1] <=> $b->[1]}' \ - -e ' map{[$_,substr($_,61,3)]}@l;}' perlebcdic.pod + -e ' map{[$_,substr($_,39,3)]}@l;}' perlebcdic.pod If you would rather see it in POSIX-BC order then change the number -61 in the last line to 70, like this: +39 in the last line to 44, like this: =over 4 @@ -630,17 +633,17 @@ If you would rather see it in POSIX-BC order then change the number =back perl \ - -ne 'if(/.{43}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\ + -ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\ -e '{push(@l,$_)}' \ -e 'END{print map{$_->[0]}' \ -e ' sort{$a->[1] <=> $b->[1]}' \ - -e ' map{[$_,substr($_,70,3)]}@l;}' perlebcdic.pod + -e ' map{[$_,substr($_,44,3)]}@l;}' perlebcdic.pod =head1 IDENTIFYING CHARACTER CODE SETS -To determine the character set you are running under from perl one -could use the return value of ord() or chr() to test one or more +To determine the character set you are running under from perl one +could use the return value of ord() or chr() to test one or more character values. For example: $is_ascii = "A" eq chr(65); @@ -671,12 +674,12 @@ However, it would be unwise to write tests such as: $is_ascii = "\n" ne chr(10); # ILL ADVISED Obviously the first of these will fail to distinguish most ASCII platforms -from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC platform since "\r" eq -chr(13) under all of those coded character sets. But note too that -because "\n" is chr(13) and "\r" is chr(10) on the Macintosh (which is an +from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC platform since "\r" eq +chr(13) under all of those coded character sets. But note too that +because "\n" is chr(13) and "\r" is chr(10) on the Macintosh (which is an ASCII platform) the second C<$is_ascii> test will lead to trouble there. -To determine whether or not perl was built under an EBCDIC +To determine whether or not perl was built under an EBCDIC code page you can use the Config module like so: use Config; @@ -684,20 +687,25 @@ code page you can use the Config module like so: =head1 CONVERSIONS +=head2 C<utf8::unicode_to_native()> and C<utf8::native_to_unicode()> + +These functions take an input numeric code point in one encoding and +return what its equivalent value is in the other. + =head2 tr/// -In order to convert a string of characters from one character set to +In order to convert a string of characters from one character set to another a simple list of numbers, such as in the right columns in the -above table, along with perl's tr/// operator is all that is needed. +above table, along with perl's tr/// operator is all that is needed. The data in the table are in ASCII/Latin1 order, hence the EBCDIC columns -provide easy-to-use ASCII/Latin1 to EBCDIC operations that are also easily +provide easy-to-use ASCII/Latin1 to EBCDIC operations that are also easily reversed. For example, to convert ASCII/Latin1 to code page 037 take the output of the second numbers column from the output of recipe 2 (modified to add '\' -characters) and use it in tr/// like so: +characters), and use it in tr/// like so: - $cp_037 = + $cp_037 = '\x00\x01\x02\x03\x37\x2D\x2E\x2F\x16\x05\x25\x0B\x0C\x0D\x0E\x0F' . '\x10\x11\x12\x13\x3C\x3D\x32\x26\x18\x19\x3F\x27\x1C\x1D\x1E\x1F' . '\x40\x5A\x7F\x7B\x5B\x6C\x50\x7D\x4D\x5D\x5C\x4E\x6B\x60\x4B\x61' . @@ -739,7 +747,7 @@ XPG operability often implies the presence of an I<iconv> utility available from the shell or from the C library. Consult your system's documentation for information on iconv. -On OS/390 or z/OS see the iconv(1) manpage. One way to invoke the iconv +On OS/390 or z/OS see the iconv(1) manpage. One way to invoke the iconv shell utility from within perl would be to: # OS/390 or z/OS example @@ -758,7 +766,7 @@ The OS/390 and z/OS C run-time libraries provide _atoe() and _etoa() functions. =head1 OPERATOR DIFFERENCES -The C<..> range operator treats certain character ranges with +The C<..> range operator treats certain character ranges with care on EBCDIC platforms. For example the following array will have twenty six elements on either an EBCDIC platform or an ASCII platform: @@ -766,13 +774,13 @@ or an ASCII platform: @alphabet = ('A'..'Z'); # $#alphabet == 25 The bitwise operators such as & ^ | may return different results -when operating on string or character data in a perl program running +when operating on string or character data in a perl program running on an EBCDIC platform than when run on an ASCII platform. Here is an example adapted from the one in L<perlop>: # EBCDIC-based examples print "j p \n" ^ " a h"; # prints "JAPH\n" - print "JA" | " ph\n"; # prints "japh\n" + print "JA" | " ph\n"; # prints "japh\n" print "JAPH\nJunk" & "\277\277\277\277\277"; # prints "japh\n"; print 'p N$' ^ " E<H\n"; # prints "Perl\n"; @@ -784,45 +792,45 @@ ported to take C<\c@> to chr(0) and C<\cA> to chr(1), etc. as well, but the thirty three characters that result depend on which code page you are using. The table below uses the standard acronyms for the controls. The POSIX-BC and 1047 sets are -identical throughout this range and differ from the 0037 set at only +identical throughout this range and differ from the 0037 set at only one spot (21 decimal). Note that the C<LINE FEED> character -may be generated by C<\cJ> on ASCII platforms but by C<\cU> on 1047 or POSIX-BC -platforms and cannot be generated as a C<"\c.letter."> control character on +may be generated by C<\cJ> on ASCII platforms but by C<\cU> on 1047 or POSIX-BC +platforms and cannot be generated as a C<"\c.letter."> control character on 0037 platforms. Note also that C<\c\> cannot be the final element in a string or regex, as it will absorb the terminator. But C<\c\I<X>> is a C<FILE SEPARATOR> concatenated with I<X> for all I<X>. - chr ord 8859-1 0037 1047 && POSIX-BC + chr ord 8859-1 0037 1047 && POSIX-BC ----------------------------------------------------------------------- - \c? 127 <DEL> " " + \c? 127 <DEL> " " \c@ 0 <NUL> <NUL> <NUL> - \cA 1 <SOH> <SOH> <SOH> + \cA 1 <SOH> <SOH> <SOH> \cB 2 <STX> <STX> <STX> \cC 3 <ETX> <ETX> <ETX> - \cD 4 <EOT> <ST> <ST> - \cE 5 <ENQ> <HT> <HT> - \cF 6 <ACK> <SSA> <SSA> - \cG 7 <BEL> <DEL> <DEL> - \cH 8 <BS> <EPA> <EPA> - \cI 9 <HT> <RI> <RI> - \cJ 10 <LF> <SS2> <SS2> + \cD 4 <EOT> <ST> <ST> + \cE 5 <ENQ> <HT> <HT> + \cF 6 <ACK> <SSA> <SSA> + \cG 7 <BEL> <DEL> <DEL> + \cH 8 <BS> <EPA> <EPA> + \cI 9 <HT> <RI> <RI> + \cJ 10 <LF> <SS2> <SS2> \cK 11 <VT> <VT> <VT> - \cL 12 <FF> <FF> <FF> - \cM 13 <CR> <CR> <CR> + \cL 12 <FF> <FF> <FF> + \cM 13 <CR> <CR> <CR> \cN 14 <SO> <SO> <SO> \cO 15 <SI> <SI> <SI> - \cP 16 <DLE> <DLE> <DLE> + \cP 16 <DLE> <DLE> <DLE> \cQ 17 <DC1> <DC1> <DC1> \cR 18 <DC2> <DC2> <DC2> - \cS 19 <DC3> <DC3> <DC3> - \cT 20 <DC4> <OSC> <OSC> - \cU 21 <NAK> <NEL> <LF> *** + \cS 19 <DC3> <DC3> <DC3> + \cT 20 <DC4> <OSC> <OSC> + \cU 21 <NAK> <NEL> <LF> ** \cV 22 <SYN> <BS> <BS> - \cW 23 <ETB> <ESA> <ESA> + \cW 23 <ETB> <ESA> <ESA> \cX 24 <CAN> <CAN> <CAN> \cY 25 <EOM> <EOM> <EOM> - \cZ 26 <SUB> <PU2> <PU2> - \c[ 27 <ESC> <SS3> <SS3> + \cZ 26 <SUB> <PU2> <PU2> + \c[ 27 <ESC> <SS3> <SS3> \c\X 28 <FS>X <FS>X <FS>X \c] 29 <GS> <GS> <GS> \c^ 30 <RS> <RS> <RS> @@ -834,7 +842,7 @@ SEPARATOR> concatenated with I<X> for all I<X>. =item chr() -chr() must be given an EBCDIC code number argument to yield a desired +chr() must be given an EBCDIC code number argument to yield a desired character return value on an EBCDIC platform. For example: $CAPITAL_LETTER_A = chr(193); @@ -848,7 +856,7 @@ For example: =item pack() -The c and C templates for pack() are dependent upon character set +The c and C templates for pack() are dependent upon character set encoding. Examples of usage on EBCDIC include: $foo = pack("CCCC",193,194,195,196); @@ -864,20 +872,20 @@ encoding. Examples of usage on EBCDIC include: One must be careful with scalars and strings that are passed to print that contain ASCII encodings. One common place for this to occur is in the output of the MIME type header for -CGI script writing. For example, many perl programming guides +CGI script writing. For example, many perl programming guides recommend something similar to: - print "Content-type:\ttext/html\015\012\015\012"; + print "Content-type:\ttext/html\015\012\015\012"; # this may be wrong on EBCDIC -Under the IBM OS/390 USS Web Server or WebSphere on z/OS for example +Under the IBM OS/390 USS Web Server or WebSphere on z/OS for example you should instead write that as: print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et al That is because the translation from EBCDIC to ASCII is done by the web server in this case (such code will not be appropriate for -the Macintosh however). Consult your web server's documentation for +the Macintosh however). Consult your web server's documentation for further details. =item printf() @@ -890,7 +898,7 @@ on an EBCDIC platform. Examples include: =item sort() -EBCDIC sort results may differ from ASCII sort results especially for +EBCDIC sort results may differ from ASCII sort results especially for mixed case strings. This is discussed in more detail below. =item sprintf() @@ -908,19 +916,19 @@ See the discussion of pack() above. =head1 REGULAR EXPRESSION DIFFERENCES -As of perl 5.005_03 the letter range regular expressions such as -[A-Z] and [a-z] have been especially coded to not pick up gap -characters. For example, characters such as E<ocirc> C<o WITH CIRCUMFLEX> -that lie between I and J would not be matched by the +As of perl 5.005_03 the letter range regular expressions such as +[A-Z] and [a-z] have been especially coded to not pick up gap +characters. For example, characters such as E<ocirc> C<o WITH CIRCUMFLEX> +that lie between I and J would not be matched by the regular expression range C</[H-K]/>. This works in the other direction, too, if either of the range end points is explicitly numeric: C<[\x89-\x91]> will match C<\x8e>, even though C<\x89> is C<i> and C<\x91 > is C<j>, and C<\x8e> is a gap character from the alphabetic viewpoint. -If you do want to match the alphabet gap characters in a single octet -regular expression try matching the hex or octal code such -as C</\313/> on EBCDIC or C</\364/> on ASCII platforms to +If you do want to match the alphabet gap characters in a single octet +regular expression try matching the hex or octal code such +as C</\313/> on EBCDIC or C</\364/> on ASCII platforms to have your regular expression match C<o WITH CIRCUMFLEX>. Another construct to be wary of is the inappropriate use of hex or @@ -953,8 +961,8 @@ set of subs: } The above would be adequate if the concern was only with numeric code points. -However, the concern may be with characters rather than code points -and on an EBCDIC platform it may be desirable for constructs such as +However, the concern may be with characters rather than code points +and on an EBCDIC platform it may be desirable for constructs such as C<if (is_print_ascii("A")) {print "A is a printable character\n";}> to print out the expected message. One way to represent the above collection of character classification subs that is capable of working across the @@ -964,7 +972,7 @@ four coded character sets discussed in this document is as follows: my $char = substr(shift,0,1); if (ord('^')==94) { # ascii return $char =~ /[\000-\037]/; - } + } if (ord('^')==176) { # 0037 return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/; } @@ -1000,7 +1008,7 @@ four coded character sets discussed in this document is as follows: return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/; } if (ord('^')==106) { # posix-bc - return $char =~ + return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/; } } @@ -1011,21 +1019,21 @@ four coded character sets discussed in this document is as follows: return $char =~ /[\240-\377]/; } if (ord('^')==176) { # 0037 - return $char =~ + return $char =~ /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; } if (ord('^')==95) { # 1047 return $char =~ - /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; + /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; } if (ord('^')==106) { # posix-bc - return $char =~ + return $char =~ /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/; } } -Note however that only the C<Is_ascii_print()> sub is really independent -of coded character set. Another way to write C<Is_latin_1()> would be +Note however that only the C<Is_ascii_print()> sub is really independent +of coded character set. Another way to write C<Is_latin_1()> would be to use the characters in the range explicitly: sub Is_latin_1 { @@ -1033,7 +1041,7 @@ to use the characters in the range explicitly: $char =~ /[ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]/; } -Although that form may run into trouble in network transit (due to the +Although that form may run into trouble in network transit (due to the presence of 8 bit characters) or on non ISO-Latin character sets. =head1 SOCKETS @@ -1057,12 +1065,12 @@ abbreviation for drive; that is: The property of lowercase before uppercase letters in EBCDIC is even carried to the Latin 1 EBCDIC pages such as 0037 and 1047. -An example would be that E<Euml> C<E WITH DIAERESIS> (203) comes -before E<euml> C<e WITH DIAERESIS> (235) on an ASCII platform, but -the latter (83) comes before the former (115) on an EBCDIC platform. -(Astute readers will note that the uppercase version of E<szlig> -C<SMALL LETTER SHARP S> is simply "SS" and that the upper case version of -E<yuml> C<y WITH DIAERESIS> is not in the 0..255 range but it is +An example would be that E<Euml> C<E WITH DIAERESIS> (203) comes +before E<euml> C<e WITH DIAERESIS> (235) on an ASCII platform, but +the latter (83) comes before the former (115) on an EBCDIC platform. +(Astute readers will note that the uppercase version of E<szlig> +C<SMALL LETTER SHARP S> is simply "SS" and that the upper case version of +E<yuml> C<y WITH DIAERESIS> is not in the 0..255 range but it is at U+x0178 in Unicode, or C<"\x{178}"> in a Unicode enabled Perl). The sort order will cause differences between results obtained on @@ -1081,21 +1089,21 @@ C<tr///> towards the character set case most employed within the data. If the data are primarily UPPERCASE non Latin 1 then apply tr/[a-z]/[A-Z]/ then sort(). If the data are primarily lowercase non Latin 1 then apply tr/[A-Z]/[a-z]/ before sorting. If the data are primarily UPPERCASE -and include Latin-1 characters then apply: +and include Latin-1 characters then apply: - tr/[a-z]/[A-Z]/; - tr/[àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ]/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ/; - s/ß/SS/g; + tr/[a-z]/[A-Z]/; + tr/[àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ]/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ/; + s/ß/SS/g; -then sort(). Do note however that such Latin-1 manipulation does not -address the E<yuml> C<y WITH DIAERESIS> character that will remain at -code point 255 on ASCII platforms, but 223 on most EBCDIC platforms -where it will sort to a place less than the EBCDIC numerals. With a +then sort(). Do note however that such Latin-1 manipulation does not +address the E<yuml> C<y WITH DIAERESIS> character that will remain at +code point 255 on ASCII platforms, but 223 on most EBCDIC platforms +where it will sort to a place less than the EBCDIC numerals. With a Unicode-enabled Perl you might try: tr/^?/\x{178}/; -The strategy of mono casing data before sorting does not preserve the case +The strategy of mono casing data before sorting does not preserve the case of the data and may not be acceptable for that reason. =head2 Convert, sort data, then re convert. @@ -1110,15 +1118,15 @@ it would be computationally expensive. =head1 TRANSFORMATION FORMATS -There are a variety of ways of transforming data with an intra character set -mapping that serve a variety of purposes. Sorting was discussed in the -previous section and a few of the other more popular mapping techniques are +There are a variety of ways of transforming data with an intra character set +mapping that serve a variety of purposes. Sorting was discussed in the +previous section and a few of the other more popular mapping techniques are discussed next. =head2 URL decoding and encoding Note that some URLs have hexadecimal ASCII code points in them in an -attempt to overcome character or protocol limitation issues. For example +attempt to overcome character or protocol limitation issues. For example the tilde character is not on every keyboard hence a URL of the form: http://www.pvhp.com/~pvhp/ @@ -1154,7 +1162,7 @@ of decoding such a URL under CCSID 1047: ); $url =~ s/%([0-9a-fA-F]{2})/pack("c",$a2e_1047[hex($1)])/ge; -Conversely, here is a partial solution for the task of encoding such +Conversely, here is a partial solution for the task of encoding such a URL under the 1047 code page: $url = 'http://www.pvhp.com/~pvhp/'; @@ -1177,11 +1185,11 @@ a URL under the 1047 code page: 92,247, 83, 84, 85, 86, 87, 88, 89, 90,178,212,214,210,211,213, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,179,219,220,217,218,159 ); - # The following regular expression does not address the - # mappings for: ('.' => '%2E', '/' => '%2F', ':' => '%3A') + # The following regular expression does not address the + # mappings for: ('.' => '%2E', '/' => '%2F', ':' => '%3A') $url =~ s/([\t "#%&\(\),;<=>\?\@\[\\\]^`{|}~])/sprintf("%%%02X",$e2a_1047[ord($1)])/ge; -where a more complete solution would split the URL into components +where a more complete solution would split the URL into components and apply a full s/// substitution only to the appropriate parts. In the remaining examples a @e2a or @a2e array may be employed @@ -1190,8 +1198,8 @@ you could use the @a2e_1047 or @e2a_1047 arrays just shown. =head2 uu encoding and decoding -The C<u> template to pack() or unpack() will render EBCDIC data in EBCDIC -characters equivalent to their ASCII counterparts. For example, the +The C<u> template to pack() or unpack() will render EBCDIC data in EBCDIC +characters equivalent to their ASCII counterparts. For example, the following will print "Yes indeed\n" on either an ASCII or EBCDIC computer: $all_byte_chrs = ''; @@ -1240,8 +1248,8 @@ the printable set using: # This QP encoder works on ASCII only $qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge; -Whereas a QP encoder that works on both ASCII and EBCDIC platforms -would look somewhat like the following (where the EBCDIC branch @e2a +Whereas a QP encoder that works on both ASCII and EBCDIC platforms +would look somewhat like the following (where the EBCDIC branch @e2a array is omitted for brevity): if (ord('A') == 65) { # ASCII @@ -1256,7 +1264,7 @@ array is omitted for brevity): s/([^ !"\#\$%&'()*+,\-.\/0-9:;<>?\@A-Z[\\\]^_`a-z{|}~$delete])/sprintf("=%02X",$e2a[ord($1)])/ge; (although in production code the substitutions might be done -in the EBCDIC branch with the @e2a array and separately in the +in the EBCDIC branch with the @e2a array and separately in the ASCII branch without the expense of the identity map). Such QP strings can be decoded with: @@ -1265,7 +1273,7 @@ Such QP strings can be decoded with: $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr hex $1/ge; $string =~ s/=[\n\r]+$//; -Whereas a QP decoder that works on both ASCII and EBCDIC platforms +Whereas a QP decoder that works on both ASCII and EBCDIC platforms would look somewhat like the following (where the @a2e array is omitted for brevity): @@ -1276,13 +1284,13 @@ omitted for brevity): The practice of shifting an alphabet one or more characters for encipherment dates back thousands of years and was explicitly detailed by Gaius Julius -Caesar in his B<Gallic Wars> text. A single alphabet shift is sometimes +Caesar in his B<Gallic Wars> text. A single alphabet shift is sometimes referred to as a rotation and the shift amount is given as a number $n after -the string 'rot' or "rot$n". Rot0 and rot26 would designate identity maps -on the 26-letter English version of the Latin alphabet. Rot13 has the -interesting property that alternate subsequent invocations are identity maps -(thus rot13 is its own non-trivial inverse in the group of 26 alphabet -rotations). Hence the following is a rot13 encoder and decoder that will +the string 'rot' or "rot$n". Rot0 and rot26 would designate identity maps +on the 26-letter English version of the Latin alphabet. Rot13 has the +interesting property that alternate subsequent invocations are identity maps +(thus rot13 is its own non-trivial inverse in the group of 26 alphabet +rotations). Hence the following is a rot13 encoder and decoder that will work on ASCII and EBCDIC platforms: #!/usr/local/bin/perl @@ -1299,28 +1307,28 @@ In one-liner form: =head1 Hashing order and checksums -To the extent that it is possible to write code that depends on +To the extent that it is possible to write code that depends on hashing order there may be differences between hashes as stored on an ASCII-based platform and hashes stored on an EBCDIC-based platform. XXX =head1 I18N AND L10N -Internationalization (I18N) and localization (L10N) are supported at least -in principle even on EBCDIC platforms. The details are system-dependent +Internationalization (I18N) and localization (L10N) are supported at least +in principle even on EBCDIC platforms. The details are system-dependent and discussed under the L<perlebcdic/OS ISSUES> section below. =head1 MULTI-OCTET CHARACTER SETS -Perl may work with an internal UTF-EBCDIC encoding form for wide characters -on EBCDIC platforms in a manner analogous to the way that it works with +Perl may work with an internal UTF-EBCDIC encoding form for wide characters +on EBCDIC platforms in a manner analogous to the way that it works with the UTF-8 internal encoding form on ASCII based platforms. Legacy multi byte EBCDIC code pages XXX. =head1 OS ISSUES -There may be a few system-dependent issues +There may be a few system-dependent issues of concern to EBCDIC Perl programmers. =head2 OS/400 @@ -1347,7 +1355,7 @@ Perl runs under Unix Systems Services or USS. =item chcp -B<chcp> is supported as a shell utility for displaying and changing +B<chcp> is supported as a shell utility for displaying and changing one's code page. See also L<chcp(1)>. =item dataset access @@ -1375,26 +1383,22 @@ or z/OS. =back -=head2 VM/ESA? - -XXX. - =head2 POSIX-BC? XXX. =head1 BUGS -This pod document contains literal Latin 1 characters and may encounter -translation difficulties. In particular one popular nroff implementation -was known to strip accented characters to their unaccented counterparts -while attempting to view this document through the B<pod2man> program -(for example, you may see a plain C<y> rather than one with a diaeresis +This pod document contains literal Latin 1 characters and may encounter +translation difficulties. In particular one popular nroff implementation +was known to strip accented characters to their unaccented counterparts +while attempting to view this document through the B<pod2man> program +(for example, you may see a plain C<y> rather than one with a diaeresis as in E<yuml>). Another nroff truncated the resultant manpage at the first occurrence of 8 bit characters. Not all shells will allow multiple C<-e> string arguments to perl to -be concatenated together properly as recipes 0, 2, 4, 5, and 6 might +be concatenated together properly as recipes 0, 2, 4, 5, and 6 might seem to imply. =head1 SEE ALSO @@ -1413,13 +1417,13 @@ L<http://www.wps.com/projects/codes/> B<ASCII: American Standard Code for Information Infiltration> Tom Jennings, September 1999. -B<The Unicode Standard, Version 3.0> The Unicode Consortium, Lisa Moore ed., -ISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000. +B<The Unicode Standard, Version 3.0> The Unicode Consortium, Lisa Moore ed., +ISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000. -B<CDRA: IBM - Character Data Representation Architecture - -Reference and Registry>, IBM SC09-2190-00, December 1996. +B<CDRA: IBM - Character Data Representation Architecture - +Reference and Registry>, IBM SC09-2190-00, December 1996. -"Demystifying Character Sets", Andrea Vine, Multilingual Computing +"Demystifying Character Sets", Andrea Vine, Multilingual Computing & Technology, B<#26 Vol. 10 Issue 4>, August/September 1999; ISSN 1523-0309; Multilingual Computing Inc. Sandpoint ID, USA. @@ -1436,11 +1440,11 @@ B<IBM - EBCDIC and the P-bit; The biggest Computer Goof Ever> Robert Bemer. =head1 AUTHOR -Peter Prymmer pvhp@best.com wrote this in 1999 and 2000 -with CCSID 0819 and 0037 help from Chris Leach and -AndrE<eacute> Pirard A.Pirard@ulg.ac.be as well as POSIX-BC +Peter Prymmer pvhp@best.com wrote this in 1999 and 2000 +with CCSID 0819 and 0037 help from Chris Leach and +AndrE<eacute> Pirard A.Pirard@ulg.ac.be as well as POSIX-BC help from Thomas Dorner Thomas.Dorner@start.de. -Thanks also to Vickie Cooper, Philip Newton, William Raffloer, and -Joe Smith. Trademarks, registered trademarks, service marks and -registered service marks used in this document are the property of +Thanks also to Vickie Cooper, Philip Newton, William Raffloer, and +Joe Smith. Trademarks, registered trademarks, service marks and +registered service marks used in this document are the property of their respective owners. diff --git a/gnu/usr.bin/perl/pod/perlexperiment.pod b/gnu/usr.bin/perl/pod/perlexperiment.pod index f304120bc66..946e8ffd6bd 100644 --- a/gnu/usr.bin/perl/pod/perlexperiment.pod +++ b/gnu/usr.bin/perl/pod/perlexperiment.pod @@ -9,48 +9,14 @@ core. Although all of these are documented with their appropriate topics, this succinct listing gives you an overview and basic facts about their status. -So far I've merely tried to find and list the experimental features and infer +So far we've merely tried to find and list the experimental features and infer their inception, versions, etc. There's a lot of speculation here. =head2 Current experiments =over 8 -=item fork() emulation - -Introduced in Perl 5.6.1 - -See also L<perlfork> - -=item Weak references - -Introduced in Perl 5.6.0 - -=item Internal file glob - -Introduced in Perl 5.6.0 - -Accepted in XXX - -=item 64-bit support - -Introduced in Perl 5.005 - -Accepted in XXX - -=item die accepts a reference - -Introduced in Perl 5.005 - -Accepted in Perl XXX - -=item Unicode support - -Introduced in Perl 5.6.0 - -Accepted in Perl 5.8.0 XXX - -=item -Dusemultiplicity -Dusethreads +=item -Dusemultiplicity -Duseithreads Introduced in Perl 5.6.0 @@ -58,42 +24,12 @@ Introduced in Perl 5.6.0 Introduced in Perl 5.7.0 -=item GetOpt::Long Options can now take multiple values at once (experimental) - -C<Getopt::Long> upgraded to version 2.35 - -Removed in Perl 5.8.8 - -=item 5.005-style threading - -Introduced in Perl 5.005 - -Removed in Perl 5.10 XXX - -=item Test::Harness::Straps - -Removed in Perl 5.10.1 - -=item perlcc - -Introduced in Perl 5.005 - -Removed in Perl 5.9.0 - =item C<our> can now have an experimental optional attribute C<unique> Introduced in Perl 5.8.0 Deprecated in Perl 5.10.0 -=item Assertions - -The C<-A> command line switch - -Introduced in Perl 5.9.0 - -Removed in Perl 5.9.5 - =item Linux abstract Unix domain sockets Introduced in Perl 5.9.2 @@ -104,8 +40,6 @@ See also L<Socket> =item L<Pod::PXML|Pod::PXML> -=item threads - =item The <:pop> IO pseudolayer See also L<perlrun> @@ -126,20 +60,32 @@ See also L<perlguts> Introduced in Perl 5.13.7 -=item internal API for C<%H> +=item internal API for C<%^H> Introduced in Perl 5.13.7 See also C<cophh_> in L<perlapi>. +=item alloccopstash + +Introduced in Perl 5.18.0 + =item av_create_and_push =item av_create_and_unshift_one =item av_create_and_unshift_one +=item cop_store_label + +Introduced in Perl 5.16.0 + =item PL_keyword_plugin +=item gv_fetchmethod_*_flags + +Introduced in Perl 5.16.0 + =item hv_iternext_flags =item lex_bufutf8 @@ -168,6 +114,10 @@ See also C<cophh_> in L<perlapi>. =item lex_unstuff +=item op_scope + +=item op_lvalue + =item parse_fullstmt =item parse_stmtseq @@ -194,16 +144,6 @@ See also C<cophh_> in L<perlapi>. =item utf8_to_bytes -=item DB module - -Introduced in Perl 5.6.0 - -See also L<perldebug>, L<perldebtut> - -=item The pseudo-hash data type - -Introduced in Perl 5.6.0 - =item Lvalue subroutines Introduced in Perl 5.6.0 @@ -222,6 +162,16 @@ See also L<perlre> See also L<perlre> +=item Smart match (C<~~>) + +Introduced in Perl 5.10.0 + +Modified in Perl 5.10.1, 5.12.0 + +=item Lexical C<$_> + +Introduced in Perl 5.10.0 + =item Backtracking control verbs C<(*ACCEPT)> @@ -232,15 +182,6 @@ See also: L<perlre/"Special Backtracking Control Verbs"> =item Code expressions, conditional expressions, and independent expressions in regexes -=item The C<\N> regex character class - -The C<\N> character class, not to be confused with the named character -sequence C<\N{NAME}>, denotes any non-newline character in a regular -expression. - -Introduced in: Perl 5.12 - -See also: =item gv_try_downgrade @@ -256,6 +197,22 @@ See L<perlapi/PL_keyword_plugin> for the mechanism. Introduced in: Perl 5.11.2 +=item Array and hash container functions accept references + +Introduced in Perl 5.14.0 + +=item Lexical subroutines + +Introduced in: Perl 5.18 + +See also: L<perlsub/Lexical Subroutines> + +=item Regular Expression Set Operations + +Introduced in: Perl 5.18 + +See also: L<perlrecharclass/Extended Bracketed Character Classes> + =back =head2 Accepted features @@ -267,7 +224,41 @@ They are also awarded +5 Stability and +3 Charisma. =over 8 -=item (none yet identified) +=item The C<\N> regex character class + +The C<\N> character class, not to be confused with the named character +sequence C<\N{NAME}>, denotes any non-newline character in a regular +expression. + +Introduced in: Perl 5.12 + +=item fork() emulation + +Introduced in Perl 5.6.1 + +See also L<perlfork> + +=item DB module + +Introduced in Perl 5.6.0 + +See also L<perldebug>, L<perldebtut> + +=item Weak references + +Introduced in Perl 5.6.0 + +=item Internal file glob + +Introduced in Perl 5.6.0 + +=item die accepts a reference + +Introduced in Perl 5.005 + +=item 64-bit support + +Introduced in Perl 5.005 =back @@ -287,12 +278,50 @@ Introduced in: 5.11.2 Removed in: 5.11.3 +=item Assertions + +The C<-A> command line switch + +Introduced in Perl 5.9.0 + +Removed in Perl 5.9.5 + +=item Test::Harness::Straps + +Moved from Perl 5.10.1 to CPAN + +=item GetOpt::Long Options can now take multiple values at once (experimental) + +C<Getopt::Long> upgraded to version 2.35 + +Removed in Perl 5.8.8 + +=item The pseudo-hash data type + +Introduced in Perl 5.6.0 + +Removed in Perl 5.9.0 + +=item 5.005-style threading + +Introduced in Perl 5.005 + +Removed in Perl 5.10 + +=item perlcc + +Introduced in Perl 5.005 + +Moved from Perl 5.9.0 to CPAN + =back =head1 AUTHORS brian d foy C<< <brian.d.foy@gmail.com> >> +SE<eacute>bastien Aperghis-Tramoni C<< <saper@cpan.org> >> + =head1 COPYRIGHT Copyright 2010, brian d foy C<< <brian.d.foy@gmail.com> >> diff --git a/gnu/usr.bin/perl/pod/perlgit.pod b/gnu/usr.bin/perl/pod/perlgit.pod index 1d2df2ed5e9..65dde7c104a 100644 --- a/gnu/usr.bin/perl/pod/perlgit.pod +++ b/gnu/usr.bin/perl/pod/perlgit.pod @@ -481,7 +481,7 @@ the "first commit where the bug is solved". C<git help bisect> has much more information on how you can tweak your binary searches. -=head1 Topic branches and rewriting history +=head2 Topic branches and rewriting history Individual committers should create topic branches under B<yourname>/B<some_descriptive_name>. Other committers should check @@ -586,7 +586,7 @@ this once globally in their F<~/.gitconfig> by doing something like: % git config --global user.name "Ævar Arnfjörð Bjarmason" % git config --global user.email avarab@gmail.com -However if you'd like to override that just for perl then execute then +However, if you'd like to override that just for perl, execute something like the following in F<perl>: % git config user.email avar@cpan.org @@ -606,7 +606,7 @@ to push your changes back with the C<camel> remote: The C<fetch> command just updates the C<camel> refs, as the objects themselves should have been fetched when pulling from C<origin>. -=head1 Accepting a patch +=head2 Accepting a patch If you have received a patch file generated using the above section, you should try out the patch. @@ -656,7 +656,7 @@ then merge it into blead then push it out to the main repository: % git checkout blead % git merge experimental - % git push + % git push origin blead If you want to delete your temporary branch, you may do so with: @@ -708,7 +708,7 @@ because it runs a subset of tests under miniperl rather than perl. =back -=head3 On merging and rebasing +=head2 On merging and rebasing Simple, one-off commits pushed to the 'blead' branch should be simple commits that apply cleanly. In other words, you should make sure your @@ -824,7 +824,77 @@ Or you could just merge the whole branch if you like it all: And then push back to the repository: - % git push + % git push origin blead + +=head2 Using a smoke-me branch to test changes + +Sometimes a change affects code paths which you cannot test on the OSes +which are directly available to you and it would be wise to have users +on other OSes test the change before you commit it to blead. + +Fortunately, there is a way to get your change smoke-tested on various +OSes: push it to a "smoke-me" branch and wait for certain automated +smoke-testers to report the results from their OSes. + +The procedure for doing this is roughly as follows (using the example of +of tonyc's smoke-me branch called win32stat): + +First, make a local branch and switch to it: + + % git checkout -b win32stat + +Make some changes, build perl and test your changes, then commit them to +your local branch. Then push your local branch to a remote smoke-me +branch: + + % git push origin win32stat:smoke-me/tonyc/win32stat + +Now you can switch back to blead locally: + + % git checkout blead + +and continue working on other things while you wait a day or two, +keeping an eye on the results reported for your smoke-me branch at +L<http://perl.develop-help.com/?b=smoke-me/tonyc/win32state>. + +If all is well then update your blead branch: + + % git pull + +then checkout your smoke-me branch once more and rebase it on blead: + + % git rebase blead win32stat + +Now switch back to blead and merge your smoke-me branch into it: + + % git checkout blead + % git merge win32stat + +As described earlier, if there are many changes on your smoke-me branch +then you should prepare a merge commit in which to give an overview of +those changes by using the following command instead of the last +command above: + + % git merge win32stat --no-ff --no-commit + +You should now build perl and test your (merged) changes one last time +(ideally run the whole test suite, but failing that at least run the +F<t/porting/*.t> tests) before pushing your changes as usual: + + % git push origin blead + +Finally, you should then delete the remote smoke-me branch: + + % git push origin :smoke-me/tonyc/win32stat + +(which is likely to produce a warning like this, which can be ignored: + + remote: fatal: ambiguous argument 'refs/heads/smoke-me/tonyc/win32stat': unknown revision or path not in the working tree. + remote: Use '--' to separate paths from revisions + +) and then delete your local branch: + + % git branch -d win32stat =head2 A note on camel and dromedary diff --git a/gnu/usr.bin/perl/pod/perlgpl.pod b/gnu/usr.bin/perl/pod/perlgpl.pod index 82a8f5a9dd1..cd8a1d64346 100644 --- a/gnu/usr.bin/perl/pod/perlgpl.pod +++ b/gnu/usr.bin/perl/pod/perlgpl.pod @@ -35,15 +35,15 @@ For the Perl Artistic License, see L<perlartistic>. GNU GENERAL PUBLIC LICENSE Version 1, February 1989 - + Copyright (C) 1989 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA - + Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. - + Preamble - + The license agreements of most software companies try to keep users at the mercy of those companies. By contrast, our General Public License is intended to guarantee your freedom to share and change free @@ -51,41 +51,41 @@ For the Perl Artistic License, see L<perlartistic>. General Public License applies to the Free Software Foundation's software and to any other program whose authors commit to using it. You can use it for your programs, too. - + When we speak of free software, we are referring to freedom, not price. Specifically, the General Public License is designed to make sure that you have the freedom to give away or sell copies of free software, that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. - + To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. - + For example, if you distribute copies of a such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. - + We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. - + Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. - + The precise terms and conditions for copying, distribution and modification follow. - + GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION - + 0. This License Agreement applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The @@ -93,7 +93,7 @@ For the Perl Artistic License, see L<perlartistic>. on the Program" means either the Program or any work containing the Program or a portion of it, either verbatim or with modifications. Each licensee is addressed as "you". - + 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and @@ -102,21 +102,21 @@ For the Perl Artistic License, see L<perlartistic>. other recipients of the Program a copy of this General Public License along with the Program. You may charge a fee for the physical act of transferring a copy. - + 2. You may modify your copy or copies of the Program or any portion of it, and copy and distribute such modifications under the terms of Paragraph 1 above, provided that you also do the following: - + a) cause the modified files to carry prominent notices stating that you changed the files and the date of any change; and - + b) cause the whole of any work that you distribute or publish, that in whole or in part contains the Program or any part thereof, either with or without modifications, to be licensed at no charge to all third parties under the terms of this General Public License (except that you may choose to grant warranty protection to some or all third parties, at your option). - + c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the simplest and most usual way, to print or display an @@ -125,34 +125,34 @@ For the Perl Artistic License, see L<perlartistic>. warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this General Public License. - + d) You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. - + Mere aggregation of another independent work with the Program (or its derivative) on a volume of a storage or distribution medium does not bring the other work under the scope of these terms. - + 3. You may copy and distribute the Program (or a portion or derivative of it, under Paragraph 2) in object code or executable form under the terms of Paragraphs 1 and 2 above provided that you also do one of the following: - + a) accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Paragraphs 1 and 2 above; or, - + b) accompany it with a written offer, valid for at least three years, to give any third party free (except for a nominal charge for the cost of distribution) a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Paragraphs 1 and 2 above; or, - + c) accompany it with the information you received as to where the corresponding source code may be obtained. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form alone.) - + Source code for a work means the preferred form of the work for making modifications to it. For an executable file, complete source code means all the source code for all modules it contains; but, as a special @@ -160,7 +160,7 @@ For the Perl Artistic License, see L<perlartistic>. libraries that accompany the operating system on which the executable file runs, or for standard header files or definitions files that accompany that operating system. - + 4. You may not copy, modify, sublicense, distribute or transfer the Program except as expressly provided under this General Public License. Any attempt otherwise to copy, modify, sublicense, distribute or transfer @@ -169,22 +169,22 @@ For the Perl Artistic License, see L<perlartistic>. copies, or rights to use copies, from you under this General Public License will not have their licenses terminated so long as such parties remain in full compliance. - + 5. By copying, distributing or modifying the Program (or any work based on the Program) you indicate your acceptance of this license to do so, and all its terms and conditions. - + 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. - + 7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. - + Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and "any later version", you have the option of following the terms and conditions @@ -192,7 +192,7 @@ For the Perl Artistic License, see L<perlartistic>. Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation. - + 8. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free @@ -200,9 +200,9 @@ For the Perl Artistic License, see L<perlartistic>. make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. - + NO WARRANTY - + 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES @@ -212,7 +212,7 @@ For the Perl Artistic License, see L<perlartistic>. TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. - + 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, @@ -222,67 +222,67 @@ For the Perl Artistic License, see L<perlartistic>. YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. - + END OF TERMS AND CONDITIONS - + Appendix: How to Apply These Terms to Your New Programs - + If you develop a new program, and you want it to be of the greatest possible use to humanity, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. - + To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. - + <one line to give the program's name and a brief idea of what it does.> Copyright (C) 19yy <name of author> - + This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version. - + This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. - + You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301 USA - - + + Also add information on how to contact you by electronic and paper mail. - + If the program is interactive, make it output a short notice like this when it starts in an interactive mode: - + Gnomovision version 69, Copyright (C) 19xx name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type 'show w'. This is free software, and you are welcome to redistribute it under certain conditions; type 'show c' for details. - + The hypothetical commands 'show w' and 'show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than 'show w' and 'show c'; they could even be mouse-clicks or menu items--whatever suits your program. - + You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here a sample; alter the names: - + Yoyodyne, Inc., hereby disclaims all copyright interest in the program 'Gnomovision' (a program to direct compilers to make passes at assemblers) written by James Hacker. - + <signature of Ty Coon>, 1 April 1989 Ty Coon, President of Vice - + That's all there is to it! =cut diff --git a/gnu/usr.bin/perl/pod/perlhack.pod b/gnu/usr.bin/perl/pod/perlhack.pod index 63df5d5dfc3..a32199a9dfb 100644 --- a/gnu/usr.bin/perl/pod/perlhack.pod +++ b/gnu/usr.bin/perl/pod/perlhack.pod @@ -10,7 +10,7 @@ perlhack - How to hack on Perl =head1 DESCRIPTION -This document explains how Perl development works. It includes details +This document explains how Perl development works. It includes details about the Perl 5 Porters email list, the Perl repository, the Perlbug bug tracker, patch guidelines, and commentary on Perl development philosophy. @@ -24,11 +24,18 @@ for a bug, comment fixes, etc., it's easy! Here's how: =item * Check out the source repository -The perl source is in a git repository. You can clone the repository +The perl source is in a git repository. You can clone the repository with the following command: % git clone git://perl5.git.perl.org/perl.git perl +=item * Ensure you're following the latest advice + +In case the advice in this guide has been updated recently, read the +latest version directly from the perl source: + + % perldoc pod/perlhack.pod + =item * Make your change Hack, hack, hack. @@ -49,57 +56,78 @@ Committing your work will save the change I<on your local system>: % git commit -a -m 'Commit message goes here' Make sure the commit message describes your change in a single -sentence. For example, "Fixed spelling errors in perlhack.pod". +sentence. For example, "Fixed spelling errors in perlhack.pod". =item * Send your change to perlbug The next step is to submit your patch to the Perl core ticket system via email. -Assuming your patch consists of a single git commit, the following -writes the file as a MIME attachment, and sends it with a meaningful +If your changes are in a single git commit, run the following commands +to write the file as a MIME attachment and send it with a meaningful subject: % git format-patch -1 --attach - % perlbug -s "[PATCH] $(git log -1 --oneline HEAD)" -f 0001-*.patch + % ./perl -Ilib utils/perlbug -s "[PATCH] $( + git log -1 --oneline HEAD)" -f 0001-*.patch The perlbug program will ask you a few questions about your email -address and the patch you're submitting. Once you've answered them it +address and the patch you're submitting. Once you've answered them it will submit your patch via email. +If your changes are in multiple commits, generate a patch file +containing them all, and attach that: + + % git format-patch origin/blead --attach --stdout > patches + % ./perl -Ilib utils/perlbug -f patches + +When prompted, pick a subject that summarizes your changes overall and +has "[PATCH]" at the beginning. + =item * Thank you The porters appreciate the time you spent helping to make Perl better. Thank you! +=item * Next time + +The next time you wish to make a patch, you need to start from the +latest perl in a pristine state. Check you don't have any local changes +or added files in your perl check-out which you wish to keep, then run +these commands: + + % git pull + % git reset --hard origin/blead + % git clean -dxf + =back =head1 BUG REPORTING If you want to report a bug in Perl, you must use the F<perlbug> -command line tool. This tool will ensure that your bug report includes +command line tool. This tool will ensure that your bug report includes all the relevant system and configuration information. To browse existing Perl bugs and patches, you can use the web interface at L<http://rt.perl.org/>. Please check the archive of the perl5-porters list (see below) and/or -the bug tracking system before submitting a bug report. Often, you'll +the bug tracking system before submitting a bug report. Often, you'll find that the bug has been reported already. You can log in to the bug tracking system and comment on existing bug -reports. If you have additional information regarding an existing bug, -please add it. This will help the porters fix the bug. +reports. If you have additional information regarding an existing bug, +please add it. This will help the porters fix the bug. =head1 PERL 5 PORTERS The perl5-porters (p5p) mailing list is where the Perl standard -distribution is maintained and developed. The people who maintain Perl +distribution is maintained and developed. The people who maintain Perl are also referred to as the "Perl 5 Porters", "p5p" or just the "porters". A searchable archive of the list is available at -L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/>. There is +L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/>. There is also another archive at L<http://archive.develooper.com/perl5-porters@perl.org/>. @@ -107,7 +135,7 @@ L<http://archive.develooper.com/perl5-porters@perl.org/>. The perl5-changes mailing list receives a copy of each patch that gets submitted to the maintenance and development branches of the perl -repository. See L<http://lists.perl.org/list/perl5-changes.html> for +repository. See L<http://lists.perl.org/list/perl5-changes.html> for subscription and archive information. =head2 #p5p on IRC @@ -119,8 +147,8 @@ Perl core. =head1 GETTING THE PERL SOURCE All of Perl's source code is kept centrally in a Git repository at -I<perl5.git.perl.org>. The repository contains many Perl revisions from -Perl 1 onwards and all the revisions from Perforce, the previous +I<perl5.git.perl.org>. The repository contains many Perl revisions +from Perl 1 onwards and all the revisions from Perforce, the previous version control system. For much more detail on using git with the Perl repository, please see @@ -128,7 +156,7 @@ L<perlgit>. =head2 Read access via Git -You will need a copy of Git for your computer. You can fetch a copy of +You will need a copy of Git for your computer. You can fetch a copy of the repository using the git protocol: % git clone git://perl5.git.perl.org/perl.git perl @@ -143,10 +171,10 @@ clone via http, though this is much slower: =head2 Read access via the web -You may access the repository over the web. This allows you to browse +You may access the repository over the web. This allows you to browse the tree, see recent commits, subscribe to RSS feeds for the changes, -search for particular commits and more. You may access it at -L<http://perl5.git.perl.org/perl.git>. A mirror of the repository is +search for particular commits and more. You may access it at +L<http://perl5.git.perl.org/perl.git>. A mirror of the repository is found at L<http://github.com/mirrors/perl>. =head2 Read access via rsync @@ -154,18 +182,18 @@ found at L<http://github.com/mirrors/perl>. You can also choose to use rsync to get a copy of the current source tree for the bleadperl branch and all maintenance branches: - % rsync -avz rsync://perl5.git.perl.org/perl-current . - % rsync -avz rsync://perl5.git.perl.org/perl-5.12.x . - % rsync -avz rsync://perl5.git.perl.org/perl-5.10.x . - % rsync -avz rsync://perl5.git.perl.org/perl-5.8.x . - % rsync -avz rsync://perl5.git.perl.org/perl-5.6.x . - % rsync -avz rsync://perl5.git.perl.org/perl-5.005xx . + % rsync -avz rsync://perl5.git.perl.org/perl-current . + % rsync -avz rsync://perl5.git.perl.org/perl-5.12.x . + % rsync -avz rsync://perl5.git.perl.org/perl-5.10.x . + % rsync -avz rsync://perl5.git.perl.org/perl-5.8.x . + % rsync -avz rsync://perl5.git.perl.org/perl-5.6.x . + % rsync -avz rsync://perl5.git.perl.org/perl-5.005xx . (Add the C<--delete> option to remove leftover files.) To get a full list of the available sync points: - % rsync perl5.git.perl.org:: + % rsync perl5.git.perl.org:: =head2 Write access via git @@ -175,40 +203,41 @@ using git. =head1 PATCHING PERL If you're planning to do more extensive work than a single small fix, -we encourage you to read the documentation below. This will help you +we encourage you to read the documentation below. This will help you focus your work and make your patches easier to incorporate into the Perl source. =head2 Submitting patches -If you have a small patch to submit, please submit it via perlbug. You -can also send email directly to perlbug@perl.org. Please note that +If you have a small patch to submit, please submit it via perlbug. You +can also send email directly to perlbug@perl.org. Please note that messages sent to perlbug may be held in a moderation queue, so you won't receive a response immediately. You'll know your submission has been processed when you receive an -email from our ticket tracking system. This email will give you a -ticket number. Once your patch has made it to the ticket tracking +email from our ticket tracking system. This email will give you a +ticket number. Once your patch has made it to the ticket tracking system, it will also be sent to the perl5-porters@perl.org list. -Patches are reviewed and discussed on the p5p list. Simple, +Patches are reviewed and discussed on the p5p list. Simple, uncontroversial patches will usually be applied without any discussion. When the patch is applied, the ticket will be updated and you will -receive email. In addition, an email will be sent to the p5p list. +receive email. In addition, an email will be sent to the p5p list. -In other cases, the patch will need more work or discussion. That will +In other cases, the patch will need more work or discussion. That will happen on the p5p list. You are encouraged to participate in the discussion and advocate for -your patch. Sometimes your patch may get lost in the shuffle. It's +your patch. Sometimes your patch may get lost in the shuffle. It's appropriate to send a reminder email to p5p if no action has been taken -in a month. Please remember that the Perl 5 developers are all +in a month. Please remember that the Perl 5 developers are all volunteers, and be polite. Changes are always applied directly to the main development branch, -called "blead". Some patches may be backported to a maintenance branch. -If you think your patch is appropriate for the maintenance branch, -please explain why when you submit it. +called "blead". Some patches may be backported to a maintenance +branch. If you think your patch is appropriate for the maintenance +branch (see L<perlpolicy/MAINTENANCE BRANCHES>), please explain why +when you submit it. =head2 Getting your patch accepted @@ -218,25 +247,25 @@ can do to help the Perl 5 Porters accept your patch. =head3 Patch style If you used git to check out the Perl source, then using C<git -format-patch> will produce a patch in a style suitable for Perl. The +format-patch> will produce a patch in a style suitable for Perl. The C<format-patch> command produces one patch file for each commit you -made. If you prefer to send a single patch for all commits, you can use -C<git diff>. +made. If you prefer to send a single patch for all commits, you can +use C<git diff>. % git checkout blead % git pull % git diff blead my-branch-name This produces a patch based on the difference between blead and your -current branch. It's important to make sure that blead is up to date +current branch. It's important to make sure that blead is up to date before producing the diff, that's why we call C<git pull> first. -We strongly recommend that you use git if possible. It will make your +We strongly recommend that you use git if possible. It will make your life easier, and ours as well. However, if you're not using git, you can still produce a suitable -patch. You'll need a pristine copy of the Perl source to diff against. -The porters prefer unified diffs. Using GNU C<diff>, you can produce a +patch. You'll need a pristine copy of the Perl source to diff against. +The porters prefer unified diffs. Using GNU C<diff>, you can produce a diff like this: % diff -Npurd perl.pristine perl.mine @@ -247,11 +276,11 @@ build artifacts, or you may get a confusing result. =head3 Commit message As you craft each patch you intend to submit to the Perl core, it's -important to write a good commit message. This is especially important +important to write a good commit message. This is especially important if your submission will consist of a series of commits. The first line of the commit message should be a short description -without a period. It should be no longer than the subject line of an +without a period. It should be no longer than the subject line of an email, 50 characters being a good rule of thumb. A lot of Git tools (Gitweb, GitHub, git log --pretty=oneline, ...) will @@ -271,11 +300,11 @@ to Perl. =item * Why Your commit message should describe why the change you are making is -important. When someone looks at your change in six months or six +important. When someone looks at your change in six months or six years, your intent should be clear. If you're deprecating a feature with the intent of later simplifying -another bit of code, say so. If you're fixing a performance problem or +another bit of code, say so. If you're fixing a performance problem or adding a new feature to support some other bit of the core, mention that. @@ -294,23 +323,23 @@ month or next year. =back A commit message isn't intended to take the place of comments in your -code. Commit messages should describe the change you made, while code +code. Commit messages should describe the change you made, while code comments should describe the current state of the code. If you've just implemented a new feature, complete with doc, tests and -well-commented code, a brief commit message will often suffice. If, +well-commented code, a brief commit message will often suffice. If, however, you've just changed a single character deep in the parser or lexer, you might need to write a small novel to ensure that future readers understand what you did and why you did it. =head3 Comments, Comments, Comments -Be sure to adequately comment your code. While commenting every line is -unnecessary, anything that takes advantage of side effects of +Be sure to adequately comment your code. While commenting every line +is unnecessary, anything that takes advantage of side effects of operators, that creates changes that will be felt outside of the function being patched, or that others may find confusing should be -documented. If you are going to err, it is better to err on the side of -adding too many comments than too few. +documented. If you are going to err, it is better to err on the side +of adding too many comments than too few. The best comments explain I<why> the code does what it does, not I<what it does>. @@ -381,13 +410,28 @@ extra paren, e.g. "if (a && (b = c)) ..." "if (!foo) ..." rather than "if (foo == FALSE) ..." etc. +=item * + +Do not declare variables using "register". It may be counterproductive +with modern compilers, and is deprecated in C++, under which the Perl +source is regularly compiled. + +=item * + +In-line functions that are in headers that are accessible to XS code +need to be able to compile without warnings with commonly used extra +compilation flags, such as gcc's C<-Wswitch-default> which warns +whenever a switch statement does not have a "default" case. The use of +these extra flags is to catch potential problems in legal C code, and +is often used by Perl aggregators, such as Linux distributors. + =back =head3 Test suite If your patch changes code (rather than just changing documentation), you should also include one or more test cases which illustrate the bug -you're fixing or validate the new functionality you're adding. In +you're fixing or validate the new functionality you're adding. In general, you should update an existing test file rather than create a new one. @@ -398,7 +442,7 @@ Your test suite additions should generally follow these guidelines =item * -Know what you're testing. Read the docs, and the source. +Know what you're testing. Read the docs, and the source. =item * @@ -428,7 +472,7 @@ Give meaningful error messages when a test fails. =item * -Avoid using qx// and system() unless you are testing for them. If you +Avoid using qx// and system() unless you are testing for them. If you do use them, make sure that you cover _all_ perl platforms. =item * @@ -450,7 +494,7 @@ Add comments to the code explaining what you are testing for. =item * -Make updating the '1..42' string unnecessary. Or make sure that you +Make updating the '1..42' string unnecessary. Or make sure that you update it. =item * @@ -473,7 +517,7 @@ This works just like patching anything else, with one extra consideration. Modules in the F<cpan/> directory of the source tree are maintained -outside of the Perl core. When the author updates the module, the +outside of the Perl core. When the author updates the module, the updates are simply copied into the core. See that module's documentation or its listing on L<http://search.cpan.org/> for more information on reporting bugs and submitting patches. @@ -493,8 +537,8 @@ core. For changes significant enough to warrant a F<pod/perldelta.pod> entry, the porters will greatly appreciate it if you submit a delta entry -along with your actual change. Significant changes include, but are not -limited to: +along with your actual change. Significant changes include, but are +not limited to: =over 4 @@ -534,13 +578,13 @@ Important platform-specific changes =back Please make sure you add the perldelta entry to the right section -within F<pod/perldelta.pod>. More information on how to write good +within F<pod/perldelta.pod>. More information on how to write good perldelta entries is available in the C<Style> section of F<Porting/how_to_write_a_perldelta.pod>. =head2 What makes for a good patch? -New features and extensions to the language can be contentious. There +New features and extensions to the language can be contentious. There is no specific set of criteria which determine what features get added, but here are some questions to consider when developing a patch: @@ -574,28 +618,28 @@ Either assimilate new technologies, or build bridges to them. =head3 Where is the implementation? -All the talk in the world is useless without an implementation. In +All the talk in the world is useless without an implementation. In almost every case, the person or people who argue for a new feature -will be expected to be the ones who implement it. Porters capable of +will be expected to be the ones who implement it. Porters capable of coding new features have their own agendas, and are not available to implement your (possibly good) idea. =head3 Backwards compatibility -It's a cardinal sin to break existing Perl programs. New warnings can +It's a cardinal sin to break existing Perl programs. New warnings can be contentious--some say that a program that emits warnings is not -broken, while others say it is. Adding keywords has the potential to +broken, while others say it is. Adding keywords has the potential to break programs, changing the meaning of existing token sequences or functions might break programs. The Perl 5 core includes mechanisms to help porters make backwards incompatible changes more compatible such as the L<feature> and -L<deprecate> modules. Please use them when appropriate. +L<deprecate> modules. Please use them when appropriate. =head3 Could it be a module instead? Perl 5 has extension mechanisms, modules and XS, specifically to avoid -the need to keep changing the Perl interpreter. You can write modules +the need to keep changing the Perl interpreter. You can write modules that export functions, you can give those functions prototypes so they can be called like built-in functions, you can even write XS code to mess with the runtime data structures of the Perl interpreter if you @@ -618,26 +662,26 @@ potential to introduce new bugs. =head3 How big is it? -The smaller and more localized the change, the better. Similarly, a +The smaller and more localized the change, the better. Similarly, a series of small patches is greatly preferred over a single large patch. =head3 Does it preclude other desirable features? A patch is likely to be rejected if it closes off future avenues of -development. For instance, a patch that placed a true and final +development. For instance, a patch that placed a true and final interpretation on prototypes is likely to be rejected because there are still options for the future of prototypes that haven't been addressed. =head3 Is the implementation robust? Good patches (tight code, complete, correct) stand more chance of going -in. Sloppy or incorrect patches might be placed on the back burner +in. Sloppy or incorrect patches might be placed on the back burner until the pumpking has time to fix, or might be discarded altogether without further notice. =head3 Is the implementation generic enough to be portable? -The worst patches make use of system-specific features. It's highly +The worst patches make use of system-specific features. It's highly unlikely that non-portable additions to the Perl language will be accepted. @@ -656,14 +700,14 @@ patch won't be accidentally thrown away by someone in the future? =head3 Is there enough documentation? Patches without documentation are probably ill-thought out or -incomplete. No features can be added or changed without documentation, +incomplete. No features can be added or changed without documentation, so submitting a patch for the appropriate pod docs as well as the source code is important. =head3 Is there another way to do it? Larry said "Although the Perl Slogan is I<There's More Than One Way to -Do It>, I hesitate to make 10 ways to do something". This is a tricky +Do It>, I hesitate to make 10 ways to do something". This is a tricky heuristic to navigate, though--one man's essential addition is another man's pointless cruft. @@ -674,12 +718,12 @@ authors, ... Perl is supposed to be easy. =head3 Patches speak louder than words -Working code is always preferred to pie-in-the-sky ideas. A patch to +Working code is always preferred to pie-in-the-sky ideas. A patch to add a feature stands a much higher chance of making it to the language than does a random feature request, no matter how fervently argued the -request might be. This ties into "Will it be useful?", as the fact that -someone took the time to make the patch demonstrates a strong desire -for the feature. +request might be. This ties into "Will it be useful?", as the fact +that someone took the time to make the patch demonstrates a strong +desire for the feature. =head1 TESTING @@ -687,11 +731,12 @@ The core uses the same testing style as the rest of Perl, a simple "ok/not ok" run through Test::Harness, but there are a few special considerations. -There are three ways to write a test in the core. L<Test::More>, -F<t/test.pl> and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. The -decision of which to use depends on what part of the test suite you're -working on. This is a measure to prevent a high-level failure (such as -Config.pm breaking) from causing basic functionality tests to fail. +There are three ways to write a test in the core: L<Test::More>, +F<t/test.pl> and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. +The decision of which to use depends on what part of the test suite +you're working on. This is a measure to prevent a high-level failure +(such as Config.pm breaking) from causing basic functionality tests to +fail. The F<t/test.pl> library provides some of the features of L<Test::More>, but avoids loading most modules and uses as few core @@ -702,11 +747,13 @@ Protocol|http://testanything.org>. =over 4 -=item * F<t/base> and F<t/comp> +=item * F<t/base>, F<t/comp> and F<t/opbasic> Since we don't know if require works, or even subroutines, use ad hoc -tests for these two. Step carefully to avoid using the feature being -tested. +tests for these three. Step carefully to avoid using the feature being +tested. Tests in F<t/opbasic>, for instance, have been placed there +rather than in F<t/op> because they test functionality which +F<t/test.pl> presumes has already been demonstrated to work. =item * F<t/cmd>, F<t/run>, F<t/io> and F<t/op> @@ -719,25 +766,25 @@ sure to skip the test gracefully if it's not there. =item * Everything else Now that the core of Perl is tested, L<Test::More> can and should be -used. You can also use the full suite of core modules in the tests. +used. You can also use the full suite of core modules in the tests. =back When you say "make test", Perl uses the F<t/TEST> program to run the -test suite (except under Win32 where it uses F<t/harness> instead). All -tests are run from the F<t/> directory, B<not> the directory which -contains the test. This causes some problems with the tests in F<lib/>, -so here's some opportunity for some patching. +test suite (except under Win32 where it uses F<t/harness> instead). +All tests are run from the F<t/> directory, B<not> the directory which +contains the test. This causes some problems with the tests in +F<lib/>, so here's some opportunity for some patching. -You must be triply conscious of cross-platform concerns. This usually +You must be triply conscious of cross-platform concerns. This usually boils down to using L<File::Spec> and avoiding things like C<fork()> and C<system()> unless absolutely necessary. =head2 Special C<make test> targets There are various special make targets that can be used to test Perl -slightly differently than the standard "test" target. Not all them are -expected to give a 100% success rate. Many of them have several +slightly differently than the standard "test" target. Not all them are +expected to give a 100% success rate. Many of them have several aliases, and many of them are not available on certain operating systems. @@ -748,75 +795,32 @@ systems. This runs some basic sanity tests on the source tree and helps catch basic errors before you submit a patch. -=item * coretest - -Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests). - -(Not available on Win32) - -=item * test.deparse - -Run all the tests through L<B::Deparse>. Not all tests will succeed. - -(Not available on Win32) - -=item * test.taintwarn - -Run all tests with the B<-t> command-line switch. Not all tests are -expected to succeed (until they're specifically fixed, of course). - -(Not available on Win32) - =item * minitest Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>, F<t/op>, F<t/uni> and F<t/mro> tests. -=item * test.valgrind check.valgrind utest.valgrind ucheck.valgrind +=item * test.valgrind check.valgrind (Only in Linux) Run all the tests using the memory leak + naughty -memory access tool "valgrind". The log files will be named +memory access tool "valgrind". The log files will be named F<testname.valgrind>. -=item * test.torture torturetest - -Run all the usual tests and some extra tests. As of Perl 5.8.0, the -only extra tests are Abigail's JAPHs, F<t/japh/abigail.t>. - -You can also run the torture test with F<t/harness> by giving -C<-torture> argument to F<t/harness>. - -=item * utest ucheck test.utf8 check.utf8 - -Run all the tests with -Mutf8. Not all tests will succeed. - -(Not available on Win32) - -=item * minitest.utf16 test.utf16 - -Runs the tests with UTF-16 encoded scripts, encoded with different -versions of this encoding. - -C<make utest.utf16> runs the test suite with a combination of C<-utf8> -and C<-utf16> arguments to F<t/TEST>. - -(Not available on Win32) - =item * test_harness Run the test suite with the F<t/harness> controlling program, instead -of F<t/TEST>. F<t/harness> is more sophisticated, and uses the +of F<t/TEST>. F<t/harness> is more sophisticated, and uses the L<Test::Harness> module, thus using this test target supposes that perl -mostly works. The main advantage for our purposes is that it prints a -detailed summary of failed tests at the end. Also, unlike F<t/TEST>, it -doesn't redirect stderr to stdout. +mostly works. The main advantage for our purposes is that it prints a +detailed summary of failed tests at the end. Also, unlike F<t/TEST>, +it doesn't redirect stderr to stdout. Note that under Win32 F<t/harness> is always used instead of F<t/TEST>, so there is no special "test_harness" target. Under Win32's "test" target you may use the TEST_SWITCHES and TEST_FILES environment variables to control the behaviour of -F<t/harness>. This means you can say +F<t/harness>. This means you can say nmake test TEST_FILES="op/*.t" nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t" @@ -830,9 +834,9 @@ Sets PERL_SKIP_TTY_TEST to true before running normal test. =head2 Parallel tests The core distribution can now run its regression tests in parallel on -Unix-like platforms. Instead of running C<make test>, set C<TEST_JOBS> +Unix-like platforms. Instead of running C<make test>, set C<TEST_JOBS> in your environment to the number of tests to run in parallel, and run -C<make test_harness>. On a Bourne-like shell, this can be done as +C<make test_harness>. On a Bourne-like shell, this can be done as TEST_JOBS=3 make test_harness # Run 3 tests in parallel @@ -842,8 +846,8 @@ non-conflicting test scripts itself, and there is no standard interface to C<make> utilities to interact with their job schedulers. Note that currently some test scripts may fail when run in parallel -(most notably F<ext/IO/t/io_dir.t>). If necessary, run just the failing -scripts again sequentially and see if the failures go away. +(most notably F<ext/IO/t/io_dir.t>). If necessary, run just the +failing scripts again sequentially and see if the failures go away. =head2 Running tests by hand @@ -861,14 +865,14 @@ or =head2 Using F<t/harness> for testing If you use C<harness> for testing, you have several command line -options available to you. The arguments are as follows, and are in the +options available to you. The arguments are as follows, and are in the order that they must appear if used together. harness -v -torture -re=pattern LIST OF FILES TO TEST harness -v -torture -re LIST OF PATTERNS TO MATCH If C<LIST OF FILES TO TEST> is omitted, the file list is obtained from -the manifest. The file list may include shell wildcards which will be +the manifest. The file list may include shell wildcards which will be expanded out. =over 4 @@ -884,14 +888,14 @@ Run the torture tests as well as the normal set. =item * -re=PATTERN -Filter the file list so that all the test files run match PATTERN. Note -that this form is distinct from the B<-re LIST OF PATTERNS> form below -in that it allows the file list to be provided as well. +Filter the file list so that all the test files run match PATTERN. +Note that this form is distinct from the B<-re LIST OF PATTERNS> form +below in that it allows the file list to be provided as well. =item * -re LIST OF PATTERNS Filter the file list so that all the test files run match -/(LIST|OF|PATTERNS)/. Note that with this form the patterns are joined +/(LIST|OF|PATTERNS)/. Note that with this form the patterns are joined by '|' and you cannot supply a list of files, instead the test files are obtained from the MANIFEST. @@ -909,7 +913,7 @@ affect the execution of the test: =item * PERL_CORE=1 indicates that we're running this test as part of the perl core test -suite. This is useful for modules that have a dual life on CPAN. +suite. This is useful for modules that have a dual life on CPAN. =item * PERL_DESTRUCT_LEVEL=2 @@ -924,9 +928,9 @@ F<./perl>). =item * PERL_SKIP_TTY_TEST -if set, tells to skip the tests that need a terminal. It's actually set -automatically by the Makefile, but can also be forced artificially by -running 'make test_notty'. +if set, tells to skip the tests that need a terminal. It's actually +set automatically by the Makefile, but can also be forced artificially +by running 'make test_notty'. =back @@ -937,7 +941,7 @@ running 'make test_notty'. =item * PERL_TEST_Net_Ping Setting this variable runs all the Net::Ping modules tests, otherwise -some tests that interact with the outside world are skipped. See +some tests that interact with the outside world are skipped. See L<perl58delta>. =item * PERL_TEST_NOVREXX @@ -948,6 +952,13 @@ Setting this variable skips the vrexx.t tests for OS2::REXX. This sets a variable in op/numconvert.t. +=item * PERL_TEST_MEMORY + +Setting this variable includes the tests in F<t/bigmem/>. This should +be set to the number of gigabytes of memory available for testing, eg. +C<PERL_TEST_MEMORY=4> indicates that tests that require 4GiB of +available memory can be run safely. + =back See also the documentation for the Test and Test::Harness modules, for @@ -961,7 +972,7 @@ To hack on the Perl guts, you'll need to read the following things: =item * L<perlsource> -An overview of the Perl source tree. This will help you find the files +An overview of the Perl source tree. This will help you find the files you're looking for. =item * L<perlinterp> @@ -972,12 +983,12 @@ Perl does what it does. =item * L<perlhacktut> This document walks through the creation of a small patch to Perl's C -code. If you're just getting started with Perl core hacking, this will +code. If you're just getting started with Perl core hacking, this will help you understand how it works. =item * L<perlhacktips> -More details on hacking the Perl core. This document focuses on lower +More details on hacking the Perl core. This document focuses on lower level details such as how to write tests, compilation issues, portability, debugging, etc. @@ -986,7 +997,7 @@ If you plan on doing serious C hacking, make sure to read this. =item * L<perlguts> This is of paramount importance, since it's the documentation of what -goes where in the Perl source. Read it over a couple of times and it +goes where in the Perl source. Read it over a couple of times and it might start to make sense - don't worry if it doesn't yet, because the best way to study it is to read it in conjunction with poking at Perl source, and we'll do that later on. @@ -1000,7 +1011,7 @@ L<http://search.cpan.org/dist/illguts/> A working knowledge of XSUB programming is incredibly useful for core hacking; XSUBs use techniques drawn from the PP code, the portion of -the guts that actually executes a Perl program. It's a lot gentler to +the guts that actually executes a Perl program. It's a lot gentler to learn those techniques from simple examples and explanation than from the core itself. @@ -1015,13 +1026,6 @@ This is a collection of words of wisdom for a Perl porter; some of it is only useful to the pumpkin holder, but most of it applies to anyone wanting to go about Perl development. -=item * The perl5-porters FAQ - -This should be available from -http://dev.perl.org/perl5/docs/p5p-faq.html . It contains hints on -reading perl5-porters, information on how perl5-porters works and how -Perl development in general works. - =back =head1 CPAN TESTERS AND PERL SMOKERS @@ -1034,9 +1038,9 @@ http://www.nntp.perl.org/group/perl.daily-build.reports/ ) automatically test Perl source releases on platforms with various configurations. -Both efforts welcome volunteers. In order to get involved in smoke +Both efforts welcome volunteers. In order to get involved in smoke testing of the perl itself visit -L<http://search.cpan.org/dist/Test-Smoke/>. In order to start smoke +L<http://search.cpan.org/dist/Test-Smoke/>. In order to start smoke testing CPAN modules visit L<http://search.cpan.org/dist/CPANPLUS-YACSmoke/> or L<http://search.cpan.org/dist/minismokebox/> or @@ -1060,14 +1064,14 @@ who knows, you may unearth a bug in the patch... =item * Do read the README associated with your operating system, e.g. -README.aix on the IBM AIX OS. Don't hesitate to supply patches to that +README.aix on the IBM AIX OS. Don't hesitate to supply patches to that README if you find anything missing or changed over a new OS release. =item * Find an area of Perl that seems interesting to you, and see if you can -work out how it works. Scan through the source, and step over it in the -debugger. Play, poke, investigate, fiddle! You'll probably get to +work out how it works. Scan through the source, and step over it in +the debugger. Play, poke, investigate, fiddle! You'll probably get to understand not just your chosen area but a much wider range of F<perl>'s activity as well, and probably sooner than you'd think. @@ -1076,7 +1080,7 @@ F<perl>'s activity as well, and probably sooner than you'd think. =head2 "The Road goes ever on and on, down from the door where it began." If you can do these things, you've started on the long road to Perl -porting. Thanks for wanting to help make Perl better - and happy +porting. Thanks for wanting to help make Perl better - and happy hacking! =head2 Metaphoric Quotations @@ -1084,7 +1088,7 @@ hacking! If you recognized the quote about the Road above, you're in luck. Most software projects begin each file with a literal description of -each file's purpose. Perl instead begins each with a literary allusion +each file's purpose. Perl instead begins each with a literary allusion to that file's purpose. Like chapters in many books, all top-level Perl source files (along @@ -1093,20 +1097,20 @@ inscription that alludes, indirectly and metaphorically, to the material you're about to read. Quotations are taken from writings of J.R.R. Tolkien pertaining to his -Legendarium, almost always from I<The Lord of the Rings>. Chapters and +Legendarium, almost always from I<The Lord of the Rings>. Chapters and page numbers are given using the following editions: =over 4 =item * -I<The Hobbit>, by J.R.R. Tolkien. The hardcover, 70th-anniversary +I<The Hobbit>, by J.R.R. Tolkien. The hardcover, 70th-anniversary edition of 2007 was used, published in the UK by Harper Collins Publishers and in the US by the Houghton Mifflin Company. =item * -I<The Lord of the Rings>, by J.R.R. Tolkien. The hardcover, +I<The Lord of the Rings>, by J.R.R. Tolkien. The hardcover, 50th-anniversary edition of 2004 was used, published in the UK by Harper Collins Publishers and in the US by the Houghton Mifflin Company. @@ -1115,7 +1119,7 @@ Company. I<The Lays of Beleriand>, by J.R.R. Tolkien and published posthumously by his son and literary executor, C.J.R. Tolkien, being the 3rd of the -12 volumes in Christopher's mammoth I<History of Middle Earth>. Page +12 volumes in Christopher's mammoth I<History of Middle Earth>. Page numbers derive from the hardcover edition, first published in 1983 by George Allen & Unwin; no page numbers changed for the special 3-volume omnibus edition of 2002 or the various trade-paper editions, all again @@ -1126,7 +1130,7 @@ now by Harper Collins or Houghton Mifflin. Other JRRT books fair game for quotes would thus include I<The Adventures of Tom Bombadil>, I<The Silmarillion>, I<Unfinished Tales>, and I<The Tale of the Children of Hurin>, all but the first -posthumously assembled by CJRT. But I<The Lord of the Rings> itself is +posthumously assembled by CJRT. But I<The Lord of the Rings> itself is perfectly fine and probably best to quote from, provided you can find a suitable quote there. @@ -1134,7 +1138,7 @@ So if you were to supply a new, complete, top-level source file to add to Perl, you should conform to this peculiar practice by yourself selecting an appropriate quotation from Tolkien, retaining the original spelling and punctuation and using the same format the rest of the -quotes are in. Indirect and oblique is just fine; remember, it's a +quotes are in. Indirect and oblique is just fine; remember, it's a metaphor, so being meta is, after all, what it's for. =head1 AUTHOR diff --git a/gnu/usr.bin/perl/pod/perlhacktips.pod b/gnu/usr.bin/perl/pod/perlhacktips.pod index bb995f33005..324ed1a8425 100644 --- a/gnu/usr.bin/perl/pod/perlhacktips.pod +++ b/gnu/usr.bin/perl/pod/perlhacktips.pod @@ -277,11 +277,9 @@ This is transparent for the most part, but because the character sets differ, you shouldn't use numeric (decimal, octal, nor hex) constants to refer to characters. You can safely say 'A', but not 0x41. You can safely say '\n', but not \012. If a character doesn't have a trivial -input form, you can create a #define for it in both C<utfebcdic.h> and -C<utf8.h>, so that it resolves to different values depending on the -character set being used. (There are three different EBCDIC character -sets defined in C<utfebcdic.h>, so it might be best to insert the -#define three times in that file.) +input form, you should add it to the list in +F<regen/unicode_constants.pl>, and have Perl create #defines for you, +based on the current platform. Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to @@ -886,7 +884,8 @@ parse also C and C++. Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the pmd-X.Y.jar from it, and then run that on source code thusly: - java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt + java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \ + --minimum-tokens 100 --files /some/where/src --language c > cpd.txt You may run into memory limits, in which case you should use the -Xmx option: @@ -967,14 +966,16 @@ C<-std1> mode on. =head1 MEMORY DEBUGGERS -B<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or -Third Degree greatly slows down the execution: seconds become minutes, -minutes become hours. For example as of Perl 5.8.1, the +B<NOTE 1>: Running under older memory debuggers such as Purify, +valgrind or Third Degree greatly slows down the execution: seconds +become minutes, minutes become hours. For example as of Perl 5.8.1, the ext/Encode/t/Unicode.t takes extraordinarily long to complete under e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more than six hours, even on a snappy computer. The said test must be doing something that is quite unfriendly for memory debuggers. If you don't feel like waiting, that you can simply kill away the perl process. +Roughly valgrind slows down execution by factor 10, AddressSanitizer by +factor 2. B<NOTE 2>: To minimize the number of memory leak false alarms (see L</PERL_DESTRUCT_LEVEL> for more information), you have to set the @@ -1137,12 +1138,12 @@ finally report any memory problems. =head2 valgrind -The excellent valgrind tool can be used to find out both memory leaks -and illegal memory accesses. As of version 3.3.0, Valgrind only -supports Linux on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and -x86-64). The special "test.valgrind" target can be used to run the -tests under valgrind. Found errors and memory leaks are logged in -files named F<testfile.valgrind>. +The valgrind tool can be used to find out both memory leaks and illegal +heap memory accesses. As of version 3.3.0, Valgrind only supports Linux +on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64). The +special "test.valgrind" target can be used to run the tests under +valgrind. Found errors and memory leaks are logged in files named +F<testfile.valgrind>. Valgrind also provides a cachegrind tool, invoked on perl as: @@ -1157,6 +1158,52 @@ To get valgrind and for more information see http://valgrind.org/ +=head2 AddressSanitizer + +AddressSanitizer is a clang extension, included in clang since v3.1. It +checks illegal heap pointers, global pointers, stack pointers and use +after free errors, and is fast enough that you can easily compile your +debugging or optimized perl with it. It does not check memory leaks +though. AddressSanitizer is available for linux, Mac OS X and soon on +Windows. + +To build perl with AddressSanitizer, your Configure invocation should +look like: + + sh Configure -des -Dcc=clang \ + -Accflags=-faddress-sanitizer -Aldflags=-faddress-sanitizer \ + -Alddlflags=-shared\ -faddress-sanitizer + +where these arguments mean: + +=over 4 + +=item * -Dcc=clang + +This should be replaced by the full path to your clang executable if it +is not in your path. + +=item * -Accflags=-faddress-sanitizer + +Compile perl and extensions sources with AddressSanitizer. + +=item * -Aldflags=-faddress-sanitizer + +Link the perl executable with AddressSanitizer. + +=item * -Alddlflags=-shared\ -faddress-sanitizer + +Link dynamic extensions with AddressSanitizer. You must manually +specify C<-shared> because using C<-Alddlflags=-shared> will prevent +Configure from setting a default value for C<lddlflags>, which usually +contains C<-shared> (at least on linux). + +=back + +See also +L<http://code.google.com/p/address-sanitizer/wiki/AddressSanitizer>. + + =head1 PROFILING Depending on your platform there are various ways of profiling Perl. @@ -1411,41 +1458,17 @@ L<perlclib>. Under ithreads the optree is read only. If you want to enforce this, to check for write accesses from buggy code, compile with -C<-DPL_OP_SLAB_ALLOC> to enable the OP slab allocator and C<-DPERL_DEBUG_READONLY_OPS> to enable code that allocates op memory -via C<mmap>, and sets it read-only at run time. Any write access to an -op results in a C<SIGBUS> and abort. +via C<mmap>, and sets it read-only when it is attached to a subroutine. Any +write access to an op results in a C<SIGBUS> and abort. This code is intended for development only, and may not be portable even to all Unix variants. Also, it is an 80% solution, in that it -isn't able to make all ops read only. Specifically it - -=over - -=item * 1 - -Only sets read-only on all slabs of ops at C<CHECK> time, hence ops -allocated later via C<require> or C<eval> will be re-write - -=item * 2 +isn't able to make all ops read only. Specifically it does not apply to op +slabs belonging to C<BEGIN> blocks. -Turns an entire slab of ops read-write if the refcount of any op in the -slab needs to be decreased. - -=item * 3 - -Turns an entire slab of ops read-write if any op from the slab is -freed. - -=back - -It's not possible to turn the slabs to read-only after an action -requiring read-write access, as either can happen during op tree -building time, so there may still be legitimate write access. - -However, as an 80% solution it is still effective, as currently it -catches a write access during the generation of F<Config.pm>, which -means that we can't yet build F<perl> with this enabled. +However, as an 80% solution it is still effective, as it has caught bugs in +the past. =head2 The .i Targets @@ -1453,7 +1476,8 @@ You can expand the macros in a F<foo.c> file by saying make foo.i -which will expand the macros using cpp. Don't be scared by the results. +which will expand the macros using cpp. Don't be scared by the +results. =head1 AUTHOR diff --git a/gnu/usr.bin/perl/pod/perlhacktut.pod b/gnu/usr.bin/perl/pod/perlhacktut.pod index 33a9ef23e8d..fc0833649be 100644 --- a/gnu/usr.bin/perl/pod/perlhacktut.pod +++ b/gnu/usr.bin/perl/pod/perlhacktut.pod @@ -53,19 +53,19 @@ test whether we're still at the start of the string. So, here's where C<pat> is set up: STRLEN fromlen; - register char *pat = SvPVx(*++MARK, fromlen); - register char *patend = pat + fromlen; - register I32 len; + char *pat = SvPVx(*++MARK, fromlen); + char *patend = pat + fromlen; + I32 len; I32 datumtype; SV *fromstr; We'll have another string pointer in there: STRLEN fromlen; - register char *pat = SvPVx(*++MARK, fromlen); - register char *patend = pat + fromlen; + char *pat = SvPVx(*++MARK, fromlen); + char *patend = pat + fromlen; + char *patcopy; - register I32 len; + I32 len; I32 datumtype; SV *fromstr; diff --git a/gnu/usr.bin/perl/pod/perlinterp.pod b/gnu/usr.bin/perl/pod/perlinterp.pod index c7f21209de5..bb559ba02b9 100644 --- a/gnu/usr.bin/perl/pod/perlinterp.pod +++ b/gnu/usr.bin/perl/pod/perlinterp.pod @@ -363,7 +363,7 @@ Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c> 1 void - 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) + 2 Perl_sv_catpvn(pTHX_ SV *sv, const char *ptr, STRLEN len) 3 { 4 STRLEN tlen; 5 char *junk; diff --git a/gnu/usr.bin/perl/pod/perlintro.pod b/gnu/usr.bin/perl/pod/perlintro.pod index afce360a2ac..77465a123fe 100644 --- a/gnu/usr.bin/perl/pod/perlintro.pod +++ b/gnu/usr.bin/perl/pod/perlintro.pod @@ -64,13 +64,13 @@ worth writing about. To run a Perl program from the Unix command line: - perl progname.pl + perl progname.pl Alternatively, put this as the first line of your script: - #!/usr/bin/env perl + #!/usr/bin/env perl -... and run the script as C</path/to/script.pl>. Of course, it'll need +... and run the script as F</path/to/script.pl>. Of course, it'll need to be executable first, so C<chmod 755 script.pl> (under Unix). (This start line assumes you have the B<env> program. You can also put @@ -84,9 +84,9 @@ Windows and Mac OS, read L<perlrun>. Perl by default is very forgiving. In order to make it more robust it is recommended to start every program with the following lines: - #!/usr/bin/perl - use strict; - use warnings; + #!/usr/bin/perl + use strict; + use warnings; The two additional lines request from perl to catch various common problems in your code. They check different things so you need both. A @@ -105,45 +105,45 @@ that kind. Perl statements end in a semi-colon: - print "Hello, world"; + print "Hello, world"; Comments start with a hash symbol and run to the end of the line - # This is a comment + # This is a comment Whitespace is irrelevant: - print - "Hello, world" - ; + print + "Hello, world" + ; ... except inside quoted strings: - # this would print with a linebreak in the middle - print "Hello - world"; + # this would print with a linebreak in the middle + print "Hello + world"; Double quotes or single quotes may be used around literal strings: - print "Hello, world"; - print 'Hello, world'; + print "Hello, world"; + print 'Hello, world'; However, only double quotes "interpolate" variables and special characters such as newlines (C<\n>): - print "Hello, $name\n"; # works fine - print 'Hello, $name\n'; # prints $name\n literally + print "Hello, $name\n"; # works fine + print 'Hello, $name\n'; # prints $name\n literally Numbers don't need quotes around them: - print 42; + print 42; You can use parentheses for functions' arguments or omit them according to your personal taste. They are only required occasionally to clarify issues of precedence. - print("Hello, world\n"); - print "Hello, world\n"; + print("Hello, world\n"); + print "Hello, world\n"; More detailed information about Perl syntax can be found in L<perlsyn>. @@ -157,8 +157,8 @@ Perl has three main variable types: scalars, arrays, and hashes. A scalar represents a single value: - my $animal = "camel"; - my $answer = 42; + my $animal = "camel"; + my $answer = 42; Scalar values can be strings, integers or floating point numbers, and Perl will automatically convert between them as required. There is no need @@ -168,9 +168,9 @@ requirements of C<use strict;>.) Scalar values can be used in various ways: - print $animal; - print "The animal is $animal\n"; - print "The square of $answer is ", $answer * $answer, "\n"; + print $animal; + print "The animal is $animal\n"; + print "The square of $answer is ", $answer * $answer, "\n"; There are a number of "magic" scalars with names that look like punctuation or line noise. These special variables are used for all @@ -179,32 +179,32 @@ need to know about for now is C<$_> which is the "default variable". It's used as the default argument to a number of functions in Perl, and it's set implicitly by certain looping constructs. - print; # prints contents of $_ by default + print; # prints contents of $_ by default =item Arrays An array represents a list of values: - my @animals = ("camel", "llama", "owl"); - my @numbers = (23, 42, 69); - my @mixed = ("camel", 42, 1.23); + my @animals = ("camel", "llama", "owl"); + my @numbers = (23, 42, 69); + my @mixed = ("camel", 42, 1.23); Arrays are zero-indexed. Here's how you get at elements in an array: - print $animals[0]; # prints "camel" - print $animals[1]; # prints "llama" + print $animals[0]; # prints "camel" + print $animals[1]; # prints "llama" The special variable C<$#array> tells you the index of the last element of an array: - print $mixed[$#mixed]; # last element, prints 1.23 + print $mixed[$#mixed]; # last element, prints 1.23 You might be tempted to use C<$#array + 1> to tell you how many items there are in an array. Don't bother. As it happens, using C<@array> where Perl expects to find a scalar value ("in scalar context") will give you the number of elements in the array: - if (@animals < 5) { ... } + if (@animals < 5) { ... } The elements we're getting from the array start with a C<$> because we're getting just a single value out of the array; you ask for a scalar, @@ -212,16 +212,16 @@ you get a scalar. To get multiple values from an array: - @animals[0,1]; # gives ("camel", "llama"); - @animals[0..2]; # gives ("camel", "llama", "owl"); - @animals[1..$#animals]; # gives all except the first element + @animals[0,1]; # gives ("camel", "llama"); + @animals[0..2]; # gives ("camel", "llama", "owl"); + @animals[1..$#animals]; # gives all except the first element This is called an "array slice". You can do various useful things to lists: - my @sorted = sort @animals; - my @backwards = reverse @numbers; + my @sorted = sort @animals; + my @backwards = reverse @numbers; There are a couple of special arrays too, such as C<@ARGV> (the command line arguments to your script) and C<@_> (the arguments passed to a @@ -231,25 +231,25 @@ subroutine). These are documented in L<perlvar>. A hash represents a set of key/value pairs: - my %fruit_color = ("apple", "red", "banana", "yellow"); + my %fruit_color = ("apple", "red", "banana", "yellow"); You can use whitespace and the C<< => >> operator to lay them out more nicely: - my %fruit_color = ( - apple => "red", - banana => "yellow", - ); + my %fruit_color = ( + apple => "red", + banana => "yellow", + ); To get at hash elements: - $fruit_color{"apple"}; # gives "red" + $fruit_color{"apple"}; # gives "red" You can get at lists of keys and values with C<keys()> and C<values()>. - my @fruits = keys %fruit_colors; - my @colors = values %fruit_colors; + my @fruits = keys %fruit_colors; + my @colors = values %fruit_colors; Hashes have no particular internal order, though you can sort the keys and loop through them. @@ -272,22 +272,22 @@ element, you can easily create lists and hashes within lists and hashes. The following example shows a 2 level hash of hash structure using anonymous hash references. - my $variables = { - scalar => { - description => "single item", - sigil => '$', - }, - array => { - description => "ordered list of items", - sigil => '@', - }, - hash => { - description => "key/value pairs", - sigil => '%', - }, - }; - - print "Scalars begin with a $variables->{'scalar'}->{'sigil'}\n"; + my $variables = { + scalar => { + description => "single item", + sigil => '$', + }, + array => { + description => "ordered list of items", + sigil => '@', + }, + hash => { + description => "key/value pairs", + sigil => '%', + }, + }; + + print "Scalars begin with a $variables->{'scalar'}->{'sigil'}\n"; Exhaustive information on the topic of references can be found in L<perlreftut>, L<perllol>, L<perlref> and L<perldsc>. @@ -296,11 +296,11 @@ L<perlreftut>, L<perllol>, L<perlref> and L<perldsc>. Throughout the previous section all the examples have used the syntax: - my $var = "value"; + my $var = "value"; The C<my> is actually not required; you could just use: - $var = "value"; + $var = "value"; However, the above usage will create global variables throughout your program, which is bad programming practice. C<my> creates lexically @@ -308,15 +308,15 @@ scoped variables instead. The variables are scoped to the block (i.e. a bunch of statements surrounded by curly-braces) in which they are defined. - my $x = "foo"; - my $some_condition = 1; - if ($some_condition) { - my $y = "bar"; - print $x; # prints "foo" - print $y; # prints "bar" - } - print $x; # prints "foo" - print $y; # prints nothing; $y has fallen out of scope + my $x = "foo"; + my $some_condition = 1; + if ($some_condition) { + my $y = "bar"; + print $x; # prints "foo" + print $y; # prints "bar" + } + print $x; # prints "foo" + print $y; # prints nothing; $y has fallen out of scope Using C<my> in combination with a C<use strict;> at the top of your Perl scripts means that the interpreter will pick up certain common @@ -338,19 +338,19 @@ which are commonly used in conditional statements. =item if - if ( condition ) { - ... - } elsif ( other condition ) { - ... - } else { - ... - } + if ( condition ) { + ... + } elsif ( other condition ) { + ... + } else { + ... + } There's also a negated version of it: - unless ( condition ) { - ... - } + unless ( condition ) { + ... + } This is provided as a more readable version of C<if (!I<condition>)>. @@ -358,54 +358,54 @@ Note that the braces are required in Perl, even if you've only got one line in the block. However, there is a clever way of making your one-line conditional blocks more English like: - # the traditional way - if ($zippy) { - print "Yow!"; - } + # the traditional way + if ($zippy) { + print "Yow!"; + } - # the Perlish post-condition way - print "Yow!" if $zippy; - print "We have no bananas" unless $bananas; + # the Perlish post-condition way + print "Yow!" if $zippy; + print "We have no bananas" unless $bananas; =item while - while ( condition ) { - ... - } + while ( condition ) { + ... + } There's also a negated version, for the same reason we have C<unless>: - until ( condition ) { - ... - } + until ( condition ) { + ... + } You can also use C<while> in a post-condition: - print "LA LA LA\n" while 1; # loops forever + print "LA LA LA\n" while 1; # loops forever =item for Exactly like C: - for ($i = 0; $i <= $max; $i++) { - ... - } + for ($i = 0; $i <= $max; $i++) { + ... + } The C style for loop is rarely needed in Perl since Perl provides the more friendly list scanning C<foreach> loop. =item foreach - foreach (@array) { - print "This element is $_\n"; - } + foreach (@array) { + print "This element is $_\n"; + } - print $list[$_] foreach 0 .. $max; + print $list[$_] foreach 0 .. $max; - # you don't have to use the default $_ either... - foreach my $key (keys %hash) { - print "The value of $key is $hash{$key}\n"; - } + # you don't have to use the default $_ either... + foreach my $key (keys %hash) { + print "The value of $key is $hash{$key}\n"; + } The C<foreach> keyword is actually a synonym for the C<for> keyword. See C<L<perlsyn/"Foreach Loops">>. @@ -429,28 +429,28 @@ of the most common ones: =item Arithmetic - + addition - - subtraction - * multiplication - / division + + addition + - subtraction + * multiplication + / division =item Numeric comparison - == equality - != inequality - < less than - > greater than - <= less than or equal - >= greater than or equal + == equality + != inequality + < less than + > greater than + <= less than or equal + >= greater than or equal =item String comparison - eq equality - ne inequality - lt less than - gt greater than - le less than or equal - ge greater than or equal + eq equality + ne inequality + lt less than + gt greater than + le less than or equal + ge greater than or equal (Why do we have separate numeric and string comparisons? Because we don't have special variable types, and Perl needs to know whether to sort @@ -459,9 +459,9 @@ before 99). =item Boolean logic - && and - || or - ! not + && and + || or + ! not (C<and>, C<or> and C<not> aren't just in the above table as descriptions of the operators. They're also supported as operators in their own @@ -471,18 +471,18 @@ detail.) =item Miscellaneous - = assignment - . string concatenation - x string multiplication - .. range operator (creates a list of numbers) + = assignment + . string concatenation + x string multiplication + .. range operator (creates a list of numbers) =back Many operators can be combined with a C<=> as follows: - $a += 1; # same as $a = $a + 1 - $a -= 1; # same as $a = $a - 1 - $a .= "\n"; # same as $a = $a . "\n"; + $a += 1; # same as $a = $a + 1 + $a -= 1; # same as $a = $a - 1 + $a .= "\n"; # same as $a = $a . "\n"; =head2 Files and I/O @@ -490,17 +490,17 @@ You can open a file for input or output using the C<open()> function. It's documented in extravagant detail in L<perlfunc> and L<perlopentut>, but in short: - open(my $in, "<", "input.txt") or die "Can't open input.txt: $!"; - open(my $out, ">", "output.txt") or die "Can't open output.txt: $!"; - open(my $log, ">>", "my.log") or die "Can't open my.log: $!"; + open(my $in, "<", "input.txt") or die "Can't open input.txt: $!"; + open(my $out, ">", "output.txt") or die "Can't open output.txt: $!"; + open(my $log, ">>", "my.log") or die "Can't open my.log: $!"; You can read from an open filehandle using the C<< <> >> operator. In scalar context it reads a single line from the filehandle, and in list context it reads the whole file in, assigning each line to an element of the list: - my $line = <$in>; - my @lines = <$in>; + my $line = <$in>; + my @lines = <$in>; Reading in the whole file at one time is called slurping. It can be useful but it may be a memory hog. Most text file processing @@ -508,22 +508,22 @@ can be done a line at a time with Perl's looping constructs. The C<< <> >> operator is most often seen in a C<while> loop: - while (<$in>) { # assigns each line in turn to $_ - print "Just read in this line: $_"; - } + while (<$in>) { # assigns each line in turn to $_ + print "Just read in this line: $_"; + } We've already seen how to print to standard output using C<print()>. However, C<print()> can also take an optional first argument specifying which filehandle to print to: - print STDERR "This is your final warning.\n"; - print $out $record; - print $log $logmessage; + print STDERR "This is your final warning.\n"; + print $out $record; + print $log $logmessage; When you're done with your filehandles, you should C<close()> them (though to be honest, Perl will clean up after you if you forget): - close $in or die "$in: $!"; + close $in or die "$in: $!"; =head2 Regular expressions @@ -535,8 +535,8 @@ elsewhere. However, in short: =item Simple matching - if (/foo/) { ... } # true if $_ contains "foo" - if ($a =~ /foo/) { ... } # true if $a contains "foo" + if (/foo/) { ... } # true if $_ contains "foo" + if ($a =~ /foo/) { ... } # true if $a contains "foo" The C<//> matching operator is documented in L<perlop>. It operates on C<$_> by default, or can be bound to another variable using the C<=~> @@ -544,9 +544,10 @@ binding operator (also documented in L<perlop>). =item Simple substitution - s/foo/bar/; # replaces foo with bar in $_ - $a =~ s/foo/bar/; # replaces foo with bar in $a - $a =~ s/foo/bar/g; # replaces ALL INSTANCES of foo with bar in $a + s/foo/bar/; # replaces foo with bar in $_ + $a =~ s/foo/bar/; # replaces foo with bar in $a + $a =~ s/foo/bar/g; # replaces ALL INSTANCES of foo with bar + # in $a The C<s///> substitution operator is documented in L<perlop>. @@ -557,46 +558,49 @@ on just about anything you could dream of by using more complex regular expressions. These are documented at great length in L<perlre>, but for the meantime, here's a quick cheat sheet: - . a single character - \s a whitespace character (space, tab, newline, ...) - \S non-whitespace character - \d a digit (0-9) - \D a non-digit - \w a word character (a-z, A-Z, 0-9, _) - \W a non-word character - [aeiou] matches a single character in the given set - [^aeiou] matches a single character outside the given set - (foo|bar|baz) matches any of the alternatives specified - - ^ start of string - $ end of string + . a single character + \s a whitespace character (space, tab, newline, + ...) + \S non-whitespace character + \d a digit (0-9) + \D a non-digit + \w a word character (a-z, A-Z, 0-9, _) + \W a non-word character + [aeiou] matches a single character in the given set + [^aeiou] matches a single character outside the given + set + (foo|bar|baz) matches any of the alternatives specified + + ^ start of string + $ end of string Quantifiers can be used to specify how many of the previous thing you want to match on, where "thing" means either a literal character, one of the metacharacters listed above, or a group of characters or metacharacters in parentheses. - * zero or more of the previous thing - + one or more of the previous thing - ? zero or one of the previous thing - {3} matches exactly 3 of the previous thing - {3,6} matches between 3 and 6 of the previous thing - {3,} matches 3 or more of the previous thing + * zero or more of the previous thing + + one or more of the previous thing + ? zero or one of the previous thing + {3} matches exactly 3 of the previous thing + {3,6} matches between 3 and 6 of the previous thing + {3,} matches 3 or more of the previous thing Some brief examples: - /^\d+/ string starts with one or more digits - /^$/ nothing in the string (start and end are adjacent) - /(\d\s){3}/ a three digits, each followed by a whitespace - character (eg "3 4 5 ") - /(a.)+/ matches a string in which every odd-numbered letter - is a (eg "abacadaf") + /^\d+/ string starts with one or more digits + /^$/ nothing in the string (start and end are + adjacent) + /(\d\s){3}/ three digits, each followed by a whitespace + character (eg "3 4 5 ") + /(a.)+/ matches a string in which every odd-numbered + letter is a (eg "abacadaf") - # This loop reads from STDIN, and prints non-blank lines: - while (<>) { - next if /^$/; - print; - } + # This loop reads from STDIN, and prints non-blank lines: + while (<>) { + next if /^$/; + print; + } =item Parentheses for capturing @@ -604,12 +608,12 @@ As well as grouping, parentheses serve a second purpose. They can be used to capture the results of parts of the regexp match for later use. The results end up in C<$1>, C<$2> and so on. - # a cheap and nasty way to break an email address up into parts + # a cheap and nasty way to break an email address up into parts - if ($email =~ /([^@]+)@(.+)/) { - print "Username is $1\n"; - print "Hostname is $2\n"; - } + if ($email =~ /([^@]+)@(.+)/) { + print "Username is $1\n"; + print "Hostname is $2\n"; + } =item Other regexp features @@ -623,15 +627,15 @@ L<perlretut>, and L<perlre>. Writing subroutines is easy: - sub logger { - my $logmessage = shift; - open my $logfile, ">>", "my.log" or die "Could not open my.log: $!"; - print $logfile $logmessage; - } + sub logger { + my $logmessage = shift; + open my $logfile, ">>", "my.log" or die "Could not open my.log: $!"; + print $logfile $logmessage; + } Now we can use the subroutine just as any other built-in function: - logger("We have a logger subroutine!"); + logger("We have a logger subroutine!"); What's that C<shift>? Well, the arguments to a subroutine are available to us as a special array called C<@_> (see L<perlvar> for more on that). @@ -641,20 +645,20 @@ arguments and assigns it to C<$logmessage>. We can manipulate C<@_> in other ways too: - my ($logmessage, $priority) = @_; # common - my $logmessage = $_[0]; # uncommon, and ugly + my ($logmessage, $priority) = @_; # common + my $logmessage = $_[0]; # uncommon, and ugly Subroutines can also return values: - sub square { - my $num = shift; - my $result = $num * $num; - return $result; - } + sub square { + my $num = shift; + my $result = $num * $num; + return $result; + } Then use it like: - $sq = square(8); + $sq = square(8); For more information on writing subroutines, see L<perlsub>. diff --git a/gnu/usr.bin/perl/pod/perliol.pod b/gnu/usr.bin/perl/pod/perliol.pod index 767fabdd7ff..a1ac2f0f331 100644 --- a/gnu/usr.bin/perl/pod/perliol.pod +++ b/gnu/usr.bin/perl/pod/perliol.pod @@ -685,7 +685,7 @@ Returns 0 on end-of-file, 1 if not end-of-file, -1 on error. Return error indicator. C<PerlIOBase_error()> is normally sufficient. -Returns 1 if there is an error (usually when C<PERLIO_F_ERROR> is set, +Returns 1 if there is an error (usually when C<PERLIO_F_ERROR> is set), 0 otherwise. =item Clearerr diff --git a/gnu/usr.bin/perl/pod/perllexwarn.pod b/gnu/usr.bin/perl/pod/perllexwarn.pod index e63135915b9..c6494dbbb73 100644 --- a/gnu/usr.bin/perl/pod/perllexwarn.pod +++ b/gnu/usr.bin/perl/pod/perllexwarn.pod @@ -212,107 +212,111 @@ to be enabled/disabled in isolation. The current hierarchy is: - all -+ - | - +- closure - | - +- deprecated - | - +- exiting - | - +- glob - | - +- io -----------+ - | | - | +- closed - | | - | +- exec - | | - | +- layer - | | - | +- newline - | | - | +- pipe - | | - | +- unopened - | - +- imprecision - | - +- misc - | - +- numeric - | - +- once - | - +- overflow - | - +- pack - | - +- portable - | - +- recursion - | - +- redefine - | - +- regexp - | - +- severe -------+ - | | - | +- debugging - | | - | +- inplace - | | - | +- internal - | | - | +- malloc - | - +- signal - | - +- substr - | - +- syntax -------+ - | | - | +- ambiguous - | | - | +- bareword - | | - | +- digit - | | - | +- illegalproto - | | - | +- parenthesis - | | - | +- precedence - | | - | +- printf - | | - | +- prototype - | | - | +- qw - | | - | +- reserved - | | - | +- semicolon - | - +- taint - | - +- threads - | - +- uninitialized - | - +- unpack - | - +- untie - | - +- utf8----------+ - | | - | +- surrogate - | | - | +- non_unicode - | | - | +- nonchar - | - +- void + all -+ + | + +- closure + | + +- deprecated + | + +- exiting + | + +- experimental --+ + | | + | +- experimental::lexical_subs + | + +- glob + | + +- imprecision + | + +- io ------------+ + | | + | +- closed + | | + | +- exec + | | + | +- layer + | | + | +- newline + | | + | +- pipe + | | + | +- unopened + | + +- misc + | + +- numeric + | + +- once + | + +- overflow + | + +- pack + | + +- portable + | + +- recursion + | + +- redefine + | + +- regexp + | + +- severe --------+ + | | + | +- debugging + | | + | +- inplace + | | + | +- internal + | | + | +- malloc + | + +- signal + | + +- substr + | + +- syntax --------+ + | | + | +- ambiguous + | | + | +- bareword + | | + | +- digit + | | + | +- illegalproto + | | + | +- parenthesis + | | + | +- precedence + | | + | +- printf + | | + | +- prototype + | | + | +- qw + | | + | +- reserved + | | + | +- semicolon + | + +- taint + | + +- threads + | + +- uninitialized + | + +- unpack + | + +- untie + | + +- utf8 ----------+ + | | + | +- non_unicode + | | + | +- nonchar + | | + | +- surrogate + | + +- void Just like the "strict" pragma any of these categories can be combined @@ -335,7 +339,6 @@ Note: In Perl 5.6.1, the lexical warnings category "deprecated" was a sub-category of the "syntax" category. It is now a top-level category in its own right. - =head2 Fatal Warnings X<warning, fatal> diff --git a/gnu/usr.bin/perl/pod/perlootut.pod b/gnu/usr.bin/perl/pod/perlootut.pod index b2e3500b358..e494f2314e3 100644 --- a/gnu/usr.bin/perl/pod/perlootut.pod +++ b/gnu/usr.bin/perl/pod/perlootut.pod @@ -10,7 +10,13 @@ perlootut - Object-Oriented Programming in Perl Tutorial =head1 DATE -This document was created in February, 2011. +This document was created in February, 2011, and the last major +revision was in February, 2013. + +If you are reading this in the future then it's possible that the state +of the art has changed. We recommend you start by reading the perlootut +document in the latest stable release of Perl, rather than this +version. =head1 DESCRIPTION @@ -218,8 +224,8 @@ Polymorphism is one of the key concepts of object-oriented design. =head2 Inheritance B<Inheritance> lets you create a specialized version of an existing -class. Inheritance lets the new class to reuse the methods and -attributes of another class. +class. Inheritance lets the new class reuse the methods and attributes +of another class. For example, we could create an C<File::MP3> class which B<inherits> from C<File>. An C<File::MP3> B<is-a> I<more specific> type of C<File>. @@ -576,27 +582,30 @@ compiler. If you need to install your software on a system without a compiler, or if having I<any> dependencies is a problem, then C<Moose> may not be right for you. -=head3 Mouse +=head3 Moo If you try C<Moose> and find that one of these issues is preventing you -from using C<Moose>, we encourage you to consider L<Mouse> next. -C<Mouse> implements a subset of C<Moose>'s functionality in a simpler -package. For all features that it does implement, the end-user API is -I<identical> to C<Moose>, meaning you can switch from C<Mouse> to +from using C<Moose>, we encourage you to consider L<Moo> next. C<Moo> +implements a subset of C<Moose>'s functionality in a simpler package. +For most features that it does implement, the end-user API is +I<identical> to C<Moose>, meaning you can switch from C<Moo> to C<Moose> quite easily. -C<Mouse> does not implement most of C<Moose>'s introspection API, so -it's often faster when loading your modules. Additionally, all of its -I<required> dependencies ship with the Perl core, and it can run -without a compiler. If you do have a compiler, C<Mouse> will use it to -compile some of its code for a speed boost. +C<Moo> does not implement most of C<Moose>'s introspection API, so it's +often faster when loading your modules. Additionally, none of its +dependencies require XS, so it can be installed on machines without a +compiler. + +One of C<Moo>'s most compelling features is its interoperability with +C<Moose>. When someone tries to use C<Moose>'s introspection API on a +C<Moo> class or role, it is transparently inflated into a C<Moose> +class or role. This makes it easier to incorporate C<Moo>-using code +into a C<Moose> code base and vice versa. -Finally, it ships with a C<Mouse::Tiny> module that takes most of -C<Mouse>'s features and bundles them up in a single module file. You -can copy this module file into your application's library directory for -easy bundling. +For example, a C<Moose> class can subclass a C<Moo> class using +C<extends> or consume a C<Moo> role using C<with>. -The C<Moose> authors hope that one day C<Mouse> can be made obsolete by +The C<Moose> authors hope that one day C<Moo> can be made obsolete by improving C<Moose> enough, but for now it provides a worthwhile alternative to C<Moose>. @@ -683,8 +692,8 @@ Here's a brief recap of the options we covered: =item * L<Moose> C<Moose> is the maximal option. It has a lot of features, a big -ecosystem, and a thriving user base. We also covered L<Mouse> briefly. -C<Mouse> is C<Moose> lite, and a reasonable alternative when Moose +ecosystem, and a thriving user base. We also covered L<Moo> briefly. +C<Moo> is C<Moose> lite, and a reasonable alternative when Moose doesn't work for your application. =item * L<Class::Accessor> diff --git a/gnu/usr.bin/perl/pod/perlpacktut.pod b/gnu/usr.bin/perl/pod/perlpacktut.pod index 2ce56622b75..b0b5bdfd7f4 100644 --- a/gnu/usr.bin/perl/pod/perlpacktut.pod +++ b/gnu/usr.bin/perl/pod/perlpacktut.pod @@ -85,7 +85,7 @@ with ASCII character coding, it will print C<0123456789>. Let's suppose you've got to read in a data file like this: Date |Description | Income|Expenditure - 01/24/2001 Ahmed's Camel Emporium 1147.99 + 01/24/2001 Zed's Camel Emporium 1147.99 01/28/2001 Flea spray 24.99 01/29/2001 Camel rides to tourists 235.00 @@ -200,7 +200,7 @@ how much we've spent: Oh, hmm. That didn't quite work. Let's see what happened: - 01/24/2001 Ahmed's Camel Emporium 1147.99 + 01/24/2001 Zed's Camel Emporium 1147.99 01/28/2001 Flea spray 24.99 01/29/2001 Camel rides to tourists 1235.00 03/23/2001Totals 1235.001172.98 @@ -225,7 +225,7 @@ additional spaces to line up our fields, like this: but they don't translate to spaces in the output.) Here's what we got this time: - 01/24/2001 Ahmed's Camel Emporium 1147.99 + 01/24/2001 Zed's Camel Emporium 1147.99 01/28/2001 Flea spray 24.99 01/29/2001 Camel rides to tourists 1235.00 03/23/2001 Totals 1235.00 1172.98 diff --git a/gnu/usr.bin/perl/pod/perlpodstyle.pod b/gnu/usr.bin/perl/pod/perlpodstyle.pod index 850f38dc8d9..6c4cfa04afc 100644 --- a/gnu/usr.bin/perl/pod/perlpodstyle.pod +++ b/gnu/usr.bin/perl/pod/perlpodstyle.pod @@ -5,70 +5,37 @@ perlpodstyle - Perl POD style guide =head1 DESCRIPTION These are general guidelines for how to write POD documentation for Perl -scripts and modules, based on general guidelines for writing good Unix man +scripts and modules, based on general guidelines for writing good UNIX man pages. All of these guidelines are, of course, optional, but following them will make your documentation more consistent with other documentation on the system. -Here are some simple guidelines for markup; see L<perlpod> for details. - -=over - -=item bold (BE<lt>E<gt>) - -B<NOTE: Use extremely rarely.> Do I<not> use bold for emphasis; that's -what italics are for. Restrict bold for notices like B<NOTE:> and -B<WARNING:>. However, program arguments and options--but I<not> their -names!--are written in bold (using BE<lt>E<gt>) to distinguish the B<-f> -command-line option from the C<-f> filetest operator. - -=item italic (IE<lt>E<gt>) - -Use italic to emphasize text, like I<this>. Function names are +The name of the program being documented is conventionally written in bold +(using BE<lt>E<gt>) wherever it occurs, as are all program options. +Arguments should be written in italics (IE<lt>E<gt>). Function names are traditionally written in italics; if you write a function as function(), -Pod::Man will take care of this for you. Names of programs, including the -name of the program being documented, are conventionally written in italics -(using IE<lt>E<gt>) wherever they occur in normal roman text. - -=item code (CE<lt>E<gt>) - -Literal code should be in CE<lt>E<gt>. However metasyntactic placeholders -should furthermore be nested in "italics" (actually, oblique) like -CE<lt>IE<lt>E<gt>E<gt>. That way -CE<lt>accept(IE<lt>NEWSOCKETE<gt>, E<lt>GENERICSOCKETE<gt>)E<gt> -renders as C<accept(I<NEWSOCKET>, I<GENERICSOCKET>)>. - -=item files (FE<lt>E<gt>) - -Filenames, whether absolute or relative, are specified with the FE<lt>E<gt> -markup. This will render as italics, but has other semantic connotations. - -=back - -References to other man pages should be in the form "manpage(section)" or -"C<LE<lt>manpage(section)E<gt>>", and Pod::Man will automatically format -those appropriately. Both will render as I<manpage>(section). The second -form, with LE<lt>E<gt>, is used to request that a POD formatter make a link -to the man page if possible. As an exception, one normally omits the -section when referring to module documentation because not all systems -place it in section 3, although that is the default. You may use -C<LE<lt>Module::NameE<gt>> for module references instead, but this is -optional because the translators are supposed to recognize module -references in pod, just as they do variable references like $foo and such. +Pod::Man will take care of this for you. Literal code or commands should +be in CE<lt>E<gt>. References to other man pages should be in the form +C<manpage(section)> or C<LE<lt>manpage(section)E<gt>>, and Pod::Man will +automatically format those appropriately. The second form, with +LE<lt>E<gt>, is used to request that a POD formatter make a link to the +man page if possible. As an exception, one normally omits the section +when referring to module documentation since it's not clear what section +module documentation will be in; use C<LE<lt>Module::NameE<gt>> for module +references instead. References to other programs or functions are normally in the form of man page references so that cross-referencing tools can provide the user with links and the like. It's possible to overdo this, though, so be careful not to clutter your documentation with too much markup. References to other programs that are not given as man page references should be enclosed in -italics via IE<lt>E<gt>. +BE<lt>E<gt>. -Major headers should be set out using a C<=head1> directive, and are -historically written in the rather startling ALL UPPER CASE format; this is -not mandatory, but it's strongly recommended so that sections have -consistent naming across different software packages. The translators are -supposed to translate all caps into small caps. Minor headers may be -included using C<=head2>, and are typically in mixed case. +The major headers should be set out using a C<=head1> directive, and are +historically written in the rather startling ALL UPPER CASE format; this +is not mandatory, but it's strongly recommended so that sections have +consistent naming across different software packages. Minor headers may +be included using C<=head2>, and are typically in mixed case. The standard sections of a manual page are: @@ -87,7 +54,7 @@ function documented by this POD page should be listed, separated by a comma and a space. For a Perl module, just give the module name. A single dash, and only a single dash, should separate the list of programs or functions from the description. Do not use any markup such as -CE<lt>E<gt> or IE<lt>E<gt> anywhere in this line. Functions should not be +CE<lt>E<gt> or BE<lt>E<gt> anywhere in this line. Functions should not be qualified with C<()> or the like. The description should ideally fit on a single line, even if a man program replaces the dash with a few tabs. @@ -229,7 +196,7 @@ Miscellaneous commentary. =item AUTHOR Who wrote it (use AUTHORS for multiple people). It's a good idea to -include your current email address (or some email address to which bug +include your current e-mail address (or some e-mail address to which bug reports should be sent) or some other contact information so that users have a way of contacting you. Remember that program documentation tends to roam the wild for far longer than you expect and pick a contact method @@ -294,22 +261,12 @@ handlers. These headings are primarily useful when documenting parts of a C library. Finally, as a general note, try not to use an excessive amount of markup. -As documented here and in L<Pod::Man>, you can safely leave Perl variables, -module names, function names, man page references, and the like unadorned -by markup, and the POD translators will figure it all out for you. This -makes it much easier to later edit the documentation. Note that many -existing translators will do the wrong thing with email addresses when -wrapped in LE<lt>E<gt>, so don't do that. - -You can check whether your documentation looks right by running - - % pod2text -o something.pod | less - -If you have I<groff> installed, you can get an even better look this way: - - % pod2man something.pod | groff -Tps -mandoc > something.ps - -Now view the resulting Postscript file to see whether everything checks out. +As documented here and in L<Pod::Man>, you can safely leave Perl +variables, function names, man page references, and the like unadorned by +markup and the POD translators will figure it out for you. This makes it +much easier to later edit the documentation. Note that many existing +translators will do the wrong thing with e-mail addresses when wrapped in +LE<lt>E<gt>, so don't do that. =head1 SEE ALSO diff --git a/gnu/usr.bin/perl/pod/perlpolicy.pod b/gnu/usr.bin/perl/pod/perlpolicy.pod index 7e713b4920a..ed02fca8852 100644 --- a/gnu/usr.bin/perl/pod/perlpolicy.pod +++ b/gnu/usr.bin/perl/pod/perlpolicy.pod @@ -207,11 +207,15 @@ an experimental feature useful and want to help shape its future. If something in the Perl core is marked as B<deprecated>, we may remove it from the core in the next stable release series, though we may not. As of Perl 5.12, deprecated features and modules warn the user as they're used. -If you use a deprecated feature and believe that its removal from the Perl -core would be a mistake, please contact the perl5-porters mailinglist and -plead your case. We don't deprecate things without a good reason, but -sometimes there's a counterargument we haven't considered. Historically, -we did not distinguish between "deprecated" and "discouraged" features. +When a module is deprecated, it will also be made available on CPAN. +Installing it from CPAN will silence deprecation warnings for that module. + +If you use a deprecated feature or module and believe that its removal from +the Perl core would be a mistake, please contact the perl5-porters +mailinglist and plead your case. We don't deprecate things without a good +reason, but sometimes there's a counterargument we haven't considered. +Historically, we did not distinguish between "deprecated" and "discouraged" +features. =item discouraged @@ -225,7 +229,8 @@ significant improvement to the Perl core. Once a feature, construct or module has been marked as deprecated for a stable release cycle, we may remove it from the Perl core. Unsurprisingly, -we say we've B<removed> these things. +we say we've B<removed> these things. When a module is removed, it will +no longer ship with Perl, but will continue to be available on CPAN. =back @@ -306,12 +311,12 @@ talk to a pumpking.) =head2 Getting changes into a maint branch Historically, only the pumpking cherry-picked changes from bleadperl -into maintperl. This has...scaling problems. At the same time, +into maintperl. This has scaling problems. At the same time, maintenance branches of stable versions of Perl need to be treated with -great care. To that end, we're going to try out a new process for -maint-5.12. +great care. To that end, as of Perl 5.12, we have a new process for +maint branches. -Any committer may cherry-pick any commit from blead to maint-5.12 if +Any committer may cherry-pick any commit from blead to a maint branch if they send mail to perl5-porters announcing their intent to cherry-pick a specific commit along with a rationale for doing so and at least two other committers respond to the list giving their assent. (This policy diff --git a/gnu/usr.bin/perl/pod/perlport.pod b/gnu/usr.bin/perl/pod/perlport.pod index 867b66e2915..cdde52db3cb 100644 --- a/gnu/usr.bin/perl/pod/perlport.pod +++ b/gnu/usr.bin/perl/pod/perlport.pod @@ -212,7 +212,7 @@ them in big-endian mode. To avoid this problem in network (socket) connections use the C<pack> and C<unpack> formats C<n> and C<N>, the "network" orders. These are guaranteed to be portable. -As of perl 5.9.2, you can also use the C<E<gt>> and C<E<lt>> modifiers +As of perl 5.10.0, you can also use the C<E<gt>> and C<E<lt>> modifiers to force big- or little-endian byte-order. This is useful if you want to store signed integers or 64-bit integers, for example. @@ -236,9 +236,9 @@ transferring or storing raw binary numbers. One can circumnavigate both these problems in two ways. Either transfer and store numbers always in text format, instead of raw -binary, or else consider using modules like Data::Dumper (included in -the standard distribution as of Perl 5.005) and Storable (included as -of perl 5.8). Keeping all data as text significantly simplifies matters. +binary, or else consider using modules like Data::Dumper and Storable +(included as of perl 5.8). Keeping all data as text significantly +simplifies matters. The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's how far EBCDIC, or more precisely UTF-EBCDIC will go. @@ -679,9 +679,8 @@ ISO 8859-1 bytes beyond 0x7f into your strings might cause trouble later. If the bytes are native 8-bit bytes, you can use the C<bytes> pragma. If the bytes are in a string (regular expression being a curious string), you can often also use the C<\xHH> notation instead -of embedding the bytes as-is. (If you want to write your code in UTF-8, -you can use the C<utf8>.) The C<bytes> and C<utf8> pragmata are -available since Perl 5.6.0. +of embedding the bytes as-is. If you want to write your code in UTF-8, +you can use the C<utf8>. =head2 System Resources @@ -777,8 +776,8 @@ Testing results: L<http://www.cpantesters.org/> =head1 PLATFORMS -As of version 5.002, Perl is built with a C<$^O> variable that -indicates the operating system it was built on. This was implemented +Perl is built with a C<$^O> variable that indicates the operating +system it was built on. This was implemented to help speed up code that would otherwise have to C<use Config> and use the value of C<$Config{osname}>. Of course, to get more detailed information about the system, looking into C<%Config> is @@ -1196,12 +1195,12 @@ trailing apostrophe. Although an extended file name is limited to 255 characters, a path name is still limited to 256 characters. -The value of C<$^O> on VOS is "VOS". To determine the +The value of C<$^O> on VOS is "vos". To determine the architecture that you are running on without resorting to loading all of C<%Config> you can examine the content of the @INC array like so: - if ($^O =~ /VOS/) { + if ($^O =~ /vos/) { print "I'm on a Stratus box!\n"; } else { print "I'm not on a Stratus box!\n"; @@ -1220,14 +1219,18 @@ F<README.vos> (installed as L<perlvos>) The VOS mailing list. -There is no specific mailing list for Perl on VOS. You can post -comments to the comp.sys.stratus newsgroup, or use the contact -information located in the distribution files on the Stratus -Anonymous FTP site. +There is no specific mailing list for Perl on VOS. You can contact +the Stratus Technologies Customer Assistance Center (CAC) for your +region, or you can use the contact information located in the +distribution files on the Stratus Anonymous FTP site. =item * -VOS Perl on the web at L<http://ftp.stratus.com/pub/vos/posix/posix.html> +Stratus Technologies on the web at L<http://www.stratus.com> + +=item * + +VOS Open-Source Software on the web at L<http://ftp.stratus.com/pub/vos/vos.html> =back @@ -1241,7 +1244,7 @@ systems). On the mainframe perl currently works under the "Unix system services for OS/390" (formerly known as OpenEdition), VM/ESA OpenEdition, or the BS200 POSIX-BC system (BS2000 is supported in perl 5.6 and greater). See L<perlos390> for details. Note that for OS/400 there is also a port of -Perl 5.8.1/5.9.0 or later to the PASE which is ASCII-based (as opposed to +Perl 5.8.1/5.10.0 or later to the PASE which is ASCII-based (as opposed to ILE which is EBCDIC-based), see L<perlos400>. As of R2.5 of USS for OS/390 and Version 2.3 of VM/ESA these Unix @@ -1280,7 +1283,7 @@ and C<|>, not to mention dealing with socket interfaces to ASCII computers Fortunately, most web servers for the mainframe will correctly translate the C<\n> in the following statement to its ASCII equivalent -(C<\r> is the same under both Unix and OS/390 & VM/ESA): +(C<\r> is the same under both Unix and OS/390): print "Content-type: text/html\r\n\r\n"; @@ -1291,7 +1294,6 @@ The values of C<$^O> on some of these platforms includes: OS/390 os390 os390 OS400 os400 os400 POSIX-BC posix-bc BS2000-posix-bc - VM/ESA vmesa vmesa Some simple tricks for determining if you are running on an EBCDIC platform could include any of the following (perhaps all): @@ -1313,8 +1315,7 @@ Also see: =item * -L<perlos390>, F<README.os390>, F<perlbs2000>, F<README.vmesa>, -L<perlebcdic>. +L<perlos390>, F<README.os390>, F<perlbs2000>, L<perlebcdic>. =item * @@ -1436,7 +1437,7 @@ in C<$^O> is "riscos" (because we don't like shouting). =head2 Other perls Perl has been ported to many platforms that do not fit into any of -the categories listed above. Some, such as AmigaOS, BeOS, HP MPE/iX, +the categories listed above. Some, such as AmigaOS, QNX, Plan 9, and VOS, have been well-integrated into the standard Perl source code kit. You may need to see the F<ports/> directory on CPAN for information, and possibly binaries, for the likes of: @@ -1450,8 +1451,6 @@ in the "OTHER" category include: OS $^O $Config{'archname'} ------------------------------------------ Amiga DOS amigaos m68k-amigos - BeOS beos - MPE/iX mpeix PA-RISC1.1 See also: @@ -1463,15 +1462,6 @@ Amiga, F<README.amiga> (installed as L<perlamiga>). =item * -Be OS, F<README.beos> - -=item * - -HP 300 MPE/iX, F<README.mpeix> and Mark Bixby's web page -L<http://www.bixby.org/mark/porting.html> - -=item * - A free perl5-based PERL.NLM for Novell Netware is available in precompiled binary and source code form from L<http://www.novell.com/> as well as from CPAN. @@ -1586,7 +1576,7 @@ A little funky, because VOS's notion of ownership is a little funky (VOS). =item chroot -Not implemented. (Win32, VMS, S<Plan 9>, S<RISC OS>, VOS, VM/ESA) +Not implemented. (Win32, VMS, S<Plan 9>, S<RISC OS>, VOS) =item crypt @@ -1611,8 +1601,6 @@ Invokes VMS debugger. (VMS) =item exec -Implemented via Spawn. (VM/ESA) - Does not automatically flush output handles on some platforms. (SunOS, Solaris, HP-UX) @@ -1651,7 +1639,7 @@ Not implemented (VMS, S<RISC OS>, VOS). =item fork -Not implemented. (AmigaOS, S<RISC OS>, VM/ESA, VMS) +Not implemented. (AmigaOS, S<RISC OS>, VMS) Emulated using multiple interpreters. See L<perlfork>. (Win32) @@ -1672,7 +1660,7 @@ Not implemented. (Win32, S<RISC OS>) =item getpriority -Not implemented. (Win32, VMS, S<RISC OS>, VOS, VM/ESA) +Not implemented. (Win32, VMS, S<RISC OS>, VOS) =item getpwnam @@ -1708,11 +1696,11 @@ Not implemented. (Win32, S<Plan 9>) =item getpwent -Not implemented. (Win32, VM/ESA) +Not implemented. (Win32) =item getgrent -Not implemented. (Win32, VMS, VM/ESA) +Not implemented. (Win32, VMS) =item gethostbyname @@ -1753,11 +1741,11 @@ Not implemented. (S<Plan 9>, Win32, S<RISC OS>) =item endpwent -Not implemented. (MPE/iX, VM/ESA, Win32) +Not implemented. (Win32) =item endgrent -Not implemented. (MPE/iX, S<RISC OS>, VM/ESA, VMS, Win32) +Not implemented. (S<RISC OS>, VMS, Win32) =item endhostent @@ -1824,7 +1812,7 @@ numbers. (VMS) =item link -Not implemented. (MPE/iX, S<RISC OS>, VOS) +Not implemented. (S<RISC OS>, VOS) Link count not updated because hard links are not quite that hard (They are sort of half-way between hard and soft links). (AmigaOS) @@ -1898,7 +1886,7 @@ Not implemented. (Win32, VMS, S<RISC OS>) =item setgrent -Not implemented. (MPE/iX, VMS, Win32, S<RISC OS>) +Not implemented. (VMS, Win32, S<RISC OS>) =item setpgrp @@ -1910,7 +1898,7 @@ Not implemented. (Win32, VMS, S<RISC OS>, VOS) =item setpwent -Not implemented. (MPE/iX, Win32, S<RISC OS>) +Not implemented. (Win32, S<RISC OS>) =item setsockopt @@ -1924,7 +1912,13 @@ Not implemented. (S<Plan 9>) =item shmwrite -Not implemented. (Win32, VMS, S<RISC OS>, VOS) +Not implemented. (Win32, VMS, S<RISC OS>) + +=item sleep + +Emulated using synchronization functions such that it can be +interrupted by alarm(), and limited to a maximum of 4294967 seconds, +approximately 49 days. (Win32) =item sockatmark @@ -1933,9 +1927,7 @@ be implemented even in Unix platforms. =item socketpair -Not implemented. (S<RISC OS>, VM/ESA) - -Available on OpenVOS Release 17.0 or later. (VOS) +Not implemented. (S<RISC OS>) Available on 64 bit OpenVMS 8.2 and later. (VMS) @@ -1976,14 +1968,14 @@ syntax if it is intended to resolve to a valid path. =item syscall -Not implemented. (Win32, VMS, S<RISC OS>, VOS, VM/ESA) +Not implemented. (Win32, VMS, S<RISC OS>, VOS) =item sysopen The traditional "0", "1", and "2" MODEs are implemented with different numeric values on some systems. The flags exported by C<Fcntl> (O_RDONLY, O_WRONLY, O_RDWR) should work everywhere though. (S<Mac -OS>, OS/390, VM/ESA) +OS>, OS/390) =item system @@ -2038,14 +2030,14 @@ should not be held open elsewhere. (Win32) =item umask -Returns undef where unavailable, as of version 5.005. +Returns undef where unavailable. C<umask> works but the correct permissions are set only when the file is finally closed. (AmigaOS) =item utime -Only the modification time is updated. (S<BeOS>, VMS, S<RISC OS>) +Only the modification time is updated. (VMS, S<RISC OS>) May not behave as expected. Behavior depends on the C runtime library's implementation of utime(), and the filesystem being @@ -2127,10 +2119,14 @@ at L<http://www.cpan.org/src> =item Dragonfly BSD +=item Midnight BSD + =item QNX Neutrino RTOS (6.5.0) =item MirOS BSD +=item Stratus OpenVOS (17.0 or later) + Caveats: =over @@ -2221,7 +2217,7 @@ available at L<http://www.cpan.org/src/> UNICOS UNICOS/mk UTS - VOS + VOS / OpenVOS Win95/98/ME/2K/XP 2) WinCE z/OS (formerly OS/390) @@ -2312,14 +2308,13 @@ L<http://www.cpan.org/ports/index.html> for binary distributions. =head1 SEE ALSO -L<perlaix>, L<perlamiga>, L<perlbeos>, L<perlbs2000>, -L<perlce>, L<perlcygwin>, L<perldgux>, L<perldos>, L<perlepoc>, +L<perlaix>, L<perlamiga>, L<perlbs2000>, +L<perlce>, L<perlcygwin>, L<perldgux>, L<perldos>, L<perlebcdic>, L<perlfreebsd>, L<perlhurd>, L<perlhpux>, L<perlirix>, -L<perlmacos>, L<perlmacosx>, L<perlmpeix>, +L<perlmacos>, L<perlmacosx>, L<perlnetware>, L<perlos2>, L<perlos390>, L<perlos400>, L<perlplan9>, L<perlqnx>, L<perlsolaris>, L<perltru64>, -L<perlunicode>, L<perlvmesa>, L<perlvms>, L<perlvos>, -L<perlwin32>, and L<Win32>. +L<perlunicode>, L<perlvms>, L<perlvos>, L<perlwin32>, and L<Win32>. =head1 AUTHORS / CONTRIBUTORS diff --git a/gnu/usr.bin/perl/pod/perlpragma.pod b/gnu/usr.bin/perl/pod/perlpragma.pod index 604387d9f97..78dacbf1741 100644 --- a/gnu/usr.bin/perl/pod/perlpragma.pod +++ b/gnu/usr.bin/perl/pod/perlpragma.pod @@ -16,22 +16,22 @@ mathematical operators, and would like to provide your own pragma that functions much like C<use integer;> You'd like this code use MyMaths; - + my $l = MyMaths->new(1.2); my $r = MyMaths->new(3.4); - + print "A: ", $l + $r, "\n"; - + use myint; print "B: ", $l + $r, "\n"; - + { no myint; print "C: ", $l + $r, "\n"; } - + print "D: ", $l + $r, "\n"; - + no myint; print "E: ", $l + $r, "\n"; @@ -63,12 +63,12 @@ this: $$l + $$r; } }; - + sub new { my ($class, $value) = @_; bless \$value, $class; } - + 1; Note how we load the user pragma C<myint> with an empty list C<()> to @@ -77,24 +77,24 @@ prevent its C<import> being called. The interaction with the Perl compilation happens inside package C<myint>: package myint; - + use strict; use warnings; - + sub import { $^H{"myint/in_effect"} = 1; } - + sub unimport { $^H{"myint/in_effect"} = 0; } - + sub in_effect { my $level = shift // 0; my $hinthash = (caller($level))[10]; return $hinthash->{"myint/in_effect"}; } - + 1; As pragmata are implemented as modules, like any other module, C<use myint;> diff --git a/gnu/usr.bin/perl/pod/perlreapi.pod b/gnu/usr.bin/perl/pod/perlreapi.pod index 5e456208684..eaaa1790d56 100644 --- a/gnu/usr.bin/perl/pod/perlreapi.pod +++ b/gnu/usr.bin/perl/pod/perlreapi.pod @@ -1,39 +1,57 @@ =head1 NAME -perlreapi - perl regular expression plugin interface +perlreapi - Perl regular expression plugin interface =head1 DESCRIPTION -As of Perl 5.9.5 there is a new interface for plugging and using other -regular expression engines than the default one. +As of Perl 5.9.5 there is a new interface for plugging and using +regular expression engines other than the default one. Each engine is supposed to provide access to a constant structure of the following format: typedef struct regexp_engine { - REGEXP* (*comp) (pTHX_ const SV * const pattern, const U32 flags); - I32 (*exec) (pTHX_ REGEXP * const rx, char* stringarg, char* strend, - char* strbeg, I32 minend, SV* screamer, + REGEXP* (*comp) (pTHX_ + const SV * const pattern, const U32 flags); + I32 (*exec) (pTHX_ + REGEXP * const rx, + char* stringarg, + char* strend, char* strbeg, + I32 minend, SV* screamer, void* data, U32 flags); - char* (*intuit) (pTHX_ REGEXP * const rx, SV *sv, char *strpos, - char *strend, U32 flags, + char* (*intuit) (pTHX_ + REGEXP * const rx, SV *sv, + char *strpos, char *strend, U32 flags, struct re_scream_pos_data_s *data); SV* (*checkstr) (pTHX_ REGEXP * const rx); void (*free) (pTHX_ REGEXP * const rx); - void (*numbered_buff_FETCH) (pTHX_ REGEXP * const rx, const I32 paren, - SV * const sv); - void (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren, - SV const * const value); - I32 (*numbered_buff_LENGTH) (pTHX_ REGEXP * const rx, const SV * const sv, - const I32 paren); - SV* (*named_buff) (pTHX_ REGEXP * const rx, SV * const key, - SV * const value, U32 flags); - SV* (*named_buff_iter) (pTHX_ REGEXP * const rx, const SV * const lastkey, + void (*numbered_buff_FETCH) (pTHX_ + REGEXP * const rx, + const I32 paren, + SV * const sv); + void (*numbered_buff_STORE) (pTHX_ + REGEXP * const rx, + const I32 paren, + SV const * const value); + I32 (*numbered_buff_LENGTH) (pTHX_ + REGEXP * const rx, + const SV * const sv, + const I32 paren); + SV* (*named_buff) (pTHX_ + REGEXP * const rx, + SV * const key, + SV * const value, + U32 flags); + SV* (*named_buff_iter) (pTHX_ + REGEXP * const rx, + const SV * const lastkey, const U32 flags); SV* (*qr_package)(pTHX_ REGEXP * const rx); #ifdef USE_ITHREADS void* (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param); #endif + REGEXP* (*op_comp) (...); + When a regexp is compiled, its C<engine> field is then set to point at the appropriate structure, so that when it needs to be used Perl can find @@ -41,11 +59,11 @@ the right routines to do so. In order to install a new regexp handler, C<$^H{regcomp}> is set to an integer which (when casted appropriately) resolves to one of these -structures. When compiling, the C<comp> method is executed, and the -resulting regexp structure's engine field is expected to point back at +structures. When compiling, the C<comp> method is executed, and the +resulting C<regexp> structure's engine field is expected to point back at the same structure. -The pTHX_ symbol in the definition is a macro used by perl under threading +The pTHX_ symbol in the definition is a macro used by Perl under threading to provide an extra argument to the routine holding a pointer back to the interpreter that is executing the regexp. So under threading all routines get an extra argument. @@ -58,43 +76,43 @@ routines get an extra argument. Compile the pattern stored in C<pattern> using the given C<flags> and return a pointer to a prepared C<REGEXP> structure that can perform -the match. See L</The REGEXP structure> below for an explanation of +the match. See L</The REGEXP structure> below for an explanation of the individual fields in the REGEXP struct. The C<pattern> parameter is the scalar that was used as the -pattern. previous versions of perl would pass two C<char*> indicating -the start and end of the stringified pattern, the following snippet can +pattern. Previous versions of Perl would pass two C<char*> indicating +the start and end of the stringified pattern; the following snippet can be used to get the old parameters: STRLEN plen; char* exp = SvPV(pattern, plen); char* xend = exp + plen; -Since any scalar can be passed as a pattern it's possible to implement +Since any scalar can be passed as a pattern, it's possible to implement an engine that does something with an array (C<< "ook" =~ [ qw/ eek hlagh / ] >>) or with the non-stringified form of a compiled regular -expression (C<< "ook" =~ qr/eek/ >>). perl's own engine will always -stringify everything using the snippet above but that doesn't mean +expression (C<< "ook" =~ qr/eek/ >>). Perl's own engine will always +stringify everything using the snippet above, but that doesn't mean other engines have to. The C<flags> parameter is a bitfield which indicates which of the -C<msixp> flags the regex was compiled with. It also contains -additional info such as whether C<use locale> is in effect. +C<msixp> flags the regex was compiled with. It also contains +additional info, such as if C<use locale> is in effect. The C<eogc> flags are stripped out before being passed to the comp -routine. The regex engine does not need to know whether any of these -are set as those flags should only affect what perl does with the +routine. The regex engine does not need to know if any of these +are set, as those flags should only affect what Perl does with the pattern and its match variables, not how it gets compiled and executed. By the time the comp callback is called, some of these flags have -already had effect (noted below where applicable). However most of -their effect occurs after the comp callback has run in routines that +already had effect (noted below where applicable). However most of +their effect occurs after the comp callback has run, in routines that read the C<< rx->extflags >> field which it populates. In general the flags should be preserved in C<< rx->extflags >> after compilation, although the regex engine might want to add or delete -some of them to invoke or disable some special behavior in perl. The +some of them to invoke or disable some special behavior in Perl. The flags along with any special behavior they cause are documented below: The pattern modifiers: @@ -113,7 +131,7 @@ as a multi-line string. =item C</x> - RXf_PMf_EXTENDED -If present on a regex C<#> comments will be handled differently by the +If present on a regex, C<"#"> comments will be handled differently by the tokenizer in some cases. TODO: Document those cases. @@ -131,7 +149,7 @@ C<get_regex_charset(const U32 flags)>. The only currently documented value returned from it is REGEX_LOCALE_CHARSET, which is set if C<use locale> is in effect. If present in C<< rx->extflags >>, C<split> will use the locale dependent definition of whitespace -when RXf_SKIPWHITE or RXf_WHITE is in effect. ASCII whitespace +when RXf_SKIPWHITE or RXf_WHITE is in effect. ASCII whitespace is defined as per L<isSPACE|perlapi/isSPACE>, and by the internal macros C<is_utf8_space> under UTF-8, and C<isSPACE_LC> under C<use locale>. @@ -142,21 +160,16 @@ Additional flags: =over 4 -=item RXf_UTF8 - -Set if the pattern is L<SvUTF8()|perlapi/SvUTF8>, set by Perl_pmruntime. - -A regex engine may want to set or disable this flag during -compilation. The perl engine for instance may upgrade non-UTF-8 -strings to UTF-8 if the pattern includes constructs such as C<\x{...}> -that can only match Unicode values. - =item RXf_SPLIT +This flag was removed in perl 5.18.0. C<split ' '> is now special-cased +solely in the parser. RXf_SPLIT is still #defined, so you can test for it. +This is how it used to work: + If C<split> is invoked as C<split ' '> or with no arguments (which -really means C<split(' ', $_)>, see L<split|perlfunc/split>), perl will -set this flag. The regex engine can then check for it and set the -SKIPWHITE and WHITE extflags. To do this the perl engine does: +really means C<split(' ', $_)>, see L<split|perlfunc/split>), Perl will +set this flag. The regex engine can then check for it and set the +SKIPWHITE and WHITE extflags. To do this, the Perl engine does: if (flags & RXf_SPLIT && r->prelen == 1 && r->precomp[0] == ' ') r->extflags |= (RXf_SKIPWHITE|RXf_WHITE); @@ -170,13 +183,16 @@ the C<split> operator. =item RXf_SKIPWHITE +This flag was removed in perl 5.18.0. It is still #defined, so you can +set it, but doing so will have no effect. This is how it used to work: + If the flag is present in C<< rx->extflags >> C<split> will delete whitespace from the start of the subject string before it's operated -on. What is considered whitespace depends on whether the subject is a -UTF-8 string and whether the C<RXf_PMf_LOCALE> flag is set. +on. What is considered whitespace depends on if the subject is a +UTF-8 string and if the C<RXf_PMf_LOCALE> flag is set. -If RXf_WHITE is set in addition to this flag C<split> will behave like -C<split " "> under the perl engine. +If RXf_WHITE is set in addition to this flag, C<split> will behave like +C<split " "> under the Perl engine. =item RXf_START_ONLY @@ -184,29 +200,37 @@ Tells the split operator to split the target string on newlines (C<\n>) without invoking the regex engine. Perl's engine sets this if the pattern is C</^/> (C<plen == 1 && *exp -== '^'>), even under C</^/s>, see L<split|perlfunc>. Of course a +== '^'>), even under C</^/s>; see L<split|perlfunc>. Of course a different regex engine might want to use the same optimizations with a different syntax. =item RXf_WHITE Tells the split operator to split the target string on whitespace -without invoking the regex engine. The definition of whitespace varies -depending on whether the target string is a UTF-8 string and on -whether RXf_PMf_LOCALE is set. +without invoking the regex engine. The definition of whitespace varies +depending on if the target string is a UTF-8 string and on +if RXf_PMf_LOCALE is set. Perl's engine sets this flag if the pattern is C<\s+>. =item RXf_NULL Tells the split operator to split the target string on -characters. The definition of character varies depending on whether +characters. The definition of character varies depending on if the target string is a UTF-8 string. Perl's engine sets this flag on empty patterns, this optimization -makes C<split //> much faster than it would otherwise be. It's even +makes C<split //> much faster than it would otherwise be. It's even faster than C<unpack>. +=item RXf_NO_INPLACE_SUBST + +Added in perl 5.18.0, this flag indicates that a regular expression might +perform an operation that would interfere with inplace substituion. For +instance it might contain lookbehind, or assign to non-magical variables +(such as $REGMARK and $REGERROR) during matching. C<s///> will skip +certain optimisations when this is set. + =back =head2 exec @@ -216,7 +240,49 @@ faster than C<unpack>. I32 minend, SV* screamer, void* data, U32 flags); -Execute a regexp. +Execute a regexp. The arguments are + +=over 4 + +=item rx + +The regular expression to execute. + +=item screamer + +This strangely-named arg is the SV to be matched against. Note that the +actual char array to be matched against is supplied by the arguments +described below; the SV is just used to determine UTF8ness, C<pos()> etc. + +=item strbeg + +Pointer to the physical start of the string. + +=item strend + +Pointer to the character following the physical end of the string (i.e. +the C<\0>). + +=item stringarg + +Pointer to the position in the string where matching should start; it might +not be equal to C<strbeg> (for example in a later iteration of C</.../g>). + +=item minend + +Minimum length of string (measured in bytes from C<stringarg>) that must +match; if the engine reaches the end of the match but hasn't reached this +position in the string, it should fail. + +=item data + +Optimisation data; subject to change. + +=item flags + +Optimisation flags; subject to change. + +=back =head2 intuit @@ -225,9 +291,9 @@ Execute a regexp. const U32 flags, struct re_scream_pos_data_s *data); Find the start position where a regex match should be attempted, -or possibly whether the regex engine should not be run because the -pattern can't match. This is called as appropriate by the core -depending on the values of the extflags member of the regexp +or possibly if the regex engine should not be run because the +pattern can't match. This is called, as appropriate, by the core, +depending on the values of the C<extflags> member of the C<regexp> structure. =head2 checkstr @@ -241,10 +307,10 @@ by C<split> for optimising matches. void free(pTHX_ REGEXP * const rx); -Called by perl when it is freeing a regexp pattern so that the engine +Called by Perl when it is freeing a regexp pattern so that the engine can release any resources pointed to by the C<pprivate> member of the -regexp structure. This is only responsible for freeing private data; -perl will handle releasing anything else contained in the regexp structure. +C<regexp> structure. This is only responsible for freeing private data; +Perl will handle releasing anything else contained in the C<regexp> structure. =head2 Numbered capture callbacks @@ -252,11 +318,22 @@ Called to get/set the value of C<$`>, C<$'>, C<$&> and their named equivalents, ${^PREMATCH}, ${^POSTMATCH} and $^{MATCH}, as well as the numbered capture groups (C<$1>, C<$2>, ...). -The C<paren> parameter will be C<-2> for C<$`>, C<-1> for C<$'>, C<0> -for C<$&>, C<1> for C<$1> and so forth. +The C<paren> parameter will be C<1> for C<$1>, C<2> for C<$2> and so +forth, and have these symbolic values for the special variables: + + ${^PREMATCH} RX_BUFF_IDX_CARET_PREMATCH + ${^POSTMATCH} RX_BUFF_IDX_CARET_POSTMATCH + ${^MATCH} RX_BUFF_IDX_CARET_FULLMATCH + $` RX_BUFF_IDX_PREMATCH + $' RX_BUFF_IDX_POSTMATCH + $& RX_BUFF_IDX_FULLMATCH + +Note that in Perl 5.17.3 and earlier, the last three constants were also +used for the caret variants of the variables. + The names have been chosen by analogy with L<Tie::Scalar> methods -names with an additional B<LENGTH> callback for efficiency. However +names with an additional B<LENGTH> callback for efficiency. However named capture variables are currently not tied internally but implemented via magic. @@ -265,25 +342,27 @@ implemented via magic. void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren, SV * const sv); -Fetch a specified numbered capture. C<sv> should be set to the scalar +Fetch a specified numbered capture. C<sv> should be set to the scalar to return, the scalar is passed as an argument rather than being -returned from the function because when it's called perl already has a +returned from the function because when it's called Perl already has a scalar to store the value, creating another one would be -redundant. The scalar can be set with C<sv_setsv>, C<sv_setpvn> and +redundant. The scalar can be set with C<sv_setsv>, C<sv_setpvn> and friends, see L<perlapi>. -This callback is where perl untaints its own capture variables under -taint mode (see L<perlsec>). See the C<Perl_reg_numbered_buff_fetch> +This callback is where Perl untaints its own capture variables under +taint mode (see L<perlsec>). See the C<Perl_reg_numbered_buff_fetch> function in F<regcomp.c> for how to untaint capture variables if that's something you'd like your engine to do as well. =head3 numbered_buff_STORE - void (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren, + void (*numbered_buff_STORE) (pTHX_ + REGEXP * const rx, + const I32 paren, SV const * const value); -Set the value of a numbered capture variable. C<value> is the scalar -that is to be used as the new value. It's up to the engine to make +Set the value of a numbered capture variable. C<value> is the scalar +that is to be used as the new value. It's up to the engine to make sure this is used as the new value (or reject it). Example: @@ -298,8 +377,10 @@ variables, to do this in another engine use the following callback (copied from C<Perl_reg_numbered_buff_store>): void - Example_reg_numbered_buff_store(pTHX_ REGEXP * const rx, const I32 paren, - SV const * const value) + Example_reg_numbered_buff_store(pTHX_ + REGEXP * const rx, + const I32 paren, + SV const * const value) { PERL_UNUSED_ARG(rx); PERL_UNUSED_ARG(paren); @@ -309,10 +390,10 @@ variables, to do this in another engine use the following callback Perl_croak(aTHX_ PL_no_modify); } -Actually perl will not I<always> croak in a statement that looks -like it would modify a numbered capture variable. This is because the -STORE callback will not be called if perl can determine that it -doesn't have to modify the value. This is exactly how tied variables +Actually Perl will not I<always> croak in a statement that looks +like it would modify a numbered capture variable. This is because the +STORE callback will not be called if Perl can determine that it +doesn't have to modify the value. This is exactly how tied variables behave in the same situation: package CaptureVar; @@ -327,21 +408,23 @@ behave in the same situation: tie my $sv => "CaptureVar"; $sv =~ y/a/b/; -Because C<$sv> is C<undef> when the C<y///> operator is applied to it +Because C<$sv> is C<undef> when the C<y///> operator is applied to it, the transliteration won't actually execute and the program won't -C<die>. This is different to how 5.8 and earlier versions behaved -since the capture variables were READONLY variables then, now they'll +C<die>. This is different to how 5.8 and earlier versions behaved +since the capture variables were READONLY variables then; now they'll just die when assigned to in the default engine. =head3 numbered_buff_LENGTH - I32 numbered_buff_LENGTH (pTHX_ REGEXP * const rx, const SV * const sv, + I32 numbered_buff_LENGTH (pTHX_ + REGEXP * const rx, + const SV * const sv, const I32 paren); -Get the C<length> of a capture variable. There's a special callback -for this so that perl doesn't have to do a FETCH and run C<length> on -the result, since the length is (in perl's case) known from an offset -stored in C<< rx->offs >> this is much more efficient: +Get the C<length> of a capture variable. There's a special callback +for this so that Perl doesn't have to do a FETCH and run C<length> on +the result, since the length is (in Perl's case) known from an offset +stored in C<< rx->offs >>, this is much more efficient: I32 s1 = rx->offs[paren].start; I32 s2 = rx->offs[paren].end; @@ -353,7 +436,7 @@ L<is_utf8_string_loclen|perlapi/is_utf8_string_loclen>. =head2 Named capture callbacks -Called to get/set the value of C<%+> and C<%-> as well as by some +Called to get/set the value of C<%+> and C<%->, as well as by some utility functions in L<re>. There are two callbacks, C<named_buff> is called in all the cases the @@ -362,7 +445,7 @@ would be on changes to C<%+> and C<%-> and C<named_buff_iter> in the same cases as FIRSTKEY and NEXTKEY. The C<flags> parameter can be used to determine which of these -operations the callbacks should respond to, the following flags are +operations the callbacks should respond to. The following flags are currently defined: Which L<Tie::Hash> operation is being performed from the Perl level on @@ -377,13 +460,13 @@ C<%+> or C<%+>, if any: RXapif_FIRSTKEY RXapif_NEXTKEY -Whether C<%+> or C<%-> is being operated on, if any. +If C<%+> or C<%-> is being operated on, if any. RXapif_ONE /* %+ */ RXapif_ALL /* %- */ -Whether this is being called as C<re::regname>, C<re::regnames> or -C<re::regnames_count>, if any. The first two will be combined with +If this is being called as C<re::regname>, C<re::regnames> or +C<re::regnames_count>, if any. The first two will be combined with C<RXapif_ONE> or C<RXapif_ALL>. RXapif_REGNAME @@ -391,10 +474,10 @@ C<RXapif_ONE> or C<RXapif_ALL>. RXapif_REGNAMES_COUNT Internally C<%+> and C<%-> are implemented with a real tied interface -via L<Tie::Hash::NamedCapture>. The methods in that package will call -back into these functions. However the usage of +via L<Tie::Hash::NamedCapture>. The methods in that package will call +back into these functions. However the usage of L<Tie::Hash::NamedCapture> for this purpose might change in future -releases. For instance this might be implemented by magic instead +releases. For instance this might be implemented by magic instead (would need an extension to mgvtbl). =head3 named_buff @@ -404,7 +487,9 @@ releases. For instance this might be implemented by magic instead =head3 named_buff_iter - SV* (*named_buff_iter) (pTHX_ REGEXP * const rx, const SV * const lastkey, + SV* (*named_buff_iter) (pTHX_ + REGEXP * const rx, + const SV * const lastkey, const U32 flags); =head2 qr_package @@ -412,12 +497,12 @@ releases. For instance this might be implemented by magic instead SV* qr_package(pTHX_ REGEXP * const rx); The package the qr// magic object is blessed into (as seen by C<ref -qr//>). It is recommended that engines change this to their package -name for identification regardless of whether they implement methods +qr//>). It is recommended that engines change this to their package +name for identification regardless of if they implement methods on the object. The package this method returns should also have the internal -C<Regexp> package in its C<@ISA>. C<< qr//->isa("Regexp") >> should always +C<Regexp> package in its C<@ISA>. C<< qr//->isa("Regexp") >> should always be true regardless of what engine is being used. Example implementation might be: @@ -449,12 +534,12 @@ Functions>. void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param); On threaded builds a regexp may need to be duplicated so that the pattern -can be used by multiple threads. This routine is expected to handle the +can be used by multiple threads. This routine is expected to handle the duplication of any private data pointed to by the C<pprivate> member of -the regexp structure. It will be called with the preconstructed new -regexp structure as an argument, the C<pprivate> member will point at +the C<regexp> structure. It will be called with the preconstructed new +C<regexp> structure as an argument, the C<pprivate> member will point at the B<old> private structure, and it is this routine's responsibility to -construct a copy and return a pointer to it (which perl will then use to +construct a copy and return a pointer to it (which Perl will then use to overwrite the field as passed to this routine.) This allows the engine to dupe its private data but also if necessary @@ -462,24 +547,30 @@ modify the final structure if it really must. On unthreaded builds this field doesn't exist. +=head2 op_comp + +This is private to the Perl core and subject to change. Should be left +null. + =head1 The REGEXP structure -The REGEXP struct is defined in F<regexp.h>. All regex engines must be able to +The REGEXP struct is defined in F<regexp.h>. +All regex engines must be able to correctly build such a structure in their L</comp> routine. -The REGEXP structure contains all the data that perl needs to be aware of -to properly work with the regular expression. It includes data about -optimisations that perl can use to determine if the regex engine should +The REGEXP structure contains all the data that Perl needs to be aware of +to properly work with the regular expression. It includes data about +optimisations that Perl can use to determine if the regex engine should really be used, and various other control info that is needed to properly -execute patterns in various contexts such as is the pattern anchored in -some way, or what flags were used during the compile, or whether the -program contains special constructs that perl needs to be aware of. +execute patterns in various contexts, such as if the pattern anchored in +some way, or what flags were used during the compile, or if the +program contains special constructs that Perl needs to be aware of. In addition it contains two fields that are intended for the private -use of the regex engine that compiled the pattern. These are the -C<intflags> and C<pprivate> members. C<pprivate> is a void pointer to -an arbitrary structure whose use and management is the responsibility -of the compiling engine. perl will never modify either of these +use of the regex engine that compiled the pattern. These are the +C<intflags> and C<pprivate> members. C<pprivate> is a void pointer to +an arbitrary structure, whose use and management is the responsibility +of the compiling engine. Perl will never modify either of these values. typedef struct regexp { @@ -489,10 +580,12 @@ values. /* what re is this a lightweight copy of? */ struct regexp* mother_re; - /* Information about the match that the perl core uses to manage things */ + /* Information about the match that the Perl core uses to manage + * things */ U32 extflags; /* Flags used both externally and internally */ - I32 minlen; /* mininum possible length of string to match */ - I32 minlenret; /* mininum possible length of $& */ + I32 minlen; /* mininum possible number of chars in */ + string to match */ + I32 minlenret; /* mininum possible number of chars in $& */ U32 gofs; /* chars left of pos that we search from */ /* substring data about strings that must appear @@ -506,15 +599,21 @@ values. void *pprivate; /* Data private to the regex engine which created this object. */ - /* Data about the last/current match. These are modified during matching*/ - U32 lastparen; /* last open paren matched */ - U32 lastcloseparen; /* last close paren matched */ + /* Data about the last/current match. These are modified during + * matching*/ + U32 lastparen; /* highest close paren matched ($+) */ + U32 lastcloseparen; /* last close paren matched ($^N) */ regexp_paren_pair *swap; /* Swap copy of *offs */ - regexp_paren_pair *offs; /* Array of offsets for (@-) and (@+) */ + regexp_paren_pair *offs; /* Array of offsets for (@-) and + (@+) */ - char *subbeg; /* saved or original string so \digit works forever. */ + char *subbeg; /* saved or original string so \digit works + forever. */ SV_SAVED_COPY /* If non-NULL, SV which is COW from original */ I32 sublen; /* Length of string pointed by subbeg */ + I32 suboffset; /* byte offset of subbeg from logical start of + str */ + I32 subcoffset; /* suboffset equiv, but in chars (for @-/@+) */ /* Information about the match that isn't often used */ I32 prelen; /* length of precomp */ @@ -523,7 +622,8 @@ values. char *wrapped; /* wrapped version of the pattern */ I32 wraplen; /* length of wrapped */ - I32 seen_evals; /* number of eval groups in the pattern - for security checks */ + I32 seen_evals; /* number of eval groups in the pattern - for + security checks */ HV *paren_names; /* Optional hash of paren names */ /* Refcount of this regexp */ @@ -534,13 +634,13 @@ The fields are discussed in more detail below: =head2 C<engine> -This field points at a regexp_engine structure which contains pointers -to the subroutines that are to be used for performing a match. It +This field points at a C<regexp_engine> structure which contains pointers +to the subroutines that are to be used for performing a match. It is the compiling routine's responsibility to populate this field before returning the regexp object. Internally this is set to C<NULL> unless a custom engine is specified in -C<$^H{regcomp}>, perl's own set of callbacks can be accessed in the struct +C<$^H{regcomp}>, Perl's own set of callbacks can be accessed in the struct pointed to by C<RE_ENGINE_PTR>. =head2 C<mother_re> @@ -549,21 +649,22 @@ TODO, see L<http://www.mail-archive.com/perl5-changes@perl.org/msg17328.html> =head2 C<extflags> -This will be used by perl to see what flags the regexp was compiled +This will be used by Perl to see what flags the regexp was compiled with, this will normally be set to the value of the flags parameter by -the L<comp|/comp> callback. See the L<comp|/comp> documentation for +the L<comp|/comp> callback. See the L<comp|/comp> documentation for valid flags. =head2 C<minlen> C<minlenret> -The minimum string length required for the pattern to match. This is used to +The minimum string length (in characters) required for the pattern to match. +This is used to prune the search space by not bothering to match any closer to the end of a -string than would allow a match. For instance there is no point in even +string than would allow a match. For instance there is no point in even starting the regex engine if the minlen is 10 but the string is only 5 -characters long. There is no way that the pattern can match. +characters long. There is no way that the pattern can match. -C<minlenret> is the minimum length of the string that would be found -in $& after a match. +C<minlenret> is the minimum length (in characters) of the string that would +be found in $& after a match. The difference between C<minlen> and C<minlenret> can be seen in the following pattern: @@ -571,10 +672,11 @@ following pattern: /ns(?=\d)/ where the C<minlen> would be 3 but C<minlenret> would only be 2 as the \d is -required to match but is not actually included in the matched content. This +required to match but is not actually +included in the matched content. This distinction is particularly important as the substitution logic uses the -C<minlenret> to tell whether it can do in-place substitution which can result in -considerable speedup. +C<minlenret> to tell if it can do in-place substitutions (these can +result in considerable speed-up). =head2 C<gofs> @@ -582,8 +684,8 @@ Left offset from pos() to start match at. =head2 C<substrs> -Substring data about strings that must appear in the final match. This -is currently only used internally by perl's engine for but might be +Substring data about strings that must appear in the final match. This +is currently only used internally by Perl's engine, but might be used in the future for all engines for optimisations. =head2 C<nparens>, C<lastparen>, and C<lastcloseparen> @@ -599,13 +701,14 @@ this is the same as C<extflags> unless the engine chose to modify one of them. =head2 C<pprivate> -A void* pointing to an engine-defined data structure. The perl engine uses the +A void* pointing to an engine-defined +data structure. The Perl engine uses the C<regexp_internal> structure (see L<perlreguts/Base Structures>) but a custom engine should use something else. =head2 C<swap> -Unused. Left in for compatibility with perl 5.10.0. +Unused. Left in for compatibility with Perl 5.10.0. =head2 C<offs> @@ -619,16 +722,17 @@ C<regexp_paren_pair> struct is defined as follows: } regexp_paren_pair; If C<< ->offs[num].start >> or C<< ->offs[num].end >> is C<-1> then that -capture group did not match. C<< ->offs[0].start/end >> represents C<$&> (or -C<${^MATCH> under C<//p>) and C<< ->offs[paren].end >> matches C<$$paren> where +capture group did not match. +C<< ->offs[0].start/end >> represents C<$&> (or +C<${^MATCH}> under C<//p>) and C<< ->offs[paren].end >> matches C<$$paren> where C<$paren >= 1>. =head2 C<precomp> C<prelen> -Used for optimisations. C<precomp> holds a copy of the pattern that -was compiled and C<prelen> its length. When a new pattern is to be +Used for optimisations. C<precomp> holds a copy of the pattern that +was compiled and C<prelen> its length. When a new pattern is to be compiled (such as inside a loop) the internal C<regcomp> operator -checks whether the last compiled C<REGEXP>'s C<precomp> and C<prelen> +checks if the last compiled C<REGEXP>'s C<precomp> and C<prelen> are equivalent to the new one, and if so uses the old pattern instead of compiling a new one. @@ -641,7 +745,7 @@ The relevant snippet from C<Perl_pp_regcomp>: =head2 C<paren_names> This is a hash used internally to track named capture groups and their -offsets. The keys are the names of the buffers the values are dualvars, +offsets. The keys are the names of the buffers the values are dualvars, with the IV slot holding the number of buffers with the given name and the pv being an embedded array of I32. The values may also be contained independently in the data array in cases where named backreferences are @@ -651,17 +755,31 @@ used. Holds information on the longest string that must occur at a fixed offset from the start of the pattern, and the longest string that must -occur at a floating offset from the start of the pattern. Used to do +occur at a floating offset from the start of the pattern. Used to do Fast-Boyer-Moore searches on the string to find out if its worth using the regex engine at all, and if so where in the string to search. -=head2 C<subbeg> C<sublen> C<saved_copy> +=head2 C<subbeg> C<sublen> C<saved_copy> C<suboffset> C<subcoffset> + +Used during the execution phase for managing search and replace patterns, +and for providing the text for C<$&>, C<$1> etc. C<subbeg> points to a +buffer (either the original string, or a copy in the case of +C<RX_MATCH_COPIED(rx)>), and C<sublen> is the length of the buffer. The +C<RX_OFFS> start and end indices index into this buffer. -Used during execution phase for managing search and replace patterns. +In the presence of the C<REXEC_COPY_STR> flag, but with the addition of +the C<REXEC_COPY_SKIP_PRE> or C<REXEC_COPY_SKIP_POST> flags, an engine +can choose not to copy the full buffer (although it must still do so in +the presence of C<RXf_PMf_KEEPCOPY> or the relevant bits being set in +C<PL_sawampersand>). In this case, it may set C<suboffset> to indicate the +number of bytes from the logical start of the buffer to the physical start +(i.e. C<subbeg>). It should also set C<subcoffset>, the number of +characters in the offset. The latter is needed to support C<@-> and C<@+> +which work in characters, not bytes. =head2 C<wrapped> C<wraplen> -Stores the string C<qr//> stringifies to. The perl engine for example +Stores the string C<qr//> stringifies to. The Perl engine for example stores C<(?^:eek)> in the case of C<qr/eek/>. When using a custom engine that doesn't support the C<(?:)> construct @@ -678,13 +796,15 @@ engine understand a construct like C<(?:)>. =head2 C<seen_evals> -This stores the number of eval groups in the pattern. This is used for security +This stores the number of eval groups in +the pattern. This is used for security purposes when embedding compiled regexes into larger patterns with C<qr//>. =head2 C<refcnt> -The number of times the structure is referenced. When this falls to 0 the -regexp is automatically freed by a call to pregfree. This should be set to 1 in +The number of times the structure is referenced. When +this falls to 0, the regexp is automatically freed +by a call to pregfree. This should be set to 1 in each engine's L</comp> routine. =head1 HISTORY diff --git a/gnu/usr.bin/perl/pod/perlrebackslash.pod b/gnu/usr.bin/perl/pod/perlrebackslash.pod index f81af0c6dd7..44b0e7db06e 100644 --- a/gnu/usr.bin/perl/pod/perlrebackslash.pod +++ b/gnu/usr.bin/perl/pod/perlrebackslash.pod @@ -68,7 +68,7 @@ as C<Not in [].> \A Beginning of string. Not in []. \b Word/non-word boundary. (Backspace in []). \B Not a word/non-word boundary. Not in []. - \cX Control-X + \cX Control-X. \C Single octet, even under UTF-8. Not in []. \d Character class for digits. \D Character class for non-digits. @@ -76,7 +76,8 @@ as C<Not in [].> \E Turn off \Q, \L and \U processing. Not in []. \f Form feed. \F Foldcase till \E. Not in []. - \g{}, \g1 Named, absolute or relative backreference. Not in [] + \g{}, \g1 Named, absolute or relative backreference. + Not in []. \G Pos assertion. Not in []. \h Character class for horizontal whitespace. \H Character class for non horizontal whitespace. @@ -85,7 +86,7 @@ as C<Not in [].> \l Lowercase next character. Not in []. \L Lowercase till \E. Not in []. \n (Logical) newline character. - \N Any character but newline. Experimental. Not in []. + \N Any character but newline. Not in []. \N{} Named or numbered (Unicode) character or sequence. \o{} Octal escape sequence. \p{}, \pP Character with the given Unicode property. @@ -246,16 +247,17 @@ Mnemonic: I<0>ctal or I<o>ctal. $str = "Perl"; $str =~ /\o{120}/; # Match, "\120" is "P". $str =~ /\120/; # Same. - $str =~ /\o{120}+/; # Match, "\120" is "P", it's repeated at least once + $str =~ /\o{120}+/; # Match, "\120" is "P", + # it's repeated at least once. $str =~ /\120+/; # Same. $str =~ /P\053/; # No match, "\053" is "+" and taken literally. /\o{23073}/ # Black foreground, white background smiling face. - /\o{4801234567}/ # Raises a warning, and yields chr(4) + /\o{4801234567}/ # Raises a warning, and yields chr(4). =head4 Disambiguation rules between old-style octal escapes and backreferences Octal escapes of the C<\000> form outside of bracketed character classes -potentially clash with old-style backreferences. (see L</Absolute referencing> +potentially clash with old-style backreferences (see L</Absolute referencing> below). They both consist of a backslash followed by numbers. So Perl has to use heuristics to determine whether it is a backreference or an octal escape. Perl uses the following rules to disambiguate: @@ -282,7 +284,7 @@ takes only the first three for the octal escape; the rest are matched as is. $pat .= ")" x 999; /^($pat)\1000$/; # Matches 'aa'; there are 1000 capture groups. /^$pat\1000$/; # Matches 'a@0'; there are 999 capture groups - # and \1000 is seen as \100 (a '@') and a '0' + # and \1000 is seen as \100 (a '@') and a '0'. =back @@ -430,7 +432,7 @@ Mnemonic: I<g>roup. =head4 Examples /(\w+) \g1/; # Finds a duplicated word, (e.g. "cat cat"). - /(\w+) \1/; # Same thing; written old-style + /(\w+) \1/; # Same thing; written old-style. /(.)(.)\g2\g1/; # Match a four letter palindrome (e.g. "ABBA"). @@ -575,7 +577,7 @@ categories above. These are: C<\C> always matches a single octet, even if the source string is encoded in UTF-8 format, and the character to be matched is a multi-octet character. -C<\C> was introduced in perl 5.6. This is very dangerous, because it violates +This is very dangerous, because it violates the logical character abstraction and can cause UTF-8 sequences to become malformed. Mnemonic: oI<C>tet. @@ -591,7 +593,7 @@ Mnemonic: I<K>eep. =item \N -This is an experimental feature new to perl 5.12.0. It matches any character +This feature, available starting in v5.12, matches any character that is B<not> a newline. It is a short-hand for writing C<[^\n]>, and is identical to the C<.> metasymbol, except under the C</s> flag, which changes the meaning of C<.>, but not C<\N>. @@ -647,7 +649,8 @@ Mnemonic: eI<X>tended Unicode character. =head4 Examples - "\x{256}" =~ /^\C\C$/; # Match as chr (0x256) takes 2 octets in UTF-8. + "\x{256}" =~ /^\C\C$/; # Match as chr (0x256) takes + # 2 octets in UTF-8. $str =~ s/foo\Kbar/baz/g; # Change any 'bar' following a 'foo' to 'baz' $str =~ s/(.)\K\g1//g; # Delete duplicated characters. diff --git a/gnu/usr.bin/perl/pod/perlrecharclass.pod b/gnu/usr.bin/perl/pod/perlrecharclass.pod index 06d206b2f8b..eb41ab9eec3 100644 --- a/gnu/usr.bin/perl/pod/perlrecharclass.pod +++ b/gnu/usr.bin/perl/pod/perlrecharclass.pod @@ -29,7 +29,7 @@ the most well-known character class. By default, a dot matches any character, except for the newline. That default can be changed to add matching the newline by using the I<single line> modifier: either for the entire regular expression with the C</s> modifier, or -locally with C<(?s)>. (The experimental C<\N> backslash sequence, described +locally with C<(?s)>. (The C<\N> backslash sequence, described below, matches any character except newline without regard to the I<single line> modifier.) @@ -68,13 +68,13 @@ character classes, see L<perlrebackslash>.) \H Match a character that isn't horizontal whitespace. \v Match a vertical whitespace character. \V Match a character that isn't vertical whitespace. - \N Match a character that isn't a newline. Experimental. + \N Match a character that isn't a newline. \pP, \p{Prop} Match a character that has the given Unicode property. \PP, \P{Prop} Match a character that doesn't have the Unicode property =head3 \N -C<\N> is new in 5.12, and is experimental. It, like the dot, matches any +C<\N>, available starting in v5.12, like the dot, matches any character that is not a newline. The difference is that C<\N> is not influenced by the I<single line> regular expression modifier (see L</The dot> above). Note that the form C<\N{...}> may mean something completely different. When the @@ -140,11 +140,12 @@ Any character not matched by C<\d> is matched by C<\D>. =head3 Word characters A C<\w> matches a single alphanumeric character (an alphabetic character, or a -decimal digit) or a connecting punctuation character, such as an -underscore ("_"). It does not match a whole word. To match a whole -word, use C<\w+>. This isn't the same thing as matching an English word, but -in the ASCII range it is the same as a string of Perl-identifier -characters. +decimal digit); or a connecting punctuation character, such as an +underscore ("_"); or a "mark" character (like some sort of accent) that +attaches to one of those. It does not match a whole word. To match a +whole word, use C<\w+>. This isn't the same thing as matching an +English word, but in the ASCII range it is the same as a string of +Perl-identifier characters. =over @@ -173,7 +174,7 @@ are generally used to add auxiliary markings to letters. C<\w> matches the platform's native underscore character plus whatever the locale considers to be alphanumeric. -=item if Unicode rules are in effect or if on an EBCDIC platform ... +=item if Unicode rules are in effect ... C<\w> matches exactly what C<\p{Word}> matches. @@ -208,9 +209,11 @@ C<\s> matches any single character considered whitespace. =item If the C</a> modifier is in effect ... -C<\s> matches the 5 characters [\t\n\f\r ]; that is, the horizontal tab, -the newline, the form feed, the carriage return, and the space. (Note -that it doesn't match the vertical tab, C<\cK> on ASCII platforms.) +In all Perl versions, C<\s> matches the 5 characters [\t\n\f\r ]; that +is, the horizontal tab, +the newline, the form feed, the carriage return, and the space. +Starting in Perl v5.18, experimentally, it also matches the vertical tab, C<\cK>. +See note C<[1]> below for a discussion of this. =item otherwise ... @@ -227,18 +230,18 @@ in the table below. =item if locale rules are in effect ... -C<\s> matches whatever the locale considers to be whitespace. Note that -this is likely to include the vertical space, unlike non-locale C<\s> -matching. +C<\s> matches whatever the locale considers to be whitespace. -=item if Unicode rules are in effect or if on an EBCDIC platform ... +=item if Unicode rules are in effect ... C<\s> matches exactly the characters shown with an "s" column in the table below. =item otherwise ... -C<\s> matches [\t\n\f\r ]. +C<\s> matches [\t\n\f\r\cK ] and, starting, experimentally in Perl +v5.18, the vertical tab, C<\cK>. +(See note C<[1]> below for a discussion of this.) Note that this list doesn't include the non-breaking space. =back @@ -277,26 +280,26 @@ Note that unlike C<\s> (and C<\d> and C<\w>), C<\h> and C<\v> always match the same characters, without regard to other factors, such as the active locale or whether the source string is in UTF-8 format. -One might think that C<\s> is equivalent to C<[\h\v]>. This is not true. -The difference is that the vertical tab (C<"\x0b">) is not matched by -C<\s>; it is however considered vertical whitespace. +One might think that C<\s> is equivalent to C<[\h\v]>. This is indeed true +starting in Perl v5.18, but prior to that, the sole difference was that the +vertical tab (C<"\cK">) was not matched by C<\s>. The following table is a complete listing of characters matched by C<\s>, C<\h> and C<\v> as of Unicode 6.0. The first column gives the Unicode code point of the character (in hex format), the second column gives the (Unicode) name. The third column indicates -by which class(es) the character is matched (assuming no locale or EBCDIC code -page is in effect that changes the C<\s> matching). +by which class(es) the character is matched (assuming no locale is in +effect that changes the C<\s> matching). 0x0009 CHARACTER TABULATION h s 0x000a LINE FEED (LF) vs - 0x000b LINE TABULATION v + 0x000b LINE TABULATION vs [1] 0x000c FORM FEED (FF) vs 0x000d CARRIAGE RETURN (CR) vs 0x0020 SPACE h s - 0x0085 NEXT LINE (NEL) vs [1] - 0x00a0 NO-BREAK SPACE h s [1] + 0x0085 NEXT LINE (NEL) vs [2] + 0x00a0 NO-BREAK SPACE h s [2] 0x1680 OGHAM SPACE MARK h s 0x180e MONGOLIAN VOWEL SEPARATOR h s 0x2000 EN QUAD h s @@ -320,6 +323,16 @@ page is in effect that changes the C<\s> matching). =item [1] +Prior to Perl v5.18, C<\s> did not match the vertical tab. The change +in v5.18 is considered an experiment, which means it could be backed out +in v5.20 or v5.22 if experience indicates that it breaks too much +existing code. If this change adversely affects you, send email to +C<perlbug@perl.org>; if it affects you positively, email +C<perlthanks@perl.org>. In the meantime, C<[^\S\cK]> (obscurely) +matches what C<\s> traditionally did. + +=item [2] + NEXT LINE and NO-BREAK SPACE may or may not match C<\s> depending on the rules in effect. See L<the beginning of this section|/Whitespace>. @@ -345,9 +358,9 @@ C</\pLl/> is valid, but means something different. It matches a two character string: a letter (Unicode property C<\pL>), followed by a lowercase C<l>. -If neither the C</a> modifier nor locale rules are in effect, the use of +If locale rules are not in effect, the use of a Unicode property will force the regular expression into using Unicode -rules. +rules, if it isn't already. Note that almost all properties are immune to case-insensitive matching. That is, adding a C</i> regular expression modifier does not change what @@ -441,7 +454,8 @@ Examples: * There is an exception to a bracketed character class matching a single character only. When the class is to match caselessly under C</i> -matching rules, and a character inside the class matches a +matching rules, and a character that is explicitly mentioned inside the +class matches a multiple-character sequence caselessly under Unicode rules, the class (when not L<inverted|/Negation>) will also match that sequence. For example, Unicode says that the letter C<LATIN SMALL LETTER SHARP S> @@ -450,6 +464,18 @@ should match the sequence C<ss> under C</i> rules. Thus, 'ss' =~ /\A\N{LATIN SMALL LETTER SHARP S}\z/i # Matches 'ss' =~ /\A[aeioust\N{LATIN SMALL LETTER SHARP S}]\z/i # Matches +For this to happen, the character must be explicitly specified, and not +be part of a multi-character range (not even as one of its endpoints). +(L</Character Ranges> will be explained shortly.) Therefore, + + 'ss' =~ /\A[\0-\x{ff}]\z/i # Doesn't match + 'ss' =~ /\A[\0-\N{LATIN SMALL LETTER SHARP S}]\z/i # No match + 'ss' =~ /\A[\xDF-\xDF]\z/i # Matches on ASCII platforms, since \XDF + # is LATIN SMALL LETTER SHARP S, and the + # range is just a single element + +Note that it isn't a good idea to specify these types of ranges anyway. + =head3 Special Characters Inside a Bracketed Character Class Most characters that are meta characters in regular expressions (that @@ -508,7 +534,7 @@ escaping. Examples: "+" =~ /[+?*]/ # Match, "+" in a character class is not special. - "\cH" =~ /[\b]/ # Match, \b inside in a character class + "\cH" =~ /[\b]/ # Match, \b inside in a character class. # is equivalent to a backspace. "]" =~ /[][]/ # Match, as the character class contains. # both [ and ]. @@ -643,7 +669,7 @@ is valid and matches '0', '1', any alphabetic character, and the percent sign. Perl recognizes the following POSIX character classes: alpha Any alphabetical character ("[A-Za-z]"). - alnum Any alphanumeric character. ("[A-Za-z0-9]") + alnum Any alphanumeric character ("[A-Za-z0-9]"). ascii Any character in the ASCII character set. blank A GNU extension, equal to a space or a horizontal tab ("\t"). cntrl Any control character. See Note [2] below. @@ -652,7 +678,8 @@ Perl recognizes the following POSIX character classes: lower Any lowercase character ("[a-z]"). print Any printable character, including a space. See Note [4] below. punct Any graphical character excluding "word" characters. Note [5]. - space Any whitespace character. "\s" plus the vertical tab ("\cK"). + space Any whitespace character. "\s" including the vertical tab + ("\cK"). upper Any uppercase character ("[A-Z]"). word A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w". xdigit Any hexadecimal digit ("[0-9a-fA-F]"). @@ -705,10 +732,6 @@ the terminal somehow: for example, newline and backspace are control characters. In the ASCII range, characters whose code points are between 0 and 31 inclusive, plus 127 (C<DEL>) are control characters. -On EBCDIC platforms, it is likely that the code page will define C<[[:cntrl:]]> -to be the EBCDIC equivalents of the ASCII controls, plus the controls -that in Unicode have code pointss from 128 through 159. - =item [3] Any character that is I<graphical>, that is, visible. This class consists @@ -743,9 +766,10 @@ Unicode considers symbols. =item [6] -C<\p{SpacePerl}> and C<\p{Space}> differ only in that in non-locale -matching, C<\p{Space}> additionally -matches the vertical tab, C<\cK>. Same for the two ASCII-only range forms. +C<\p{SpacePerl}> and C<\p{Space}> match identically starting with Perl +v5.18. In earlier versions, these differ only in that in non-locale +matching, C<\p{SpacePerl}> does not match the vertical tab, C<\cK>. +Same for the two ASCII-only range forms. =back @@ -787,7 +811,7 @@ The POSIX class matches according to the locale, except that C<word> uses the platform's native underscore character, no matter what the locale is. -=item if Unicode rules are in effect or if on an EBCDIC platform ... +=item if Unicode rules are in effect ... The POSIX class matches the same as the Full-range counterpart. @@ -806,7 +830,7 @@ L<perlre/Which character set modifier is in effect?>. It is proposed to change this behavior in a future release of Perl so that whether or not Unicode rules are in effect would not change the -behavior: Outside of locale or an EBCDIC code page, the POSIX classes +behavior: Outside of locale, the POSIX classes would behave like their ASCII-range counterparts. If you wish to comment on this proposal, send email to C<perl5-porters@perl.org>. @@ -840,11 +864,218 @@ either construct raises an exception. /[01[:lower:]]/ # Matches a character that is either a # lowercase letter, or '0' or '1'. /[[:digit:][:^xdigit:]]/ # Matches a character that can be anything - # except the letters 'a' to 'f'. This is - # because the main character class is composed - # of two POSIX character classes that are ORed - # together, one that matches any digit, and - # the other that matches anything that isn't a - # hex digit. The result matches all - # characters except the letters 'a' to 'f' and - # 'A' to 'F'. + # except the letters 'a' to 'f' and 'A' to + # 'F'. This is because the main character + # class is composed of two POSIX character + # classes that are ORed together, one that + # matches any digit, and the other that + # matches anything that isn't a hex digit. + # The OR adds the digits, leaving only the + # letters 'a' to 'f' and 'A' to 'F' excluded. + +=head3 Extended Bracketed Character Classes +X<character class> +X<set operations> + +This is a fancy bracketed character class that can be used for more +readable and less error-prone classes, and to perform set operations, +such as intersection. An example is + + /(?[ \p{Thai} & \p{Digit} ])/ + +This will match all the digit characters that are in the Thai script. + +This is an experimental feature available starting in 5.18, and is +subject to change as we gain field experience with it. Any attempt to +use it will raise a warning, unless disabled via + + no warnings "experimental::regex_sets"; + +Comments on this feature are welcome; send email to +C<perl5-porters@perl.org>. + +We can extend the example above: + + /(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])/ + +This matches digits that are in either the Thai or Laotian scripts. + +Notice the white space in these examples. This construct always has +the C<E<sol>x> modifier turned on. + +The available binary operators are: + + & intersection + + union + | another name for '+', hence means union + - subtraction (the result matches the set consisting of those + code points matched by the first operand, excluding any that + are also matched by the second operand) + ^ symmetric difference (the union minus the intersection). This + is like an exclusive or, in that the result is the set of code + points that are matched by either, but not both, of the + operands. + +There is one unary operator: + + ! complement + +All the binary operators left associate, and are of equal precedence. +The unary operator right associates, and has higher precedence. Use +parentheses to override the default associations. Some feedback we've +received indicates a desire for intersection to have higher precedence +than union. This is something that feedback from the field may cause us +to change in future releases; you may want to parenthesize copiously to +avoid such changes affecting your code, until this feature is no longer +considered experimental. + +The main restriction is that everything is a metacharacter. Thus, +you cannot refer to single characters by doing something like this: + + /(?[ a + b ])/ # Syntax error! + +The easiest way to specify an individual typable character is to enclose +it in brackets: + + /(?[ [a] + [b] ])/ + +(This is the same thing as C<[ab]>.) You could also have said the +equivalent: + + /(?[[ a b ]])/ + +(You can, of course, specify single characters by using, C<\x{ }>, +C<\N{ }>, etc.) + +This last example shows the use of this construct to specify an ordinary +bracketed character class without additional set operations. Note the +white space within it; C<E<sol>x> is turned on even within bracketed +character classes, except you can't have comments inside them. Hence, + + (?[ [#] ]) + +matches the literal character "#". To specify a literal white space character, +you can escape it with a backslash, like: + + /(?[ [ a e i o u \ ] ])/ + +This matches the English vowels plus the SPACE character. +All the other escapes accepted by normal bracketed character classes are +accepted here as well; but unrecognized escapes that generate warnings +in normal classes are fatal errors here. + +All warnings from these class elements are fatal, as well as some +practices that don't currently warn. For example you cannot say + + /(?[ [ \xF ] ])/ # Syntax error! + +You have to have two hex digits after a braceless C<\x> (use a leading +zero to make two). These restrictions are to lower the incidence of +typos causing the class to not match what you thought it would. + +The final difference between regular bracketed character classes and +these, is that it is not possible to get these to match a +multi-character fold. Thus, + + /(?[ [\xDF] ])/iu + +does not match the string C<ss>. + +You don't have to enclose POSIX class names inside double brackets, +hence both of the following work: + + /(?[ [:word:] - [:lower:] ])/ + /(?[ [[:word:]] - [[:lower:]] ])/ + +Any contained POSIX character classes, including things like C<\w> and C<\D> +respect the C<E<sol>a> (and C<E<sol>aa>) modifiers. + +C<< (?[ ]) >> is a regex-compile-time construct. Any attempt to use +something which isn't knowable at the time the containing regular +expression is compiled is a fatal error. In practice, this means +just three limitiations: + +=over 4 + +=item 1 + +This construct cannot be used within the scope of +C<use locale> (or the C<E<sol>l> regex modifier). + +=item 2 + +Any +L<user-defined property|perlunicode/"User-Defined Character Properties"> +used must be already defined by the time the regular expression is +compiled (but note that this construct can be used instead of such +properties). + +=item 3 + +A regular expression that otherwise would compile +using C<E<sol>d> rules, and which uses this construct will instead +use C<E<sol>u>. Thus this construct tells Perl that you don't want +C<E<sol>d> rules for the entire regular expression containing it. + +=back + +The C<E<sol>x> processing within this class is an extended form. +Besides the characters that are considered white space in normal C</x> +processing, there are 5 others, recommended by the Unicode standard: + + U+0085 NEXT LINE + U+200E LEFT-TO-RIGHT MARK + U+200F RIGHT-TO-LEFT MARK + U+2028 LINE SEPARATOR + U+2029 PARAGRAPH SEPARATOR + +Note that skipping white space applies only to the interior of this +construct. There must not be any space between any of the characters +that form the initial C<(?[>. Nor may there be space between the +closing C<])> characters. + +Just as in all regular expressions, the pattern can can be built up by +including variables that are interpolated at regex compilation time. +Care must be taken to ensure that you are getting what you expect. For +example: + + my $thai_or_lao = '\p{Thai} + \p{Lao}'; + ... + qr/(?[ \p{Digit} & $thai_or_lao ])/; + +compiles to + + qr/(?[ \p{Digit} & \p{Thai} + \p{Lao} ])/; + +But this does not have the effect that someone reading the code would +likely expect, as the intersection applies just to C<\p{Thai}>, +excluding the Laotian. Pitfalls like this can be avoided by +parenthesizing the component pieces: + + my $thai_or_lao = '( \p{Thai} + \p{Lao} )'; + +But any modifiers will still apply to all the components: + + my $lower = '\p{Lower} + \p{Digit}'; + qr/(?[ \p{Greek} & $lower ])/i; + +matches upper case things. You can avoid surprises by making the +components into instances of this construct by compiling them: + + my $thai_or_lao = qr/(?[ \p{Thai} + \p{Lao} ])/; + my $lower = qr/(?[ \p{Lower} + \p{Digit} ])/; + +When these are embedded in another pattern, what they match does not +change, regardless of parenthesization or what modifiers are in effect +in that outer pattern. + +Due to the way that Perl parses things, your parentheses and brackets +may need to be balanced, even including comments. If you run into any +examples, please send them to C<perlbug@perl.org>, so that we can have a +concrete example for this man page. + +We may change it so that things that remain legal uses in normal bracketed +character classes might become illegal within this experimental +construct. One proposal, for example, is to forbid adjacent uses of the +same character, as in C<(?[ [aa] ])>. The motivation for such a change +is that this usage is likely a typo, as the second "a" adds nothing. diff --git a/gnu/usr.bin/perl/pod/perlreftut.pod b/gnu/usr.bin/perl/pod/perlreftut.pod index 9565562711d..bd888eb5a02 100644 --- a/gnu/usr.bin/perl/pod/perlreftut.pod +++ b/gnu/usr.bin/perl/pod/perlreftut.pod @@ -18,9 +18,9 @@ Fortunately, you only need to know 10% of what's in the main page to get =head1 Who Needs Complicated Data Structures? -One problem that came up all the time in Perl 4 was how to represent a -hash whose values were lists. Perl 4 had hashes, of course, but the -values had to be scalars; they couldn't be lists. +One problem that comes up all the time is needing a hash whose values are +lists. Perl has hashes, of course, but the values have to be scalars; +they can't be lists. Why would you want a hash of lists? Let's take a simple example: You have a file of city and country names, like this: @@ -47,8 +47,7 @@ country, and append the new city to the list. When you're done reading the input, iterate over the hash as usual, sorting each list of cities before you print it out. -If hash values can't be lists, you lose. In Perl 4, hash values can't -be lists; they can only be strings. You lose. You'd probably have to +If hash values couldn't be lists, you lose. You'd probably have to combine all the cities into a single string somehow, and then when time came to write the output, you'd have to break the string into a list, sort the list, and turn it back into a string. This is messy @@ -403,7 +402,7 @@ to push C<Athens> onto an array that doesn't exist, so it helpfully makes a new, empty, anonymous array for you, installs it into C<%table>, and then pushes C<Athens> onto it. This is called 'autovivification'--bringing things to life automatically. Perl saw -that they key wasn't in the hash, so it created a new hash entry +that the key wasn't in the hash, so it created a new hash entry automatically. Perl saw that you wanted to use the hash value as an array, so it created a new empty array and installed a reference to it in the hash automatically. And as usual, Perl made the array one diff --git a/gnu/usr.bin/perl/pod/perlreguts.pod b/gnu/usr.bin/perl/pod/perlreguts.pod index ec1c243f8a9..bb7f372c664 100644 --- a/gnu/usr.bin/perl/pod/perlreguts.pod +++ b/gnu/usr.bin/perl/pod/perlreguts.pod @@ -182,9 +182,9 @@ POSIX char classes called C<regnode_charclass_class> which has an additional 4-byte (32-bit) bitmap indicating which POSIX char classes have been included. - regnode_charclass_class U32 arg1; - char bitmap[ANYOF_BITMAP_SIZE]; - char classflags[ANYOF_CLASSBITMAP_SIZE]; + regnode_charclass_class U32 arg1; + char bitmap[ANYOF_BITMAP_SIZE]; + char classflags[ANYOF_CLASSBITMAP_SIZE]; =back @@ -354,20 +354,23 @@ simpler form. The call graph looks like this: - reg() # parse a top level regex, or inside of parens - regbranch() # parse a single branch of an alternation - regpiece() # parse a pattern followed by a quantifier - regatom() # parse a simple pattern - regclass() # used to handle a class - reg() # used to handle a parenthesised subpattern - .... - ... - regtail() # finish off the branch - ... - regtail() # finish off the branch sequence. Tie each - # branch's tail to the tail of the sequence - # (NEW) In Debug mode this is - # regtail_study(). + reg() # parse a top level regex, or inside of + # parens + regbranch() # parse a single branch of an alternation + regpiece() # parse a pattern followed by a quantifier + regatom() # parse a simple pattern + regclass() # used to handle a class + reg() # used to handle a parenthesised + # subpattern + .... + ... + regtail() # finish off the branch + ... + regtail() # finish off the branch sequence. Tie each + # branch's tail to the tail of the + # sequence + # (NEW) In Debug mode this is + # regtail_study(). A grammar form might be something like this: @@ -383,6 +386,52 @@ A grammar form might be something like this: piece : _piece | _piece quant +=head3 Parsing complications + +The implication of the above description is that a pattern containing nested +parentheses will result in a call graph which cycles through C<reg()>, +C<regbranch()>, C<regpiece()>, C<regatom()>, C<reg()>, C<regbranch()> I<etc> +multiple times, until the deepest level of nesting is reached. All the above +routines return a pointer to a C<regnode>, which is usually the last regnode +added to the program. However, one complication is that reg() returns NULL +for parsing C<(?:)> syntax for embedded modifiers, setting the flag +C<TRYAGAIN>. The C<TRYAGAIN> propagates upwards until it is captured, in +some cases by by C<regatom()>, but otherwise unconditionally by +C<regbranch()>. Hence it will never be returned by C<regbranch()> to +C<reg()>. This flag permits patterns such as C<(?i)+> to be detected as +errors (I<Quantifier follows nothing in regex; marked by <-- HERE in m/(?i)+ +<-- HERE />). + +Another complication is that the representation used for the program differs +if it needs to store Unicode, but it's not always possible to know for sure +whether it does until midway through parsing. The Unicode representation for +the program is larger, and cannot be matched as efficiently. (See L</Unicode +and Localisation Support> below for more details as to why.) If the pattern +contains literal Unicode, it's obvious that the program needs to store +Unicode. Otherwise, the parser optimistically assumes that the more +efficient representation can be used, and starts sizing on this basis. +However, if it then encounters something in the pattern which must be stored +as Unicode, such as an C<\x{...}> escape sequence representing a character +literal, then this means that all previously calculated sizes need to be +redone, using values appropriate for the Unicode representation. Currently, +all regular expression constructions which can trigger this are parsed by code +in C<regatom()>. + +To avoid wasted work when a restart is needed, the sizing pass is abandoned +- C<regatom()> immediately returns NULL, setting the flag C<RESTART_UTF8>. +(This action is encapsulated using the macro C<REQUIRE_UTF8>.) This restart +request is propagated up the call chain in a similar fashion, until it is +"caught" in C<Perl_re_op_compile()>, which marks the pattern as containing +Unicode, and restarts the sizing pass. It is also possible for constructions +within run-time code blocks to turn out to need Unicode representation., +which is signalled by C<S_compile_runtime_code()> returning false to +C<Perl_re_op_compile()>. + +The restart was previously implemented using a C<longjmp> in C<regatom()> +back to a C<setjmp> in C<Perl_re_op_compile()>, but this proved to be +problematic as the latter is a large function containing many automatic +variables, which interact badly with the emergent control flow of C<setjmp>. + =head3 Debug Output In the 5.9.x development version of perl you can C<< use re Debug => 'PARSE' >> @@ -489,11 +538,11 @@ Now for something much more complex: C</x(?:foo*|b[a][rR])(foo|bar)$/> atom >)$< 34 tail~ BRANCH (28) 36 tsdy~ BRANCH (END) (31) - ~ attach to CLOSE1 (34) offset to 3 + ~ attach to CLOSE1 (34) offset to 3 tsdy~ EXACT <foo> (EXACT) (29) - ~ attach to CLOSE1 (34) offset to 5 + ~ attach to CLOSE1 (34) offset to 5 tsdy~ EXACT <bar> (EXACT) (32) - ~ attach to CLOSE1 (34) offset to 2 + ~ attach to CLOSE1 (34) offset to 2 >$< tail~ BRANCH (3) ~ BRANCH (9) ~ TAIL (25) @@ -765,7 +814,7 @@ implement things such as the stringification of C<qr//>. The other structure is pointed to be the C<regexp> struct's C<pprivate> and is in addition to C<intflags> in the same struct considered to be the property of the regex engine which compiled the -regular expression; +regular expression; The regexp structure contains all the data that perl needs to be aware of to properly work with the regular expression. It includes data about @@ -792,31 +841,24 @@ The following structure is used as the C<pprivate> struct by perl's regex engine. Since it is specific to perl it is only of curiosity value to other engine implementations. - typedef struct regexp_internal { - regexp_paren_ofs *swap; /* Swap copy of *startp / *endp */ - U32 *offsets; /* offset annotations 20001228 MJD - data about mapping the program to the - string*/ - regnode *regstclass; /* Optional startclass as identified or constructed - by the optimiser */ - struct reg_data *data; /* Additional miscellaneous data used by the program. - Used to make it easier to clone and free arbitrary - data that the regops need. Often the ARG field of - a regop is an index into this structure */ - regnode program[1]; /* Unwarranted chumminess with compiler. */ - } regexp_internal; + typedef struct regexp_internal { + U32 *offsets; /* offset annotations 20001228 MJD + * data about mapping the program to + * the string*/ + regnode *regstclass; /* Optional startclass as identified or + * constructed by the optimiser */ + struct reg_data *data; /* Additional miscellaneous data used + * by the program. Used to make it + * easier to clone and free arbitrary + * data that the regops need. Often the + * ARG field of a regop is an index + * into this structure */ + regnode program[1]; /* Unwarranted chumminess with + * compiler. */ + } regexp_internal; =over 5 -=item C<swap> - -C<swap> formerly was an extra set of startp/endp stored in a -C<regexp_paren_ofs> struct. This was used when the last successful match -was from the same pattern as the current pattern, so that a partial -match didn't overwrite the previous match's results, but it caused a -problem with re-entrant code such as trying to build the UTF-8 swashes. -Currently unused and left for backward compatibility with 5.10.0. - =item C<offsets> Offsets holds a mapping of offset in the C<program> diff --git a/gnu/usr.bin/perl/pod/perlreref.pod b/gnu/usr.bin/perl/pod/perlreref.pod index 954a423759c..d76b407f901 100644 --- a/gnu/usr.bin/perl/pod/perlreref.pod +++ b/gnu/usr.bin/perl/pod/perlreref.pod @@ -136,7 +136,7 @@ and L<perlunicode> for details. \S A non-whitespace character \h An horizontal whitespace \H A non horizontal whitespace - \N A non newline (when not followed by '{NAME}'; experimental; + \N A non newline (when not followed by '{NAME}';; not valid in a character class; equivalent to [^\n]; it's like '.' without /s modifier) \v A vertical whitespace diff --git a/gnu/usr.bin/perl/pod/perlretut.pod b/gnu/usr.bin/perl/pod/perlretut.pod index a3ff6ad28c4..bf4ab3bc296 100644 --- a/gnu/usr.bin/perl/pod/perlretut.pod +++ b/gnu/usr.bin/perl/pod/perlretut.pod @@ -869,7 +869,7 @@ with one higher than the maximum reached across all the alternatives. =head2 Position information -In addition to what was matched, Perl (since 5.6.0) also provides the +In addition to what was matched, Perl also provides the positions of what was matched as contents of the C<@-> and C<@+> arrays. C<$-[0]> is the position of the start of the entire match and C<$+[0]> is the position of the end. Similarly, C<$-[n]> is the @@ -1874,8 +1874,8 @@ work if they appear in a regular expression embedded directly in a program, but not when contained in a string that is interpolated in a pattern. -With the advent of 5.6.0, Perl regexps can handle more than just the -standard ASCII character set. Perl now supports I<Unicode>, a standard +Perl regexps can handle more than just the +standard ASCII character set. Perl supports I<Unicode>, a standard for representing the alphabets from virtually all of the world's written languages, and a host of symbols. Perl's text strings are Unicode strings, so they can contain characters with a value (codepoint or character number) higher @@ -1926,13 +1926,13 @@ Consortium, L<http://www.unicode.org/charts/charindex.html>; explanatory material with links to other resources at L<http://www.unicode.org/standard/where>. -The answer to requirement 2) is, as of 5.6.0, that a regexp (mostly) -uses Unicode characters. (The "mostly" is for messy backward +The answer to requirement 2) is that a regexp (mostly) +uses Unicode characters. The "mostly" is for messy backward compatibility reasons, but starting in Perl 5.14, any regex compiled in the scope of a C<use feature 'unicode_strings'> (which is automatically turned on within the scope of a C<use 5.012> or higher) will turn that "mostly" into "always". If you want to handle Unicode properly, you -should ensure that C<'unicode_strings'> is turned on.) +should ensure that C<'unicode_strings'> is turned on. Internally, this is encoded to bytes using either UTF-8 or a native 8 bit encoding, depending on the history of the string, but conceptually it is a sequence of characters, not bytes. See L<perlunitut> for a @@ -2618,23 +2618,23 @@ C<(?((?{...}))yes-regexp|no-regexp)>. In other words, in the case of a code expression, we don't need the extra parentheses around the conditional. -If you try to use code expressions with interpolating variables, Perl -may surprise you: +If you try to use code expressions where the code text is contained within +an interpolated variable, rather than appearing literally in the pattern, +Perl may surprise you: $bar = 5; $pat = '(?{ 1 })'; /foo(?{ $bar })bar/; # compiles ok, $bar not interpolated - /foo(?{ 1 })$bar/; # compile error! + /foo(?{ 1 })$bar/; # compiles ok, $bar interpolated /foo${pat}bar/; # compile error! $pat = qr/(?{ $foo = 1 })/; # precompile code regexp /foo${pat}bar/; # compiles ok -If a regexp has (1) code expressions and interpolating variables, or -(2) a variable that interpolates a code expression, Perl treats the -regexp as an error. If the code expression is precompiled into a -variable, however, interpolating is ok. The question is, why is this -an error? +If a regexp has a variable that interpolates a code expression, Perl +treats the regexp as an error. If the code expression is precompiled into +a variable, however, interpolating is ok. The question is, why is this an +error? The reason is that variable interpolation and code expressions together pose a security risk. The combination is dangerous because @@ -2657,7 +2657,6 @@ security check by invoking S<C<use re 'eval'>>: use re 'eval'; # throw caution out the door $bar = 5; $pat = '(?{ 1 })'; - /foo(?{ 1 })$bar/; # compiles ok /foo${pat}bar/; # compiles ok Another form of code expression is the I<pattern code expression>. @@ -2698,8 +2697,9 @@ Ha! Try that with your garden variety regexp package... Note that the variables C<$z0> and C<$z1> are not substituted when the regexp is compiled, as happens for ordinary variables outside a code -expression. Rather, the code expressions are evaluated when Perl -encounters them during the search for a match. +expression. Rather, the whole code block is parsed as perl code at the +same time as perl is compiling the code containing the literal regexp +pattern. The regexp without the C<//x> modifier is diff --git a/gnu/usr.bin/perl/pod/perlsource.pod b/gnu/usr.bin/perl/pod/perlsource.pod index 16252eb3f07..81e3e94160f 100644 --- a/gnu/usr.bin/perl/pod/perlsource.pod +++ b/gnu/usr.bin/perl/pod/perlsource.pod @@ -116,6 +116,13 @@ Tests for perl's method resolution order implementations (see L<mro>). Tests for perl's built in functions that don't fit into any of the other directories. +=item * F<t/opbasic/> + +Tests for perl's built in functions which, like those in F<t/op/>, do not fit +into any of the other directories, but which, in addition, cannot use +F<t/test.pl>,as that program depends on functionality which the +test file itself is testing. + =item * F<t/re/> Tests for regex related functions or behaviour. (These used to live in diff --git a/gnu/usr.bin/perl/pod/perlunicode.pod b/gnu/usr.bin/perl/pod/perlunicode.pod index 77daca34a7d..7a98285acc7 100644 --- a/gnu/usr.bin/perl/pod/perlunicode.pod +++ b/gnu/usr.bin/perl/pod/perlunicode.pod @@ -28,8 +28,10 @@ C<use feature 'unicode_strings'> is specified. (This is automatically selected if you use C<use 5.012> or higher.) Failure to do this can trigger unexpected surprises. See L</The "Unicode Bug"> below. -This pragma doesn't affect I/O, and there are still several places -where Unicode isn't fully supported, such as in filenames. +This pragma doesn't affect I/O. Nor does it change the internal +representation of strings, only their interpretation. There are still +several places where Unicode isn't fully supported, such as in +filenames. =item Input and Output Layers @@ -72,8 +74,7 @@ See L</"Byte and Character Semantics"> for more details. =head2 Byte and Character Semantics -Beginning with version 5.6, Perl uses logically-wide characters to -represent strings internally. +Perl uses logically-wide characters to represent strings internally. Starting in Perl 5.14, Perl-level operations work with characters rather than bytes within the scope of a @@ -97,13 +98,8 @@ while C<use locale ':not_characters'> effectively also selects C<use feature 'unicode_strings'> in its scope; see L<perllocale>.) Otherwise, Perl uses the platform's native byte semantics for characters whose code points are less than 256, and -Unicode semantics for those greater than 255. On EBCDIC platforms, this -is almost seamless, as the EBCDIC code pages that Perl handles are -equivalent to Unicode's first 256 code points. (The exception is that -EBCDIC regular expression case-insensitive matching rules are not as -as robust as Unicode's.) But on ASCII platforms, Perl uses US-ASCII -(or Basic Latin in Unicode terminology) byte semantics, meaning that characters -whose ordinal numbers are in the range 128 - 255 are undefined except for their +Unicode semantics for those greater than 255. That means that non-ASCII +characters are undefined except for their ordinal numbers. This means that none have case (upper and lower), nor are any a member of character classes, like C<[:alpha:]> or C<\w>. (But all do belong to the C<\W> class or the Perl regular expression extension C<[:^alpha:]>.) @@ -720,7 +716,8 @@ This is a synonym for C<\p{Present_In=*}> =item B<C<\p{PerlSpace}>> -This is the same as C<\s>, restricted to ASCII, namely C<S<[ \f\n\r\t]>>. +This is the same as C<\s>, restricted to ASCII, namely C<S<[ \f\n\r\t]>> +and starting in Perl v5.18, experimentally, a vertical tab. Mnemonic: Perl's (original) space @@ -807,7 +804,9 @@ L<perlrecharclass/POSIX Character Classes>. =head2 User-Defined Character Properties You can define your own binary character properties by defining subroutines -whose names begin with "In" or "Is". The subroutines can be defined in any +whose names begin with "In" or "Is". (The experimental feature +L<perlre/(?[ ])> provides an alternative which allows more complex +definitions.) The subroutines can be defined in any package. The user-defined properties can be used in the regular expression C<\p> and C<\P> constructs; if you are using a user-defined property from a package other than the one you are in, you must specify its package in the @@ -978,62 +977,93 @@ Level 1 - Basic Unicode Support RL1.1 Hex Notation - done [1] RL1.2 Properties - done [2][3] RL1.2a Compatibility Properties - done [4] - RL1.3 Subtraction and Intersection - MISSING [5] + RL1.3 Subtraction and Intersection - experimental [5] RL1.4 Simple Word Boundaries - done [6] RL1.5 Simple Loose Matches - done [7] RL1.6 Line Boundaries - MISSING [8][9] RL1.7 Supplementary Code Points - done [10] - [1] \x{...} - [2] \p{...} \P{...} - [3] supports not only minimal list, but all Unicode character - properties (see Unicode Character Properties above) - [4] \d \D \s \S \w \W \X [:prop:] [:^prop:] - [5] can use regular expression look-ahead [a] or - user-defined character properties [b] to emulate set - operations - [6] \b \B - [7] note that Perl does Full case-folding in matching (but with - bugs), not Simple: for example U+1F88 is equivalent to - U+1F00 U+03B9, instead of just U+1F80. This difference - matters mainly for certain Greek capital letters with certain - modifiers: the Full case-folding decomposes the letter, - while the Simple case-folding would map it to a single - character. - [8] should do ^ and $ also on U+000B (\v in C), FF (\f), CR - (\r), CRLF (\r\n), NEL (U+0085), LS (U+2028), and PS - (U+2029); should also affect <>, $., and script line - numbers; should not split lines within CRLF [c] (i.e. there - is no empty line between \r and \n) - [9] Linebreaking conformant with UAX#14 "Unicode Line Breaking - Algorithm" is available through the Unicode::LineBreaking - module. - [10] UTF-8/UTF-EBDDIC used in Perl allows not only U+10000 to - U+10FFFF but also beyond U+10FFFF - -[a] You can mimic class subtraction using lookahead. +=over 4 + +=item [1] + +\x{...} + +=item [2] + +\p{...} \P{...} + +=item [3] + +supports not only minimal list, but all Unicode character properties (see Unicode Character Properties above) + +=item [4] + +\d \D \s \S \w \W \X [:prop:] [:^prop:] + +=item [5] + +The experimental feature in v5.18 "(?[...])" accomplishes this. See +L<perlre/(?[ ])>. If you don't want to use an experimental feature, +you can use one of the following: + +=over 4 + +=item * Regular expression look-ahead + +You can mimic class subtraction using lookahead. For example, what UTS#18 might write as - [{Greek}-[{UNASSIGNED}]] + [{Block=Greek}-[{UNASSIGNED}]] in Perl can be written as: - (?!\p{Unassigned})\p{InGreekAndCoptic} - (?=\p{Assigned})\p{InGreekAndCoptic} + (?!\p{Unassigned})\p{Block=Greek} + (?=\p{Assigned})\p{Block=Greek} But in this particular example, you probably really want - \p{GreekAndCoptic} + \p{Greek} which will match assigned characters known to be part of the Greek script. -Also see the L<Unicode::Regex::Set> module; it does implement the full -UTS#18 grouping, intersection, union, and removal (subtraction) syntax. +=item * CPAN module L<Unicode::Regex::Set> -[b] '+' for union, '-' for removal (set-difference), '&' for intersection -(see L</"User-Defined Character Properties">) +It does implement the full UTS#18 grouping, intersection, union, and +removal (subtraction) syntax. -[c] Try the C<:crlf> layer (see L<PerlIO>). +=item * L</"User-Defined Character Properties"> + +'+' for union, '-' for removal (set-difference), '&' for intersection + +=back + +=item [6] + +\b \B + +=item [7] + +Note that Perl does Full case-folding in matching (but with bugs), not Simple: for example U+1F88 is equivalent to U+1F00 U+03B9, instead of just U+1F80. This difference matters mainly for certain Greek capital letters with certain modifiers: the Full case-folding decomposes the letter, while the Simple case-folding would map it to a single character. + +=item [8] + +Should do ^ and $ also on U+000B (\v in C), FF (\f), CR (\r), CRLF +(\r\n), NEL (U+0085), LS (U+2028), and PS (U+2029); should also affect +<>, $., and script line numbers; should not split lines within CRLF +(i.e. there is no empty line between \r and \n). For CRLF, try the +C<:crlf> layer (see L<PerlIO>). + +=item [9] + +Linebreaking conformant with UAX#14 "Unicode Line Breaking Algorithm" is available through the Unicode::LineBreaking module. + +=item [10] + +UTF-8/UTF-EBDDIC used in Perl allows not only U+10000 to +U+10FFFF but also beyond U+10FFFF + +=back =item * @@ -1330,7 +1360,7 @@ results, or both, but it is not. The following are such interfaces. Also, see L</The "Unicode Bug">. For all of these interfaces Perl -currently (as of 5.8.3) simply assumes byte strings both as arguments +currently (as of v5.16.0) simply assumes byte strings both as arguments and results, or UTF-8 strings if the (problematic) C<encoding> pragma has been used. One reason that Perl does not attempt to resolve the role of Unicode in @@ -1544,9 +1574,8 @@ are valid UTF-8. =item * -C<is_utf8_char(s)> returns true if the pointer points to a valid UTF-8 -character. However, this function should not be used because of -security concerns. Instead, use C<is_utf8_string()>. +C<is_utf8_char_buf(buf, buf_end)> returns true if the pointer points to +a valid UTF-8 character. =item * @@ -1722,7 +1751,7 @@ to work under 5.6, so you should be safe to try them out. A filehandle that should read or write UTF-8 - if ($] > 5.007) { + if ($] > 5.008) { binmode $fh, ":encoding(utf8)"; } @@ -1733,10 +1762,10 @@ A scalar that is going to be passed to some extension Be it Compress::Zlib, Apache::Request or any extension that has no mention of Unicode in the manpage, you need to make sure that the UTF8 flag is stripped off. Note that at the time of this writing -(October 2002) the mentioned modules are not UTF-8-aware. Please +(January 2012) the mentioned modules are not UTF-8-aware. Please check the documentation to verify if this is still true. - if ($] > 5.007) { + if ($] > 5.008) { require Encode; $val = Encode::encode_utf8($val); # make octets } @@ -1748,7 +1777,7 @@ A scalar we got back from an extension If you believe the scalar comes back as UTF-8, you will most likely want the UTF8 flag restored: - if ($] > 5.007) { + if ($] > 5.008) { require Encode; $val = Encode::decode_utf8($val); } @@ -1757,7 +1786,7 @@ want the UTF8 flag restored: Same thing, if you are really sure it is UTF-8 - if ($] > 5.007) { + if ($] > 5.008) { require Encode; Encode::_utf8_on($val); } @@ -1770,14 +1799,14 @@ When the database contains only UTF-8, a wrapper function or method is a convenient way to replace all your fetchrow_array and fetchrow_hashref calls. A wrapper function will also make it easier to adapt to future enhancements in your database driver. Note that at the -time of this writing (October 2002), the DBI has no standardized way +time of this writing (January 2012), the DBI has no standardized way to deal with UTF-8 data. Please check the documentation to verify if that is still true. sub fetchrow { # $what is one of fetchrow_{array,hashref} my($self, $sth, $what) = @_; - if ($] < 5.007) { + if ($] < 5.008) { return $sth->$what; } else { require Encode; @@ -1813,7 +1842,7 @@ Scalars that contain only ASCII and are marked as UTF-8 are sometimes a drag to your program. If you recognize such a situation, just remove the UTF8 flag: - utf8::downgrade($val) if $] > 5.007; + utf8::downgrade($val) if $] > 5.008; =back diff --git a/gnu/usr.bin/perl/pod/perlunifaq.pod b/gnu/usr.bin/perl/pod/perlunifaq.pod index 9bd103c9ac2..f952d1a3f91 100644 --- a/gnu/usr.bin/perl/pod/perlunifaq.pod +++ b/gnu/usr.bin/perl/pod/perlunifaq.pod @@ -141,16 +141,16 @@ concern, and you can just C<eval> dumped data as always. Starting in Perl 5.14 (and partially in Perl 5.12), just put a C<use feature 'unicode_strings'> near the beginning of your program. Within its lexical scope you shouldn't have this problem. It also is -automatically enabled under C<use feature ':5.12'> or using C<-E> on the -command line for Perl 5.12 or higher. +automatically enabled under C<use feature ':5.12'> or C<use v5.12> or +using C<-E> on the command line for Perl 5.12 or higher. The rationale for requiring this is to not break older programs that rely on the way things worked before Unicode came along. Those older programs knew only about the ASCII character set, and so may not work properly for additional characters. When a string is encoded in UTF-8, Perl assumes that the program is prepared to deal with Unicode, but when -the string isn't, Perl assumes that only ASCII (unless it is an EBCDIC -platform) is wanted, and so those characters that are not ASCII +the string isn't, Perl assumes that only ASCII +is wanted, and so those characters that are not ASCII characters aren't recognized as to what they would be in Unicode. C<use feature 'unicode_strings'> tells Perl to treat all characters as Unicode, whether the string is encoded in UTF-8 or not, thus avoiding diff --git a/gnu/usr.bin/perl/pod/perluniintro.pod b/gnu/usr.bin/perl/pod/perluniintro.pod index 8ce4b7b4464..c0cca15194d 100644 --- a/gnu/usr.bin/perl/pod/perluniintro.pod +++ b/gnu/usr.bin/perl/pod/perluniintro.pod @@ -137,7 +137,7 @@ forms>, of which I<UTF-8> is perhaps the most popular. UTF-8 is a variable length encoding that encodes Unicode characters as 1 to 6 bytes. Other encodings include UTF-16 and UTF-32 and their big- and little-endian variants -(UTF-8 is byte-order independent) The ISO/IEC 10646 defines the UCS-2 +(UTF-8 is byte-order independent). The ISO/IEC 10646 defines the UCS-2 and UCS-4 encoding forms. For more information about encodings--for instance, to learn what @@ -145,12 +145,12 @@ I<surrogates> and I<byte order marks> (BOMs) are--see L<perlunicode>. =head2 Perl's Unicode Support -Starting from Perl 5.6.0, Perl has had the capacity to handle Unicode -natively. Perl 5.8.0, however, is the first recommended release for +Starting from Perl v5.6.0, Perl has had the capacity to handle Unicode +natively. Perl v5.8.0, however, is the first recommended release for serious Unicode work. The maintenance release 5.6.1 fixed many of the problems of the initial Unicode implementation, but for example regular expressions still do not work with Unicode in 5.6.1. -Perl 5.14.0 is the first release where Unicode support is +Perl v5.14.0 is the first release where Unicode support is (almost) seamlessly integrable without some gotchas (the exception being some differences in L<quotemeta|perlfunc/quotemeta>, which is fixed starting in Perl 5.16.0). To enable this @@ -159,12 +159,12 @@ automatically selected if you C<use 5.012> or higher). See L<feature>. (5.14 also fixes a number of bugs and departures from the Unicode standard.) -Before Perl 5.8.0, the use of C<use utf8> was used to declare +Before Perl v5.8.0, the use of C<use utf8> was used to declare that operations in the current block or file would be Unicode-aware. This model was found to be wrong, or at least clumsy: the "Unicodeness" is now carried with the data, instead of being attached to the operations. -Starting with Perl 5.8.0, only one case remains where an explicit C<use +Starting with Perl v5.8.0, only one case remains where an explicit C<use utf8> is needed: if your Perl script itself is encoded in UTF-8, you can use UTF-8 in your identifier names, and in string and regular expression literals, by saying C<use utf8>. This is not the default because @@ -176,7 +176,7 @@ Perl supports both pre-5.6 strings of eight-bit native bytes, and strings of Unicode characters. The general principle is that Perl tries to keep its data as eight-bit bytes for as long as possible, but as soon as Unicodeness cannot be avoided, the data is transparently upgraded -to Unicode. Prior to Perl 5.14, the upgrade was not completely +to Unicode. Prior to Perl v5.14.0, the upgrade was not completely transparent (see L<perlunicode/The "Unicode Bug">), and for backwards compatibility, full transparency is not gained unless C<use feature 'unicode_strings'> (see L<feature>) or C<use 5.012> (or higher) is @@ -415,7 +415,7 @@ streams, use explicit layers directly in the C<open()> call. You can switch encodings on an already opened stream by using C<binmode()>; see L<perlfunc/binmode>. -The C<:locale> does not currently (as of Perl 5.8.0) work with +The C<:locale> does not currently work with C<open()> and C<binmode()>, only with the C<open> pragma. The C<:utf8> and C<:encoding(...)> methods do work with all of C<open()>, C<binmode()>, and the C<open> pragma. diff --git a/gnu/usr.bin/perl/pod/perlutil.pod b/gnu/usr.bin/perl/pod/perlutil.pod index 040f51d5f65..3f53ad0fa51 100644 --- a/gnu/usr.bin/perl/pod/perlutil.pod +++ b/gnu/usr.bin/perl/pod/perlutil.pod @@ -232,7 +232,7 @@ came along modules included in the perl distribution. B<piconv> is a Perl version of B<iconv>, a character encoding converter widely available for various Unixen today. This script was primarily a -technology demonstrator for Perl 5.8.0, but you can use piconv in the +technology demonstrator for Perl v5.8.0, but you can use piconv in the place of iconv for virtually any case. =item L<ptar> |