A-Za-z
defines all the letters.
Perl understands the language-specific data via the standardized (ISO C, XPG4, POSIX 1.c) method called ``the locale system''. The locale system is controlled per application using one function call and several environment variables.
In runtime you can switch locales using the POSIX::setlocale().
The first argument of setlocale()
is called the category and the
second argument the locale. The category tells in what aspect of data
processing we want to apply language-specific rules, the locale tells
in what language-country/territory-codeset - but read on for the naming
of the locales: not all systems name locales as in the example.
For further information about the categories, please consult your
setlocale(3) manual. For the locales available in your system, also
consult the setlocale(3) manual and see whether it leads you to the
list of the available locales (search for the SEE ALSO
section). If
that fails, try out in command line the following commands:
Sadly enough even if the calling interface has been standardized the names of the locales are not. The naming usually is language-country/territory-codeset but the latter parts may not be present. Two special locales are worth special mention:
and ``POSIX''
Currently and effectively these are the same locale: the difference is
mainly that the first one is defined by the C standard and the second
one is defined by the POSIX standard. What they mean and define is the
default locale in which every program does start in. The language
is (American) English and the character codeset ASCII
.
NOTE: not all systems have the ``POSIX''
locale (not all systems
are POSIX): use the ``C''
locale when you need the default locale.
which stands for alphanumeric characters, that is, alphabetic and
numeric characters (please consult the perlre manpage
for more information
about regular expressions). Thanks to the
LC_CTYPE
, depending on
your locale settings, characters like &198;
, &201;
, &223;
, &248;
, can be
understood as \w
characters.
B
does in most Latin
alphabets follow the A
but where do the &193;
and &196;
belong?
Here is a code snippet that will tell you what are the alphanumeric characters in the current locale, in the locale order:
As noted above, this will work only for Perl versions 5.003_06 and up.
NOTE: in the pre-5.003_06 Perl releases the per-locale collation
was possible using the I18N::Collate
library module. This is now
mildly obsolete and to be avoided. The
LC_COLLATE
functionality is
integrated into the Perl core language and one can use scalar data
completely normally -- there is no need to juggle with the scalar
references of I18N::Collate
.
If this is unset and the
LC_ALL
is set, the
LC_ALL
is used as
the
LC_CTYPE
. If both this and the
LC_ALL
are unset but the
LANG
is set, the
LANG
is used as the
LC_CTYPE
.
If none of these three is set, the default locale ``C''
is used as the
LC_CTYPE
.
If this is unset and the
LC_ALL
is set, the
LC_ALL
is used as
the
LC_CTYPE
. If both this and the
LC_ALL
are unset but the
LANG
is set, the
LANG
is used as the
LC_COLLATE
.
If none of these three is set, the default locale ``C''
is used as the
LC_COLLATE
.
LC_...
are set.
LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME
) but Perl
does not currently obey them.