You can download perl 5.8.0 from Hobbes.
There are a couple of config.sys statements you may need.
Depending on how perl was built it might have complied into internal
scripts, the drive letter of the drive the install was done to in addition to the path. Such
as e:/usr/lib/perl/lib
- if the builder is real smart he will build
it as /usr/lib/perl/lib
. In the second case you can move the whole
directory tree to what ever drive you like and it will work. In the
first case you need the following environmental variable to change all
occurrences of where it was built to where it is now installed on
the fly.
PERLLIB_PREFIX=e:/usr/lib/perl/lib;X:\usr\lib\perl\lib
Where X: is, as usual, the drive you have /usr/perl/lib installed on.
Perl will also be looking for a *nix type shell to run things such as
system calls in, so the second config.sys statement you may need is
PERL_SH_DIR=X:\BIN
Which tells perl where to find a shell executable.
Perl is yet another scripting language like REXX, python etc. Like REXX it was invented by one man, Larry Wall, and his book "Programming Perl" is a very good introduction. Known by all perl addicts as "The camel book" it is published by O'Reilly. Who, incidently, put a lot of support into perl. They also publish some very handy "Pocket References" on various subjects including perl and HTML. Not all bookstores stock them though.
Officially perl is the "Practical Extraction and Report Language", unofficially it has been referred to as the "Pathologically Eclectic Rubbish Lister". The perl motto is TMTOWTDI - There's More Than One Way To Do It - as, given more than one perl programmer and a task, they will all come up with different code. All of which works and is "correct".
Those of you who know REXX will know that the OS/2 command interpreter will run
as a batch file any file ending with .cmd - If that file starts with a /* */
type comment line then it will pass the rest of the script to the REXX
interpreter. However, it does not know about perl. There are two ways around
this.
You can have perl.exe in you path so that you can type
perl <name of perl script>
(Note perl scripts are normally
suffixed .pl). The other is to suffix the script with .cmd and put
extproc perl
as the first line of the script. With this method
the OS/2 command processor loads perl and passes it the rest of the script.
The main disadvantage of the extproc approach is the script will no longer run
on a *nix box and intelligent editors will not know they are editing perl,
which they do from the .pl suffix.
Because I write and run scripts on both OS/2 and unix, and like my editor
to know I am working with perl and not REXX, which most default to if they see
.cmd, all the following examples are *nix style. They will work in OS/2 if you
cut and paste them and invoke with:
perl <name of perl script>
.
So helloworld.cmd looks like:
extproc perl print "Hello World\n";
and is run by just typing helloworld
.
helloworld.pl on the other hand looks like:
print "Hello World\n";
and is run by typing perl helloworld.pl
Both produce the same
result - the string "Hello World" followed by a line feed. (That's the \n for
you non C types.)
Before we go any further, I must mention the shebang line. Shebang is hallowed
unix speak for the first line of a script that starts #!
. What
follows is the program to execute the rest of the script. So perl scripts on
unix tend to start
#!/usr/local/bin/perl -w
Or whatever the path to the perl executable is.
The -w
above is a switch to tell perl to turn on warning messages.
Putting it in the script makes sure we don't forget any switches.
This works on OS/2 in
so far as the switches are obeyed. Like all good unix programmes perl has a
lot of command line switches. The only one you may need on the command
line itself is -T
which turns on "taint mode" - more of that later.
Note that you can have a shebang line to set switches and use extproc to run the script. They are not mutually exclusive.
Now you know how to run perl scripts lets take a look at the syntax. Perl
statements are terminated by a semicolon (;) and because of this they can
cross physical lines.
Unless inside quoted strings, whitespace is generally not significant.
Anything after a hash (#) sign is taken as comments to the end of the
current input line. How long should a line be? Keep it readable.
Traditionalists will use 72 character lines harking back to 80 column punch
cards where cc73-80 were used to sequence number the deck - they often used
to get dropped!
So you can't have a block of comments REXX style as:
/* some comments and more comments ending here */
You have to code them like this:
# some comments # and more comments # ending here #
Unlike REXX, perl is a "Data Typed" language. This means you have to tell perl what type of variable each variable is. For this article we will only consider scalar, array and hash types. Scalars start with a $ sign, arrays with an @ sign and hashes with a % sign. A variable can hold any type of data - string, number etc. This is also true of arrays. You can have an array where element 0 (perl starts indexing from 0 BTW) is a number, element 1 is a string and element 2 a reference to something else. We are not going into references in this article, for now just take it as read that an array element can contain just about anything - including another array. As indeed can scalars.
Let's look at some code:
#!/usr/local/bin/perl -w # scalar1.pl $thing = 0; # single quotes stop substitution print 'thing = $thing\n'; print "\n"; # double quotes allow substitution print "thing = $thing\n"; { $somethingelse = 1; $thing++; # this is a quick way of incrementing by one. print "thing = $thing somethingelse = $somethingelse\n"; } print "thing = $thing somethingelse = $somethingelse\n";
Running this as perl scalar1.pl
gives:
thing = $thing\n thing = 0 thing = 1 something else = 1 thing = 1 something else = 1
Two points here - putting something in single quotes stops substitution unlike
REXX where what you quote with does not matter as long as they match. If you
need to put the quote character inside the quoted string, escape it with
a backslash (\)
"She said \"Oh dear\"."
The second point is the wiggly braces.
These denote a "code block" and are usually found after IF's WHILE's etc.
The point here is that a variable inside the {} is not always the same as the same named variable outside the {}. Technically this is known as the "scope" of the variable. Lets hack the code around and see what happens.
#!/usr/local/bin/perl -w # scalar2.pl $thing = 0; print "\n"; print "thing = $thing\n"; $somethingelse = 0; { $somethingelse = 1; $thing++; # this is a quick way of incrementing by one. print "thing = $thing somethingelse = $somethingelse\n"; } print "thing = $thing somethingelse = $somethingelse\n";
Running this as perl scalar2.pl
gives:
thing = 0 thing = 1 something else = 1 thing = 1 something else = 1
The variable $somethingelse is declared before the code block so exists inside the block and afterwards. Now lets make it local inside the block. We do that by prefixing with "my"
#!/usr/local/bin/perl # scalar3.pl #use strict; #use warnings; $thing = 0; print "\n"; print "thing = $thing\n"; $somethingelse = 0; # lots and lots of code $somethingelse = 1; # this code block could be an if or a loop { my $somethingelse = 2; $thing++; # this is a quick way of incrementing by one. print "thing = $thing somethingelse = $somethingelse\n"; } print "thing = $thing somethingelse = $somethingelse\n";
Running this as perl scalar3.pl
gives:
thing = 0 thing = 1 something else = 2 thing = 1 something else = 1
Now see what happens? There are now two variables called $somethingelse one
inside the {} and one outside. Obviously here is scope for great confusion so
there is a bit of magic perl can help us with. use strict;
If we put that at the start of our script then every variable will need
to be declared with "my" - but error messages will be issued whenever which
copy of a variable to use is open to question.
I advise you to always use it.
Cut and paste the above into an editor and then try it.
Uncomment the use
statements, removing the lines with
$somethingelse outside the code block and without the "my" inside the code block.
Try combinations of these.
Perl uses different operators for testing numbers and strings.
It is easy to remember which as numbers use symbols and strings use letters.
Operation | On Numbers | On Strings |
---|---|---|
Less than | < | lt |
Less than or equal | <= | le |
Equal | == | eq |
Not equal | != | ne |
Greater than or equal | >= | ge |
Greater than | > | gt |
Compare * | <=> | cmp |
* <=> is known as the "spaceship operator", both types of compare return -1, 0, +1, for less than, equal to & greater than. Note that you might get away with using the wrong type of operator without getting an error, but the result will certainly not be what you want or expect.
Now lets have a look at arrays. Array names start with an @ sign. They are 0
indexed and individual elements are referenced by $arrayname[element #]
Arrays
are really lists and operate in "list context" however if they are
referenced in scalar context they return the size of the array:
#!/usr/local/lib/perl # array.pl use strict; use warnings; my @array; $array[0] = 1; $array[1] = 'thing'; $array[2] = 3; print "@array \n"; print $#array."\n"; my $i = @array; print "$i \n";
Results in
1 thing 3 2 3
Hashes on the other hand are much more fun. They start with a % sign and
individual elements are referenced by $hashname{element_name}
The element name is more correctly known as the "key".
Note that:
They can be sorted however, either by key or by value. The following example shows this.
#!/usr/local/lib/perl -w # hash4.pl use strict; my %hash; my $thing; my $key; # define a hash with keys in jumbled order # $hash{'a string'} = 'zzzzzzzz'; $hash{'b string'} = 'string'; $hash{'c string'} = 'string'; $hash{'12345678'} = 'another string'; # the hash is stored as a list # of key/value pairs # this iterates over the list foreach $thing (%hash) { print "$thing\n"; } # the above printed keys and values on separate lines. # to get key/value pairs we must tell it to only process the keys thus print "\n\n"; foreach $key (keys %hash) { print "Key: \"$key\" Value: \"$hash{$key}\"\n"; } print "\n\n"; # they still print in a random order though so we sort the keys foreach $key (sort keys %hash) { print "Key: \"$key\" Value: \"$hash{$key}\"\n"; } # and here we sort by value. hash_by_value is a subroutine invoked by # the sort process - see below for more details # print "\n\n"; foreach $key (sort hash_by_value keys %hash) { print "$key $hash{$key}\n"; } exit; sub hash_by_value { # sort calls this routine with two values $a and $b # the code has to tell sort which order they should be in # it does this by returning: # -1 if $a is before $b # 0 if they are equal # +1 if $b is before $a # if the values we are sorting on are numeric we use the <=> operator # if the values are strings we use the cmp operator # the || syntax runs if the left hand result is 0 # so here we are sorting on the string value of the hash and if the # values are equal we sort on the key $hash{$a} cmp $hash{$b} || $a cmp $b; # we don't need to return anything as # 1) perl supplies a return by default # 2) if a return value is not specified perl returns the # value of the last expression }
Results in:
b string string 12345678 another string c string string a string zzzzzzzz Key: "b string" Value: "string" Key: "12345678" Value: "another string" Key: "c string" Value: "string" Key: "a string" Value: "zzzzzzzz" Key: "12345678" Value: "another string" Key: "a string" Value: "zzzzzzzz" Key: "b string" Value: "string" Key: "c string" Value: "string" Key: "12345678" Value: "another string" Key: "b string" Value: "string" Key: "c string" Value: "string" Key: "a string" Value: "zzzzzzzz"
Perl has a slightly odd definition of true and false.
An undefined value is a variable that has no value - perl does not default
values when you declare the variable as some other languages do. REXX for
example defaults the value of a variable to it's name in upper case.
So given the following code snippit
my $var1 = 3; my $var2;
Then $var1
has the value 3 and $var2
has no
value and is undefined. Note that undefined is not zero - it is no
value. Variables may be made undefined. Why would you want to? Well many modules
return undefined if there is an error or no data to return.
You can test for definition thus:
if ( defined $var ) # test if $var has a value
Or conversely
if ( ! defined $var ) # test if $var has no value
Undefined is similar in concept, and as confusing, as NULL in database speak.
Perl implements powerful regular expressions and here are a couple of examples.
I had to parse some XML consisting of employee records.
The data was around 120,000 records with many more key/value pairs. However, there was no deeper nesting of keys and no attributes to worry about. Also, the API I was going to use to pass the data to another application that did not understand XML took a key/value hash as one of its parameters. Handy :-) Now the code:<data> <EmployeeProfile> <name>Fred Flintstone</name> <town>Bedrock</town> <spouse>Wilma</spouse> </EmployeeProfile> <EmployeeProfile> <name>Barny Rubble</name> <town>Bedrock</town> <spouse>Betty</spouse> </EmployeeProfile> </data>
So how does it work? First I change what perl thinks of as line end. This lives in a special variable "$/" by settinguse strict; use warnings; my %inputkeyvalue; open EMPS, "<emp.xml" or die "Can't open input data $!\n"; $/ = "<\/EmployeeProfile>"; # change line ending while (
) { (%inputkeyvalue) = m/<(\w+)>(.*)<\/\1>/g; foreach my $key (sort keys %inputkeyvalue) { print "$key $inputkeyvalue{$key}\n" } print "\n"; }
$/ = "<\/EmployeeProfile>";
(%inputkeyvalue) = m/<(\w+)>(.*)<\/\1>/g;
name Fred Flintstone spouse Wilma town Bedrock name Barny Rubble spouse Betty town Bedrock
Now that was all I needed, but just in case someone asks "How would you sort that?" We need to get into "references". I don't want to get into that here in any detail, but simply put a reference can be thought of as a pointer to something rather than the something itself.
Giving:use strict; use warnings; my %inputkeyvalue; my @emps; # an array to hold hashes open EMPS, "<emp.xml" or die "Can't open input data $!\n"; $/ = "<\/EmployeeProfile>"; # change line ending while (
) { push @emps , {}; # put pointer to an anonymous hash on end of array (%{$emps[-1]}) = m/<(\w+)>(.*)<\/\1>/g; # fill that hash delete $emps[-1] if ! keys %{$emps[-1]}; # drop array element if empty. ie last } foreach $_ (sort by_name @emps) { foreach my $key ( sort keys %{$_}) { print "$key ${$_}{$key}\n"; } print "\n"; } exit 0; sub by_name { ${$a}{'name'} cmp ${$b}{'name'}; }
name Barny Rubble spouse Betty town Bedrock name Fred Flintstone spouse Wilma town Bedrock
The second example was from my web site that contains a searchable archive of documents. After it went live some bright spark said it would be really cool if, having opened a document the search had turned up, the searched for text was highlighted. I suddenly realised that if the search results screen held a few extra pieces of information and if I used a button to view the document rather than a standard link I could do it. The essential bits of the perl cgi script that pressing the button invoked where $ss contains the search string and NEWSLETTER is an open file handle pointing to the document.
undef $/; # This undefines the line ending so it will read the # whole of the document in one go. $_ = <NEWSLETTER>; # read the lot - the <> operator is i/o on handle s/($ss)/\<font color="#ff0000"\>$1\<\/font\>/gis;
Taking it apart, s/....../ is the search and replace operator. If you don't tell it what to operate on it defaults to $_ (So does match BTW). So it finds the search string $ss and replaces that with itself $1, surrounded with font end-font tags (Thank's Brian!) to set the colour to red. It does this globally g, case insensitively and treats the whole string as one line s. So in one pass we have highlighted in red every occurrence of the search string. Pretty neat huh?
When you use perl for cgi scripts someone will try and hack your web server by
breaking the scripts with bad data. Perl has a taint mode that is set by the -T
command line switch and perl will then whinge about any unsafe use of data
that has been obtained from the web source, amongst other places. You have to
validate and "clean" such values using regular expressions before using them.
An example might be an email address. Now it would be reasonable to assume that
the email address be used in a reply. This could be done by running a
system command like
`mail email-address message`
Perl would normally run this in a
shell as the userid the web server was running as. Now consider if the
"email address" consisted of the command line command separator character and
then some nasty command like "rm -R /*" Ouch in spades. Now perhaps you see why
taint mode is a good thing tm.
And finally, how can anyone not like a language that boasts an
unless
statement?