|| Perl Tutorial, Part 2 ||
By Ch4r/Niels | nielsosky@gmail.com
http://binaryuniverse.net | irc.binaryuniverse.net #binaryuniverse
http://anomalous-security.org
http://st0rage.org
http://brain-hack.org
| Copy Info |
This tutorial may be redistributed under the conditions that it is not
modified in any way and full credit is given to the original author,
Ch4r.
| Introduction |
This tutorial seeks to build upon the basic knowledge introduced in my
first Perl tutorial. Before beginning this tutorial you should know
what the following are and how to use them:
- the print function
- variables
- arrays
- the if and if/else control structures
- the while and for/foreach loops
- the <STDIN> file handle used for receiving input from the user
If the meaning of any of those terms or how they are used is unclear,
I'd recommend you take a moment to read through my first Perl tutorial
as the information presented in this article is useless without a
knowledge of what was covered in my previous tutorial.
As usual, please feel free to contact me if you have any feedback
related to the tutorial or you spot errors. I hope you enjoy part two
of my sequence of Perl tutorials!
| Context |
I've decided to kick off my second Perl tutorial with a brief
discussion of context. 'Context' may sound intimidating, but it's
actually a fairly easy concept to grasp and doesn't require learning
any new functions, operators, or anything else directly implemented in
your code (of course, context does affect your code, or it wouldn't be
covered here; it just isn't any part of the code itself). Context is
simply the idea that an expression that uses scalar data will act
differently than an expression that uses non-scalar data (ie, arrays;
this is called list data), and one function or operator may yield
different results depending on whether it is used with scalar or list
data. When used with scalar data, an expression is said to be in scalar
context, and it is said to be in list context when used with non-scalar
data.
An example of how an operator is different in scalar context than in
list context is the <STDIN> file handle. The <STDIN> file
handle reads input from the user, as we've seen. So far we've only used
it in scalar context -- we used it to store input entered by the user
in a scalar variable last tutorial. What would it do in list context
though, such as when the input is assigned to an array? The answer is
that it reads input from the user, and each line (lines being separated
with the enter key) comprises a separate element of an array. The user
terminates all input by entering the EOF (End Of File) character which
is usually Ctrl-D in *nix and Ctrl-Z on Windows.
For instance, suppose the user enters the three lines "line one", "line
two" and "line three" while <STDIN> is being used to assign the
input to the array @lines. The result is that $lines[0] will be "line
one", $lines[1] will be "line two", and $lines[2] will be "line three".
Now suppose that leaving @lines as it is, we were to add it to the
number 3 and assign the result to $result. This expression is in scalar
context as it uses addition. However, @lines is an array. The resulting
value stored in $result is 6. Why? Because if the addition operator is
given an array, it counts the elements of the array and turns that
number into a scalar value, then continues with the addition. In this
case, @lines has 3 elements so the expression @lines + 3 is another way
of saying 3 + 3. Thus, $result is assigned the value 6.
| Hashes |
Hashes are like arrays in that they are used for storing multiple
values in one variable. They differ, however, in the fact that they
don't use index numbers for identifying separate values stored within
them. Rather, each value is identified with a string. The following is
a diagram of how hashes work compared to arrays:
@array:
0 -> "a value"
1 -> "another value"
2 -> "this \nis \nthe \nthird \nelement"
3 -> 78,541
4 -> "final element!"
%hash:
"string" -> "this is a value of the hash\n"
"another string" -> "omg! Guess what?! Another value!"
"blah" -> 2,498
"endz0r" -> "this is the end.\n\a"
There are a couple things that should be noted from this diagram. The
first is that while array names are prefixed with @, the name of a hash
(also referred to as a hashed array) is prefixed with % (eg. %hash,
%thing, %stuff, %etc...). The second is that while arrays are indexed
with the predictable pattern of 0, 1, 2, 3, etc, hashes are indexed
with values supplied by the coder and are not as predictable. This
limits the use of hashed arrays in some ways, but also increases their
flexibility in other areas.
One more thing to note about hashes is that while an element of an
array is referred to as $array[0], an value of a hash is referred to as
$hash{"key"}. Note that hashed arrays use curly braces as opposed to
brackets. Also note that what is the equivalent of an index number in
an array is referred to as a key when used with a hash. For instance,
in $hash{"thing"}, "thing" is the key.
There are two methods we can use to assign a new key/value pair to a
hash. The first is the following:
-----
%hash = ("key 1", "value 1", "key 2", "value 2", "key 3", "value 3",
"etc", "etc..." );
-----
This assigns %hash the keys "key 1", "key 2", "key 3", and "etc" with
their values being "value 1", "value 2", "value 3", "etc..."
respectively. Thus, the first possible syntax for assigning keys and
values to a hash is to list the key and value separated by a comma,
then another key and value with the multiple assignments of keys and
values separated by commas. The syntax used above defines the whole
hash to be suede (of course it can be modified later) overwriting any
pre-existing data.
However, this syntax as it is is not easily readable by humans. To
make
it clearer some of the commas could be replaced with "=>". This
doesn't change anything other than whether the code is easily read by
humans:
-----
%hash = ("key 1" => "value 1",
"key 2" => "value 2",
"key 3" => "value 3",
"etc" => "etc..." );
-----
In this case we've also added some extra whitespace (namely, newlines)
but it produces the exact same array as the previous example did.
The second way to assign a key/value pair to a hash is specifically
assign one key/value pair to a hash, as in the following:
-----
$hash{"key"} = "the value!\n";
-----
%hash now contains the string "the value!\n" with its key being simply
"key".
This notation is an example of how values within hashes are referred
to. Assigning a specific key a value is nowhere near the only thing
that can be done while referring to a specific key/value pair. The
value could be printed, could help comprise a mathematical expression,
could be assigned to another variable, etc -- the possibilities are the
same as with scalar variables or arrays.
There are several functions which will come in handy when working with
hashes. If you're wondering what a function is, it is code written by
other people (well, not always by other people, but that definition
does the job until this tutorial introduces coding your own
subroutines) that performs a specific set of instructions on its
parameters. If you don't quite get that, it'll become clear as we work
with Perl's built in functions.
The first function I will illustrate in this paper is the delete
function. It deletes a key/value pair from a hash. For instance, the
following line deletes the key/value combination 'the_key' from the hash
%the_hash:
-----
delete $the_hash{the_key};
-----
Two more often used functions that are similar to each other are the
keys and values functions. They return a list of the keys of a given
hash and a list of values for that specific hash respectively. For
instance, the following script prints each key-value pair contained in
%hash:
----
#!/usr/bin/env perl
%hash = (
"key_thing" => "this seems to be a string that is
a value in a hash",
"pi" => 3.14
"last" => "omg, guess what?! This is the last
element!"
);
@keyz0rz = keys %hash;
@valuez0rz = values %hash;
print "Keys are:\n";
foreach (@keyz0rz) {
print;
print "\n";
}
print "\nThe values are:\n";
foreach (@valuez0rz) {
print;
print "\n";
}
----
The first line of this script, #!/usr/bin/env perl, has the equivalant
effect as #!/usr/bin/perl. The difference is that no matter where the
Perl interpreter is located on the user's system, it will be executed
as long as it is in the user's path.
The next 6 lines assign keys and values to %hash. Proceeding the
assignment, the keys and values existing within %hash are both assigned
to the appropriate arrays with use of the keys and values functions.
Finally, both arrays are printed to standard output within foreach
loops.
Note that we take advantage of the default argument for the print
function here for the purpose of shortenning our code. If the print
function is called without arguments passed to it (usually the argument
is a string to be printed) then it simply prints the variable $_
regardless of it's contents. In this example, that works out perfectly
as $_ holds the element of the array used in the current iteration of
the foreach loop.
| Subroutines |
Subroutines are an important concept to understand that you will use
more and more as you write longer and more complex programs, and are an
excellent method of organizing and reusing code. If you've ever worked
with defining your own functions in another language, such as C,
subroutines in Perl are the same concept with some syntax differences.
Functions written by other coders have already been introduced and used
frequently throughout this tutorial (as well as part 1). Examples of
these are print, which prints text supplied by the programmer to
standard output, and the delete function, which deletes an element from
a hash.
Subroutines are in all ways identical to those functions with the
exception that they are defined by the programmer writing the script
they're used in. So what exactly are subroutines? They are simply
blocks
of code with labels attached to them and the code they consist of can
be called by simple using the label attached to them. Although it may
not seem like it at first, they are very useful.
For instance, suppose you wanted to code a simple script that prompted
the user for two integers, added them, and displayed the output. It
would consist of a few print()s, an addition operation, and some
assignments to variables. Suppose, however, that you wanted to edit the
script to repeat that procedure five times. Writing the same code over
and over quickly becomes tedious and boring, and when it is a long
segment of code it can be very time consuming. For this reason, Perl
includes a feature that allows coders to define and customize their
subroutines.
Defining subroutines is actually quite simple. The process consists of
typing the 'sub' keyword, followed by the name of the subroutine, and
then the body of the subroutine (the code that is executed when the
subroutine is called) enclosed in braces. For instance, the following
is a subroutine named hello_there that simply prints 'Hello, world!' to
standard output when it is called:
-----
sub hello_there {
print "Hello, world!\n";
}
-----
Calling the subroutine for the purpose of executing its body is equally
simple. Subroutines are called simply by inserting the name of the
subroutine into the script it is to be used in, prefixed with an
ampersand (&). As an example, the following bit of code calls the
subroutine defined earlier:
-----
print "The next line will call hello_there.\n";
&hello_there;
print "Subroutine 'hello_there' was called on the previous line\n";
-----
If the previous two examples are combined into one script, the output
would look like this:
The next line will call hello_there.
Hello, world!
Subroutine 'hello_there' was called on the previous line
A very useful feature of subroutines is the ability to pass arguments,
or parameters, to them when they are called. Arguments are simply extra
pieces of data passed to a function when it is called. Take, for
example, the print function. It usually has one argument of data passed
to it -- one string which is to be printed to standard output. However,
more parameters can be passed to it by separating them using commas,
and this results in each of the strings passed to print to be printed
to standard output. Similarly, it is possible to pass arguments to
user-defined subroutines using the same syntax as with print -- namely,
separating each parameter from the next with commas. Parameters passed
to a function may be enclosed in parenthesis to increase readability,
but the parenthesis are optional in Perl, unlike in some other
languages (such as C).
The following segment of code lists two different ways of calling the
subroutine 'afunc' with three parameters: "hello!!", 45, and "blah".
-----
&afunc "hello!!', 45, "blah";
&afunc("hello!!", 45, "blah");
-----
By this point you are probably wondering how to access and use
arguments passed to a function. The answer is very simple: each
argument passed to a subroutine is placed into the array @_. So, in the
previous example, @_ contains three elements: $_[0], which is the
string "hello!!"; $_[1], which is the numerical value 45; and $_[2],
which is the string "blah". Parameters stored as elements of @_ may be
assigned to variables, passed as parameters to other subroutines, and
used in the same style that regular array elements may be used. Take
care, however, when modifying elements of @_: this modifies the
arguments passed to the current subroutine directly and the original
data will be lost.
Another important concept to take note of is that of scope. In many
languages, such as C, modifications made to variables in one function
do not effect that variable when used in another function or the main()
section of the program. For instance, if variable 'a_var' is assigned a
value of 13 in the function a_func(), a_var is still undefined when
used in main() unless it is declared in main() as well well, and any
modifications made to a_var in a_func will not effect the copy of a_var
stored in main(). This concept is referred to as scope (if you don't
know C or don't understand this example made in C, don't worry; it
should become clear later on).
Perl does not have this restriction, however. If a subroutine assigns
$a_var the string value "Hello, world!", $a_var could be included as a
parameter to the print() function in the main part of your script and
the text "Hello, world!" would be displayed. Eg:
-----
&assignvariable;
print $hellooovariable;
sub assignvariable {
$hellooovariable = "Hello, world!\n";
}
-----
This code will work fine, outputting the text "Hello, world!\n" (with
\n being a newline, of course). However, for several different reasons,
the programmer may desire to make the variable $hellooovariable used in
the subroutine local to that subroutine only. This means that the
remaining sections of the script that is not part of the subroutine can
not access or modify the copy of $hellooovariable stored in that
particular subroutine. It is as if the variable does not exist as far
as the rest of the script is concerned. This also means that a new
variable called $hellooovariable could be assigned the numeric value 27
outside of the subroutine $hellooovariable is local to, and the
$hellooovariable would still hold the value "Hello, world!\n" when used
in the subroutine it was originally local to, whilst it would hold the
value 27 when used in the heart of the script. The concept of scope can
simply be thought of as each subroutine having its own copy of a
specific variable.
So how, you may be asking, can variables be made local to the function
they are declared in? The answer is simply by declaring them with the
'my' keyword. Take a look at the following sample:
-----
my $teh_variable = 27;
&change_teh_variable("Let's change \$teh_variable to this
string!\n");
print $teh_variable;
sub change_teh_variable {
my $teh_variable = $_[0];
}
-----
The first line of this script declares $teh_variable as a local
variable using the my keyword, and assigns it the value of 27. The next
line calls the subroutine change_teh_variable with one parameter: the
string "Let's change \$teh_variable to this string!\n". Control is now
handed over to the function change_teh_variable().
change_teh_variable() should now assign $teh_variable the string
that was passed as the first parameter (remember, $_[0] holds the first
argument passed to a subroutine). Then, once the body of the function
is finished, execution jumps back to the print statement, which prints
27.
Huh? 27? But $teh_variable was assigned a multi word string in the
subroutine called immediately before the print statement. The reason
that 27 was printed was that the function that was called assigns the
string passed to it to the local
variable $teh_variable. If we were to include a print statement inside
change_teh_variable() to print $teh_variable, we'd see that in that
case it did indeed hold the string passed as the first argument (try it
if you don't believe me.. ;-)), but as the variable is local to
change_the_variable(), the copy that the main segment of the script
uses is not modified.
The final basic concept having to do with subroutines that I introduce
here is the ideas that a function returns a value. As an example of
what a return value is, take the keys function, which returns a list of
the keys that the hash specified as a parameter consists of. The
function "keys(%hash)" by itself is completely useless, because the
result of it is returned. This means that we have to use the function
as part of a larger statement. The keys function returns a list of keys in the
hash, and for the function to be of any use at all, something must be
done with that return value - often it is assigned to an array.
As another illustration of what a return value is, take a subroutine
that is used to add two numbers. It could handle the result in a few
possible ways. It may print the result, assign it to a variable, or
simply return it. In the case that it returns a value, it must be used
as part of a larger expression to be made useful. Assuming the function
addtwo() returns the sum of its two parameters, the following is an
example of how the result would be assigned to $sum_var:
-----
$sum_var = &addtwo 3, 4; # $sum_var now holds 7
-----
If we were simply to use the subroutine addtwo() by itself, it would be
a waste of computing power (not much though, mind you, as it's a pretty
simple function). It would add the numbers, return the value, but as
what would be done with the return value is not specified, it would
simply move on. If the concept of return values still seems confusing,
you can think of it this way: when a subroutine with a return value is
executed, the body is executed as with any other subroutine, but the
use of the subroutine itself inside your code evaluates to whatever the
return value is, just like a variable evaluates to what its value is.
Returning values from a subroutine in Perl is not very complex at all,
although the idea may seem confusing at first. It is simply done with
the return function, which accepts one parameter: the value to return.
For instance, the following subroutine returns the string "Less than
fifty" if the number entered by the user via STDIN (standard input)
is less than fifty, otherwise it returns "Not less than fifty":
-----
sub isltfifty {
print "Enter a number por favor: ";
chomp(my $num = <STDIN>);
if($num < 50) {
return "Less than fifty";
}
else {
return "Not less than fifty";
}
}
-----
| Regular Expressions |
Regular expressions make Perl a very powerful, flexible, and suitable
language for dealing with text manipulation. They give the programmer
the ability to tell whether a string matches a certain pattern.
Although this may not sound like much, it is. With regular expressions,
you can tell precisely what pattern(s) a string matches, remember what
parts of the patterns matched the string, etc.
As a simple example of how to use regular expressions, let's say that
we want to see if the variable $string contains the text 'in'. The code
to do this is as follows:
-----
$string =~ /in/;
-----
This statement by itself will do nothing. It is, however, useful as
part of a boolean expression in conditional statements such as if. The
binding operator, =~, is used to see whether the string on the left
matches the regular expression on the right. So far, we know that
whatever text is assigned to $string is being tested with the regular
expression /in/. The expression returns a value of 1 if the match
succeeds and returns a value of undef otherwise. Undef, short for
undefined, is a special value in Perl. When used as a string, it is a
null string (""), when used as a number it is 0, and when used in
boolean expressions it is false. It is the value that a variable holds
before it has been assigned a value.
The regular expressions placed on the right side of the =~ operator
will be the primary focus of this section of the tutorial, as this is
the part that allows the programmer to specify what pattern is to be
matched and is what provides the power and flexibility associated with
regular expressions -- this is the regular expression. Regular
expressions in Perl consist of the pattern to be matched between two
forward slashes // as delimiters. In the previous example, the regular
expression is /in/, which matches all strings that contain the sequence
of characters 'in' within them, regardless of the rest of the string.
Thus, if $string had contained the text "Binary Universe" the pattern
match would have returned true, but if it had been simply the string
"bananas" it would not have done so.
Regular expressions also allow the use of wildcards. The . (yes, a
single dot) wildcard matches any one character with the exception of a
newline. Thus, the pattern /bl.h/ matches the strings 'blah', 'bl4h',
'bl@h', 'bl.h', 'bl2h', or any other string that has a b and an l next to each other, a character,
than an h, but does not match
strings such as 'binary universe', that do not. To include a period
literally within a string, if you wanted to match the string
'www.binaryuniverse.net' for instance, it should be prefixed with a
backslash. Eg: /www\.binaryuniverse\.net/.
A useful feature that is similar to the wildcard character . is the
ability to use character classes. Character classes are used in the
same way as the dot character but they do not match any character --
they match any of the characters specified by the programmer that are
included between [] brackets. As an example, the following regular
expression matches either the string 'cat' or 'car':
/ca[tr]/
Ranges of characters are specified by including the first character of
the range, a -, then the last character of the range. For instance,
/bl[a-z]h/ matches bl, any lowercase alphabetic character, then the
character h. There are even a few common shortcuts for different
classes of characters that are commonly used. These shortcuts consist
of a backslash and then the letter than represents them. For instance
\d matches any digit. The following table lists the most common
shortcuts and the character classes they represent:
\d - Any digit. [0-9]
\w - Any 'word' character, including any alphanumeric character or an
underscore. [a-zA-Z0-9_]
\s - Any whitespace character (tab, space, etc).
\D - Any non-digit.
\W - Any non-'word' character.
\S - Any non-whitespace character.
The ^ character is used to indicate the opposite of a character class.
That is to say, [^0-5] matches any character that is not a digit
between 0 and 5. Thus, [^0-9] is the same as [\D] which matches
anything that is not a digit.
The pipe (|) character is used to represent 'or' in regular
expressions, similar to the way in which two pipes (||) represent
logical or in Perl. Thus, the following regular expression matches
either 'w00t', 'woot', or 'wewt':
/w00t|woot|wewt/
It should be noted, however, that parenthesis are used for determining
precedence. Thus, the previous regular expression could be rewritten to
be shorter, as in the following example:
/w(00|oo|ew)t/
This matches a w, than either
00, oo, or ew, then a t.
Another important concept to grasp where regular expressions are
concerned is that of quantifiers. As an example of how a quantifier
works, take the commonly used * quantifier. When the * quantifier is
used, it tells Perl that the previous part of the regular expression
(the previous character unless grouped otherwise with parenthesis) may
appear any number of times (5, 9, 276, even 0) and the regular
expression as a whole should still return true. For instance, the
following regular expression matches the string 'perl',
'pppppppppppppppppppperl', 'erl', or any other string that begins with
any number of p characters
followed immediately by the text 'erl':
-----
/p*erl/
-----
And, as an example of how grouping regular expressions with parenthesis
works with the * quantifier, the following matches 'perperperl',
'perl', 'l', or any other string that begins with the string 'per' any
number of times (including, must I remind you, 0) followed by the
character l:
-----
/(per)*l/
-----
With this knowledge, we can conclude that if we want a specific part of
a string to match ANY character ANY number of times (including, of
course, 0), we can simply use the dot metacharacter followed by the
asterisk quantifier. Thus, the following regular expression matches ANY
string:
-----
/.*/
-----
This segment of my Perl tutorial has introduced only the basics of what
is a very wide and powerful feature of the Perl scripting language:
regular expressions. Expect a more in depth look at good ol' regexps in
part 3 as well as an introduction to more topics that Perl has to
offer.
| Wrapping It Up |
As I did in my last Perl tutorial, I will attempt to give an example of
a script that implements most of the main topics covered in this
tutorial. This time, I've chosen a very pointless example -- it's a
script that maps IP address to the name of the owner of the computer
that they correspond to (pointless for many reasons, including the fact
that most people have dynamic IPs :P). It has three commands that
can be entered - 'add', which is used to add an IP address to the
database of IPs (the database is actually only temporary and is
erased once execution of the script stops, which is another reason this
is a very inneffective script; it could be done otherwise, but file I/O
and Perl DBMs have not been covered yet), 'list', which lists the IPs
entered in the database, and 'delete', which deletes an IP from the
database. Here it is:
-----
#!/usr/bin/env perl
while() {
print "Please enter a command: ";
chomp (my $cmd = <STDIN>);
if ($cmd =~ /add/) {
&add_ip_to_hash;
}
elsif ($cmd =~ /delete/) {
&delete_ip_from_hash;
}
elsif ($cmd =~ /list/) {
&list_ips;
}
elsif ($cmd =~ /exit/) {
exit 0;
}
else {
print "Invalid command!\n";
}
}
sub add_ip_to_hash {
print "Enter ip address: ";
chomp (my $ip = <STDIN>);
print "Enter owner of computer that $ip corresponds
to: ";
chomp (my $o = <STDIN>);
$hash{$ip} = $o;
}
sub delete_ip_from_hash {
print "What IP would you like to delete from the
database? ";
chomp (my $ip = <STDIN>);
delete $hash{$ip};
}
sub list_ips {
my @keyz0rz = keys (%hash);
foreach (@keyz0rz) {
print $_, " --> ", $hash{$_}, "\n";
}
}
-----
Ok. I lied when I said the only commands were 'add', 'delete', and
'list'. There's one other command. As a challenge, I'm going to let you
figure out a) what that command is b) how it works the way it does, and
c) how the whole script works the way as it does as a whole.
If you have any questions regarding the content of this tutorial,
please contact me via IRC or forums. I'm usually around BU, ASO,
st0rage, DI-Sec, and some other places.