|| Perl Tutorial, Part 1 ||
By Ch4r/Niels | nielsosky@gmail.com
http://binaryuniverse.net | irc.binaryuniverse.net #binaryuniverse
http://anomalous-security.org
http://st0rage.org
http://brain-hack.org
| Copy Info |
This tutorial may be redistributed as long as it remains completely
unchanged and full credit is given to me, Ch4r/Niels.
| Shouts |
Shouts to mu, dlab, Cryptic, Oropix, deep, CreepyNodque, Sintigan,
ScM, Tele, Ic3D4ne, Ee77,
ponyboy, Inviz, and everyone that I forgot.
| Introduction |
If you bothered to read the title, you'd know that this is a Perl
tutorial. This tutorial doesn't aim to teach you every function in the
Perl scripting language - it's
meant to be a simple introduction to the language that teaches you the
basics and should
provide with enough for you to take your first step down the path of
Perl coding. If you're looking
for some more material on Perl after reading this, I'd recommend
O'Reilly's Perl books, Google,
http://programmingtutorials.com (look in the Perl section), and
http://freewebs.com/okidan (Okidan's site, which has several tutorials
and ebooks related to Perl
as well as Cryptography, Security, etc).
So what is Perl? Perl, short for Practical Extraction and Report
Language, is a scripting language that was originally invented by Larry
Wall for working with text files.
It's not a compiled language like C/++ or Java in that programs are
first compiled into
executables; it's an interpreted language, meaning that you simply tell
the interpreter to
parse the file containing the Perl script and it executes it line by
line. This makes programs
quicker in the sense that there is no compile time to wait through, but
it also means that the
programs may run a bit slower than the equivalent in a compiled
language.
If you have any comments on this tutorial, please feel welcome to
contact me in whatever way you please with your feedback. Also note
that I'm far from mastering Perl,
so if you see something that looks messed up please contact me. Enjoy...
| What You Need |
You will need a few things to get started with Perl programming. First,
you will need an interpreter to interpret and run the scripts you
write. If you're using
some form of *nix (Linux, BSD, Mac OS, Solaris, etc) you probably
already have a Perl interpreter
installed - you can check by opening a shell and entering the command
'perl'. If you get a
'command not found' error, you probably don't have Perl installed.
If you're running Windows (yuck) or for some other reason don't have
Perl installed, you can download Perl from http://perl.org. While we're
on the topic of
operating systems, I should also say this: I use Linux in this
tutorial, but this should also work on
other operating systems.
| Hello, World! |
Similar to many other programming texts, we're going to begin by
creating a simple program that
prints "Hello, World!" to standard output. Open up a text editor such
as pico, vim, Notepad, etc
and enter the following code (the ----- is just to show where the code
is, don't enter that):
-----
#!/usr/bin/perl
print "Hello, world!\n";
-----
Save the file as hello.pl (or anything else with a .pl extension) and
then open a command line interface. Change directories to the directory
you saved the hello.pl
script in and type 'perl hello.pl'. The output should be "Hello,
world!":
bash-2.05$ perl hello.pl
Hello, world!
bash-2.05$
Let's go over the source... the first line, "#!/usr/bin/perl", tells
the interpreter the path to where Perl is installed on your computer.
If you use Windows you don't
really need this line, but if you use *nix you must have it at the
first line of your script and
replace /usr/bin/perl with
wherever Perl is installed. If you're having problems and are on *nix
that could be why, but note that Perl is usually installed to
/usr/bin/perl. The line that reads
'print "Hello, world!\n";' tells the interpreter to print the text
'Hello, world!' followed by a
newline. The print function is used to print the text between quotes,
the text 'Hello, world!' is
the text to be printed to standard output by the print function, and
the '\n' character tells
Perl to print a newline. The semicolon is required at the end of a line
in Perl, similar to several
other languages.
| Strings |
Strings are simply several characters put together, usually forming
something meaningful. For example, the word "perl" is a string as is
the sentence "Binary
Universe pwns all" or the progression of random characters
"vre896q7cdrnhfqy7r8_@09". In Perl,
strings are printed to standard output (usually the screen) using the
print function. For
example, we saw the line
print "Hello, world!\n";
in the hello.pl script. As we already saw, this prints the string
"Hello, world!" followed by a newline. The \n (newline) character is
what is called an escape
sequence. There are many escape sequences that perform various tasks
such as \t, which prints a tab,
and \a, which is a beep. Note that \n is by far the most commonly used
escape sequence.
So what if we wanted to print the progression of characters "\" and "n"
instead of printing a newline? In that case, we would execute one of
two possibilities. The
first is to replace the double quoted string "blah blah blah\n" with
'blah blah blah\n'. The
single quotes tell the interpreter that the string should be
interpreted literally, meaning
that whatever is typed will be printed whether or not it is an escape
sequence. Replacing the print
statement in the hello.pl script with a single quoted string would
print "Hello, world!\n"
instead of "Hello, world!" and a newline.
The second possibility is to replace "\n" with "\\n". This works
because \\ in a double quoted string means that the character \ is
printed, leaving the character "n" unaffedted by the previous
backslash. Thus, the "n" is printed as well, making the output "Hello,
world!\n".
Now let's make things a tad more complex - look at the following
example:
print "Hello, " . "world" . "!\n";
If you substitute the print statement in the hello.pl script with the
above one, the same result
is achieved. This is because the dot is used for concatenation, meaning
that it is used for
joining several strings together. In the above example it joins the
strings "Hello, ", "world",
and "!\n" together to be printed.
The same effect could be achieved by using commas instead of dots:
print "Hello, ", "world", "!\n";
This works the way it does because the print function takes a list of
strings to print separated by commas (although normally only one string
is printed and there is no
need for commas).
| Numbers and Numeric Expressions |
Mathematical expressions (eg 2+2, 6-5, 2+2+6-5, etc) can be evaluated
in Perl. The following are some commonly used operators:
+ - addition
- - subtraction
* - multiplication
/ - division
% - modulus/remainder
eg:
print 20+3; #Prints 23
print 20-3; #Prints 17
print 20*3; #Prints 60
print 20/3; #Prints 6.66666666666667
print 20%3; #Prints 2
This should be fairly straightforward with the exception of one thing -
wtf does the "#" mean? In Perl, # denotes a comment, meaning that
everything following the # until
the end of the line is to be ignored by the interpreter. So when we
enter "#Prints 23" the
interpreter doesn't care because it simply ignores that part of the
line seeing as it is commented out.
One very important concept to remember when writing code with
mathematical statements in it is operator precedence. For example,
5+5*5 returns 30 because the
interpreter first evaluates 5*5,
which yields 25, and then adds five to it. The same concept holds true
in basic arithmetic. If we wanted the interpreter to evaluate 5+5 first
and then multiply the
result by five, we would simply enclose the statement 5+5 in
parenthesis: (5+5)*5. This returns 50, not
30.
| Variables |
A variable is simply a series of characters used to represent a string
or number. For example, if I assign the variable "$variable" the string
value "I am a variable", wherever we use $variable we are actually
using the string "I am a variable" (with a few exceptions such as if it
is used in a single quoted string). So if we print "$variable" we are
actually printing "I am a variable".
Scalar variables in Perl always begin with the $ character followed by
the name of the variable (a scalar variable is one variable that is
assigned one value; we'll see examples of non-scalar variables later).
In addition to starting with the $ character, the character following
the $ character must be either an underscore or a letter and all the
characters after that may be underscores, letters, or numbers.
Following are what would be valid variable names:
$w00t
$_t3h_variablez0r
$var
$abc123
And now, what are invalid variable names that generate errors:
var
$123abc
$abc^%$
$abc$123
t3h_variablez0r
Values are assigned to variables using the = operator and can be
printed using the print statement. Here's another modification to the
hello.pl program:
-----
#!/usr/bin/perl
#hello.pl - using variables
$hello_var = "Hello, world!\n";
print $hello_var;
-----
In this example first the string "Hello, world!\n" (with \n being a
newline) is assigned to the variable $hello_var and $hello_var is
printed to standard output. Since $hello_var is a variable containing
the string "Hello, world!\n", "Hello, world!\n" is printed. Variables
can also be printed as part of a double quoted string, or concatenated
together with other strings in the print statement.
To print the string "$hello_var" to standard output literally instead
of the contents of the variable $hello_var we have the same
possibilities that arose when we wanted to print the string "\n"
literally instead of a newline: using a single quoted string instead,
or placing a backslash before the value we want to print literally:
print "\$hello_var is $hello_var";
print '$hello_var is' . " $hello_var";
The first prints "$hello_var is Hello, world!" and then a newline (we
didn't need to add the \n escape sequence at the end because the
variable $hello_var already contained a newline) and the second example
prints the same thing using a single quoted string concatenated with a
double quoted string.
| Boolean Tests & Control Structures |
Most useful programs usually execute different code based on whether a
given test returns true or false. The concept of having a script act
differently based on whether something is true or false is a very
important one in your future as a coder as you will find yourself
implementing boolean tests into your programs with frequency of you
pursue much of a future in coding at all.
One of the most commonly used features of the Perl language (and many
other programming languages for that matter) is the concept of the 'if'
control structure. The if control structure receives an expression, and
if the expression is true it executes the block of code enclosed in {}
braces directly after it. If the previous sentence seems confusing,
don't worry -- after seeing some examples you'll find the if control
structure easy to understand. Here is the basic syntax of the if
control structure:
-----
if (boolean_test) {
statements to execute
if boolean test is true
go here
}
-----
For example, the following code tests to see if 5 > 3 (five is
greater than three) and if it is prints the message "It's true...5 is
greater than 3!", and then a newline:
-----
if (5 > 3) {
print "It's true...5 is greater than 3!\n";
}
-----
If we switched the 5 and the 3 so that it read "if (3>5) {...}", the
print statement would _not_ be executed because 3 is _not_ greater than
5. The if control structure can be read in English simply as "if
[test_here] then [code_to_execute_if_test_returns_true]". In this case,
"if five is greater than three then print 'It's true...5 is greater
than 3!'".
We aren't limited to using > in if conditional. We can test whether
a number is less than, equal to, or not equal to another as well as
performing other tests. In addition, we can specify multiple tests that
must return true for the body of the control structure to execute. Some
of the operators we could use are as follows:
== - equal to (numeric)
eq - equal to (strings)
> - greater than
< - less than
>= - greater than or equal to
<= - less than or equal to
!= - not equal to
|| - logical or
&& - logical and
These are fairly straightforward and can be implemented in the same
fashion as > was in our example with the exception of the last two,
which require an explanation.
||, logical or, is used when multiple tests are implemented in the if
control structure. If any one of the tests joined together with the ||
operator(s) is true, then the expression as a whole returns true and
the body of the if structure is executed. For example, the following
executes the print statement if any of the tests 5 < 3, 4 < 7, or
10 < 9 is true:
-----
if (5 < 3 || 4 < 7 || 10 < 9) {
print "One of the conditions was true\n";
}
-----
&&, logical and, works the same way except all of the
conditions must be true for the expression as a whole to return true.
If you substitute all of the || operators in the previous example with
&& operators, the print function is _not_ executed as not all
of the conditions were true (namely, 5 is not less than three and 10 is
not less than 9). However, if we use the following the print statement
_is_ executed:
-----
if (3 < 5 && 4!=9 && 6>= 5 && 8 == 8) {
print "All of the conditions were true\n";
}
----
A useful variation of the if control structure is the if-else control
structure. The if-else control structure functions in the same way as
the if control structure but in addition to executing a block of code
if the given condition is true, it executes an alternate block of code
if the given condition is false. Example:
-----
if (3 > 5) {
print "Three seems to be greater than five.\n";
}
else {
print "Well, three is not greater than five.\n";
}
-----
In this example, the Perl interpreter first checks whether 3 > 5. As
three is not greater than five, the interpreter skips over the rest of
the if control structure. It then arrives at the else structure and
seeing as 'else' specifies an alternate block of code to execute when
the condition is false, the print statement declaring that three is not
greater than five is executed.
So far we've seen two conditional control structures, if and if-else.
There is, however, another widely used type of control structure that
is used as a method of repitition - a loop. The first type of loop
covered here is the while loop. This loop is given a test, similar to
the if control structure, and while the test is true it continues to
execute the block of code enclosed in braces. Here's an example:
-----
$i = 0;
while ($i <= 10) {
print "$i is not more than 10\n";
$i = $i + 1;
}
-----
This loop prints the following to standard output:
0 is not more than 10
1 is not more than 10
2 is not more than 10
3 is not more than 10
4 is not more than 10
5 is not more than 10
6 is not more than 10
7 is not more than 10
8 is not more than 10
9 is not more than 10
10 is not more than 10
How does this work? The while loop is given a condition -- $i <= 10.
$i is set to 0 and zero is less than or equal to ten, so the body of
loop is executed. The body of the loop consists of a print statement,
and then assigns the variable $i a value of itself plus one. $i is now
1. As one is less than or equal to ten, the process repeats. This
continues until the final iteration of the loop, when $i is 10. Ten is
less than or equal to ten, so
the block is again executed. Now $i is again assigned the value of
itself plus one, which equals 11. As eleven is not less than or equal
to ten, the body of the while loop is not executed and execution of the
script continues past the while loop.
Note that Perl offers us a couple of commonly used shortcuts to rewrite
the expression "$i = $i + 1". The first of these allows us to replace
"$i = $i + n" with "$i += n" (where n is a number). This is not simply
limited to adding a given amount to a variable though -- the same
notation can be implemented for subtracting, multiplying, or dividing.
The following chart lists some expressions that can be rewritten with
this shorter notation, and then shows the equivalent using Perl's
+=/-=/*=//= shortcut.
Expression
|
Shorter
Equivalent
|
$i = $i + 17
|
$i += 17
|
$j = $j * 12
|
$j *= 12
|
$a = $a / 27
|
$a /= 27
|
$k = $k + 3
|
$k += 3
|
$v = $v - 7
|
$v -= 7
|
Perl also offers a second shortcut that is used to add one or subtract
one from a specific variable. The syntax for this shortcut is simply:
$variable++; #increments $variable
$variable--; #decrements $variable
Thus, we could rewrite the while loop used previous with "$i++" instead
of "$i = $i + 1" and achieve the same result:
-----
$i = 0;
while ($i <= 10) {
print "$i is not more than 10\n";
$i++; #we could also use $i += 1
}
-----
The for/foreach loop is a bit trickier to understand than the while
loop. The following is the same as our previous while loop that prints
"$i is not more than 10\n" and then adds one to $i, but it is
implemented with a for loop instead:
-----
for ($i = 0; $i <= 10; $i++) {
print "$i is not more than 10\n";
}
-----
The most confusing area is the line directly after the keyword "for",
which in the while, if, and if/else control structures held a value
that needed to return true for the body of the structure to be
executed. In the for (and foreach, as we'll discuss in a moment)
control structure, this area is broken down into three sections which
are separated by semicolons.
The first section is where the counter variable is assigned a value. As
the for loop is used primarily to repeat something a specific number of
times, it usually uses a variable to keep track of how many times the
body of the loop has been executed. In the while loop, we executed the
body of the loop 11 times (0 -- 10) and used the $i variable to keep
track of how many times we had iterated (gone through) the loop. The
variable used for this purpose is referred to as the counter, as it
counts the number of times we have iterated through the loop. In this
case, the counter is $i and here it is assigned a value of 0:
for ($i = 0; $i <= 10; $i++) {
The second section of the above line, which begins immediately after
the first semicolon and is terminated with the second semicolon,
supplies the condition that must be met for the body of the loop to be
executed. In this case, the variable $i must be less than or equal to
ten ($i <= 10) for another iteration of the loop to take place.
The last of the three sections between the parentheses, which begins
directly after the second (and final) semicolon, is the action that
must be performed on the counter variable at the end of each iteration
of the loop. In this case, 1 is added to the current value of $i ($i++).
Note that the variables used in the three different sections of the
first line do not have to be the same; we could have used completely
different variables such as:
for ($i = 0; $j <= 10; $k++)
However, this doesn't make much sense and defeats the purpose of the
for loop, which is to have a cleaner and more organized way of
iterating through a loop a specific number of times.
Also note that the foreach loop works exactly the same way as the for
loop. The following accomplishes the same thing as the for and while
loops we used before:
-----
foreach($i = 0; $i <= 10; $i++) {
print "$i is not more than 10\n";
}
-----
There are alternate uses for the for/foreach loop and we will cover
them in upcoming sections.
Note that these are not the only control structures that Perl provides.
You may also hear about or see the until and unless control structures.
The until control structure is the exact opposite of a while loop: it
executes its body as long as the condition it is given is false. The
unless control structure is the opposite of the if control structure:
it executes its body if the condition it is given is false. Finally,
we'll also cover the if/elsif/else control structure later in this
tutorial.
| Arrays |
We discussed scalar variables earlier -- scalar variables were one
variable assigned one value. Now we'll discuss arrays, which are one
variable assigned multiple values. Arrays will prove quite useful for
organizing data, and although the idea of one variable with several
values may sound like a confusing idea as well as one that isn't
necessary, you'll soon see that it is actually quite easy to understand
and quite useful. Following is a diagram that charts scalar variables
and array variables, and how they are organized:
Scalar:
$variable
->
|
"value"
|
name ->
|
value
|
Array:
@array ->
|
"value 1"
|
27
|
"another
value"
|
34565
|
name ->
|
0
|
1
|
2
|
3
|
The first diagram shows the anatomy of a scalar variable. A variable,
in this case one named $variable, is assigned a value, in this case the
string "value". Simple enough; we've been doing that since almost the
beginning of this tutorial.
The second diagram shows the anatomy of an array. A variable, in this
case one named @array, is assigned multiple values, in this case "value
1", 27, "another value", and 34565. If you take a look at the bottom
row, you'll notice that each value has an index number: the first value
has an index of 0, the second value has an index of 1, the third an
index of 2, and the fourth an index of 3. As an array variable can hold
many values, we need a way to define which value we are referring to
when we use the name of the array in our program. The index numbers
assist us in this -- when we want to refer to an element of an array
variable, we use the name of the array and the index number of the
element. For example, we could refer to the string "another value",
which is an element of the array @array, as "@array, index 2" (as that
string has an index number of 2 in @array).
You may have noticed by now that the variable name is "@array" and not
"$array". Array variables are prefixed with the @ symbol, while scalar
variables are prefixed with the $ symbol. To confuse things even more,
when we refer to an individual element of an array, we prefix the array
name with $ and not @. This is because an individual element of an
array is scalar data by itself, and when a variable holds one piece of
data, scalar data, it is prefixed with $. When we are referring to the
array as a whole, however, it holds several pieces of data, so we
prefix it with @.
All this discussion is useless if we do not know how to implement it
within our Perl scripts though. The following code is used to declare
an array that holds the values "value one", "another value - value
two", 39, and 908.
@new_array = ("value one", "another value - value two", 39, 908);
It was previously mentioned that a specific element of an array is
accessed by using the name of the array and the index number of the
element. How is that implemented in our Perl code though? Take a look
at the following example:
-----
print $new_array[0] . "\n";
-----
This line prints the element at index 0 in @new_array concatenated with
a newline (note that we used $new_array as opposed to @new_array, as a
specific element of an array is by itself scalar data -- one piece of
data). Thus, the above is the same as the line:
-----
print "value one\n";
-----
If you study the first example you'll be able to tell that to access an
individual element of an array, we use the name of the array followed
the index number, which is enclosed in square brackets. Once we've
specified it's location, we can treat it similar to other scalar
variables (if you're saying 'huh? But it's in an array, so it isn't a
scalar variable!', the answer is that while it's in an array, it is a
scalar variable by itself. This is an important concept to grasp, which
is why I've repeated it several times :P)...perform mathematical
expressions on it, assign it a value, print it, and perform a variety
of other operations on it.
I mentioned earlier in this tutorial that the for/foreach loop had
other uses and that I would cover them later on. "Later on" has
arrived. It turns out that the for/foreach loop can be used to iterate
through the elements of an array. The foreach loop is in this way close
to English in that is says "foreach @array", which translates to
"foreach element of @array". As an example, the following code is used
to print each element of the array we used previously, @new_array:
-----
foreach (@new_array) {
print $_;
}
-----
There is one question concerning the above code that will probably
arise, and it is worth an explanation -- you, the reader, are probably
sitting there going "Wtf does '$_' mean?!" The answer is that in the
above example, the current element of @new_array for each iteration is
stored in $_.
If we wanted to change the variable name that this value was stored in,
we would simply add the variable to store the value in immediately
before the part of the first line that reads "(@new_array)". For
example, the following code does the same thing as the last bit of code
used to demonstrate the foreach loop but it stores the current element
of the array in $a_var as opposed to $_:
-----
foreach $a_var (@new_array) {
print $a_var;
}
-----
| Input |
So far we've seen how to send data to standard output, but receiving
input from users is often a requirement for a useful script. This is
actually a fairly easy task in Perl. The following example receives
some input from a user and stores it in the variable $teh_inputz0r:
-----
chomp($teh_inputz0r = <STDIN>);
-----
<STDIN> is the main component here. The less than and greater
than symbols denote a file handle to be used and "STDIN" is the name of
the file handle (in this case, standard input; yes, standard input is
represented as a file). An understanding of how file handles work is
not needed to understand standard input though, and file handles will
not be covered in this tutorial. In other words, <STDIN>
represents standard input, which is usually input received from the
keyboard. In this case, we're assigning the input to a variable,
$teh_inputz0r ($teh_inputz0r = <STDIN>).
The chomp function simply removes the ending newline of a string. When
the user enters text that is assigned to $teh_inputz0r, it is
terminated with a newline. The chomp function removes that trailing
newline from the string.
| Wrapping It Up |
I've decided to release my Perl tutorial in several parts. This is part
one, and part two, along with a possible part three, will introduce
more concepts such as hashes, functions, regular expressions, sockets,
and more. This tutorial should have provided a very basic, although not
quite complete, introduction to Perl and my future tutorials will build
upon that. Expect them to be out in not too long.
I've decided to add an extra feature to this tutorial to demonstrate
some very basic ways the information presented in this tutorial can be
used. This is a simple script that covers most of the topics introduced
in this text. It receives several numbers as input from the user and
finds the average of them. Note that before it does this, it asks the
user to enter how many numbers they will be entering. This is a
(somewhat simple) example example of what arrays can do that simple
scalar variables can not. Here's the script:
-----
#!/usr/bin/perl
# Finds the average of numbers entered by the users
print "This script allows you to enter however many numbers you choose
and then finds the mean of those numbers. How many numbers will you be
entering? ";
chomp ($count = <STDIN>);
for ($i = 0; $i < $count; $i++) {
print "Enter number: ";
chomp ($num = <STDIN>);
push @num_array, $num
}
foreach (@num_array) {
$average += $_;
}
$average /= $count;
print "The average is $average.\n";
# The end.
-----
I hope you enjoyed this tutorial and learned something from it. Won't
be long until part two's here! :)
-Ch4r