Introduction to Perl for Bioinformatics

About Perl


From the book 'Learning Perl'........

1.1 History of Perl

Perl is short for "Practical Extraction and Report Language," although it has also been called a "Pathologically Eclectic Rubbish Lister." There's no point in arguing which one is more correct, because both are endorsed by Larry Wall, Perl's creator and chief architect, implementor, and maintainer. He created Perl when he was trying to produce some reports from a Usenet-news-like hierarchy of files for a bug-reporting system, and awk ran out of steam. Larry, being the lazy programmer that he is, decided to over-kill the problem with a general-purpose tool that he could use in at least one other place. The result was the first version of Perl.

After playing with this version of Perl a bit, adding stuff here and there, Larry released it to the community of Usenet readers, commonly known as "the Net." The users on this ragtag fugitive fleet of systems around the world (tens of thousands of them) gave him feedback, asking for ways to do this, that, or the other, many of which Larry had never envisioned his little Perl handling.

But as a result, Perl grew, and grew, and grew, at about the same rate as the UNIX operating system. (For you newcomers, the entire UNIX kernel used to fit in 32K! And now we're lucky if we can get it in under a few meg.) It grew in features. It grew in portability. What was once a little language now had over a thousand pages of documentation split across dozens of different manpages, a 600-page Nutshell reference book, a handful of Usenet newsgroups with 200,000 subscribers, and now this gentle introduction.

Larry is no longer the sole maintainer of Perl, but retains his executive title of chief architect. And Perl is still growing.

1.2 Purpose of Perl

Perl is designed to assist the programmer with common tasks that are probably too heavy or too portability-sensitive for the shell, and yet too weird or short-lived or complicated to code in C or some other UNIX glue language.

Once you become familiar with Perl, you may find yourself spending less time trying to get shell quoting (or C declarations) right, and more time reading Usenet news and downhill snowboarding, because Perl is a great tool for leverage. Perl's powerful constructs allow you to create (with minimal fuss) some very cool one-up solutions or general tools. Also, you can drag those tools along to your next job, because Perl is highly portable and readily available, so you'll have even more time there to read Usenet news and annoy your friends at karaoke bars.

Like any language, Perl can be "write-only"; it's possible to write programs that are impossible to read. But with proper care, you can avoid this common accusation. Yes, sometimes Perl looks like line noise to the uninitiated, but to the seasoned Perl programmer, it looks like checksummed line noise with a mission in life. If you follow the guidelines of this book, your programs should be easy to read and easy to maintain, but they probably won't win any obfuscated Perl contests.

1.3 Availability

If you get
perl: not found

when you try to invoke Perl from the shell, your system administrator hasn't caught the fever yet. But even if it's not on your system, you can get it for free (or nearly so).

Perl is distributed under the GNU Public License,[1] which says something like, "you can distribute binaries of Perl only if you make the source code available at no cost, and if you modify Perl, you have to distribute the source to your modifications as well." And that's essentially free. You can get the source to Perl for the cost of a blank tape or a few megabytes over a wire. And no one can lock Perl up and sell you just binaries for their particular idea of "supported hardware configurations."

[1] Or the slightly more liberal Artistic License, found in the distribution sources.

In fact, it's not only free, but it runs rather nicely on nearly everything that calls itself UNIX or UNIX-like and has a C compiler. This is because the package comes with an arcane configuration script called Configure that pokes and prods the system directories looking for things it requires, and adjusts the include files and defined symbols accordingly, turning to you for verification of its findings.

Besides UNIX or UNIX-like systems, people have also been addicted enough to Perl to port it to the Amiga, the Atari ST, the Macintosh family, VMS, OS/2, even MS/DOS and Windows NT and Windows 95 - and probably even more by the time you read this. The sources for Perl (and many precompiled binaries for non-UNIX architectures) are available from the Comprehensive Perl Archive Network (the CPAN). If you are web-savvy, visit http://www.perl.com/CPAN for one of the many mirrors. If you're absolutely stumped, write bookquestions@oreilly.com and say "Where can I get Perl?!?!"

1.4 Using Perl

Perl is a scripting language, unlike other languages programs are not compiled by the user, but are compiled on-the-fly when executed. This does have a performance overhead, but perl is not about performance it's about utility and ease of use. The other advantage of this is that your perl programs can be run by anybody on any computer with a standard Perl interpreter.

1.5 Overview of a Script

Scripts are written into a file, usually called myfile.pl or something that describes the function of the program. The '.pl' suffix is not required but allows users on the system to know that the file is a perl script.

The first line of every perl script is the same and basically tells the machine: I AM A PERL SCRIPT, pass me to the Perl interpreter which is installed HERE. The rest of the script contains Perl language passed to the interpreter.

Here is your first Perl script.
#!/usr/bin/perl
print "Hello World!\n";

1.6 Running a Script

In UNIX, files need to be set as being 'executable', i.e. a runnable script as opposed to a text file.

Once you've typed your program into a file called 'hello.pl', you need to set it as executable:
chmod +x ./hello.pl
Then to run the script do:
./hello.pl
On Windows machines you will not need to do this, but since Mac OS X systems are based on a version of UNIX called BSD, this needs to be done in a terminal.