CGI Scripting - An IntroductionThis article is about CGI scripting in Perl. Of course, scripting can take place anywhere, not just in the context of the web. And many languages can be used to write CGI scripts. Perl has been ported to about 70 operating systems. The most recent I know of (February 2001) is Windows CE. This makes Perl widely available. In this article I will make a lot of simplifications and generalizations. Here's the first... There are 2 types of CGI scripts: Invariably, this second type, having processed the data, will output an HTML page or form to let the user continue, or at least know what happened, or they will do a CGI redirect to another script which outputs something. I am splitting scripts in to 2 types just to emphasize that there are differences between processing forms and processing in the absence of forms. TerminologyThere are Web Servers, and Web Clients. Some web clients are browsers. There are programs and scripts. Once upon a time, programs were compiled and scripts were interpreted. Hence the 2 names. But today, this is 'a distinction without a difference'. My attitude is that the 2 words, program and script, are interchangable. Program and process, however, are different. Program means a program on disk. Process means a program which has been loaded by the operating system into memory, and it being executed. This means a single program on disk can be loaded and run several times simultaneously, in which case it is 1 program and several processes. Web servers have names like Apache, Zeus, MS IIS and TinyHTTPd. Apache and TinyHTTPd (Tiny HyperText Transfer Protocol Daemon) are Open Source. Zeus and MS IIS (Internet Information Server) are commercial products. The feeble security of IIS makes it unusable in a commercial environment. My examples will use Apache as the web server. Web clients which are browsers have names like Opera, Netscape, IE. Of course, you can roll your own non-browser web client. We'll do this below. URI = URL + URNYou'll notice the 3 letters I, L and N are in alphabetical order. That's the way to remember this formula. Web Server Start UpWhen a web server starts running, these are the basic steps taken:
It doesn�t matter which web server you are using, and it doesn't matter if the web server is running under Unix or Windows or any other OS. These principles will apply. Web Server Request LoopThe web server request loop, simplified (as always), has several steps:
Pictorially, we have an infinity symbol, ie a figure-of-8 on its side: +------+ 1 -Request---> +------+ 2 -Action--> +------+
| Web | (URI or Submit) | Web | (script.pl) | Perl |
|Client| |Server| |Script|
+------+ <--Response- 4 +------+ <---HTML-- 3 +------+
(Header and HTML) (Plain page or CGI form)
Things to note:
Web Server Directory StructureBut how does the web server know which page to return or which script to run? To answer this we next look at the directory structure on the web server's machine. Below, Monash and Rusden are the names of university campuses. monash.edu and rusden.edu will be listed under the 'Virtual Hosts' part of httpd.conf, or, if you are running MS Windows NT/2k, they can be named in the file C:\WinNT\System32\Drivers\Etc\Hosts. Under other versions of MS Windows, the hosts file will be C:\Windows\Hosts. And a warning about the NT version of this file. Windows Explorer will lie to you about the attributes of this file. You will have to log off as any user and log on as the administrator to be able to save edits into this file. See http://savage.net.au/Perl/html/configure-apache.html for details. Assume this directory structure: - D:\
- www\
- cgi-bin\
- x.pl
- conf\
- httpd.conf
- public\
- index.html
- monash\
- index.html
- monash\staff
- mug-shots.html
- rusden\
- index.html
- rusden\staff
- courses.html
Note:
Web Server ConfigurationNow, the web server can be told, via its configuration file httpd.conf, that:
Did you notice that both virtual hosts use D:\www\cgi-bin? ============================================================== These 2 hosts have their own document trees, but share scripts ============================================================== We can service any number of virtual hosts with only one copy of each script. This is a huge maintenance saving. This is the information available to the web server when a request comes in from a web client. So, now let's look at the client side of things. A Perl Web ClientHere is a real, live, complete, Perl Web Client, which is obviously not a browser: #!/usr/bin/perl
use LWP::Simple;
print get('http://savage.net.au/index.html');
Yes folks, that's it. The work is managed by the Perl module 'LWP::Simple', and is available thru the command 'get', which that module exports, ie makes public so it can be used in scripts like this one. LWP stands for Library for Web programming in Perl. This code runs identically, from the command line, under Windows and Linux. The output is 'print'ed to the screen, but not formatted according to the HTML. It's time now to step thru the web server-web client interaction: Web Client RequestsWhen you type something like 'rusden.edu' into the browser's address field, or pass that string to a web client, and hit Go, here's sort of what happens:
In reality, processing the request and manufacturing the response can be quite complex procedures. Web PagesThere are 2 types of web pages sent out to web clients:
Action = ScriptIf you view the source of such a form, you will always find text like: <form method= 'POST' action='http://some.domain.net.au/script.pl' enctype='application/x-www-form-urlencoded'>. The 'action' part tells the web server, when the form is submitted, which script to run to process the form's data. The web server asks the operating system to load and run the script, and then it (the web server) passes the data (from the form) to the script. The script process the data and outputs a response (which would normally be another form). WarningI've used './script.pl' to indicate the script is in the 'current' directory, but be warned, the CGI protocol does not specify what the current directory is at any time. In fact, it does not even specify that any current directory exists. Your scripts must, at all times, know exactly where they are and what they are doing. Remember, this 'action' is taking place inside (ie from the point of view of) the web server. Web Page ContentWeb pages usually contain data in a combination of languages:
Yes, scripts can output scripts! Specifically, scripts can output web pages containing JavaScript, etc. There's even a Perl interface to Macromedia's Flash. Where I used to work, some salesmen were obsessed with Flash, because it's all they understand of the software we write :-(. In Flash's defense, you'd have to say it's too trivial to have pretensions. JavaScriptAs a Perl aficionado, you may be tempted to look down on JavaScript, but you shouldn't. It really does have its uses. When a page contains JavaScript to validate form input, this means quite a bit of a saving for the web client. Without the JavaScript here's what would happen (call this 'overhead'):
All of this takes time. When the JavaScript runs, it runs inside the web client, eg browser, so the web client gets a response much faster. Of course, complex validation often requires access to databases and so on, so sometimes there is no escape from the overhead just listed. For example, where I work we noticed some pages were appearing very slowly, and I tracked it down to 3.6Mb (yes!!!) of JavaScript in some pages, which was being used to stop inputting of duplicate data. Naturally this JavaScript was being created by a Perl program :-). Digression: HTML 'v' XMLAs an aside, here's how HTML compares to XML. HTML is a rendering language. It indicates how the data is to be displayed. XML is a meta-language. It indicates the meaning of the data. Examples: HTML: '<h1>25</h1>' tells you how 25 should look, but not what it is. In other words, '<h1>' is a command, telling a web client how to display what follows. HTML: '<th>Temperature</th><td>25</td>' tells you how to align the 25, but not what it is. XML: '<temperature>25</temperature>' tells you what 25 is. '<temperature>' is not a command. XML: '<street number>25</street number>' tells you what 25 is. Hmmm. This would make a marvellous exam question. Re Action: A Tale of 2 ScriptsSo, what happens when a web client requests that a web server run a script? To answer this, let's look at a web client request for a script-generated form, and how that request is processed. In fact, the web client is saying to the web server: 'Pretty please, run _your_ script on _my_ data'. Let's step thru the procedure:
You can see the problem. How does script # 2 know what 'state' script # 1 got up to? Maintaining StateThe problem of maintaining state is a big problem. Chapter 5 in 'Writing Apache Modules in Perl and C' is called 'Maintaining State', and is dedicated to this problem. See 'Resources', below. A few alternatives, and a very simply discussion of possible drawbacks:
In each case, you either abandon that alternative, or add complexity to overcome the drawbacks. There is no one, perfect, solution which fits all cases. You must study the alternatives, study your situation, and choose a course of action. Combining Perl and HTMLThere are 3 basic ways to do this:
A Detour - SDFIf you head on over to SDF - Simple Document Format, at http://www.mincom.com/mtr/sdf/ you'll see an example of the 3rd way. SDF is, of course, a Perl-based Open Source answer to PDF. SDF is also available from CPAN: http://theoryx5.uwinnipeg.ca/CPAN/cpan-search.html. SDF converts text files into various specific formats. SDF can output, directly or via other software, into these formats: HTML, PostScript, PDF, man pages, POD, LaTeX, SGML, MIMS HTX and F6 help, MIF, RTF, Windows help and plain text. Inside a Script: Who's Calling?A script can ask the web server the URI used to fire off the script. The web server puts this information into the environment of the script, under the name HTTP_REFERER (yes, mis-spelling included for free). So, as a script, I can say I was called by one of: Now, either 'monash.edu' or 'rusden.edu' is just the value of a string in the script, and so the script can use this string as a key into a database. In fact, this part of the URI is also in the environment, separately, under the name HTTP_HOST. From a database table, or any number of tables, the script can retrieve data specific to the host. This in turn means the script can change its behaviour depending on the URI used to run it. Data per URI - Page DesignThe Open Source database MySQL has a reserved table called 'hosts', so I'll start using the word 'domain'. Given a domain, I can turn that into a number which can be used as an index into a database table. Here is a sample 'domain' table: +=============+=======+ | | URI | | domain_name | index | +=============+=======+ | monash.edu | 4 | +=============+=======+ | rusden.edu | 6 | +=============+=======+ And here is a sample web page 'design' table: +=======+ + URI +===============+===========+===================+ | index | template_name | bkg_color | location_of_links |... +=======+===============+===========+===================+ | 4 | dark blue | cream | down the left |... +=======+===============+===========+===================+ | 6 | pale green | an image | across the bottom |... +=======+===============+===========+===================+ Data per URI - Page ContentHere is a sample web page 'content' table: +=======+ + URI +================+================+ | index | News headlines | Weather |... +=======+================+================+ | 4 | - | www.bom.gov.au |... +=======+================+================+ | 6 | www.f2.com.au | www.bom.gov.au |... +=======+================+================+ f2 = Fairfax, the publisher of 'The Age' newspaper. bom = Bureau of Meteorology. Data per URI - Page Content RevisitedLet me give a more commercial example. Here we chain tables: ProductMap table: +=======+ + URI +==============+============+ | index | Products | product_id | +=======+==============+============+ | 4 | Motherboards | 1 | +=======+==============+============+ | 4 | Printers | 2 | +=======+==============+============+ | 4 | CD Writers | 3 | +=======+==============+============+ | 6 | CD Writers | 4 | +=======+==============+============+ | 6 | Zip Drives | 5 | +=======+==============+============+ Product table: +============+=============+ | product_id | Brands | +============+=============+ | 1 | Gigabyte X1 | +============+=============+ | 1 | Gigabyte X2 | +============+=============+ | 1 | Intel A | +============+=============+ | 1 | Intel B | +============+=============+ | : | : | +============+=============+ | 5 | Sony | +============+=============+ Hence a list of Products for a given URI, ie a given shop, can be turned into an HTML table and inserted into the outgoing web page. Perl's Sub-languagesPerl has a number of languages built in to it.
Lastly, you can insert Python, C and C++ source code in a Perl program, and have Perl call the appropriate compiler at run time, to build your program on the fly. For example, if you write a subroutine in C++, you can compile it and then call it from Perl, all in one go! Writing DocumentationMost Perl module (library) authors write their documentation in POD, for various reasons:
Resources
AuthorRon Savage Home page: http://savage.net.au/index.html
LicenceAustralian Copyright © 2002 Ron Savage. All rights reserved. All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html |
| Top of page |