This is the 1st in a series about how I use Perl.
REST is short for Representational state transfer.
Here I describe the process of the REST-style path info component (of a HTTP client's request to a server) being transformed - by an algorithm - in such a way as to select a specific module of code to run, and to select a specific method to call within that module.
I'm using the word 'method' here in its Object-Oriented Programming sense.
Hence the transformation takes a string as input, and produces a directive specifying which code path to execute within the application, to service the given url + path info.
There won't actually be much Perl in this article, so readers who are not familiar with Perl should still gain some benefit by following along.
Firstly:
You could state with http://en.wikipedia.org/wiki/Representational_State_Transfer for that.
I'll add something about that one day, perhaps.
So, getting away from the negativities, what is it?
Well, I have 2 motives for this article. Firstly, I somehow wasn't satisfied by the
various articles I read on REST, and secondly, I am currently re-working
a large, soon-to-be-released, module - CGI::Office::Contacts
- to use REST,
and this article is helping me clarify my ideas.
That last bit is the key - this article is a record of my attempt to apply REST - for the first time - to a specific Perl module.
BTW: The abstract for CGI::Office::Contacts
is 'A web-based, group and private,
contacts manager'.
Well, I hope that by the time you've finished this article, you can answer that question for yourselves.
But for me, the answer is Yes! It is not just another fad, but are definitely the way to go.
They are an extremely neat way of organizing path infos, and that leads immediately and directly to a correspondingly neat way of organizing the set of modules (in any language) which make up a modern and typically complex CGI-based application.
The module CGI::Office::Contacts
ships with 2 scripts which run the same code:
The latter uses the Perl module FCGI::ProcManager
.
In what follows, I'll rarely refer to these scripts, but here's how they are used, where the 'N' just represents some person's unique id.
So, the part after the script's name, '/person/delete/N', is common, no matter which type of script we are running.
This common part is called 'path info', and we'll focus on it from here on.
Here are some path info samples, used by CGI::Office::Contacts
:
101 /person/add 105 /person/update/:id 102 /person/delete/:id 103 /person_donation/add /person_donation/delete/:ids 104 /person_note/add /person_note/delete/:ids 106 /person_site/add /person_site/delete/:ids 201 /organization/add 205 /organization/update/:id 202 /organization/delete/:id 203 /organization_donation/add /organization_donation/delete/:ids 204 /organization_note/add /organization_note/delete/:ids 206 /organization_site/add /organization_site/delete/:ids
The numbers in the first column are values I used for the CGI form field 'action', before switching to REST. More on that big decision later.
The :id syntax is used by CGI::Application::Dispatch
. This is the module which implements
the algorithm mentioned above, and is used to transform those path infos into code paths.
The :ids indicate that the value is actually a set of ids, separated by e.g. '.' chars. The reason for having multiple ids is that the first one identifies the entity (organization, person), and all others identify donations or notes or sites belonging to that entity.
At run time, the :id would of course be the real id of an entity, or a string as just
mentioned. When setting up CGI::Application::Dispatch
though, the :id syntax means
that the value supplied by the client is stored in such a way that a variable called id
is used by your code to retrieve that value.
If we consider the path info in (English) grammatical terms, or in coding terms, we can rewrite all those separate path info samples into one generic grammatical form:
/noun/verb/id
Or, in programming terms:
/object/action/id
In either case, we first specify the thing to be processed, and then the action to be perpertrated on that thing.
If it's a brand new thing, then /person/add is sufficient (i.e. the id is not yet known), while actions on pre-existing things always require the thing's id to be specified, as in /person/delete/id.
So far, so good. But I want to talk more about that CGI form field, 'action' I mentioned.
The purpose of 'action', and its values, is to give the application the information it needs to select one code path amongst many. That is, the value of 'action' is what causes one specific execution path to be the one which is active during each instantiation of the application.
If we change our viewpoint from that data value to the code making the decision, we notice something interesting.
When using a CGI form field's value to switch, the code which does the switching is actually, and necessarily, inside our application.
For instance, we might use an 'if' statement (very crude), or we might assign that value to a variable within the application, and then use that to switch.
An example of the latter case is when using the Perl module CGI::Application
, which uses
what it calls the run mode, to do the switching.
In the case of an application whose parent class is CGI::Application
, the switch is inside
the parent. Our application would just supply a mapping function, which maps values to
the names of subroutines.
REST allows us to answer the following question...
When using a stand-alone module such as CGI::Application::Dispatch
, some code somewhere
must still do the switching, but now the algorithm is implemented in code completely outside
our application.
This turns out to be a fascinatingly different way of doing things.
Not only that, but also CGI::Application::Dispatch
implements a generic path info
transformation algorithm, and can be reused endlessly, by any number of unrelated projects.
There are, of course, alternatives to CGI::Application::Dispatch
.
In this list, the first 2 are presumably (I didn't try them) intended as stand-alone
path transformers, while Catalyst
is a major framework with the transformation logic
built in.
There may be others, both in Perl and other languages.
In each case, they solve the same problem: Transforming the path info string into a code path.
There may be arguments in favour, and against, all of these modules, but I won't go into
those here, except to say that CGI::Application::Dispatch
is elegant and succinct. By that
I mean the set of rules specifying the tranformation algorithm are short and side-by-side
(there's an example just below), whereas the other modules scatter the rules throughout the
code, as a side-effect of how they implement their logic.
But back to what I know best.
We are starting with a generic path info such as /object/action/id, and wish to use that to specify code.
The following code fragment, used in both scripts mentioned above, is the starting point.
Note: The syntax ':x' means /x is extracted from the path info, and the value of x is made available to the code. Without the ':', the /x would literally mean x, and would not be interpreted as the name of a variable.
CGI::Application::Dispatch -> dispatch ( args_to_new => {QUERY => $cgi}, debug => 0, prefix => 'CGI::Office::Contacts::Controller', table => [ '' => {app => 'Initialize', rm => 'display'}, ':app' => {rm => 'display'}, ':app/:rm/:id?' => {}, # The '?' says the id is optional. ], );
Here's how it works. It is saying (as per the docs for CGI::Application::Dispatch
):
Then, construct the (default) class name CGI::Office::Contacts::Controller::Initialize
,
load an instance of that class, and call the (default) display method.
The class name defaults because it is not specified in the path info.
The method name defaults for the same reason.
Then, construct the class name CGI::Office::Contacts::Controller::Object
, load an
instance of that class, and call the (default) display method.
The class name comes from the path info.
The method name defaults because it is not specified in the path info.
Then, construct the class name CGI::Office::Contacts::Controller::Object
, load an
instance of that class, and call the (explicit) action method.
Here, both the class name and the method name are taken from the path info.
Samples:
Use class CGI::Office::Contacts::Controller::Person
, and call method add.
Use class CGI::Office::Contacts::Controller::Organization::Notes
, and call method delete,
with an object id of 99.
It's CGI::Application::Dispatch
which converts organization_notes into 2 parts of the
final class name, Organization::Notes
.
CGI::Application::Dispatch
makes the value available (inside a application based on
CGI::Application
) as $self -> param('id'), since I used ':id' in the code above
(where you see CGI::Application::Dispatch -> dispatch...).
Hence retrieval of the id's value, in the currently-executing method (run mode), is trivial.
For completion I should say that all CGI form field data is also passed to the code being called.
Well, to start with:
Clearly, the prefix key in the hash allows us to use abbreviated class names in what follows.
When the user (client) does not specify a path info, e.g. upon first hitting our web site's url,
we must specify both a class name and a method name, for the logic within
CGI::Application::Dispatch
to, errr, dispatch to.
It becomes the name of a sub-class of our prefix.
We must still specify a method, simply because the user didn't.
We call it. Or, to be pedantic, we configure CGI::Application::Dispatch
to call it.
I've chosen to default the action to display, although other articles I've read often use list for that. I see it as a personal decision which way to jump. But jump you must.
From those samples, and the big list of samples way back, we have this class structure:
This is clearly saying that our code base is everywhere compartmentalized, following the philosophy of dedicating one class to one job.
This is the base class for all controllers.
This is the class which handles the initialization phase of our application.
For me, this means it is the one and only module which outputs a web page and its associated CSS and Javascript. Yep, the one and only.
All other activity is done via Ajax calls from the client to my app.
And hence all other controllers output just tiny responses to those Ajax calls.
This is the base class for all organization-related controllers.
Likewise, this is the base class for all person-related controllers.
Up till now I haven't said anything about the rest of the application - that is, the modules which implement the Model and View components.
Under the model component, we'll have:
This is code common to organizations and people.
Also, there is another, corresponding, family of modules under CGI::Office::Contacts::View::*
.
It should be clear that if there is work to be done on, say, the Person part of the code, then the work should be restricted to these modules, or their parents:
This is a result of the code structure adopted, and this structure is in turn a direct consequence of mapping the path info to the modules' namespace.
You might be thinking that we're ending up with too many modules, but this is never a problem.
In database design, for example, it's a classic beginners mistake to try to minimize the number of tables used, but this is never a good design policy to be following.
It's the same with code. Each module is small and neat, targetted to one part of the application, and the namespace orients you immediately.
In Perl, it should be mentioned for the non-Perl readers, the structure of module names (i.e. the language) does not force you to inherit modules from parents just because the parents' names are prefixes of the module in question.
However, I always name my modules such that the inheritance tree matches the structure of the modules' names.
To be released separately are modules such as CGI::Office::Contacts::Import::vCards
.
Clearly, this means the controller will be called vCards.pm, and its parent will be Import.pm, and its parent will be Controller.pm.
So the controller component will be CGI::Office::Contacts::Controller::Import::vCards
.
Also, the outcome so far indicates precisely what we must do to create handlers for new objects in the future.
For instance, if we allow our entities to have multiple sites (geographic addresses) just
as we gave them donations and notes, we must write a controller called
CGI::Office::Contacts::Controller::Organization::Sites
, whose parent is obvious, and
whose place in the scheme of things is also obvious.
I haven't written any export code for the new database, but exporting vCards from my email client, and importing them into this package uses XML.
That's not actually a problem, but the above design means that the basic system has no XML-module requirments at all. This is another artifact of the compartmentalized design.
So, the user only needs to install XML-based modules if they choose to import via vCards.
In the same way we want our programs to exit without error, well-written articles have to exit with a nice conclusion.
To me, the analysis of path info and module structure, and the way these 2 tie so neatly together, means I have gained considerable benefit in feeling assured that the resultant code is as reliable as I can make it.
And that fits in with my personal mantra when operating within the computer industry: Reliability comes first, middle and last.
Top