Starting2rest

Perl@Work#1 - REST, the path info, and code paths

Should REST be adopted or not?

The URL

The Path Info

Path info as grammar and pseudo-code

The Bad Old Days

The Good New Days

The Transformation Algorithm

Observations and Deductions

MVC - Model-View-Controller

Other modules in the suite

Exiting Cleanly

Author

Date

Starting2rest

Perl@Work#1 - REST, the path info, and code paths

This is the 1st in a series about how I use Perl.

REST is short for Representational state transfer.

Here I describe the process of the REST-style path info component (of a HTTP client's request to a server) being transformed - by an algorithm - in such a way as to select a specific module of code to run, and to select a specific method to call within that module.

I'm using the word 'method' here in its Object-Oriented Programming sense.

Hence the transformation takes a string as input, and produces a directive specifying which code path to execute within the application, to service the given url + path info.

There won't actually be much Perl in this article, so readers who are not familiar with Perl should still gain some benefit by following along.

What this article is, and is not

Firstly:

I am not a REST expert
This is not an encyclopedic survey of the material
You could state with http://en.wikipedia.org/wiki/Representational_State_Transfer for that.
This is not a reference manual
This does not discuss the HTTP method names (get, post, etc)
I'll add something about that one day, perhaps.
This does not discuss REST in the absence of HTTP

So, getting away from the negativities, what is it?

Well, I have 2 motives for this article. Firstly, I somehow wasn't satisfied by the various articles I read on REST, and secondly, I am currently re-working a large, soon-to-be-released, module - CGI::Office::Contacts - to use REST, and this article is helping me clarify my ideas.

That last bit is the key - this article is a record of my attempt to apply REST - for the first time - to a specific Perl module.

BTW: The abstract for CGI::Office::Contacts is 'A web-based, group and private, contacts manager'.

Should REST be adopted or not?

Well, I hope that by the time you've finished this article, you can answer that question for yourselves.

But for me, the answer is Yes! It is not just another fad, but are definitely the way to go.

They are an extremely neat way of organizing path infos, and that leads immediately and directly to a correspondingly neat way of organizing the set of modules (in any language) which make up a modern and typically complex CGI-based application.

The URL

The module CGI::Office::Contacts ships with 2 scripts which run the same code:

A classic CGI script /cgi-bin/office/contacts.cgi
A fancy FCGI script /office/contacts
The latter uses the Perl module FCGI::ProcManager.

In what follows, I'll rarely refer to these scripts, but here's how they are used, where the 'N' just represents some person's unique id.

/cgi-bin/office/contacts.cgi/person/delete/N
/office/contacts/person/delete/N

So, the part after the script's name, '/person/delete/N', is common, no matter which type of script we are running.

This common part is called 'path info', and we'll focus on it from here on.

The Path Info

Here are some path info samples, used by CGI::Office::Contacts:

	101 /person/add
	105 /person/update/:id
	102 /person/delete/:id
	103 /person_donation/add
	    /person_donation/delete/:ids
	104 /person_note/add
	    /person_note/delete/:ids
	106 /person_site/add
	    /person_site/delete/:ids

	201 /organization/add
	205 /organization/update/:id
	202 /organization/delete/:id
	203 /organization_donation/add
	    /organization_donation/delete/:ids
	204 /organization_note/add
	    /organization_note/delete/:ids
	206 /organization_site/add
	    /organization_site/delete/:ids

The numbers in the first column are values I used for the CGI form field 'action', before switching to REST. More on that big decision later.

The :id syntax is used by CGI::Application::Dispatch. This is the module which implements the algorithm mentioned above, and is used to transform those path infos into code paths.

The :ids indicate that the value is actually a set of ids, separated by e.g. '.' chars. The reason for having multiple ids is that the first one identifies the entity (organization, person), and all others identify donations or notes or sites belonging to that entity.

At run time, the :id would of course be the real id of an entity, or a string as just mentioned. When setting up CGI::Application::Dispatch though, the :id syntax means that the value supplied by the client is stored in such a way that a variable called id is used by your code to retrieve that value.

Path info as grammar and pseudo-code

If we consider the path info in (English) grammatical terms, or in coding terms, we can rewrite all those separate path info samples into one generic grammatical form:

	/noun/verb/id

Or, in programming terms:

	/object/action/id

In either case, we first specify the thing to be processed, and then the action to be perpertrated on that thing.

If it's a brand new thing, then /person/add is sufficient (i.e. the id is not yet known), while actions on pre-existing things always require the thing's id to be specified, as in /person/delete/id.

The Bad Old Days

So far, so good. But I want to talk more about that CGI form field, 'action' I mentioned.

CGI form data as a selector, or switcher

The purpose of 'action', and its values, is to give the application the information it needs to select one code path amongst many. That is, the value of 'action' is what causes one specific execution path to be the one which is active during each instantiation of the application.

But who is the selector?

If we change our viewpoint from that data value to the code making the decision, we notice something interesting.

When using a CGI form field's value to switch, the code which does the switching is actually, and necessarily, inside our application.

For instance, we might use an 'if' statement (very crude), or we might assign that value to a variable within the application, and then use that to switch.

An example of the latter case is when using the Perl module CGI::Application, which uses what it calls the run mode, to do the switching.

In the case of an application whose parent class is CGI::Application, the switch is inside the parent. Our application would just supply a mapping function, which maps values to the names of subroutines.

The Good New Days

REST allows us to answer the following question...

What if the selector were outside our application?

When using a stand-alone module such as CGI::Application::Dispatch, some code somewhere must still do the switching, but now the algorithm is implemented in code completely outside our application.

This turns out to be a fascinatingly different way of doing things.

Not only that, but also CGI::Application::Dispatch implements a generic path info transformation algorithm, and can be reused endlessly, by any number of unrelated projects.

A Digression

There are, of course, alternatives to CGI::Application::Dispatch.

In this list, the first 2 are presumably (I didn't try them) intended as stand-alone path transformers, while Catalyst is a major framework with the transformation logic built in.

HTTP::Router
Path::Router
Catalyst

There may be others, both in Perl and other languages.

In each case, they solve the same problem: Transforming the path info string into a code path.

There may be arguments in favour, and against, all of these modules, but I won't go into those here, except to say that CGI::Application::Dispatch is elegant and succinct. By that I mean the set of rules specifying the tranformation algorithm are short and side-by-side (there's an example just below), whereas the other modules scatter the rules throughout the code, as a side-effect of how they implement their logic.

But back to what I know best.

The Transformation Algorithm

We are starting with a generic path info such as /object/action/id, and wish to use that to specify code.

The following code fragment, used in both scripts mentioned above, is the starting point.

Note: The syntax ':x' means /x is extracted from the path info, and the value of x is made available to the code. Without the ':', the /x would literally mean x, and would not be interpreted as the name of a variable.

	CGI::Application::Dispatch -> dispatch
	(
	 args_to_new => {QUERY => $cgi},
	 debug       => 0,
	 prefix      => 'CGI::Office::Contacts::Controller',
	 table       =>
	 [
	  ''              => {app => 'Initialize', rm => 'display'},
	  ':app'          => {rm => 'display'},
	  ':app/:rm/:id?' => {}, # The '?' says the id is optional.
	 ],
	);

Here's how it works. It is saying (as per the docs for CGI::Application::Dispatch):

When the path info is blank
Then, construct the (default) class name CGI::Office::Contacts::Controller::Initialize, load an instance of that class, and call the (default) display method.

The class name defaults because it is not specified in the path info.

The method name defaults for the same reason.
When the path info is /:app, i.e. /object
Then, construct the class name CGI::Office::Contacts::Controller::Object, load an instance of that class, and call the (default) display method.

The class name comes from the path info.

The method name defaults because it is not specified in the path info.
When the path info is :app/:rm/:id, i.e. /object/action/id
Then, construct the class name CGI::Office::Contacts::Controller::Object, load an instance of that class, and call the (explicit) action method.

Here, both the class name and the method name are taken from the path info.

Samples:

/person/add
Use class CGI::Office::Contacts::Controller::Person, and call method add.
/organization_notes/delete/99
Use class CGI::Office::Contacts::Controller::Organization::Notes, and call method delete, with an object id of 99.

It's CGI::Application::Dispatch which converts organization_notes into 2 parts of the final class name, Organization::Notes.
The id of 99.
CGI::Application::Dispatch makes the value available (inside a application based on CGI::Application) as $self -> param('id'), since I used ':id' in the code above (where you see CGI::Application::Dispatch -> dispatch...).

Hence retrieval of the id's value, in the currently-executing method (run mode), is trivial.

For completion I should say that all CGI form field data is also passed to the code being called.

Observations and Deductions

So, what can we make of all this?

Well, to start with:

The prefix
Clearly, the prefix key in the hash allows us to use abbreviated class names in what follows.
Sherlock Holmes and the little-known case of the null path info
When the user (client) does not specify a path info, e.g. upon first hitting our web site's url, we must specify both a class name and a method name, for the logic within CGI::Application::Dispatch to, errr, dispatch to.
When an object is specified...
It becomes the name of a sub-class of our prefix.

We must still specify a method, simply because the user didn't.
And, when an action is specified...
We call it. Or, to be pedantic, we configure CGI::Application::Dispatch to call it.
The default action
I've chosen to default the action to display, although other articles I've read often use list for that. I see it as a personal decision which way to jump. But jump you must.

But wait, there's more!

Observe the class structure
From those samples, and the big list of samples way back, we have this class structure:
- CGI::Office::Contacts::Controller
- CGI::Office::Contacts::Controller::Initialize
- CGI::Office::Contacts::Controller::Organization
- CGI::Office::Contacts::Controller::Organization::Donation
- CGI::Office::Contacts::Controller::Organization::Note
- CGI::Office::Contacts::Controller::Person
- CGI::Office::Contacts::Controller::Person::Donation
- CGI::Office::Contacts::Controller::Person::Note
- Etc
This is clearly saying that our code base is everywhere compartmentalized, following the philosophy of dedicating one class to one job.
Observe the Perl modules
- Controller.pm
  This is the base class for all controllers.
- Initialize.pm
  This is the class which handles the initialization phase of our application.
  
  For me, this means it is the one and only module which outputs a web page and its associated CSS and Javascript. Yep, the one and only.
  
  All other activity is done via Ajax calls from the client to my app.
  
  And hence all other controllers output just tiny responses to those Ajax calls.
- Organization.pm
  This is the base class for all organization-related controllers.
- Person.pm
  Likewise, this is the base class for all person-related controllers.
- And so on

MVC - Model-View-Controller

Up till now I haven't said anything about the rest of the application - that is, the modules which implement the Model and View components.

Model and View

Under the model component, we'll have:

CGI::Office::Contacts::Database
CGI::Office::Contacts::Database::EmailAddress
CGI::Office::Contacts::Database::Entity
This is code common to organizations and people.
CGI::Office::Contacts::Database::Donation
CGI::Office::Contacts::Database::Note
CGI::Office::Contacts::Database::Occupation
CGI::Office::Contacts::Database::Organization
CGI::Office::Contacts::Database::Person
CGI::Office::Contacts::Database::PhoneNumber
CGI::Office::Contacts::Database::Util

Also, there is another, corresponding, family of modules under CGI::Office::Contacts::View::*.

Coding and Debugging

It should be clear that if there is work to be done on, say, the Person part of the code, then the work should be restricted to these modules, or their parents:

CGI::Office::Contacts::Controller::Person
CGI::Office::Contacts::Database::Person
CGI::Office::Contacts::View::Person

This is a result of the code structure adopted, and this structure is in turn a direct consequence of mapping the path info to the modules' namespace.

You might be thinking that we're ending up with too many modules, but this is never a problem.

In database design, for example, it's a classic beginners mistake to try to minimize the number of tables used, but this is never a good design policy to be following.

It's the same with code. Each module is small and neat, targetted to one part of the application, and the namespace orients you immediately.

In Perl, it should be mentioned for the non-Perl readers, the structure of module names (i.e. the language) does not force you to inherit modules from parents just because the parents' names are prefixes of the module in question.

However, I always name my modules such that the inheritance tree matches the structure of the modules' names.

Other modules in the suite

Importing

To be released separately are modules such as CGI::Office::Contacts::Import::vCards.

Clearly, this means the controller will be called vCards.pm, and its parent will be Import.pm, and its parent will be Controller.pm.

So the controller component will be CGI::Office::Contacts::Controller::Import::vCards.

Also, the outcome so far indicates precisely what we must do to create handlers for new objects in the future.

For instance, if we allow our entities to have multiple sites (geographic addresses) just as we gave them donations and notes, we must write a controller called CGI::Office::Contacts::Controller::Organization::Sites, whose parent is obvious, and whose place in the scheme of things is also obvious.

Exporting

I haven't written any export code for the new database, but exporting vCards from my email client, and importing them into this package uses XML.

That's not actually a problem, but the above design means that the basic system has no XML-module requirments at all. This is another artifact of the compartmentalized design.

So, the user only needs to install XML-based modules if they choose to import via vCards.

Exiting Cleanly

In the same way we want our programs to exit without error, well-written articles have to exit with a nice conclusion.

To me, the analysis of path info and module structure, and the way these 2 tie so neatly together, means I have gained considerable benefit in feeling assured that the resultant code is as reliable as I can make it.

And that fits in with my personal mantra when operating within the computer industry: Reliability comes first, middle and last.

Author

Perl@Work#1 was written by Ron Savage in 2009.

Home page: http://savage.net.au/index.html

Copyright

	All Programs of mine are 'OSI Certified Open Source Software';
	you can redistribute them and/or modify them under the terms of
	The Artistic License, a copy of which is available at:
	http://www.opensource.org/licenses/index.html

Date

Written 2009-10-09.

Top of page

Starting2rest

Table of Contents

Starting2rest

What this article is, and is not

CGI form data as a selector, or switcher

But who is the selector?

What if the selector were outside our application?

A Digression

So, what can we make of all this?

But wait, there's more!

Model and View

Coding and Debugging

Importing

Exporting