Images in Files or in Databases

Table of contents

Images in Files or in Databases
Referencing a File
Caching and Benchmarking
Caching and mod_perl
Images and their Metadata
Backup
Author
Licence

Images in Files or in Databases

These are my thoughts on a small part of this question.

Firstly, my policy is: I put data in databases and files in directories.

And I put images in files because I think of images as self-contained objects.

The question of metadata relating to the images is discussed below.

All my work happens to be in the CGI environment, so that's what I'll use to illustrate my examples. YMMV.

It should be obvious that someone starting with different assumptions and different requirements could well come to a different solution.

And if this document provokes someone to write an article supporting the contrary argument, good!

Either way, I see the primary purpose of this document to be a way of encouraging people to work through the design process first, rather than mindlessly dictating that one particular course of action 'must' necessarily and always be the best solution in all cases.

That is, I'm really addressing myself to the attitude people take to arriving at a solution, rather that the solution itself. So, that makes this a psychological perspective rather than a software perspective.

Referencing a File

When creating HTML, and 'img' tag will contain something like:

        img src = "/images/flower.png"

(I'm avoiding angle brackets in this document for simplicity.)

The URL /images/flower.png undergoes translation by the web server and is assumed in this discussion to refer directly to a file.

Now, as I see it, there are no software agents between the web server and the file system (except for the ever-present OS, which I ignore).

That is, the web server hits the file system to retrieve the image.

If I was to store the image in a database, retrieving it would require something like:

        img src = "/cgi-bin/get-pix.cgi?name=flower.png"

Now the OS has to hit the file system to load and run get-pix.cgi, which is a database client.

Then get-pix.cgi messages the database server which in turn hits the database, i.e. the file system, to retrieve the image.

The database server sends the image to the database client, get-pix.cgi, which in turn sends the image to the web server.

So, the latter mechanism has inserted 2 software agents between the web server and the image's data: The database client and the database server.

My claim, then, is that in this specific situation, the former mechanism for referring to files is the better solution.

Caching and Benchmarking

You might like to throw the word caching into the discussion, but caching can be used in many situations, and to some extent confuses the issue.

So try benchmarking. This is better: Now the decision on which design to adopt will be (or should be) based on measurements, rather than on someone raving about how one mechanism 'must' be always better than the alternatives.

Caching and mod_perl

You might think that using mod_perl to cache get-pix.cgi will help. Nope:

mod_perl. Ha ha ha!

Using mod_perl says you're in a CGI (HTML-generating) environment, in which case I will argue that the first alternative above:

        img src = "/images/flower.png"

is the fastest way to go.

mod_perl. Ha ha ha!

Why have you introduced mod_perl? To overcome the complexity arising from the presence of a database client and a database server.

Yep, you're trying to add complexity to solve the problem of complexity.

Images and their Metadata

I acknowledge the system may well have image metadata which needs to be maintained in parallel with the images.

And I would recommend keeping such metadata in a database, rather than in, say, text files. After all, data goes in databases - even I know that!

But I'm just not convinced that the images themselves have to be in a database. Can anyone explain why that would be better, as distinct from just asserting that it 'must' be better?

There is an argument that it is easier to keep the images and their metadata together by keeping the information all in a database. Well, arguable I guess.

And yet, losing the images because they are in the file system and not in the database along with their metadata sure looks like incompetence to me, and this point alone is not enough to convince me that putting the images in the database is always best.

Backup

The complexity of all this is highlighted when we consider backup.

My preference for images in files in directories, rather than images in databases, means that backing up a consistent set of images in files and metadata in databases is more awkward than if both images and their metadata were in the same database.

But even this doesn't convince me - but of course it does mean more effort in designing the backup regime.

Author

Ron Savage .

Home page: http://savage.net.au/index.html

Version: 1.01 01-Jun-2006

This version disguises my email address.

Version: 1.00 12-Apr-2005

Original version.

Licence

Australian Copyright © 2002 Ron Savage. All rights reserved.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html