Naming database objects

Table of Contents

Naming Database Tables, Columns and Indexes
Table names
Column names
Column names used in searches
Index names
Handling Yes/No type columns
Sample Code
Q and A
Naming Names
Resources
Author
Licence

Naming database objects

Naming Database Tables, Columns and Indexes

This article discusses my ideal naming scheme for these items.

By ideal I mean that it is:

  • Simple
  • Easy to understand
  • Well structured

    This means that I can write a program to perform database administration, and that such a program can use tables and columns named in a systematic method to automatically determine that certain columns are foreign keys, and so on.

    This avoids using database vendor-dependent methods to determine which columns are foreign keys. And yes, there are other methods for determining foreign keys.

    Of course, the assumption is that you can choose these names...

Table names

Table names obey the following rules:

  • They are composed of letters, digits and underscores

    E.g.: survey_1.

    To use spaces and '-'s in table names often requires quoting the table name, and database interfaces usually have a vendor-specific quote character for this, but it all adds complexity for little gain. The problem with '-' is a-b looks like an arithmetic expression, similar to 1-2, whereas a_b is unambiguously a single token.

  • They are short

    E.g.: city not metropolis.

    To be pedantic, we should use locality since a city has a legislated boundary, at least here in Australia, whereas locality covers city, suburb and any other area. Also this helps us avoid any city-centric bias in our design.

    Note: postcodes (called zip codes in America), can cover several localities, and large localities can have several postcodes. Because of this, Australia Post provides lists of localities and postcodes, and always uses the word locality in these contexts.

    E.g.: service_class but not class_of_service.

    In this case, service_class is not just shorter, but service is a meaningful prefix for related tables, e.g. service_extension_type, service_handset_type, service_number_type, etc.

  • They are single words

    E.g.: entity not organizational_entity.

    E.g.: service not phone_service.

    Of course, sometimes there are exceptions: fees_due, fees_paid. There is nothing to be gained by using subs (an abbreviation of 'membership subscriptions') here in order to satisfy the shortness rule above. This point is not to blindly follow rules, but to use sensible choices guided by rules.

  • They are singular words

    E.g.: person not people.

    Of course the table as a whole holds data for people, while each record is for a person.

    So how do we choose between these possibilities? I use person because it ties in with my convention for naming columns. This is discussed below, but I'll indicate here what I'm thinking.

    Each column name has the table name as a prefix. So, columns in the person table are called person_surname, person_email, etc. Now the column name can be automatically converted into a prompt for use during on-screen data input. Thus person_surname becomes 'Surname'. Likewise 'Email'.

    This - 'person' - is not a special case. Using the singular word as the table name applies to any table.

    Now, as for this data input context: I've written a Perl package which automatically turns table names and column names into CGI forms, i.e. the code fabricates a generic data entry package for any given database schema, and when you are entering data for 1 record it is for 1 person, not for 1 people.

    The package is (provisionally) called CGI::DBI. It's an unpublished Perl module. It will be released when I have taken out the authentication code, which belongs at the web server level, not inside a specific module.

    Hence my interest in the auto-generation of CGI forms based on table and column names.

  • They are lower case

    E.g.: person.

    This is because, as explained below for column names, English nouns start with a capital letter, so a list of table names in lower case can easily be converted into nice-looking nouns by upping the case of the first letter of each, and displaying them in a pop-up menu.

    The end result of such a pop-up menu is not really very fancy. Nevertheless, it can serve as a simple mechanism for use by us when we have to wear our database administrator hats.

Column names

Table names obey the following rules:

  • They are composed of the 2 tokens: 'table name'_'column name'
  • Primary keys are called 'table name'_id

    That is, the table name is used as a prefix for all column names, and the 2 components in the name are separated by '_'.

    For a start, this means every column has a unique name, which makes it very easy to manipulate column names. For instance, there is never any ambiguity about which table a given column belongs to.

    In particular, it covers the case where the primary key for the address table, say, will be called address_id. From this it follows that the foreign key in the person table, pointing to the address table, will be called person_address_id.

    Also, auto-generation of select clauses is very simple, since 'select x' always refers to a unique x.

    Further, the auto-generation of joins is very simple, since something like 'where address_id = person_address_id' always refers to two unique column names.

    This rule also means column names are constructed in such a way that all foreign keys can be automatically determined.

    This is done by chopping the table name off the front of the column name. If the remaining string is a the name of a column in another table, then the (original) string was the name of a foreign key in the current table.

  • The column_name part obeys the same sorts of rules as for table names

    E.g.:

    • site_id is in the site table
    • site_name is in the site table
    • building_id is in the building table
    • building_name is in the building table
    • building_site_id is in the building table

      Further, it is a foreign key into the site table.

    The package mentioned above - CGI::DBI - generates prompts from column names for CGI forms thus:

    • Strip off leading table name and '_'
    • Chop off any trailing '_id'

      Thus person_address_id becomes 'Address'.

      Clearly, cases like address_id, the primary key for the address table, are treated specially.

    • Convert the first letter to upper case

      This is because English proper nouns start with capital letters.

    • Convert '_' to space

    Here are some sample prompts generated via this algorithm:

    • person_surname becomes ‘Surname’
    • person_given_names becomes ‘Given names’

Also, when using rules like these to name columns, graphing database schema becomes easier.

Why? Because to draw lines between boxes which represent tables, the code must be able to report which tables contain foreign keys. This can be done in various ways, but as always, the simpler the better.

Column names used in searches

When a column is to be used as the subject of a search, I create 2 parallel columns, one containing the real data, and one containing a lower case version of the data.

The two columns have the same name, but with the latter column's name also having the suffix _key.

Hence, in the person table I'd have person_name and person_name_key.

Note: some databases, eg MySQL, by default perform case-insensitive matches on text columns.

What I'm suggesting here is a policy, or strategy if you will, on what to do when MySQL is not the platform at hand.

Index names

I've adopted the mindlessly simple method of using the column name as the prefix of the index name, only where (of course) the index is just on one column.

There is some sample code below, but to repeat it here:

Column name: service_class_name_key

Index name: service_class_name_key_index

The component _key is explained in the previous section.

Handling Yes/No type columns

Nothing special is required here.

Let's say we have a company table. Then to record the fact some companies might wish to opt out of a survey, we would have a column called company_opt_out.

We can make several observations about this column name:

  • Two words, opt out, have been chosen to make things clear

    They are short and simple, and no obvious alterative presents itself.

  • The Yes case corresponds to the naming of the column

    Hence the SQL "company_opt_out = 'Yes'" is perfectly clear.

  • The corresponding Perl code is short and simple

    This assumes we've read the data into a hash:

    	if ($$data{'company_opt_out') eq 'Yes')
    	{
    		:
    	}
    	else
    	{
    		:
    	}

    Notice too how the Yes case, the special case, naturally comes at the start of the code.

Let's take another example: Private (silent) phone numbers. Actually they do ring, so they are not silent....

The column in the service table would be called service_is_silent, and the Perl code is:

	if ($$data{'service_is_silent'} eq 'Yes')
	{
		:
	}

Sample Code

Since TMTOWTDI, here's another way:

In our class's constructor, we create an instance of DBIx::SQLEngine:

	$$self{'_sql_engine'} = DBIx::SQLEngine -> new
	(
		'DBI:mysql:db_name:127.0.0.1', 'db_user', 'db_pass',
		{
			AutoCommit         => 1,
			PrintError         => 0,
			RaiseError         => 1,
			ShowErrorStatement => 1,
		}
	);

Here are some comments from the DBIx::SQLEngine docs:

  • The public interface ... is shared by all SQLEngine subclasses. In general, these methods aim to produce generic, database-independent queries, using standard SQL syntax.

  • This release includes subclasses supporting the MySQL, Pg, AnyData, and CSV.

Interestingly, in /perl/site/lib/DBIx/SQLEngine/ you'll find SQLite.pm.

Now we create a table:

	my($table_name) = 'service_class';
	my($column)     =
	[
		{
			name => 'service_class_id',
			type => 'sequential',
		},
		{
			name => 'service_class_code',
			type => 'char(5)',
		},
		{ # This holds lc service_class_code.
			name => 'service_class_code_key',
			type => 'char(5)',
		},
		{
			name => 'service_class_name',
			type => 'text',
		},
		{ # This holds lc service_class_name.
			name => 'service_class_name_key',
			type => 'text',
		},
	];
	$self -> do_drop_table($table_name);
	$self -> do_create_table($table_name, $column);
	$self -> do_create_index($table_name, 'service_class_code_key');
	$self -> do_create_index($table_name, 'service_class_name_key', 'unique');

Here's what the last 3 methods look like:

	sub do_drop_table
	{
		my($self, $table_name) = @_;
		my(@column)            = $$self{'_sql_engine'} -> detect_table($table_name, 1);
		$$self{'_sql_engine'} -> do_drop_table($table_name) if ($#column >= 0);
	}	# End of do_drop_table.
	sub do_create_table
	{
		my($self, $table_name, $column) = @_;
		$$self{'_sql_engine'} -> do_create_table($table_name, $column);
	}	# End of do_create_table.
	sub do_create_index
	{
		my($self, $table_name, $index_column, $unique) = @_;
		$unique  ||= '';
		my($sql) = "create $unique index ${index_column}_index " .
				"on $table_name ($index_column)";
		$$self{'_sql_engine'} -> do_sql($sql);
	}	# End of do_create_index.

We can make a few observation on this code:

  • We're writing Perl, not SQL

    In other words, DBIx::SQLEngine is one of those modules designed and written well, so as to let us users code in the Perl way, rather than having to contort themselves somehow.

    I'm specifically referring here to $column, a classic Perl data structure of an array ref of hash refs.

  • We effortlessly avoid vendor-specific SQL

    In particular, the drop table command does not know, and we don't need to know, about MySQL's 'if exists' extension to SQL.

  • We use delegation as and when it suits us

    In fact, sub do_create_table() adds nothing of its own to the code. We could just as easily have called DBIx::SQLEngine's do_create_table() directly.

    But table creation is not that frequent that we need worry about such optimizations. It just makes the calling code a bit neater.

    The alternative was to call like this:

    	$self -> do_drop_table($table_name);
    	$$self{'_sql_engine'} -> do_create_table($table_name, $column);
    	$self -> do_create_index($table_name, 'service_class_code_key');
    	$self -> do_create_index($table_name, 'service_class_name_key', 'unique');

    I prefer the look of the former version, using a series of $selfs, rather than having that hash element in one case. And it's easier to remember, than having to worry about which particular call must use $$self{'_sql_engine'}.

Q and A

Q: So why put the table name on the column name in the first place?

A: Because site_id and building_id are unique, whereas to call them both id adds complexity for little gain, and having id in every table means every SQL statement containing a join has to have dis-ambiguation code added.

Q: But doesn't this convention mean extra typing?

A: Yes, but that's why I wrote the CGI form generator :-).

E.g.: building_site_id (in the building table).

The column name is building_site_id, and site_id is readily found to be the name of a column, hence building_site_id is necessarily a foreign key.

My package, like others (using perhaps different methods), can then automatically determine database joins by this consistent naming of foreign keys.

This in turn means CGI forms containing drop-down menus can be automatically constructed from foreign keys.

E.g.: If we have a table called person_title, with a primary key person_title_id, then in the person table, the foreign key into the person_title table will be called person_person_title_id.

This may look odd at first, but the convenience of always knowing exactly what's where rapidy becomes a habit.

The algorithm given above for converting a column name into a prompt means person_person_title_id is displayed neatly as 'Person title', the table name prefix person_ and the _id suffix having been dropped.

Not only that, but when displaying data for one person, or when displaying an empty input screen, we can automatically populate the title field with a pop-up menu taken directly from the person_title table.

Naming Names

Wherever plausible, I have a column in each table called 'table name'_name, to be used by display programs.

In the case of a table called person, there would columns like person_given_names, person_initials and person_surname, so person_name would be populated whenever a new record was added, by person_given_names + '1 space' + person_surname, or if there was a column called person_preferred_name, then by person_preferred_name + '1 space' + person_surname.

By default, the auto-generation of pop-up menus just mentioned would use such a column for display, and use the corresponding value of the primary key as the value to associate to the visible string.

Lastly, I say 'Wherever plausible', since in cases like a table of phone services, 'name' is just not appropriate.

Resources

An image of such a schema as described above: http://savage.net.au/assets/images/web-site/aussi.png.

The process of installing the software to create this image is documented here: http://savage.net.au/Ron/html/graphing-database-schema.html.

Other database drawing tools: http://savage.net.au/Ron/html/drawing-tools.html.

Author

Ron Savage.

Home page: http://savage.net.au/index.html

This POD was converted to HTML by /Perl.html#fancy-pom2.pl

  • Version: 1.01 01-Jun-2006

    This version disguises my email address.

  • Version: 1.00 18-Feb-2002

    Original version.

Licence

Australian Copyright © 2002 Ron Savage. All rights reserved.

	All Programs of mine are 'OSI Certified Open Source Software';
	you can redistribute them and/or modify them under the terms of
	The Artistic License, a copy of which is available at:
	http://www.opensource.org/licenses/index.html
 
Top of page