How to Write Proper, Well Structured mySQL Database Schemas

When developing a software project one of the most important, foundational and intrinsic aspects is a properly structured database schema. It's the equivalent of when building a house you need to ensure the foundation is properly laid, otherwise the chances of building a quality house are drastically reduced. Surprisingly easier than one would think, let's learn the various facets used to write a well architect ed database schema.

CREATE TABLE Syntax

To begin, open your favourite text editor to a new file as a database schema is nothing more than a plain text file. A database consists of multiple tables each of which consist of columns, and the CREATE TABLE syntax is used to create a single table. Here's a basic example:


CREATE TABLE users (
    id INT NOT NULL, 
    is_active TINY INT NOT NULL, 
    full_name VAR CHAR(100) NOT NULL, 
    email VARCHAR(100) NOT NULL
);

As you can see this will create a database table named which consists of four columns . This should be a fairly straight forward SQL statement beginning with , followed by the name of the database tables, then within parentheses the columns of the table separated by a comma.

Use Correct Column Types

As shown above, the columns the table will consist of are separated by commas. Each column definition is comprised of the three same parts:

<cb:COL_NAME     TYPE     [OPTIONS]

The name of the column, followed by the column type, then any optional parameters. We'll get into the optional parameters later, but concentrating on the column type, below lists the most commonly used column types available:

Type Description

INT
Integer, supports values up to (+/-) 2.14 billion.  Most communly used integer type, but the following with respective ranges are also available:<ul><li>TINYINT - 128.  Great for booleans (1 or 0).</li><li>SMALLINT - 32k</li><li>MEDIUMINT - 3.8 million</li><li>BIGINT - 9.3 quintillion.</li></ul>

VARCHAR(xxx)
Variable length string that supports virtually all non-binary data.  The <c:xxx> within parentheses is the maximum length the column can hold.

DECIMAL(x,y)
Stores decimal / float values, such as prices or any numeric values that aren't whole numbers.  The numbers within the parentheses of <c:(x,y)> define the maximum length of the column, and the number of decimal points to store.  For example, <c:DECIMAL(8,2)> would allow numbers to be maximum six digits in length plus formatted to two decimal points.

DATETIME / TIMESTAMP
Both hold the date and time in YYY-MM-DD HH:II:SS format.  You should use TIMESTAMP for all row meta data (ie. created at, lst updated, etc.) and DATETIME for all other dates (eg. date of birth, etc.).

DATE
Similar to DATETIME except it only stores the date in YYY-MM-DD format, and does not store the time.

TEXT
Large blocks of text, can store up to 65k characters.  The following are also available with their respective ranges:<ul><li>MEDIUMTEXT - 16.7 million characters.</li><li>LONGTEXT - 4.2 billion characters.</li></ul>

BLOB
Used to store binary data such as images.  Supports maximum size of 64kb, and the following with respective size limits are also supported:<ul><li>TINYBLOG - 255 bytes</li><li>MEDIUMBLOB - 16MB</li><li>LONGBLOG - 4GB</li></ul>

ENUM(opt1, opt2, opt3...)
Only allows the value to be one of the pre-defined values specified within the parentheses.  Good for things such as a status column (eg. active, inactive, pending).

For all intents, the above column types are all that you need to write well constructed mySQL database schemas.

Define Column Options

When defining columns there are also various options you may specify. Below is another example of the statement:


CREATE TABLE users (
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
    username VARCHAR(100) NOT NULL UNIQUE, 
    status ENUM('active','inactive') NOT NULL DEFAULT 'active', 
    balance DECIMAL(8,2) NOT NULL DEFAULT 0, 
    date_of_birth DATETIME, 
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

The above may look a little daunting, but fret not, it's quite simple. Broken down, here's what is happening in the above statement:

  • You should always use NOT NULL on all columns possible to help with speed and performance of the table. This simply specifies the column can not be left empty / null when a row is inserted.
  • Always try to keep the column size as small as realistically possible, as it helps improve speed and performance.
  • The column is an integer, is also the primary key of the table meaning it's unique, and will increment by one each time a record is inserted. This should generally be used on all tables you create so you can easily reference any single row within the table.
  • The column is an ENUM and must either have a value of "active" or "inactive". If no value is specified, a new row will begin with the status of "active".
  • The column starts at 0 for every new row, and is an amount that is formatted two two decimal points.
  • The column is simply a DATE but also allows for a null value as the date of birth may not be known upon creation.
  • Last, the column is a TIMESTAMP as it's current row meta data, and defaults to the current time when the row was inserted.

The above is an example of a nicely structured database table, and should be used as an example going forward.

Link Tables Together With Foreign Key Constraints

One of the greatest advantages of using relational databases such as mySQL is its excellent support for foreign key constraints and cascading. This is when you link two tables together by a column, forming a parent child relationship, so when the parent row is deleted the necessary child rows are also automatically deleted. Here's an example:


CREATE TABLE users (
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
    username VARCHAR(100) NOT NULL UNIQUE, 
    full_name VARCHAR(100) NOT NULL, 
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
) engine=InnoDB;

CREATE TABLE orders (
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
    userid INT NOT NULL, 
    amount DECIMAL(8,2) NOT NULL, 
    product_name VARCHAR(200) NOT NULL, 
    FOREIGN KEY (userid) REFERENCES users (id) ON DELETE CASCADE
) engine=InnoDB;

You will notice the FOREIGN KEY claue s the last line. This line simply states this table contains child rows that are linked by the column to their parent row, which is the column of the table. What this means is, any time a row is deleted from the table, mySQL will automatically delete all corresponding rows from the table helping ensure structural integrity within your database.

Also note the <c:engine=InnoDB> at the end of the above statement. Although InnoDB is now the default mySQL table type, it wasn't always, so this should be added just to stay on the safe side. Cascading only works with InnoDB tables, and not the other popular table type of myIsam.

Design With Confidence

You're now well on your way to architecting solid, well structred mySQL database schemas. Using the above knowledge you can now write well organized schemas that provide both, performance and structural integrity.