Lately I have been working with a set of customers on a longer term basis which has given me time to explore new tools using their environments. One tool that I am finding very helpful is called SchemaSpy.
SchemaSpy is a Java-based tool (requires Java 5 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.
One of the things that I like about ERD tools is that I can quickly generate a graphic representation of the database to see which tables are referenced the most, and which are candidates for dropping (why keep old data if you don’t need to?). I get up to speed a lot faster and can contribute to the Development process a lot more effectively when I know the relationships between tables versus observing queries only. An additional benefit is that SchemaSpy is command-line driven and builds the html after each run, so I find it convenient to set it up on crontab so that schema changes are automatically picked up. Nice, eh?
Usage of this tool is very straightforward and it does a lot of the work for you. Basically if your database has Foreign Keys then you’re laughing, if not don’t despair, you just have a bit more work ahead of you. Most of the time SchemaSpy “does the right thing” but sometimes you need to give it a little help in the form of metadata files.
Installation of SchemaSpy and Dependencies
- Download the latest copy of SchemaSpy jar file (5.0.0 at time of writing)
- Make sure you have latest copy of Java JRE for your platform
- You will need a Java driver for your database — I’m using the MySQL Connector/J
- Install the graphiz package
Creating a mysql.properties file
While not explicitly required, I prefer to create this once so that the command line is shorter and neater. In my case here is the properties file in use for my Percona Server 5.6.10 sandbox:
description=MySQL driver=com.mysql.jdbc.Driver connectionSpec=jdbc:mysql://127.0.0.1:5610/schemaspy driverPath=/usr/share/java/mysql-connector-java.jar
CREATE TABLE `parent` ( `parent_id` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`parent_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `child_A` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `parent_id` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `child_B` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `parent_id` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id`), KEY `parent_id` (`parent_id`), CONSTRAINT `child_B_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`parent_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
Notice that in the case of child_A there is an implicit FK relationship to parent via parent_id, however in child_B it is explicit. This has an impact on how SchemaSpy identifies the relationship, and whether you need to provide a hint or not.
Running SchemaSpy (with Foreign Keys)
When you run schemaSpy without a metadata file, you will see that the relationship between parent and child_B is properly displayed:
Table child_A can be made to display if you tag the Implied relationships box, but this generally only works if you have used the exact same name (in this case, parent_id) in the child and parent tables. Your environment may be like a lot of shops that we see, where the parent table is known as id, and on the child tables they reference it as parent_id — this might make sense to Developers but it doesn’t help SchemaSpy at all. This is where Metadata files come in to play
Creating a Metadata File
The format of the metadata file is XML-based. I have included the metadata file I used to properly link child_A to parent table:
<?xml version="1.0" encoding="UTF-8"?> <schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.sourceforge.net/xmlschema/2011/02/05/schemaspy.meta.xsd"> <comments> Main Production Database. Percona, Michael Coburn </comments> <tables> <table name="parent" comments="Parent table"> <column name="parent_id" primaryKey="true"> </column> </table> <table name="child_A" comments="Non-FK relationship, implicit relationship to parent"> <column name="id" primaryKey="true"> </column> <column name="parent_id"> <foreignKey table="parent" column="parent_id"/> </column> </table> </tables> </schemaMeta>
If you’re looking for other examples you can find a more comprehensive example here.
The syntax below if specific for MySQL, but note that SchemaSpy works for pretty much all the popular RDBMS out there.
java -jar schemaSpy_5.0.0.jar -t mysql -u msandbox_ro -p msandbox -meta schemaspy.meta.xml -o /var/www/schemaspy/
The -o directive tells SchemaSpy where to write the output to, make sure this is a directory that can then be accessed by a web server. Also note that since I created the mysql.properties file in the same directory as the jar file, I don’t need to pass any host:port or schema name information.
Viewing SchemaSpy output
A completed Relationships view of these three tables should look like this:
While I find the Relationships tab the most useful component of SchemaSpy, I don’t want to leave you with the impression that this is the only component of the tool. There are additional tabs:
- Tables – Names, number of children, parents, count of columns, row counts, and comments — a great way for a high level overview of the table sizes and an easy way to use the search feature of your browser to zero in on a particular table
- Constraints – lists the explicit Foreign Key constraints in the database (this does not include constraints identified via metadata files!)
- Anomalies – Identifies possible relationships between columns/tables based on names, tables without indexes, columns flagged ‘nullable’ and ‘must be unique’ (woops!), single column tables, incrementing column names in tables, and tables with the string NULL instead of the actual SQL NULL value. This is basically a quick sanity check of your schema for any significant errors or items requiring review.
- Columns – a listing of all columns in the schema, really handy to sort by name to see if you have any implicit constraints that you might have missed and can then write into your metadata file.
- Donate – This is free software and John Currier asks for donations so he can justify the time spent maintaining SchemaSpy to his wife
Finally don’t forget to automate SchemaSpy via crontab once you’re done.
What are some ERD tools you use and how do they compare with SchemaSpy? Feel free to answer via the comments. Thanks for reading!