Have you heard about Graph Databases? They are a new way of storing data that helps to show relationships between things or people instead of the traditional style of a relational database. These relationships are given a level of importance and can be used to determine other relationships that your items might have. This introduction should give you an overview of what they are about, things to look out for and since there are several ways that you can use graph databases in your application or site development, a jumping-off point to begin learning more about how they can make your design better.
What is a Graph Database?
Graph databases are unlike traditional relational databases since the data is not stored in rows or columns. There are no key-value pairs or normalization to have to worry about. Data is stored in an object-oriented format and called a “node”. Each node can be defined as a person, thing, object, etc… and relationships between nodes are defined as an “edge”. Nodes can contain as much information as we would like to store, so a person node could have a name, age, address, etc…Unlike traditional database models, each relationship between nodes can be given an identifying label as well as an importance. This way you can see relationships such as “Who works at X and likes Y and has also been to location Z” without having to struggle through multiple join statements.
All nodes can be interconnected so there is a defined and clear relationship to follow throughout the entire data structure. The relationship between the nodes can also have properties so not only can we see that X knows Y, but we can define that X has known Y for a number of years. Traversals can be done throughout the graph database in order to find all of a certain relationship type between nodes or to find nodes that all like the same thing.
There are many advantages to using a graph database:
- Disk-based – No more complex software and server architecture required to maintain a complex or enterprise database system. Most graph storage engines are Solid-State Disk Ready and run in a binary-on-disk format to allow for faster speed.
- Transactional – A lot of the transaction features we are used to having in relational databases has been built-in to graph databases so you get recovery, deadlock detection, etc…
- Scalable – Several billions of nodes, relationships, properties and everything you need can run on a single machine.
- Robust – Many companies that switched to graph databases early in the game have been running them 24/7 in production for almost 10 years.
When Should I Use a Graph Database?
In the past we have used databases mostly to store tabular data such as reports, sales numbers, inventory etc…relational databases were good for this since you could associate things like transactions with items in inventory. However, as the internet has changed, so has our data storage needs. As our data has become more connected, we have outgrown the traditional relational database model.
Sometimes graph databases are the most natural approach to the solution of how the data is inter-related or needs to be stored. For example, social networking sites such as Facebook need to keep track of people that are connected to other people, places, companies or products. While using Facebook, you often see recommendations for friends based on the number of mutual friends, where you went to school or places that you like. You may also see recommendations for products based on something you have liked in the past or something a friend recommends. Since this data naturally falls into a relationship pattern, it is easy to see why a graph database would be the best fit.
Where do I Start?
There are several graph database engines to choose from. Some require installation on your own system and others are available via a cloud connection. Many of them offer tools to help you connect to, setup and populate your database along with documentation and code tutorials.
- Freebase is an open-source, publically accessible graph database that you can use as a back-end for your applications that comes with a nice suite of tools to help you get started. http://www.freebase.com/
- Neo4J is a Java-based graph database that boasts a high-performance, mature, robust NOSQL graph database that works with an object-oriented network structure. http://neo4j.org/
- Trinity is Microsoft’s approach to graph databases and a computing platform. This isn’t officially released yet and is still in research but you can check it out: http://research.microsoft.com/en-us/projects/trinity/
- HyperGraphDB is an extensible, portable, distributed, and embeddable, open-source system designed for artificial intelligence and semantic web projects. http://www.kobrix.com/index.jsp
- InfiniteGraph is a distributed graph database that has a lot of great tools and easy to use interface to help you get started. http://www.infinitegraph.com/
Graph Databases have been around for a while but they are just now picking up steam as we are beginning to see that the traditional model of relational databases is no longer working for us. Since our information is becoming more connected, we need a new way to store data that doesn’t feel like a round peg in a square hole. Graph databases give us the ability to build and grow our data storage in a way that has never been possible before. I hope this article has encouraged you to learn more about graph databases and evaluate whether or not they may be a good fit for your application.