A database is a structured set of data held in a computer or server. Structured is the key word here. Structured data is organised in ways that computers (and hopefully humans) can understand.
For example, nothing will happen if I write “5 5 =SUM(A1:B1)” in a word processing engine. My system is unable to parse my data (or intentions). Alternately, if I use spreadsheet software and write “5” in cells A1 and B1 and “=SUM(A1:B1)” in cell C1, my system can parse my data and intentions. The formula in cell C1 will calculate the sum of 5 and 5, and return an answer of 10. Success!
Now that we’re database experts, let’s drill down into the types of databases. We’ll see that databases can get much more complex than storing data in cells, but they are always used to store and organise data.
Are spreadsheets databases?
Spreadsheets are a type of database wherein data is contained by workbooks of one or more worksheets. Within worksheets, data is organised into columns and rows of cells. This is organisation. You can compare databases vs spreadsheets, but spreadsheets are databases.
They are many other applications for storing/sorting data called Database Management Systems (DBMS).
Database Management System (DBMS) power rankings
A DBMS is software for creating and managing databases. DB-Engines lists over 300 systems representing 11 models of organising data.
Let’s explore these models in order of popularity, along with common systems and use cases for each.
Tier 1: ubiquitous, used everywhere and by everyone
Relational DBMS (RDBMS)
Example systems: Microsoft Access and MySQL
Example users: Codebots (and 99% of apps and websites)
A RDBMS is like spreadsheet software. Data is organised into workbooks of tables containing columns and rows. The terminology for a RDBMS is databases of entities characterised by attributes and records thereof.
The defining characteristic of relational databases is schemas. Schemas define relationships between entities and formatting within attribute records. For example, a schema may specify stores have sales and sales have dates, which are expressed yyyymmdd.
Since its introduction in 1979, RDBMS has largely superseded navigational databases and dominated databasing in many sectors and industries. RDBMS has weathered successive challenges posed by supposed disruptors, partly by accommodating changing needs. For example, some systems of RDBMS have been expanded with non-traditional features, such as user-defined data types, non-atomic attributes, inheritance, and hierarchies.
The rise of NoSQL databases may pose a unique challenge for RDBMS as an answer for nearly all data analytics requirements.
Tier 2: common
Example systems: MongoDB
Example users: Scratchpad by Expedia
Also known as document-oriented database systems, document stores (like spreadsheets) have no schema. This means records can be non-uniform and/or have a nested structure and values can be stored as arrays.
Document-stores are a leading example of NoSQL database. This type of DBMS is growing in popularity as solutions to specific data management challenges arise.
Document stores share some common elements with graph databases, and can be categorized as a subclass of key-value stores. Native XML databases can likewise be categorized as a subclass of document stores. Merging these types of databases, however, yields no real advantage. Each has unique strengths and weaknesses and advantageous use cases.
This situation is endemic of there only being X number of ways computers can store and interrogate data. Similarities between methodologies are bound to arise. For example, Elasticsearch (nominally a search engine database) provides enough of the core operations on documents to be categorized as a document-oriented database.
Document stores can use internal notations (such as JSON), which are processed directly. JSON documents can be stored as a string in a key-value store or relational database. Again, blurring the lines of database categorization.
Basically, these databases are designed to efficiently store and update documents.
Example systems: Redis
Example users: Stack Exchange
In key-value stores, keys (or ids) and values are stored in pairs. Each key has one and only one value. A key-value store is like a RDBMS with one id attribute and one other attribute.
Some systems allow range and ordering queries. Generally, however, key-value stores are simple. The trade-off is complexity for speed.
These simple systems are normally inadequate for complex functionality. They can, however, be applied in embedded systems or as high performance in-process databases. Key-value stories are commonly used alongside document stores and wide column stores.
Example systems: Elasticsearch
Example users: Codebots Library and Vimeo
Search engines are NoSQL database management systems designed for searching (rather than editing) data.
Some features of search engines include:
- Support for complex search expressions
- Full text search
- Stemming (reducing inflected words to their stem)
- Ranking and grouping of search results
- Geospatial search
- Distributed search for high scalability
Wide column stores
Example systems: Cassandra
Example users: Netflix
Also known as extensible record stores, wide column stores store data in records (rows) of billions of dynamic columns. (Because the focus in on records, wide column stores are not strictly column stores). This is like how a RDBMS stores data. Unlike a relational database, however, wide column stores have no schema. They are a type of NoSQL database.
Tier 3: big fish
Example systems: Neo4j and Microsoft Azure Cosmos DB
Example users: Ebay
Also known as graph-oriented DBMS or graph database, a graph DBMS is based around structures of nodes and edges. Edges define relationships between nodes. These databases are designed to efficiently render visual representations of data.
One Ebay spokesperson said, “Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code. At the same time, Neo4j allowed us to add functionality that was previously not possible.”
Nearly all graph DBMS are multi-model. Microsoft Azure Cosmos DBMS, for example, combines document store, key-value store, wide column store, and, of course, graph DBMS.
Multi-model systems are a method of compensating for limited range in utility by any one DBMS. In conjunction, with document store, key-value store, wide column store capability, for example, a graph-oriented DBMS has serious firepower.
Time series DBMS
Example systems: InfluxDB
Example users: PipelineFX
Time series DBMS is optimised for handling time series data. For starters, each entry is associated with a timestamp.
These databases are designed to efficiently collect, store and query various time series with high transaction volumes. Although time series data can be managed with other categories of DBMS (from key-value stores to relational systems), specialised systems, which have sacrificed broad utility for speed in specific functions, have advantages given appropriate contexts.
Time series databases have struggled to gain much use.
Native XML DBMS
Example systems: BaseX
Also known as NXD, a native XML DBMS is a database management system whose internal data model corresponds to XML documents. It’s a subclass of document stores.
In contrast with an XML-enabled DBMS, which is only capable of storing data as XML documents, a NXD has all the capabilities of XML. For example, an NXD can represent hierarchical data, understand embedded PCDATA declarations in XML elements, and utilize XML-specific query languages â€‹â€‹such as XPath, XQuery or XSLT.
Native XML DBMS do not necessarily store data as XML documents, they can store other formats when doing so provides better efficiency.
Example systems: Jena
The Resource Description Framework (RDF) is a methodology for describing information. Originally its main utility was describing metadata, now, however, it’s used much more generally, often in connection with the sematic web, but also in other applications.
The RDF model represents information as triples in the form of subject-predicate-object.
Database management systems, which are able to store and process such triples, are called RDF stores or triple stores.
RDF stores offer specific methods going beyond those of graph DBMS. For example, SPARQL, a SQL-like query language for RDF data, is supported by many RDF stores.
Tier 4: minnows
Example systems: Db4o
Also known as object databases, object-oriented DBMS were developed in the 1980s.
An object-oriented DBMS follows an object-oriented data model with classes, properties, and methods. An object is always managed in its entirety. This means, for example, creating an object, which in a relational system would probably be stored across multiple tables, will be performed automatically, as one atomic transaction, and without any action by the application program.
Object-oriented DBMS was once conceived as disruptor to RDBMS.
In recent years, however, the classic relational database management systems have been extended with some object-oriented features, such as user-defined data types and structured attributes. Some of these extensions were even standardized within SQL.
This, plus other features, tools and architectures enabling storage of objects within relational databases (such as Hibernate or JPA), hinder the widespread use of object-oriented systems.
Example systems: Adabas
Multivalue DBMS organise data as entities with attributes and records. Unlike a spreadsheet or RDBMS, however, multivalue databases can assign more than one value within an record’s attribute.
Although some RDBMS systems allow multivalue records, it should be used sparingly in RDBMSs. In contrast, multivalue forms the basis for data modeling in a multivalue DBMS.
Example systems: Jackrabbit
Also known as content repositories, content stores are database management systems specialized in managing digital content, such as text, pictures, videos, and metadata thereof.
As with other types of DBMS, content stores sacrifices broad utility for speed in specific functions.
Example systems: Event Store
Event stores are database management systems implementing the concept of event sourcing. These databases record all state changing events for an object and log these with a timestamp. In doing so, event stores create a time series.
The premise behind event stores is recording history rather than state. Instead, users to infer state by replaying events. This is in contrast with other types of DBMS, which only store state (unless explicitly modeled to do otherwise).
Supported operations on event stores include recording new events and querying event time series for objects. Modifying and deleting events is unsupported. This simplifies maintaining consistency in distributed systems. However, many event stores have performance issues necessitating snapshots. This partially defeats the purpose of organising data by history instead of state.
Tier 5: dying
Example systems: IMS and IDMS
The term navigational DBMS describes a class of database management systems wherein data is only accessible via linked records.
This model was developed in the 1960s, and was first DBMS able to manage large amounts of data. Depending on the flexibility of linking, systems can be sub-grouped into hierarchical DBMS and network DBMS. Both of these subclasses have been largely superseded by RDBMS.
Which database or DBMS is best?
The above power rankings are in terms of popularity, so if your definition of best is most popular, RDBMS is a clear winner.
DBMS power rankings:
- Document stores
- Key-value stores
- Search engines
- Wide column stores
- Graph DBMS
- Time series DBMS
- Native XML DBMS
- RDF stores
- Object-oriented DBMS
- Multivalue stores
- Content stores
- Event stores
- Navigational DBMS
No, really, which DBMS is best?
None. Each database model and system has strengths and weaknesses. For some, such as event stores, use cases are limited. Arguably, event stores are unbeatable within their narrow range of utility.This is in contrast with RDBMS, which are widely used and can nearly do everything. The downside with relational databases, such as MySQL, is they are slower than specialized systems in specialized tasks.
This is good news for business managers, who can more easily weigh the pros and cons of each model based on high-level considerations.
Is the desired outcome visual representation of geospatial data? Perhaps a graph DBMS is right for you. If, however, you’re building a web app, and can only afford a single database (no multi-modelling), a RDBMS such as MySQL will rarely be the wrong choice.
There is no best database in a generalized sense, but there can certainly be more appropriate types of database system in particular circumstances.