In this post, you will learn about NoSQL databases types and basic features of different NoSQL database types. NoSQL databases can be broadly categorized into four categories.
- Key-Value databases
- Document Databases
- Column family NoSQL Database
- Graph Databases
NoSQL Database Types Introduction
Let’s go through the short introduction and understand the features of all these NoSQL database types below. NoSQL databases are widely used in Big Data and provide operational intelligence to users.
Key-Value databases
It has a Big Hash Table of keys and values which are highly distributed across a cluster of commodity servers. Key-Value databases typically guarantee Availability and Partition Tolerance.
Key-value databases trade off the Consistency in data in order to improve write time.
The key can be synthetic or auto-generated which enables you to uniquely identify a single record in the database. The values can be String, JSON, BLOB etc.
Among the most popular key-value databases are Amazon DynamoDB, Oracle NoSQL Database, Riak, Berkeley DB, Aerospike, Project Voldemort, IBM Informix C-ISAM.
Document Databases
The main concept behind document databases is documents which can be JSON, BSON, XML, and so on. Document databases store documents and retrieve documents.
The data structure defined inside the document databases is hierarchical in nature which can be a scalar value, map or a collection. It is similar to a key-value database but the only difference is that the document database stores the data in form of a document which embeds attribute metadata associated with the stored content.
Every document databases use their own file structure to store data. For example, Apache CouchDB uses JSON to store data, javascript as its query language and HTTP protocol for its API’s.
Among the most popular document databases are MongoDB, Informix, DocumentDB, CouchDB, BaseX.
Column family NoSQL Database
Column family NoSQL database is another aggregate oriented database.
In column family NoSQL database we have a single key which is also known as the row-key and within that, we can store multiple column families where each column family is a combination of columns that fit together. Column family as a whole is effectively your aggregate. We use row key and column family name to address a column family.
It is, however, one of the most complicated aggregate database but the gain we have in terms of retrieval time of aggregate rows. When we are taking these aggregates into the memory, instead of spreading across a lot of individual records we store the whole thing in one database in one go.
The database is designed in such a way that it clearly knows what the aggregate boundaries are. This is very useful when we run this database on the cluster.
As we know that aggregate binds the data together, hence different aggregates are spread across different nodes in the cluster.
Therefore, if somebody wants to retrieve the data, say about a particular order, then you need to go to one node in the cluster instead of shooting on all other nodes to pick up different rows and aggregate it.
Among the most popular column family NoSQL databases are Apache HBase and Cassandra.
Graph Databases
Graph databases store data in the form of the graph.
Let us try to understand what the graph is. A graph is a mathematical model used to establish a relation between two objects.
We will discuss the whole concept of graph database taking Neo4j as the base database.
Neo4j is an open source NoSQL graph database implemented in JAVA and Scala. The source code is available on GitHub and is used by companies such as Walmart, eBay, LinkedIn etc.