What's "New" in NewSQL?

Series: system March 13, 2019

NewSQL has become the new sexy in the DBMS area. Many startups tout their products as NewSQL databases. However, is NewSQL really new? How does NewSQL come about and what does the term mean? I will share some facts and thoughts below.

This article is mostly inspired by “What’s Really New with NewSQL?” (by Andrew Pavlo and Matthew Aslett, SIGMOD Record, June 2016), which is amazing to read.

A Little About The History

The first DBMSs came out in the mid-1960s, e.g., IBM’s IMS. It separates an application’s code from data that it operates on, and frees developers to write applications without concerns about data manipulation performance. IMS was later followed by the pioneering work in the early 1970s on the first relational DBMSs, IBM’s System R and INGRES from the University of California. The late 1980s and early 1990s saw the rise and fall of object-oriented DBMSs, many of whose ideas are incorporated into document-oriented NoSQL systems over two decades later. Moreover, two all-star open source databases, MySQL and PostgreSQL, started during the 1990s.

Radical changes took place with the arrival of the 2000s. Internet applications sprouted around the world, and scalability was the new dimension. Systems need to scale to support massive concurrent users and provide online services with high availability. Thus, custom middlewares were developed to shard single-node DBMSs over a cluster of less expensive machines. With the help of such middlewares, the system presents a single logical database to users, while data and queries are stored and executed in the whole cluster. Two notable examples are eBay’s Oracle-based cluster[1] and Google’s MySQL-based cluster[2].

Early sharding middlewares faced challenges when it comes to joining tables or updating more than one record in a transaction. At that time, many companies, especially internet companies, re-thought their needs for consistency and correctness at the expense of availability and performance. Many of them turned away from traditional DBMSs and developed their own database systems. Likewise, the relational model was also under heated debates whether it is the best way to represent an application’s data. And many agreed that using SQL may be overkill for simple lookup queries. (This was the time NoSQL really means “No SQL!”. This movement of saying no to SQL does not succeed as it was expected from today’s view, and NoSQL becomes Not-Only SQL.)

These rethinks add impetus to the NoSQL movement in the mid of late 2000s[3]. The key to this movement is to develop new database systems that forgo strong transactional guarantees and the relational model of traditional DBMSs. Eventual consistency and K-V/Graph/Document models become the new cool types of equipment. The two well-known pioneering systems are Google’s BigTable and Amazon’s Dynamo. Based on them, many similar systems, including Facebook’s Cassandra (based on BigTable and Dynamo) and PowerSet’s HBase (based on BigTable), were born to follow the fashion. Besides, among startups, the most well-known one is MongoDB, which carries the flag of NoSQL to the end of today.

NoSQL led the fashion, and at the same time created more chaos. By the end of the 2000s, many applications found themselves at awkward situations that they cannot give up strong transactional and consistency requirements as they thought. Also, it was painful for developers to spend too much energy writing code to handle inconsistent data. Thus, it is in this environment that brought about NewSQL systems, which aims to provide the same scalable performance of NoSQL for OLTP read-write workloads while still maintaining ACID guarantees for transactions.

The State of The Art

Memory Is The Main Battlefield

Traditional database systems are disk-oriented and optimized for block-addressable storage. Devices, like an SSD or HDD, provide large capacities with low prices compared with DRAM chips. However, data accesses on disks invoke high latencies (million seconds) and result in poor performance. For OLTP databases, it is affordable now for us to store all but the most massive databases entirely in memory. Moreover, NVM devices provide promising opportunities to remove disks entirely from the storage hierarchy. The benefit of this approach is that DBMS no longer has to assume that a transaction can access data outside of memory and stalls the whole execution. Also, NVM enables new optimizations to reduce transaction overheads from many aspects previously related to disks, such as WAL[4].

Memory is now the main battlefield. There are several NewSQL DBMSs that are based on a main memory storage architecture, including H-Store, Hyper, VoltDB, MemSQL, SAP HANA, etc.

  • H-Store. H-Store is an experimental main-memory, row-based distributed DBMS optimized for OLTP applications. It is an academic project collaborated between MIT, Brown Univ., CMU., Yale Univ. and Intel. The final release is around 2016, and a more state-of-the-art DBMS based on H-Store’s architecture is VoltDB.
  • Hyper. Hyper is a main-memory-based relational DBMS for HTAP workloads. The project starts at TUM and acquired by Tableau Software.

Concurrency Control, Nothing Significantly New

Distributed systems are categorized into two classes depending on whether the system uses a centralized or decentralized transaction coordination protocol. In a centralized coordinated system, all transactions go through the coordinator, and the coordinator makes decisions whether to proceed with the transaction or not. While in a decentralized system, each node maintains their transaction states and requires the clocks in the cluster are highly synchronized to guarantee a global ordering of transactions.

Two-phase locking (2PL) is commonly used in distributed DBMSs since the 1970-80s. It can be used in a cluster of shared-nothing nodes with a centralized coordinator, such as SDD-1. Also, it can be used in a decentralized cluster with centralized deadlock detection, such as IBM’s R* and the distributed version of INGRES.

2PL is complicated due to the complexity of deadlock detections. Concurrency control based on timestamp ordering(TO), such as decentralized MVCC, is now widely used in almost all of NewSQL systems, like MemSQL, HyPer, HANA, and CockroachDB. Maintaining multiple versions potentially allows transactions to still complete even if another transaction updates the same tuples. It also allows for long-running, read-only transactions to not block on a writer. 2PL and MVCC view the world of transactions from two opposite aspects: the former is pessimistic about transactions that they are easily running into conflictions, while the latter is optimistic that few transactions will end into conflictions. Instead of interleaving transactions as in MVCC, VoltDB uses another TO concurrency control scheme, which schedules transactions to execute one-at-a-time (in each partition).

Apart from above, other systems use a combination of 2PL and MVCC together. Modifying the database still requires transactions to get a lock as 2PL specifies, but when a transaction modifies a record, it creates a new version of the record as MVCC does. Thus, read-only queries avoid requiring locks and blocking on writing transactions. The most famous systems adopt this scheme are MySQL’s InnoDB, Google’s Spanner, NuoDB and Clustrix.

Replication and Crash Recovery The best way to ensure HA and data durability is to replicate data across multiple machines (HDFS is a good example of this fashion). However, this introduces other design challenges. The first is how the DBMS enforces data consistency across nodes. Strongly consistent strategy requires a transaction’s writes must be acknowledged at all replicas before the transaction is considered committed. This ensures that replicas can serve read-only queries and still be consistent. Nearly all of the NewSQL systems support strongly consistent replication. The second challenge is crash recovery. Unlike traditional DBMSs, NewSQL systems must ensure that no updates are lost, and at the same time manimize downtime. In a distribtued DBMSs with replicas, when the master node crashes, the system need promote one of the slaves to become the new master. When the previous master node comes back on-line, except for loading in its last checkpoint and returning its WAL, it needs to get the updates from the new master that is missed when it was down. Paxos and Raft are used as fundamental infrastructures in NewSQL systems. While, all of these are standard procedures and technologies that can trace back to 1990s.

[1]R. Shoup and D. Pritchett. The eBay architecture. SD Forum, November 2006.
[2]J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed sql database that scales. Proc. VLDB Endow., 6(11):1068–1079, Aug. 2013.
[3]R. Cattell. Scalable sql and nosql data stores. SIGMOD Rec., 39:12–27, 2011.
[4]Arulraj J, Perron M, Pavlo A. Write-behind logging. Proceedings of the VLDB Endowment. 2016 Nov 1;10(4):337-48.

built with , Jekyll, and GitHub Pages — read the fine print