This was written 2014, I wonder how relevant it is today (2023). Much of it points are theory, which should not have changed. 😀
Barry Morris, CEO of Nuodb, has written a series of articles about the “Holy Grail”, which he published at the Cloud Computing Journal, and somewhere within the NuoDB site.
The most import ant contribution that Morris makes, in my mind is that there are four models of scale out RDBMS. (Shared Disk, Shared Nothing, Synchronous Commit and their own Durable Distributed Cache invented, (or maybe substantiated), by Jim Starkey.)
Unsurprisingly, Morris’ third article extolling the superiority of what he has to sell does not, as far as I can see describe how the consistency property is met. I need to re-read the MVCC part of the article. MVCC is based on a file/item append model. MVCC obviates locks (How?) and thus removes a massive part of the seriality of a DBMS which is good because not only do we have Brewers Theory to deal with, but also Amdahl’s Law. The un-answered question to me is how does the relevant cache partition ensure that the page copy it gets from a remote node is the most recent and not required to be locked for update? He states the relationships are asynchronous between nodes, so we are back to eventually consistent, it would seem.
From Morris’ article we learn that NuoDB (like MarkLogic?) and in fact like MySQL where Starkey worked for a while consists of a Transaction Engine and a Storage Manager entity.
Morris mentions Google F1, which is used to support their ad keywords database.It is based on Google’s Spanner which seems pretty much their answer to the CAP theorum, we’ll have to see what the latency cost is like, but being Google it may not be publicly open source.
Morris’ article does not reference Brewer’s CAP theory.
I originally posted some links via delicious but they went away a lng time ago. I found this, https://dzone.com/articles/understanding-the-cap-theorem later. At some stage I found the proof that the CAP theorem was a theory.
Can we break Brewer’s theory?
I need a, personally, accessible definition of Consistent, Available and Partition Aware. (The first two are easy). Although the wikipedia entry, CAP Theorum has a pretty good set of definitions
In theoretical computer science, the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:[1][2]
- Consistency (all nodes see the same data at the same time)
- Availability (a guarantee that every request receives a response about whether it was successful or failed)
- Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
It’s likely I suppose that we might engineer to ensure that the failing condition is so trivial it can be ignored.
The commonest compromise is between availability and consistency although eventual consistency is a relatively modern construction.
Shared disk clusters engineered for HA on a fail fast and recover algorithm are a solution that fails the Availability requirement of the CAP theorum although they have a zero RPO and can have relatively short RTOs.
Here’s the sponsored Bloor paper on NuoDB.
The Jim Starkey wikipedia article references a 2012 patent that patents “A multi-user, elastic, on-demand, distributed relational database management system.” At some time the site hosting the patent went away, maybe because the patent had expired, but I found this, which is probably a record of patent. We’ll see? Probably the patents that protect the Nuodb products.
ooOOOoo
The NHS have decided to replace Oracle with RIAK for the “spine”. This claims partition tolerance and availability.
http://www.aerospike.com/ is another hi-performance, scale-out database.
When considering XML/RDF optimised databases, I have been pointed at Virtuoso, which has a wikipedia page here. and a white papers page here.
Minor amendments made to this page, including noting how old it is.