Recently, Google Cloud Platform announced the availability of an additional database option for our customers: Google Cloud Bigtable. Now Cloud Platform Developers have another Google managed solution for storing their data. So, when do you use which tool?
Are you using some sort of relational database with tables, views, and indices? Does your application rely on stored procedures and custom table views or write joins? Do you need the certainty of transactions and ACID compliance? Do you generally both read and write to the data, not in equal amounts, but not lopsidedly one or the other? If you answered “yes” to a lot of these you probably want to investigate or stick with Cloud SQL, which as its name suggests, is an implementation of MySQL hosted by Google.
Cloud SQL does have all of the limitations of MySQL. Certain types of applications don’t require the complexity of normalized data with no duplication. Other cases require scaling in a manner that SQL cannot handle without additional complexity, reduced performance, or higher cost. Going into the full pros and cons of SQL vs NoSQL is beyond the scope of this post, but rest assured there are reasons that both of them exist, and both are valid choices depending on circumstances.
NoSQL: Datastore or Bigtable
If you are leaning towards a NoSQL solution you now have two Google managed NoSQL choices on our platform. What is the differentiator between Cloud Datastore vs Cloud Bigtable? They are both NoSQL solutions. Both are described as massively scalable. Both leave you with little to no management (or No Ops for those of you playing buzzword bingo). The answer lies in four areas:
Bigtable is optimized for mind boggling huge sets of data. Seriously, it is most cost effective when dealing with datasets that start at 1 Terabyte. Datastore can handle large data sets too, but Datastore is performance and cost optimized to handle smaller sets of data too. Have a few GBs of data? Datastore would be the better call. Have data that might start out small, but grow to a Terabyte in time, still Datastore. Have data that starts at a Terabyte and will keep expanding? Then you’ve started down a path that might make Bigtable interesting. But size alone isn’t the only factor.
Bigtable stores data in a big honkin’ table. Yeah, the name is a little on the nose, but it’s true. There are rows and columns somewhat like relational database systems but not exactly. But it has a schema, and predefined structure.
Cloud Datastore, on the other hand, is more optimal for ad hoc storage of structured data representing objects. Basically you define an object and then push it into Datastore. You don’t define a schema, create tables, or set up any other sort of structure before storing a record.
Do you need to analyze the data in massive aggregate scale while the database is still online and taking requests? Do you want to run MapReduce on your production data without copying it somewhere for study? Do you want to hook it up to various Big Data analysis toolkits? If this sounds like what you want to do, Bigtable makes more sense.
If you are coming to Google Cloud Platform from other technologies, and are working with HBase, Bigtable is for you. Bigtable is accessible through extensions to the HBase 1.0 API and is therefore compatible with a lot of the Hadoop ecosystem as well as other Big Data tools.
On the other hand, there are also a few limitations. You cannot join. There is no SQL interface. The API gives you Put/Get/Delete individual records, or you can run Scan operations.
Datastore does not have SQL either, but has an API called GQL that while not exactly the same does abstract querying objects in a way that most SQL developers should be able to quickly understand.
Finally the product page has a great explanation of Bigtable’s relation to other Google Cloud Platform offerings:
Cloud Bigtable and other storage options
Cloud Bigtable is not a relational database; it does not support SQL queries or joins, nor does it support multi-row transactions. Also, it is not a good solution for small amounts of data (< 1 TB).
If you need full SQL support for an online transaction processing (OLTP) system, consider Google Cloud SQL.
If you need interactive querying in an online analytical processing (OLAP) system, consider Google BigQuery.
If you need to store immutable blobs larger than 10 MB, such as large images or movies, consider Google Cloud Storage.
If you need to store highly structured objects, or if you require support for ACID transactions and SQL-like queries, consider Cloud Datastore.
In short, there is a lot of awesome stuff about Cloud Bigtable, but it doesn’t mean that it is right in all cases. It’s a NoOps, NoSQL, Big Data analysis tool, meant to be used at massive scale in conjunction with other Big Data tools. I recommend that you check out the documentation for Bigtable as there is much more to be found there. And let me know if you need more clarification on anything.