As I’ve mentioned before, one of the great things (at least in terms of how it relates to software development) about the Internet in general, and the blogosphere in particular, is that it offers a tremendous opportunity for someone to ‘fast-forward’ their skills if they know where to look (being vaguely intelligent and able to read quickly also helps).
This definitely applies when it comes to NoSQL. Beyond the mindless advocacy of some folks that think NoSQL applies everywhere, and before I have to create a system that needs to deal with the scalability issues of Amazon, I want to know how it applies to, well, the sorts of systems that I’ve dealt with, where RDBMS has played a central and crucial role, and quite nicely.
digression….Well, except for all of the ways in which RDBMS’ suck. I think the whole object/relational impedance mismatch thing is largely a load of crap (until, of course, I have to deal with developers who don’t understand basic concepts like indexes and can’t figure out why their query takes two minutes to return, but I digress), but there is a lot about traditional relational databases that is time-consuming and annoying. And don’t get me started about programming in .NET with ‘raw’ ADO.NET. But I digress (again)).
How, for instance, would you handle a typical order processing system using NoSQL instead of SQL Server? When it comes to ‘just’ storing data, I get that (I think), and I get why it scales to the heavens and what not. But what about the day to day things that a typical DBA (or non-retarded developer) has to deal with? How do you do those things?
And maybe it’s because I’m older or have grown even more lazy/stupid than ever, or maybe it’s because the NHL playoffs are on-going, but I also don’t want to have to learn it all ‘from scratch’ right now. I don’t have the time. I need my sleep. Reading about other people’s experiences doesn’t replace actually having gone through them, but it helps (Even better, I need sample applications, but I digress. Again.). I’m spiking out code to make sure that I am familiar with the APIs involved with various implementations, but that only goes so far.
So one of the things I’ve been looking for are discussions of NoSQL that get to the details. What isn’t good about it? What are the problems you are going to face? Since I’m going to be building some systems using it, what exactly am I getting myself into? Because of my Philosophy background (Ph.D, University of Miami, “Hi Jeremy!”), I can do the whole theoretical debate thing till the cows come home (where were the cows that they needed to come home?), but what I really want to know is how it’s going to bite me tomorrow (and not in a good way).
Here are some of the things I’ve found:
The Dark Side of NoSQL - I really like this article, because it asks hard questions, and gives a nice description of how I sometimes feel when reading NoSQL advocacy:
“There is a dark side to most of the current NoSQL databases. People rarely talk about it. They talk about performance, about how easy schemaless databases are to use. About nice APIs. They are mostly developers and not operation and system administrators. No-one asks those. But it’s there where rubber hits the road.”
It then goes on to talk about some of the issues with NoSQL implementations: ad hoc data fixing, ad hoc reporting, and data export. There are so many different NoSQL ‘platforms’ (for lack of a better term) out there that some of these are undoubtedly more or less problematic, but it is a theme that I’ve read in a couple of places now. As someone who has spent a lot of time doing either ad hoc data fixing or ad hoc reporting (exporting is usually something I deal with in larger ETL type projects), especially in an operational role, the idea that you can’t easily do some equivalent of “join this table to that table to that table with this group by order by blah blah blah” is worrisome (although, I realize, part of the point). Depending on the client and the situation, this is something you typically need to do *all the time*. Users expect it. I know that in the ‘typical’ (there is no such thing, of course) situation, you might have your NoSQL event store and your SQL query store (to use CQRS terminology), but what about when you need to find out why the transform from the event store to the query store didn’t do what you expected? In production? And you need to fix it now?
NoSQL, meh – written by someone with prior experience with object-oriented databases, who offers a cautionary tale. And which includes the following funny comic strip that others have linked to:
![fault-tolerance[1] fault-tolerance[1]](/images/blogcoward_com/WindowsLiveWriter/NoSQLlinksofinterest_10C52/fault-tolerance%5B1%5D_3.png)
Apparently, ad-hoc querying is more difficult than I thought.
Anyway, it raises a point that deserves elaboration. As a SQL ‘snob’ (see previous reference in this post about developers who don’t know anything about indexes), I know that most people really kind of suck at dealing with SQL. Well, except for the fact that even your generic BA these days knows enough about T-SQL (or whatever your flavor is) to know how to get the data that they need to do their jobs, without having to wait on a DBA to do it for them. Sure, they might create a Cartesian Product every once in a while, but what the hell, there are tradeoffs everywhere in life and in business and in software development. SQL is well known enough for many people now.
A NoSQL implementation that prevents this sort of (really, really) ad hoc (and probably inefficient, but who cares) querying would really affect a lot of people.
NoSQL: If Only It Was That Easy – this post lists out a lot of the more popular NoSQL technologies, and how (the blogger in his estimation thinks) they scale, in comparison with RDBMS (well, in comparison with MySQL, which is a toy… ZING!!!!!>). What I really like about it is that it is based on real-world research done with an eye towards producing a real-world solution for a real-world project.
NoSQL Déjà Vu – the blogger talks about his previous experience with working with object databases, and how he thinks it relates to the current NoSQL movement. A lot of what he talks about relates to ‘political’ issues, which are things as a developer you don’t really want to think about, but eventually have to. I’ve never had to deal with a dick DBA (other than myself, obviously), and so don’t normally think about those sorts of issues when thinking about how to implement technical solutions, but they are important. I like this point:
“The geeky programmer in me (that loved working on that CRM project) is rooting for NoSQL databases. The recovering DBA in me cringes at the thought of battling data corruption with inferior, unfamiliar tools.”
I feel the same way.
In the end, there are some things that become clear.
NoSQL is proven technology, especially when it comes to the ‘insanely scalable’ systems of companies like Amazon and Google and Facebook and….etc. etc. etc. Theoretical arguments are fine and dandy, but the stuff works.
When it comes to ad hoc querying and tooling support, NoSQL (in general) seems to lag behind what you can do with any common RDBMS. In certain instances, this is a show-stopper, but as time moves on, I think this will be rectified. Why? Because the need for ad hoc querying and tooling support is a business requirement, not just a ‘nice to have’, and so the market will produce what is needed.
I just hope it happens sooner rather than later, especially since I’d rather not have to deal with the growing pains myself. I expect that in the meantime, I’ll have to suffer through those pains. And it’s just weird to get rid of things like lookup tables and normalization and the like.
Be sure to read the comments on all of these posts, as there is great information there as well.