Posts
820
Comments
681
Trackbacks
1
NoSQL is the new Black

No one really talks about alt.net anymore.  In my mind, that’s a good thing.  Though some people apparently had different goals for all of that (setting up a foundation or some other silly thing), for other people, it was about taking the things that a small subset of .NET developers were doing and making them more mainstream (whatever that means).  Though (again) it is hardly a definitive sign of anything, the fact that people like Jeremy and Ayende have been publishing articles in MSDN (and no one thinks this is odd) is a good thing.  That Asp.NET MVC has unit test support built-in is a good thing.  And so on and so forth.

Anyhoo, certain topics in certain circles tend to get discussed around the same time, which isn’t that surprising.  Alt.NET has always been an echo chamber (which is both good and bad) and so the fact that like-minded people who read each other’s blogs and follow similar paths hit the same topics/roadblocks/whatever isn’t surprising.  For instance, some 12-18 months ago, one couldn’t swing one’s dead cat without hitting a blog post about SOLID principles (much like a lot of Alt.NET, the posts were hit and miss…some people never made it past the “S” or “O”).  And to be clear, there’s nothing bad about that.

One current topic that gets a lot of buzz is NoSQL.  The advantages of having multiple people (even within an echo chamber) talking about a topic is that it increases that the really good blog posters will provide a lot of hands-on experience and information.  This is especially useful for people like myself who don’t have the bandwidth or the smarts or the gumption or all three to work through the experience, and instead want to learn from other people’s mistakes, as it is so much easier than making those mistakes myself.

digression:  a few years back, myself and others pointed out what seemed to be obvious flaws with ‘strict’ TDD.  Unless you are building a framework, strict TDD is bad.  Bad, bad, bad, bad, bad.  I mean, code without TDD is also bad, but TDD code is bad on many special levels.  People who got the TDD religion liked to talk about code coverage, and percentages of said code coverage, and other bad things.  Fast forward a bit, and you find a lot of good information on the Net about how strict TDD is bad, and that you should instead focus on the business scenarios you are trying to support in code, and testing along those scenarios.  Thank you, we knew that already.

So, after reading so much silly stuff about NoSQL, I was particularly glad to see that Ayende was creating a blog series about NoSQL.  He’s not always right, far from it, but as someone who understands the concept of JFHCI, I know that he will cover the topic from a sane, intelligent, practiced perspective.

Having said that, I immediately find myself in agreement with some comments from Frans Bouma from the initial post of Ayende’s series:

“About relational databases, that they don't scale: I don't really know what you've been using but that's utter nonsense….

I've been using relational databases for many many years, and I've yet to see a relational database that big that was too slow to keep up. Mind you: millions and millions of new rows per day isn't too much.

For the vast majority of the people using databases in their applications, relational databases are just fine, only for the very few who write the new amazon.com, or the google competitor might need different databases, but how many are those? a handful. “

Having worked on fairly heavily used eCom sites, I agree with this.  You can handle a *lot* of traffic using standard relational databases just fine.

digression: to be clear, from a programming perspective, Ayende can kick my ass.  It’s not like he’s going to say “Gosh, jdn, I’ve never considered that you could do that.”  He knows that.  I think it’s as much a rhetorical thing to say that relational databases can’t scale.  Also, I’ve met Ayende, and from a physical perspective, I’m pretty certain Ayende can kick my ass.  But I digress.

On the flip side to the pro-NoSQL silliness, there’s a more negative take on it from another post:

“…by replacing MySQL or Postgres with a different, new data store, you have traded a well-enumerated list of limitations and warts for a newer, poorly understood list of limitations and warts, and that is a huge business risk….The sooner your company admits this, the sooner you can get down to some real work.  Developing the app for Google-sized scale is a waste of your time, plus, there is no way you will get it right. Absolutely none. It's not that you're not smart enough, it's that you do not have the experience to know what problems you will see at scale.”

This ties into another blog post by karl from CodeBetter and gets to what I really want to talk about.  At one point, he says:

“A lot of developers don't feel that object-relational impedance mismatch is really a problem or even exists. That's only because, as the only solution, you've been dealing with it for so long - possibly your entire programming career (like me), that you don't think about it. You've been trained and desensitized to the problem, accepting it as part of programming the same way you've accepted if statements.”

Now, to be fair, he does preface that with ‘a lot of developers’ but I have a completely different take about the whole impedance thing.  My experience has been that the vast number of developers that I’ve worked with don’t understand relational theory, and that’s where the ‘impedance’ comes in.  Many developers seem to thing that if they’ve learned how to do joins correctly that they understand relational theory.  They don’t understand indexes or statistics or the other sorts of things that you need to understand to program against relational databases, and so when there’s a problem, it’s this grand ‘impedance’ thing.

Bullshit.  From this perspective, the drive seems to be to get rid of decades of experience of how relational databases work, but at what cost?  Reducing developer friction?  That’s nice, but only if it has greater value elsewhere.  When an entity changes shape over time, when working with a relational database, there’s a known series of steps of how to deal with that.  Shoving it all into a NoSQL alternative sounds good, but how does it work long-term from an operational perspective, for instance?

“…it seems pretty clear to me at an implementation level that document-oriented databases (as well as object oriented database, like db40, I'd assume) are relatively close to the OO model used in code, and as such provide the greatest value to programmers.”

Yeah, and so what?  The greatest value to programmers is supposed to trump what, the ability of an end user to run a report off of a relational database, an ability that almost any decent business end user knows how to do at a basic level?  I don’t see it.

To be clear, I think that NoSQL solutions have a place, and have a growing place.  But, in my mind, it has a place precisely in those situations where clearly understood discussions of the limitations of relational databases take place, not random ‘RDBMS can’t scale’ statements are made.

In particular, I am looking forward to Ayende’s post(s) (not originally scheduled, but mentioned in a comment) about how to use a NoSQL source with an OLTP destination, especially if he posts full code about it.  *That* will be awesome.

posted on Tuesday, March 30, 2010 10:19 PM
Comments
Gravatar
# re: NoSQL is the new Black
Jimmy Bogard
3/31/2010 7:27 AM
So I think it really, really really depends on what you're doing for RDBMS to "not scale". If data is heavily denormalized and queries take a bazillion joins, that tends to not scale. If you need aggregate or calculated information, that tends to not scale. But there are other solutions that allow you to stay in the RDBMS realm that help out there.
Gravatar
# re: NoSQL is the new Black
jdn
3/31/2010 7:31 PM


I totally agree with you.

I understand from a theoretical perspective the notion that RDBMS's can't scale, but I think that for most projects/applications, they do well enough.

For instance, someone at a current gig had a query that took longer than a minute to execute. Once I took a look at it, I was able to change it so that it executed in two seconds.

The response I got was that I was a 'SQL guru' but to be honest, it was TSQL 102 stuff (so not exactly 101 level, but nothing I would thing of as really magical). Ayende (obviously) doesn't fall into this category, but I can imagine the developer having an issue would think that this was proof that SQL Server can't scale.

Post Comment

Title *
Name *
Email
Url
Comment *  
Please add 4 and 5 and type the answer here: