The title of this post, even for me, is pretty obnoxious, but I also feel pretty strongly about this, so here goes.
Ayende posted about code that “should never hit production” in one of his usual code challenge blog postings. I know that some people complain when he does this, but a) it’s his blog, he can post whatever he wants to, you don’t have to read it, and b) even if these challenge posts don’t touch something you happen to be facing yourself at any point in time, they tend to be pretty interesting, and c) if like me, you don’t always get what the challenge is, it’s a “teaching moment” so you should take it as it is.
Anyway, in this post, he posted “what is wrong with this code” (I’m not going to reproduce, hit the link to see it) and the answer was “this code doesn’t have any paging available”. Furthermore, he stated “this is not optional.” From the title of this post, you can tell that I don’t agree with this.
The “solution” that he posted (again, hit the link to see it) basically made the method that returned the potential result set break if it got above a preset page size limit. Though it is slightly (though not really) unfair, my analogy is something like this:
Suppose you queried a database with the query “Select * from TableA”. Ayende’s solution is to “cripple” the select statement so that it only returns some pre-defined page limit. (say, the first 1000 rows) You can see the history of our debate about this here, where he cripples RavenDB similarly. Since RavenDB is his product, he can obviously do whatever he wants to do, and, to the extent that I can stretch my mind to figure out why he has done this, I kind of, sort of, in a semi-intelligent way, get why he has done it.
Let me further explain why I think this is hideously stupid (technical term).
Code should do what you tell it to do
If you are in Query Analyzer or Management Studio (as it relates to SQL Server), if you issue a “select * from TableA” command, it should do just that. The end. The issuer of the command should know what they are doing when issuing such a command, and know any negative ramifications in doing so, but if I want to return a million rows from a table that has a million rows in it, then a command to return a million rows should do just that. Crippling a select to return only a subset is simply idiotic.
Obviously, Ayende doesn’t think it is idiotic, and he explains his rationale (somewhat) through what he calls “safe by default” and which he explains in detail here as it relates to RavenDB.
As far as I can tell from our conversations, and how he has explained it elsewhere, the rationale is something like this: even smart developers make mistakes. A very typical mistake that even smart developers make is in writing code that produces an unbounded result set which ends up producing production failures. To prevent these mistakes, “safe by default” limits result sets.
Furthermore, as the owner/producer/grand poobah of RavenDB, he doesn’t want RavenDB to get a bad reputation of bad performance from producing unbounded result sets. Thus, “safe by default.” You can see this in his response to one of my comments:
Jdn,
I have seen too many systems where unbounded result sets brought the system to its knees.
Not on my watch
Don’t hinder competent developers due to what incompetent developers tend to do
For whatever reason, I have a reputation as someone who knows a bit about databases, especially (well, only) as it relates to SQL Server. It is definitely an unfortunate fact that the vast majority of developers that I have worked with don’t really get databases as much as maybe they should (this is one of the reasons why they tend to like NoSQL solutions so much), and so they tend to sometimes do things like issue database commands that return unbounded result sets on large result sets when maybe they shouldn’t. Concepts like “READ UNCOMMITTED” seem beyond them sometimes, even when they think that the discovery of the concept of a LEFT JOIN makes them think they are database gurus.
Ayende’s “safe by default” concept basically preaches that they shouldn’t learn database fundamentals. Let’s just “safe by default” their ignorance. Instead of teaching people who don’t know database fundamentals some, well, database fundamentals, let’s go ahead and cripple the software, unless the competent developers know what built-in hidden crippling limits are set.
Paging might not be an option if you are building blog software
If you are building some sort of simple software that has a UI that lists a list of, well, stuff, then paging is pretty important. If I want to see the last 50 comments posted to my blog, then I want to build a UI that only lists the last 50, and doesn’t do so by returning all 57,000 (or whatever) comments and then pages off of that. We all know applications that do that, and we all know that those applications that do that, well, suck. And the developers that build those applications should know better.
RavenDB, and all similar software, should be “Enterprise by Default” not “Safe By Default”
Perhaps I’ve completely misunderstood the potential of RavenDB, but, in my mind, it is a product/technology that is suitable to the enterprise, not just for blog software. As such, I think it should be something that you could use in developing, for instance, trading applications, where you might have a huge amount of orders that you need to query, for instance. You shouldn’t have to know “well, this software is designed to protect incompetent developers, so I have to do something different” in order to write code that does what you think it should do, because Papa Ayende crippled it.
Software for Children
Paging is an option. If you are building simplistic software with simplistic UI requirements, then you do need to take paging into consideration but at the application level. In any event, if you are pulling data from any sort of data source (database, flat files, etc.), you should probably spend a few minutes or so thinking about the amount of data you are pulling, and if it will affect your application.
Or you can use RavenDB and have production outages because no one knows why you aren’t getting all of the data you need to do your job, because Ayende says “Not on my watch.”