Posts
828
Comments
691
Trackbacks
1
March 2010 Blog Posts
NoSQL is the new Black

No one really talks about alt.net anymore.  In my mind, that’s a good thing.  Though some people apparently had different goals for all of that (setting up a foundation or some other silly thing), for other people, it was about taking the things that a small subset of .NET developers were doing and making them more mainstream (whatever that means).  Though (again) it is hardly a definitive sign of anything, the fact that people like Jeremy and Ayende have been publishing articles in MSDN (and no one thinks this is odd) is a good thing.  That Asp.NET MVC has unit test support built-in is a good thing.  And so on and so forth.

Anyhoo, certain topics in certain circles tend to get discussed around the same time, which isn’t that surprising.  Alt.NET has always been an echo chamber (which is both good and bad) and so the fact that like-minded people who read each other’s blogs and follow similar paths hit the same topics/roadblocks/whatever isn’t surprising.  For instance, some 12-18 months ago, one couldn’t swing one’s dead cat without hitting a blog post about SOLID principles (much like a lot of Alt.NET, the posts were hit and miss…some people never made it past the “S” or “O”).  And to be clear, there’s nothing bad about that.

One current topic that gets a lot of buzz is NoSQL.  The advantages of having multiple people (even within an echo chamber) talking about a topic is that it increases that the really good blog posters will provide a lot of hands-on experience and information.  This is especially useful for people like myself who don’t have the bandwidth or the smarts or the gumption or all three to work through the experience, and instead want to learn from other people’s mistakes, as it is so much easier than making those mistakes myself.

digression:  a few years back, myself and others pointed out what seemed to be obvious flaws with ‘strict’ TDD.  Unless you are building a framework, strict TDD is bad.  Bad, bad, bad, bad, bad.  I mean, code without TDD is also bad, but TDD code is bad on many special levels.  People who got the TDD religion liked to talk about code coverage, and percentages of said code coverage, and other bad things.  Fast forward a bit, and you find a lot of good information on the Net about how strict TDD is bad, and that you should instead focus on the business scenarios you are trying to support in code, and testing along those scenarios.  Thank you, we knew that already.

So, after reading so much silly stuff about NoSQL, I was particularly glad to see that Ayende was creating a blog series about NoSQL.  He’s not always right, far from it, but as someone who understands the concept of JFHCI, I know that he will cover the topic from a sane, intelligent, practiced perspective.

Having said that, I immediately find myself in agreement with some comments from Frans Bouma from the initial post of Ayende’s series:

“About relational databases, that they don't scale: I don't really know what you've been using but that's utter nonsense….

I've been using relational databases for many many years, and I've yet to see a relational database that big that was too slow to keep up. Mind you: millions and millions of new rows per day isn't too much.

For the vast majority of the people using databases in their applications, relational databases are just fine, only for the very few who write the new amazon.com, or the google competitor might need different databases, but how many are those? a handful. “

Having worked on fairly heavily used eCom sites, I agree with this.  You can handle a *lot* of traffic using standard relational databases just fine.

digression: to be clear, from a programming perspective, Ayende can kick my ass.  It’s not like he’s going to say “Gosh, jdn, I’ve never considered that you could do that.”  He knows that.  I think it’s as much a rhetorical thing to say that relational databases can’t scale.  Also, I’ve met Ayende, and from a physical perspective, I’m pretty certain Ayende can kick my ass.  But I digress.

On the flip side to the pro-NoSQL silliness, there’s a more negative take on it from another post:

“…by replacing MySQL or Postgres with a different, new data store, you have traded a well-enumerated list of limitations and warts for a newer, poorly understood list of limitations and warts, and that is a huge business risk….The sooner your company admits this, the sooner you can get down to some real work.  Developing the app for Google-sized scale is a waste of your time, plus, there is no way you will get it right. Absolutely none. It's not that you're not smart enough, it's that you do not have the experience to know what problems you will see at scale.”

This ties into another blog post by karl from CodeBetter and gets to what I really want to talk about.  At one point, he says:

“A lot of developers don't feel that object-relational impedance mismatch is really a problem or even exists. That's only because, as the only solution, you've been dealing with it for so long - possibly your entire programming career (like me), that you don't think about it. You've been trained and desensitized to the problem, accepting it as part of programming the same way you've accepted if statements.”

Now, to be fair, he does preface that with ‘a lot of developers’ but I have a completely different take about the whole impedance thing.  My experience has been that the vast number of developers that I’ve worked with don’t understand relational theory, and that’s where the ‘impedance’ comes in.  Many developers seem to thing that if they’ve learned how to do joins correctly that they understand relational theory.  They don’t understand indexes or statistics or the other sorts of things that you need to understand to program against relational databases, and so when there’s a problem, it’s this grand ‘impedance’ thing.

Bullshit.  From this perspective, the drive seems to be to get rid of decades of experience of how relational databases work, but at what cost?  Reducing developer friction?  That’s nice, but only if it has greater value elsewhere.  When an entity changes shape over time, when working with a relational database, there’s a known series of steps of how to deal with that.  Shoving it all into a NoSQL alternative sounds good, but how does it work long-term from an operational perspective, for instance?

“…it seems pretty clear to me at an implementation level that document-oriented databases (as well as object oriented database, like db40, I'd assume) are relatively close to the OO model used in code, and as such provide the greatest value to programmers.”

Yeah, and so what?  The greatest value to programmers is supposed to trump what, the ability of an end user to run a report off of a relational database, an ability that almost any decent business end user knows how to do at a basic level?  I don’t see it.

To be clear, I think that NoSQL solutions have a place, and have a growing place.  But, in my mind, it has a place precisely in those situations where clearly understood discussions of the limitations of relational databases take place, not random ‘RDBMS can’t scale’ statements are made.

In particular, I am looking forward to Ayende’s post(s) (not originally scheduled, but mentioned in a comment) about how to use a NoSQL source with an OLTP destination, especially if he posts full code about it.  *That* will be awesome.

posted @ Tuesday, March 30, 2010 10:19 PM | Feedback (2)
Loquat – Swingset Chain (live)

A really beautiful song I’ve mentioned before about the loss of friendship and other stuff (like alcoholism if you listen to the entire lyric…LOL).  It’s also misnamed.

You’re a dandelion seed
That files through the air
And lands randomly
Then disappears

Check it out.

posted @ Thursday, March 25, 2010 5:56 PM | Feedback (0)
Windows XP Mode No Longer Requires Hardware Virtualization

For those of you running Windows 7 but without hardware virtualization support (either wrong CPU or no option to enable in BIOS), you can now run Windows XP mode, as they’ve removed that requirement.

Checkout the download here.  I was prompted to download three files, the main 400+ MB file and then two updates that needed to be run.  It was a little confusing because there was no indication that the hardware requirement was removed (and other places on the site still say that it is required), but I’ve been able to set it up successfully and run it.

posted @ Thursday, March 18, 2010 4:20 PM | Feedback (1)
cqrs for dummies – 2.1 of n – command layer notifications

And so of course, I managed to leave out one important consideration.

You are on the UI and you, e.g., click a button that creates a command.  What happens now?

I didn’t really talk about either the command bus or the command handler, because to a certain extent they are implementation details.  You could actually have, e.g. NServiceBus and use a bus, or you could connect through regular method calls, or you could do a host of different things.  When it comes to the command handler, you could have some generic handler class or, better I think, have a one-to-one pair between a command and a handler (quick note below).

More important is the question, what do you do if the command fails?  With CQRS, even more important to answer is the question, what do you do if the command succeeds?

Command Notification

One set of validation failures can be handled at the client before a command is sent, and the ways of accomplishing this are as varied as they are when CQRS isn’t involved (in fact, CQRS doesn’t change a thing here).  Field length validations, regex validations, required field validations, etc. are all things that can be handled by javascript, view model annotations, or a host of other solutions.  I won’t go into any more detail here, other than that you would notify the UI when this occurred.

Another set of notification failures are ones that happen due to technical problems with the command bus and/or command handler.  These are almost all implementation specific, and so I don’t really have a lot to say here that would be useful (again, you would notify the UI per usual here).

The next set of notification failures are due to a command passing all local validation and then being passed through the bus/handler and into the domain, and then failing due to whatever reason (could be technical problems, could be failing domain-side validation, doesn’t matter).  How do you handle this?

It depends a little bit on whether you handle your commands in an async manner or not.  Normally, I wouldn’t use async methods here, and instead either make your command passing methods return true or false, or throw a specific exception on why the command failed to be processed inside the domain (this is in my mind preferable, so that you know what to due in various situations).

digression: in case there is any confusion, when I talk about a command failing to be processed in the domain, this shouldn’t be thought of as an ‘exceptional’ in the sense of being rare.  I know there is a lot of ‘religious’ debate around exceptions, and in other contexts, I would argue that throwing an exception for a scenario that isn’t rare is arguably bad design.  In this situation, if a command fails to be processed inside the domain, I want to know why.  Did it fail because of some specific validation rule within the domain?  If so, which one?  If not, was it a technical problem?  Methods that return true/false don’t give you this level of detail.  I suppose that one could return a ISpecificCommandSentToDomainMethodResponse message back to the UI.  This might actually be a better option.  But I digress.

The really interesting case is when you send a command that succeeds, in the scenario where you are using Eventual Consistency.

Keep in mind that you can have a separate Event Store and a separate Query Store, and have all communications between them take place with full transactional support.  Though my guess is most implementations of CQRS in the wild will have some amount of Eventual Consistency built in, it isn’t, strictly speaking, required.

But what if you are using Eventual Consistency?  This raises interesting scenarios.  I’m the customer updating my Address through the UI and click the update button.  If I’m using Eventual Consistency, the UI screen will return after the command is sent and doesn’t fail (since we don’t know how long it will take till the query store gets updated, we don’t want to block once the command makes it through the domain).  What do we show the user?

There are a couple of options, and a lot of it depends on context and what the end user expects.  An obvious option for commands that might reasonably be expected to take more than a few milliseconds is to simply tell the user that the request has been submitted, and to check back later.  Though I don’t have any personal experience in the area, I’ve heard anecdotal reports from many people who use online banking and expect this sort of behavior. 

In other cases, you can, well,….’lie’ to your user.  The end user sent the information that they wanted to update, so it is available on the client (cached or in session or whatever), so you can redraw your UI to include this info.  By the time the end user gets into a scenario where they actually have to re-query the Query Store, the info will be there by then.

Mark and Udi have suggested techniques such as these, and while I admire the cleverness of them, I don’t like them.  If an end user expects some end result and is shown a ‘fake’ and then, for whatever reason, that end result doesn’t end up in the query store….that seems bad to me.  I suppose, like everything, it depends on context.  If one is using an internal system and it happens once in a blue moon, and there is a well understood set of actions an end user can take, maybe that’s all right.

But, in general, I lean toward the ‘you sent a command, check back later’ model.  It’s honest, and publicly understood.

Command Handlers

Why would it be better to have a specific command handler per command, especially if command handlers are implementation specific anyway?

Potential scalability.  If I have a separate piece of code, say, CommandXHandler, to handle all instances of CommandX, then, there is a potential there that I can take that piece of code (CommandXHandler) and scale it separately if needed.  For instance, suppose I use MSMQ and so my command bus is MSMQ and different handlers are different queues.  Suppose it turns out that a certain type of CommandX is more prevalent than others.  If needed, I could then dedicate separate hardware to handle these commands apart from the rest.

posted @ Monday, March 15, 2010 11:38 PM | Feedback (2)
cqrs for dummies – 2 of n – the command layer
Series link here.

Update: added an 'addendum' here

As a reminder of what I’m talking about, here’s the picture from Mark:

DDDDivision_big

What I’m going to be talking about is section #2 from Mark’s picture, and in particular, I’m going to be going over a number of concepts including:

  • The differences and advantages of using Commands over DTOs
  • Validation of commands, and how/when this can occur outside of the domain
  • Why domain objects shouldn’t have getters and setters and how commands help here
  • Why domain objects should never be invalid and how commands help here
  • UI Implications of using Commands, bye-bye Excel screens!
  • The command store – the attentive reader will notice that in section #3 of Mark’s picture, there is something called the Event Store.  An improvement, IMO, is to add something called the Command Store, which would have lines coming from the Command Bus to it.  At some point, I will produce my own picture/diagram, but for now, just mentally put it in.

A lot of what I’m going to be talking about applies to what I’ve called elsewhere the ‘strict’ version of cqrs, that is, when you simply separate your read and write services, using queries on the reads and commands on the writes.  Let’s get to it.

Why commands are better than DTOs

That’s a blanket statement, and so, of course, that means that someone can come up with scenarios where it isn’t true.  But I’m more and more convinced that those scenarios will be fewer than scenarios where the statement is true.  Let’s consider why this might be the case.

We are on a UI page that allows the user to update something.  Let’s suppose it is a Customer Address.  A typical way of doing this without cqrs is to create something like a Customer DTO with the updated Address information and pass it into the domain, using some method like CustomerDomainObject.UpdateCustomer(CustomerDTO dto).  This works, BTW.

Imagine for a moment that it doesn’t work, and you are trying to figure out why.  If it doesn’t work in the sense of throwing an exception, you at least have some idea of where in your code that it failed.  That’s good.

Suppose that the reason why it doesn’t work is because some other user has already updated the Customer’s credit rating, and so there is a concurrency conflict (if you are using DTOs, you need to be doing this).  Depending on how sophisticated your auditing processes are, you may be able to trace down that this is, in fact, what happened.  More likely than not, it won’t be easy, but let’s suppose you can.  That’s good.

But, let’s step back and think about this basic scenario.  One user was trying to update a Customer Address.  The other user was trying to update the same Customer’s credit rating.  What business concurrency conflict occurred here?  None (leaving aside the possibility that changing a Customer’s Address affected their credit rating).  There is no business reason why there should have been a conflict here.

So, why was there a conflict?  A generic DTO, passed into a generic UpdateCustomer(CustomerDTO dto) method has no understanding of context.  When done right, DTOs use a timestamp value to prevent concurrency conflicts, but here you have two different and non-conflicting business requests that are failed due to a technical restriction.  There is no reason why both of these requests couldn’t have been honored otherwise.

Commands can help here.  Instead of sending a generic DTO to a generic UpdateCustomer method, send a Command that specifies exactly what it is trying to update.  Send an UpdateCustomerAddress command to your domain, along with a UpdateCustomerCreditRating command to your domain.  Since they are trying to do two separate things, you don’t have to fail one just due to concurrency issues.

More importantly, don’t send rather generic UpdateCustomerAddress and UpdateCustomerCreditRating commands.  Instead, send more specific CustomerMovedChangeAddress and ReduceCustomerCreditRatingDueToLatePayment commands (or whatever) to your domain.  Specify *exactly* what you are trying to achieve.

Why?  Because then you can isolate whether those commands should succeed or fail due to considerations that relate specifically to the context in question.

Now, if you’ve used DTOs before, you might be appalled at the implications of this.  Having a single DTO seems to be a good use of code reuse.  Using commands instead of DTOs seems to mean that you end up writing more code.  The short answer is, yes, you do have to write more code.

But keep in mind how your code is used, past the point of it being developed, and when your operations team is dealing with it.  Which is easier, figuring out why some generic UpdateCustomer(CustomerDTO dto) method failed, or why some ReduceCustomerCreditRatingDueToLatePayment command failed?

More on this below.

There’s validation and then there’s validation

When determining whether a command is valid or not, there are different levels of validation that you need to consider.  A lot of validation can be done at the UI layer before a command ever makes it to your domain.  At the command level, you can handle the sort of validation of user inputs that any typical application needs.  Is this string level of the right length, and of the right pattern, blah blah blah.  The command handler of each command can handle these sorts of validations, and provide quick response to the UI of any issues, before you ever hit the domain.

Validation against business rules and context happens at your domain level, which is section #3 of Mark’s picture.  Think of the difference between a validation rule that a customer address must be less than 150 characters in length versus a validation rule that says you cannot degrade a customer’s credit rating, even due to late payment, if they process more than $100k in business a month.

Domain objects shouldn’t have getters or setters

For the longest time, I heard about this, and didn’t get it.  I came from a background of using DTOs and managing what I considered to be domain objects, and thought this idea was mystifying, even insane.  How could I possibly manage something as central as a Customer domain object, if I couldn’t use getters and setters?

The use of a query layer explains how to rid domain objects of getters, since you don’t hit your domain objects at all.  You hit your query layer, which hits your reporting store, which returns screen/task specific objects that gives you all of the information you need.   Your domain doesn’t play any role in it.

But what about setters?  How could you update a Customer object to have the correct Address for shipping product if you didn’t allow setters?  Crazy.

But once you start to think about the difference between using Commands and DTOs, it starts to make sense.  A command that says CustomerMovedUpdateAddress gets passed (eventually) into the domain, which then processes that command internally.  You don’t need public get/set on Address1, get/set City, get/set Country, etc. on each property of your domain object.  You just need a public method that can handle the command that is telling you the set of changes that you want to process.  Which leads to….

Domain objects should never be invalid

Suppose you passed in a CustomerMovedUpdateAddress command that had an inconsistent City and State combination.  If you handle this sort of validation at the command level, this would never occur, but let’s suppose it did.  Your domain object should have the logic internally to accept or reject all of those change as a batch. 

If you allow changes to the internal state of your domain object on a property by property basis, then you have a heck of a lot of validation logic you have to write to ensure that your domain object is never in an invalid state.  If you encapsulate all of those changes in a command object, you can accept or reject those changes as a whole, ensuring that your domain object is never put into an invalid state.  Whether it is changing the Address or the CreditRating, or whatever, the use of commands allows you to accept or reject a set of changes as a batch, and thus your domain object is never put into an invalid state. 

No more Excel-like screens

A lot of Microsoft demos show you a big grid of values where you can select a row and make some changes and then click an update button.  This works, BTW, but has a lot of problems, which I’ve been trying to describe here.  Implicitly or explicitly, they work under the covers like the common DTO pattern I’ve described.  Batch a whole set of changes without context, pass them to where ever they need to be processed, and hope you don’t get a concurrency error.

There’s no easy solution here, except to try and make your UI better, more attuned to the specific tasks you want to achieve.  This is really hard.  The Excel-like grid screen is common place, easy to produce, and hard to get away from.   But it is important to do so if you want to produce a scalable application that works across multiple users.

The Command Store

Few discussions of cqrs talk about having a separate command store, but I think it is important.

As I’ve talked about in other places, code exists long after it is written, and most importantly, it exists in situations where it has to be maintained.  One of the reasons why I think it is important to choose commands over DTOs is that you have a record of specific actions taking place.  An UpdateCustomer(CustomerDTO dto) method tells you nothing, but a method that tells you that you sent a ReduceCustomerCreditRatingDueToLatePayment command tells you something.

If you have a store that saves every command that is sent into the system, you have a store that gives you an automatic audit trail of what the users have tried to do.  Constructed properly, you have a trail of everything that has attempted to affect you system.

With this record, you can troubleshoot your production system.  With this record, in theory, you can recreate the record against your UAT system, or against a new DEV system.

Anyone who has worked in an Operations environment should be able to think about ways in which this works really well.

Next steps

A lot of the fun stuff comes in discussing section #3 of Mark’s picture, which I hope to talk about soon.

posted @ Sunday, March 14, 2010 10:34 PM | Feedback (3)
cqrs for dummies – an interlude – is cqrs the ‘shiny new thing’?

The next substantive post on the command layer should be up this week, but thought I would comment on this.

For various reasons, a question has arisen of whether cqrs is some shiny new thing.  define what that means however you wish.

The short answer is, yes, but no.

As Papa Greg has mentioned/stated before, cqrs is ‘only’ separating your codebase in such a way that commands and queries use different services (or classes if like me you are allergic to services).  Which is absolutely correct.  Typically, however, mention of cqrs tends to involve a lot more than that.  Having a separate event store from the reporting store, maybe using Event Sourcing, maybe involving Eventual Consistency, yada yada yada.

There is no doubt (“Don’t speak, I don’t want to hear it…..”) that when an even remotely possibly good idea pops up, it can be misapplied or abused.  Since cqrs, either in its strict (“it ain’t a f^&king application design”) form or in its relaxed (“yeah, fine, but it can lead to kickass application design”) form, is a remotely possibly good idea, it can be misapplied or abused.  The whole “I have a hammer” yada yada yada issue is almost inevitable. 

But, that’s okay.  From a statistical perspective, e.g., TDD is probably rarely practiced, and TDD really sucks, so it is misapplied and abused as well.  The advantage, if you can call it that, is that ‘leading’ developers will try TDD way before most other people, and find out the ways that it can be misapplied and abused, and they will probably blog about it, making it easier for everyone else to get along. 

Same goes with cqrs.  Because of its inherent intuitive plausibility, people will try to implement it.  And, like most development efforts, most implementation efforts will have problems.  But, THAT’S OKAY.  Sorry for yelling, but it is okay.

If you are someone interested in cqrs in either its strict or relaxed form, keep in mind that it is no different from DI or IOC or blah blah blah.  There are places where it works and makes sense and other places…well, maybe not. 

I lean towards the cqrs junkie side if only because the relaxed form of it suggests doing things in code that I’d already been doing anyway, but even so, I am aware that it is no more a panacea than CORBA , whatever. 

You still have to be smart about it.

posted @ Tuesday, March 09, 2010 7:20 PM | Feedback (0)