Posts
832
Comments
691
Trackbacks
1
December 2009 Blog Posts
cqrs for dummies – 0 of N - Overview

disclaimer: if you’ve ever read a ‘for dummies’ book, you know that they are designed to give a pretty decent overview of a topic from experts.  Given that, this series is probably mis-named.  It should be something more like “cqrs from an idiot” or something, but that’s bad marketing.  Anyway, I’ll be pointing out the people who really know their stuff, with links and whatnot, but I hope to be able to give an explanation of CQRS at a high level that will be of use.

update: adding missing overview paragraph at the end, and uploaded larger pictures.

Goals for the series

Anyone who has looked at my DDD-Lite series might wonder about the relationship between CQRS and DDD.  Glad you wondered.  Put crudely, DDD fits inside of CQRS.  That is, the latter is an architectural style that uses a Domain Model, but goes beyond it.  You can do DDD without doing CQRS.  It is technically possible to do CQRS without DDD, but they fit together well.

What I want to try to do is to layout what I take to be the basic tenets of CQRS in a way that might be easier to understand in some circumstances.  A full-blown implementation of CQRS involves a lot of concepts that have a lot of implications, some of which, in my opinion, are not logically required.  The concepts that I want to focus on are ones that I think that any experienced developer/architect should be able to understand, and which can provide a very robust and maintainable way of developing software.

“Okay, last week it was DDD, now it is CQRS, what’s next, XYZ?”  Possibly, yes.  I think that CQRS is proven by the main practioners who have used it in high-demand, highly scalable environments (think real-time stock trading), and as such, is usable in many different situations.  Are you going to want to implement it for every possible software application?  No, of course not.  But, if you have an application that is similar to, say, your everyday eCommerce application in terms of complexity, I think it is something to look at.  It preaches techniques and practices that are pretty widely applicable.

And, as for whether there will be some XYZ that further expands on CQRS….sure, it’s possible.  It might even be desirable.  When done right, real world experience leads to refinements of accepted practices.  It’s called learning.

What is CQS?

CQS is the acronym for  “Command Query Separation” and was presented by Bertrand Meyer (via Wikipedia) as follows:

“every method should either be a command that performs an action, or a query that returns data to the caller, but not both. In other words, asking a question should not change the answer. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects.”

In my own words, a query should ask a system about its state without affecting it, whereas a command affects the state of a system.  Moreover, these things should be separate in software.  That is, a query should return something (more on what this something is in a bit) that details the state of the system you care about, while a command should be a void method call (or maybe returns an int or bool to indicate the result of issuing the command) that changes the state of the system.

Trivial examples run the risk of being, well, trivial, but think of a basic eCommerce system.  If I want to display a list of pending orders, I want to issue a query that gives me the list of pending orders, without changing the state of any of them.  If I want to change the state of an order from ‘Pending’ to ‘Fulfilled’, I want to issue a command that does just that.

If you’ve ever worked with a system that mixes queries and commands, you know how difficult that system is to work with.  Does this method call change things or not?  How do I know?  Is there some global variable that I need to track?

What is CQRS?

CQRS is the acronym for “Command Query Responsibility Segregation”, which I believe was coined by Greg Young, in this post.   In any event, the important point is this:

“Meyer:
   Separate command methods that change state from query methods that read state.

Current:
   Separate command messages that change state from query messages that read state.”

On the face of it, it doesn’t seem like that big of a deal, but the architectural implications are pretty important.  This will become clearer shortly.

CQRS experts

The people who I’ve learned about CQRS from are among the following:

Greg Young: you can learn a lot by reading his posts here.  He doesn’t post much these days, but he did coin the acronym.

Udi Dahan: in particular, he talks about his current thinking about CQRS here.

Mark Nijhof: he has a reference implementation available here and gives a good overview here.  As you will see, I’m using a couple of his images, right about…….now.

Picture One from Mark

With permission, here’s an image of a potential CQRS implementation:

DDDOverview_big

With permission, here’s the same image split into sections:

DDDDivision_big

The four sections that Mark separates are these:

  1. Queries
  2. Commands
  3. Internal Events
  4. External Events

I will comment on all four sections, but just to highlight section #1, think of the following:

Suppose you need to display to a user the order information that is tied to a particular customer, a typical requirement for some screen (windows or web).

If you are using a Domain Model, you might have a Customer object.  Tied to that Customer object is the list of the orders associated with it.  A typical way of handling this in software is to load the Customer object from a database, and then load the associated orders.  Then, you translate that Customer object to a DTO that contains all of the relevant data that is needed in the order information screen, and that DTO is passed to your UI.

This is pretty typical.  If you code the way I do, you have some sort of mapper class that maps the domain object to the DTO.  And this does work.

But….why create the domain object and then map it to a DTO?  Why not just create the DTO directly?  The DTO won’t need all of the possible data that the domain object contains, so that is wasted communication with your data source.  Why spend time mapping at all?

Overview of the 4 sections

  1. Queries: as noted above, queries produce DTOs that come directly from a facade layer that doesn’t interact with the Domain Model at all.  The thin data layer noted in the picture could very well be SQL views that directly map to your ViewModel.  These then can easily bind to your UI.   So, imagine that when a screen is loaded, it sends a GetCustomerDetailQuery object with the ID of the customer to the data store, and a simple DTO is returned.
  2. Commands: these are produced by interacting with the UI, perhaps editing the customer information on a screen and pressing the update button.  This will produce something like a ChangeCustomerAddressCommand object which is then sent into the system to be handled (validated and then sent into the Domain, or rejected, or, etc. etc. etc.).  Mark’s pictures denote that a bus is specifically used, but it doesn’t have to be done this way (though you probably should).
  3. Internal Events: here is where the commands interact with the Domain Model, and where the concepts of a repository and an Event Store come into play.  In my mind, understanding the Event Store and how it works is the most complicated part of CQRS.  A couple of things to note: the Event Store is write-only and it doesn’t have to be a database (in fact, if I understand it correctly, it usually isn’t).  It could be a database though, but we’ll cover this in detail later.  The notion of a Compensating Action will also come into play.
  4. External Events: once the Domain has been updated by the internal events, these events are published so that any subscriber can be made aware of them.  These could be external vendors, but in the diagram, you see that the read/write database that is the source for the Query objects is also a subscriber.  This can be confusing to understand properly.  Doesn’t this mean that your database can be out of sync with what is happening in your Domain?  Yes, it can, but as you will see, this isn’t really a problem, because it already happens in your systems today.  As importantly, you will find that architecting a system this way greatly enhances scalability, and the notion of Eventual Consistency will come into play.

What’s to come

What I want to do next is to discuss the four sections in more detail, and how it might affect the architecture of a typical software application.

posted @ Thursday, December 31, 2009 8:07 PM | Feedback (2)
Kanban and Scrum Together

There's a very good eBook by Henrik Kniberg and Mattias Skarin entitled Kanban and Scrum - making the most of both that, well, talks about Kanban and Scrum and how to make the most of both.

You can get it from here, though unfortunately, to get it, you have to register.  Hate that.  You can also pay to buy a physical copy of it.

It's....interesting to read the forewards by Mary Poppendieck and David Anderson and notice the stylistic differences.  Also, note that David's foreward is from before his (in my opinion, proper) decision to remove the word 'waste' from his presentations.

posted @ Thursday, December 31, 2009 10:22 AM | Feedback (0)
There’s nothing wrong with ORM

Rob Conery, apparently bored again, decided to post a rant/whine/something about ORMs, apparently because of something Ayende posted.  Apparently.  It looks like this is a topic that he’s been thinking about for a while, so maybe Ayende’s semi-flamebait post was just the trigger.  I don’t know.  Anyway, Rob posted an initial commentary, and then, because he’s a wuss sensitive soul, he deleted it, and posted an edited commentary.  I have the initial commentary since my RSS reader grabbed it, and I’m not sure exactly why he deleted it, as opposed to just posting a follow up.  I didn’t think it was that obnoxious, but that could be because I use myself as a reference point for what is obnoxious, and, like at least 87% of humanity, Rob is less obnoxious as I am.  I just thought it was, well, kind of dumb.

Anyway, out of laziness and to avoid that evil paraphrasing thing, a few quotations from el Robbo, and my comments:

“Developers are so fixated on data persistence that it’s utterly mind-numbing. I’ll go even further – this fixation on the basics is rusting and corroding the whole .NET coding existence.”

Note the embedded link, it leads to this gem:

“I think, in general, the .NET crowd overthinks and over-engineers just about everything”

It’s hard to know exactly what to say about this, but that’s never stopped me before.  One of the things I prefer about .NET is that it is, in general, depending on what you are looking at, a hell of a lot less over-engineered than, say, Java.  I know there are a lot of lightweight frameworks around now, but seriously, there’s some major suckage there.

Also, the reason why developers think about data persistence is because, oh, I don’t know, maybe because it is often a major business and/or functional requirement.  The non-tech types understand the idea that data sits in tables and you can get to it, read it, and (maybe) do stuff to it.  ORMs exist so that developers can get the data in tables the easiest way possible, without entirely screwing up their day.

Rob knows this, of course, so obviously there is something else on his mind.  Maybe this is it (emphasis in original):

right now there really aren’t any terribly viable solutions that will offer developers the level of familiarity and security that a relational system will – because we haven’t asked for one.”

This, of course, is just nutty.  We do, via ASP.NET MVC, have evidence that there are at least some people at Microsoft who care about what developers think, but in general, software vendors care a lot less about what developers want, and a lot more about what customers want (devs can be customers, of course, but think Venn Diagram and small intersection).  If customers were asking for non-relational systems, Oracle and Microsoft would build them (more on this in a second).

A bit more:

“Using Object storage mechanisms such as a document database or Object database is a great alternative to the ever-present impedance mismatch issue but it clearly needs some proof and testing.”

I’m going to go beyond Rob here, and just state something in the clear:  the ‘ever-present impedance mismatch issue’ is basically a lot of shit.  Is a relational model different from an object model?  Yep.  If you are using an object model and a relational model, will you need to figure out a way to map between the two?  Yep.  Will dumb people f%^k this up?  Yep.  Will smart people sometimes f%&k this up?  If they are ignorant, sure. 

But, at this point, is this really an issue?  Nope.  It’s a solved ‘issue.’  And object databases will never take off unless and until Microsoft and Oracle provide their own solutions.  You can fake it anyway.  Create a table with an ID column, and maybe a version column, then add a column of xml datatype.  Persist your objects to that column.  Done.

Developer friction is, in almost all circumstances, the least important problem.  I have to continue the ‘poor man’s kanban’ series, but if you do a value stream mapping of the entire ‘production line’ involved in your software processes, developer friction isn’t the bottleneck.  Focusing on that is time wasted that could be spent on other things.

This isn’t to say you shouldn’t try to reduce developer friction, or ignore CouchDB, db4o, blah blah blah.  Rock on with all of that.  And Rob will probably end up doing things that are helpful in this area.  He’s good at that.

But, really, ORMs aren’t a problem.  There’s nothing to worry about here.

posted @ Tuesday, December 29, 2009 7:20 PM | Feedback (8)
Remote connection to a SQL Server 2005/2008 instance

If you’re like me, you might have had an occasion or two where you needed to connect to an instance of SQL Server from a workstation or server that wasn’t where the SQL Server was actually located.  Because of the gosh darn important security measures that have been implemented in more recent editions of Windows Server and SQL Server itself, this might not work well by default (not to be critical of gosh darn important security measures, as they are gosh darn important).

As a note to myself, if you need to do this, there are a couple of things to check:

a) Check the protocols that are enabled for your SQL Server instance, using the SQL Server Configuration Manager.  Depending on your situation, you will need to enable TCP/IP and/or Named Pipes.

b) Open up Windows Firewall.  Yeah, I know, dangerous.  But not really.  Be sure to allow the program within the Windows Firewall interface (usually found through Control Panel), and typically found at something like: C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\sqlservr.exe, though that will vary depending on which version you have installed.

c) if you do use TCP/IP, check to see if it is using a dynamic port.  Even once you open up Windows Firewall, you will need to make sure the client is hitting the correct port.

Once you know/remember what to check, it is usually easy to set this up.  Obviously, if the SQL Server is in a production environment, you have to be careful about what you are doing (well, you should be careful, regardless), but I find I typically have to do this when I need to connect a workstation to a Dev or QA SQL instance that is not local.

posted @ Monday, December 14, 2009 9:28 PM | Feedback (0)
Like I was saying, Operations is important

If you go take a gander at this, you’ll see that some relatively well known (and some relatively not well known….. http://www.fakeplasticrock.com ???….I dare you to read that quickly without doing a double-take, BTW) sites went down over the weekend due to a hard-drive failure on a dedicated managed server that was hosting various virtual machines.

Now, it would be easy to gloat and flame about this, and really, people should know better.  Same goes for some popular blogger sites that lose all their images when a server fails.  However, I won’t specifically for a number of reasons:

a) I’ve done Ops, and I think I was damn good at it, thank you very much.  It still was hard.  With so much to manage, there is a lot to forget or overlook.  Sure, you should have not only a backup plan, but also a recovery plan.  Everyone know that.  If you tested your recovery plan last week, you are light years ahead of most people.  Have you tested it this week?  How do you know nothing has changed?  Do you have the resources to test your recovery plan everyday, both in terms of people and hardware and time?  No, of course you don’t. 

b) Have you ever saved anything important to CD?  Yeah, me too.  You do know that CDs degrade over time, right?  So those important pictures you saved to CD in 2000 might be unreadable now.  You’ve checked this, right?  Sure you have.

c) The last time I had to deal with hardware failure personally, I did not have complete backups of everything.  I was lucky enough that the failing machine would stay up after a reboot for five minutes.  So, I had increments of 5 minutes over (something like) two days to copy off anything important.  That was fun.

d) The computer gods admire hubris, but they also punish it.  Vigorously.  When I ran an Ops department, I used to test the production SAN by yanking a hard drive out of a slot, just for the hell of it, just to make sure it worked.  The computer gods admired my testing, and punished me by making me accidently run the batch script that turned off credit card processing on the entire web farm (except for one server) a few weeks later.  If I rag Haack too hard, my apartment will catch fire and burn to the ground along with all my hardware.  Or something.

In any event, I did think it important to re-iterate that “Operations”, my global catch-all term for all that non-programming stuff, is just as, if not more, important than all the nifty whiz-bang programming stuff.  An environment with a recovery plan and no separation of concerns trumps the opposite (and if you disagree, you are wrong, sorry…well, okay, usually.  There are exceptions).

Oddly enough, I ran into ‘Ops’ considerations last Friday.  I was asked to merge in some code from one environment to another one.  On a Friday.  The weekend before a very important series of events was occurring in the new environment.  But it was tested, right?  Sure, it was tested in a different environment, and the only differences are environmental variables, no problem, right?  Right.  Wrong.  After merging in all the code, and getting ready to leave for the weekend, it occurred to me that it wouldn’t hurt to run a small piece of the code, just to verify.  No problem.

FAIL.  So, spend 15 minutes or so to fix that and rerun it.  Okay, code is launching, looks like it might take a while to run.  For the heck of it, let’s just run one other piece of code, just to verify.  No problem.

FAIL.  Finally, after these figurative kicks to the nether regions, I thought about what I was actually attempting to do.  Merge in code on a Friday, in an environment that had passed important tests run the previous week, the weekend before this environment was going to go through an important series of events.  Code that I could make to run without error.  Hell, that’s easy enough.  I generally can make failing code run without an error, even if I don’t know what the code does.   Read the error message, use experience combined with brain power, rinse and repeat, till code completes….Slight problem.  What if it actually matters what the code is doing, this Friday before the weekend before a week of an important series of events?

I made the executive decision, and rolled back the code merge.  Let them yell at me on Monday (which they didn’t….in retrospect, they agreed it was dumb to try).

Having said all of that, make sure your really important kool kidz stuff can survive a hard drive failure.  Seriously.

posted @ Monday, December 14, 2009 8:20 PM | Feedback (0)