Posts
832
Comments
691
Trackbacks
1
May 2011 Blog Posts
Software Development Troubleshooting Rule #1: Read the damn error message

I did it again.  I know better.

Keeping in mind that my ego can be seen from space, one thing I am proud of is the fact that I am as good at troubleshooting software development errors, either in production or while doing development, as anyone I know.  I have met a few people who are my equal, but I’ve never met anyone or read of anyone who I would say was better (and I can easily say that of people in terms of coding skill, SQL skills, networking skills, and so on and so on and so on).

Yet I did it again.  I just spent a couple of hours struggling to figure out why I kept getting an exception in my code, and I did that because I violated rule #1: read the damn error message.

Don’t just think you’ve read the error message.  Don’t skim it, and think it says something other than what it says.  Absolutely don’t assume that since you got an error message in an area of code that produced a previous error at some point, it must be the same error with the same resolution.  Read the damn message.

Sometimes error messages are very obscure (COM error messages, I mean you), sometimes they might be misleading, but more often than not, the error message will tell you exactly what you did wrong, you only need to read it and pay attention to what it is telling you.

Even after you read the error message, you need to have skill in interpreting it at times.  I can’t readily explain what this skill is or how to develop it, not exactly.  I think my Philosophy background helps a lot, as a common mistake people make when reading an error message is to leap to conclusions about all sorts of possible causes, instead of just focusing on the very little that you know (the content of the error message) and working methodically from there.  Even then, there is a bit of an art to it that’s hard to articulate.

Back in the day, I was asked by a co-worker who had taken over my previous position as head of operations for help in troubleshooting an error (I don’t remember exactly what it involved, Microsoft Exchange Server, I think).  He worked very diligently at working through problems without asking for assistance, in part because he was replacing the guy (me) who had always been the ‘hero’ in solving operational issues and so (rightly) wanted to prove himself, but after 12-24 hours, he was stymied. 

We were aided by the fact that a direct (but obscure) error message was being logged in Event Viewer.  As I have mentioned many times before, Google is often our guide, so I googled the message.  I quickly determined that the 4th listing in the search results looked promising and, within 10 minutes or so, the problem was solved.

My co-worker came to me later and asked me how I was able to determine that it was the 4th listing that was the right one (and had quickly dismissed the first three listings), as he’d also used Google in a similar fashion.  I couldn’t really give him an answer, since I couldn’t really explain it.  It just seemed like the right one to investigate.

Nevertheless, no matter your skill level or experience level, unless you are lucky, you are going to waste a lot of time if you don’t follow rule #1: read the damn error message.

posted @ Saturday, May 28, 2011 11:36 PM | Feedback (0)
Lucinda Williams – Seeing Black

A song spawned in reaction to the suicide of Vic Chesnutt, harsh and straight-forward.

The ending guitar solo is provided by…Elvis Costello.  Yes, that Elvis Costello, and no, I didn’t know he could play guitar like that either.  Technique wise, he’s no Steve Vai (obviously), but he rips it out in the style of Neil Young (in fact, if I hadn’t read it was Elvis Costello, I would have bet at least $124 it was Neil Young).

Did you feel your act was a final truth
The dramatic ending of a misspent youth
Did you really feel you had all the proof
Did you feel your act was a final truth

Was it hard to finally pull the plug
Was it hard to receive that final hug
Did evil triumph over love
Was it hard to finally pull the plug

When did you start seeing black
Was it too much good you felt you lacked
Was it too much weight riding on your back
When did you start seeing black


Enjoy

posted @ Thursday, May 26, 2011 7:51 PM | Feedback (0)
cqrs for dummies – 4 of N – the event publishing layer

Series link here.

As a reminder of what I’m talking about, here’s the picture from Mark:

DDDDivision_big

What I’m going to be talking about is section #4 from Mark’s picture, and in particular, I’m going to be going over a number of concepts including:

  • Eventual Consistency
  • Event Handler per view model
  • Persisting internal events and publishing external events inside a transaction
  • Publishing architecture

Eventual Consistency

In a previous post, I talked in detail about what Eventual Consistency is all about.  I will briefly recap here.

At the heart of cqrs in this architecture is that you issue commands with a void return.  The commands are sent through command handlers which call into the relevant aggregate root to fulfill the command, and which emit internal events as a result.  These events are persisted in the event store as well as published for external consumption, which involves the read-only store used by queries being updated as a result.  Subsequent queries can then see these updates.

What’s important to note here is that there is no single transaction that wraps this process end to end.  Though there is a transaction involved, it does not extend from the issuance of the command to the update of the read-only store.  There’s a temporal gap here: a command is issued and even when it is successful, there is no guarantee that the result is immediately available for querying.

From a certain traditional perspective, this seems to be a flaw.  How do you know from the querying side when the update is available to be read?  The answer is that, strictly speaking you don’t.  In a well-built architecture, it will probably only be milliseconds later, but the key word here is “probably.”

However, what appears to be a flaw is in fact, when looked at from another perspective, par for the course for most applications, and is already acceptable to the business.  As long as the read-only store is eventually consistent with the results of successful commands, this temporal gap is perfectly fine.

Why eventual consistency is acceptable

There are many examples that can explain why eventual consistency is acceptable, let me describe two of them, both centering around a standard e-Commerce store.  And, to highlight the important points, let us assume that it isn’t using cqrs, but instead uses a common standard: the main database handles all transactions, and then replicates them to a reporting database using stock replication features found in major OLTP systems, such as Replication within Microsoft SQL Server.

Suppose the marketing department has created a new email campaign and wants to get a ‘real-time’ report of how well that campaign is working.  The head of the marketing department generates the ‘real-time’ report and prints it out to carry into a meeting.

The first thing to note is that replication has a built-in temporal lag.  Depending on how robust the replication infrastructure is, this lag might only be seconds behind the main database.  In some instances, under heavy traffic, the lag could be longer.  Regardless, there is already a lag built into the system.

Suppose for a moment that the report is generated off of the main database instead of a reporting database, and so there is no replication lag.  The moment after the report is generated, it is, theoretically, out of date.  It doesn’t capture the orders that are created and persisted after the report is generated.  From the time after the report is generated to the time it is discussed and analyzed in the meeting, it is out of date.  But this is okay, as the business can operate successfully regardless.  It doesn’t actually need an up to the millisecond accurate report.  It just needs one that is more or less up to date within a reasonable time frame. 

Let’s look at it from a user that is browsing the e-Commerce store.  A typical scenario is that only product that has available inventory is viewable on site (to prevent sales on products that can’t be back-ordered, for instance).  The user browses to the category of interest, and picks a product that they are interested in.  The web site generates a product detail page to give the user full information on that product, so that they can examine it and decide whether or not they wish to purchase it.

The moment after the product detail page is generated and displayed to the user, it is, theoretically, out of date.  For popular items, there is absolutely no guarantee that the product won’t have all of its inventory gone by the time the user gets around to attempting to add the item to their shopping basket.  Even if they can successfully add it to their shopping basket, there is absolutely no guarantee that it will still be available by the time they initiate the checkout process to purchase it.

What this highlights is the fact that eventual consistency is a fact of life/business that exists regardless of whether you build a system that is architecturally designed around the fact.

Why you might want to architecturally design with eventual consistency in mind

In a word, “scalability.” 

When I was working with high volume e-Commerce stores involving properties like NASCAR and the NBA, it was a common theme that we wanted to limit the usage of our main database to taking people’s credit card numbers.  This is why we used replication.  We didn’t want to limit the scalability of our main database because it needed to run reports or because we needed to generate product detail pages off of it.  We did those off of the replica (actually, it was a little more complicated than that, but you should get the idea).  We obviously wanted our reports to be accurate, and we obviously wanted to only display product detail pages that had product we could sell, but at the end of the day, given the choice between limiting the number of orders we could process and limiting the number of times we displayed an ‘out of date’ product detail page, we chose the latter.

What’s important to note here is that, no matter what, you always have to make this choice.  It isn’t as if adopting cqrs changes this.  What cqrs can do is allow you to break down the process flow of your application in such a way as to optimize that flow (more on this below).

cqrs still insists on an important transactional component: the creation of internal events and publishing them externally

If your aggregate roots act on commands that they receive and produce internal events which are then persisted in the event store, you want to ensure that they are available externally for consumption.  If a tree falls in a forest, no one cares if it makes a sound or not, but if an aggregate root produces an internal event, you want to make sure it makes a sound, and so failing to publish them externally should fail and throw an exception.

event handler per view model

Replication typically involves a (more or less) straight one for one update from your main database to your replica.  The schemas are typically (more or less) identical.  This can cause performance problems when querying the replica as you have to join across multiple tables to get the data that is required to be displayed in your views.

A well designed cqrs system will allow you to have one published event update one or more read-only store tables, so that your reporting queries or your UI queries are optimized to deliver exactly the information needed exactly when it is needed.

cqrs publishing architecture

I don’t think you can describe an ‘ultimate’ architecture without understanding the needs of the application in question.  There is no one size fits all solution.  However, there are some guidelines that I think are useful to keep in mind.

As mentioned above, your read-only store should be optimized to provide the exact information needed when it is needed, and so it should probably be significantly denormalized.

Just as SQL Server Replication can fail, but be restarted either from scratch or from the moment of failure, a cqrs architecture should allow for the same.  Since you are storing all of the events that are produced internally, you should have a set of mechanisms that allows you to replay all of the events from scratch or from the moment of failure if it happens to fail.  Because you don’t want to reprocess, for example, orders previously placed, so as to not trigger, e.g. credit card processing, you should have some mechanisms in place that allow you to identify an event that is republished externally as being replayed.

summary

cqrs may be new in name, but it is definitely not new in terms of the concepts that underlie it.  The flow of such a system, generically can be described thusly:

command triggered –> command handled –> aggregate root acts on the command it receives –> aggregate root publishes internally the event that results –> internal event is persisted in the event store –> internal event is then published externally –> external event is consumed by the read-only store –> read-only store is queried in an eventually consistent manner

Each of these pieces of the flow can be scaled separately.  If you are using something like MSMQ, you can have totally separate queues along the way for different command handlers or event handlers, so that the highest traffic ones can be on different sets of hardware, for instance.

going forward

The flow that I just described makes the most sense (in my mind) when you have an application that applies DDD principles.  The ways in which this can be explored are readily available through Google.  Where I find it most interesting is when you think of cqrs in its purest/strictest form, where commands and queries are separate objects, but in applications that don’t necessarily involve aggregate roots.  ETL scenarios are an obvious candidate, but there are many others.

I hope that I have been able to lay out the very basic details of cqrs for dummies, from a dummy.  I will continue to update this series based on what I learn in the future.

posted @ Wednesday, May 25, 2011 11:43 PM | Feedback (0)
Repost: Unblocking files you get from the internet

Every once in a while, I need to do this (usually when I get a new machine), and I can never remember exactly where Sergio’s post is, so I’m re-posting it here so I have a 3% better chance of finding it next time.

This gets rid of the need to click the unblock button when you download zips or whatnot from the InterWEB thing.

Run gpedit.msc, go to User Configuration/Administrative Templates/Windows Components/Attachment Manager, and enable the “Do not preserve zone information in file attachments.”  From a command prompt, Gpupdate /force.

posted @ Sunday, May 22, 2011 12:47 PM | Feedback (0)
Concrete Blonde – Side of the Road

A song that they never played live (at least at any of their concerts I attended), a great song about sadness and loss.  Or something like that.  Can’t personally relate, of course.

i can remember
us laughing in bed
hung over
happy
and holding our heads
we didn't care about what people said
it's hard recognizing a dream that's gone dead
feeling my liquor
feeling alone
nowhere to go
so I guess I'll go home
you were the first and the only one
by the side of the road

Enjoy.

posted @ Monday, May 16, 2011 7:48 PM | Feedback (0)
cqrs for dummies – example – when returning a value from a command makes sense

In a previous post, I talked about why I have broken from tradition and created command handlers that returned a value.  I want to give a brief example of where that makes sense.

ELT is a legitimate software development project

There are many times when I develop software that involves heavy UI usage and/or involves something like a domain in the sense of DDD.

However, there are many other times when I’m doing something else, and that is when I am developing ETL software.  ETL ( Extract Transform Load) is definitely less ‘sexy’ but is often a crucial part of many software projects overall.  You need to take data from somewhere (often from multiple somewheres), alter it, and load it somewhere else.

Most of these projects are procedural and boring.  Do step 1, then step 2, then blah blah blah, till step x.

CQRS fits well here.  You have commands and you have queries.  You read some data, then issue commands based on that data.

Here’s the thing.  In an ETL sort of situation, you want to know that your commands succeed before you go to the next step, because you need the results of those commands, one way or another.  You aren’t in an eventual consistency situation, and you aren’t in a scalability situation where you need to issue huge numbers of commands in parallel.  You simply need to know that the command you issued succeeded (and possibly get some information back).

A simple situation: you issue a command to go and get a file from an external source.  As a result of that command, the file might end up in some particular location.  Regardless, the command has to succeed before the next step occurs as the file has to have been retrieved.

Sure, you could set up an infrastructure where you send a void command that gets handled and then produces events that are published to some mechanism that records them in some read-only query result that then tells you you can move onto the next step.

The obvious question is, why in the hell would you want to do that?  Why not just let the command return that it was successful and/or with whatever data you need to move onto the next step?

It isn’t CQRS if you let commands return a value

You can argue semantics all you want.  It isn’t CQRS that involves sending commands into a domain that produces events that are then published and subscribed to in a way that allows a query to read it off of a read-only data store.  No doubt about that.  But, IMO, none of that is inherent in any and every CQRS situation. 

YMMV. 

posted @ Saturday, May 14, 2011 11:36 PM | Feedback (0)
How to get a series on the Food Network

Step one - Pick a contestant skill level: incompetent, novice, ‘foodie’, professional

Step two – Pick a theme: could be a particular food type (e.g, cake), could be a particular food technique (e.g, grilling), could be just a general competition in order to become the next .

Step three – Pick a judging panel: should be (though isn’t always the case) relevant to what you pick in step one and two.  If possible, include at least two of the following: generally effeminate guy, guy with bad hair, fairly smoking hot woman.  This helps satisfy the widest range of demographic.

Step four – have them eliminated in a tension filled scene in front of the judges.  Ideally, string it out over multiple weeks or months.

Voila.  You have a hit TV show. 

I have a write-up treatment featuring novice pickling contestants entitled “Strut Your Kraut”, that’s bound to be a hit.  Please don’t steal it.

posted @ Friday, May 13, 2011 8:02 PM | Feedback (0)
Job posting snippet that probably should have been run through the marketing department

Must….resist….obvious….joke

“I need an Informatica Developer w/ BO for my direct client “

posted @ Wednesday, May 11, 2011 7:03 PM | Feedback (0)
Tekpub improvements after AWS outage

Since I brought it up, I thought I’d provide closure by pointing to how Rob (as expected) is improving the infrastructure of Tekpub:

“That AWS outage cost us a lot and I will not let it happen again. The main issue was my “freezing” when it came to a move - I was too afraid that I’d mess up our orders data and while I generally am OK with messing up - I am absolutely NOT OK when it comes to $$$.

To that end I’m looking to move our commerce bits over to Shopify. Their level of integration is amazing and I’ve been working with their team on how to offload our stuff over to them, so I can focus on building out the video bits.

Moving to Shopify means that if Tekpub goes offline - I just need to move the video portal and profile bits over - which is a rather quick thing to do. No order information will be lost, and the upkeep is reduced.

As an existing user - this won’t matter to you at all. Nothing will change with respect to your account and your ongoing memberships etc. As a new user - you’ll have a bit more of a “comfortable” checkout process with improved PCI compliance.

Now, I have no idea how great (or not) Shopify is, but it looks good.  Plus, the dude’s smart (even if overly sensitive), so now that he’s lived through that great hell of a site outage, I would imagine he did his due diligence.

I really wish Rob would put some content on Tekpub about infrastructure stuff.  How did he choose Shopify?  How do they provide extra benefits in terms of safety and backup?  How does it improve PCI compliance (I’m really glad to see he looked at this)?

Way too many people think that the actual writing of code is the most important and/or difficult part of software development, when it often times isn’t.  The operational stuff that I talk about is admittedly boring, but often times is more important.  This is one of the many reasons why I’m suspicious/dismissive of Software Craftsmanship, but I digress.

posted @ Monday, May 02, 2011 7:11 PM | Feedback (1)
Words of Wisdom – My Younger Sister

In discussing the music of Epica:

“Their death grunting even makes sense live, though it usually seems rather silly to me.”

If I’ve said it once, I’ve said it a thousand times.  You can never have too much death grunting.

Death grunting?

posted @ Monday, May 02, 2011 6:23 PM | Feedback (0)
Job posting snippet sign of the Apocalypse?

Hi,

Position: Project Manager

Location: NYC, NY

Rate: Open

Duration: 18 Months

Required:
* Waterfall.  Thats the only absolute requirement. 

posted @ Sunday, May 01, 2011 5:44 PM | Feedback (0)
Peter Wolf – Nothing but the wheel

A great “driving because my heart is broken” song.  Not that I know anything about that personally, of course.

I’ve been trying to drive you off my mind
And maybe that way baby, I can leave it all behind
And the only thing I know for sure
Is you don’t want me anymore
And I’m holding on to nothing but the wheel

Great backing vocal by Mick Jagger.

Enjoy.


posted @ Sunday, May 01, 2011 12:36 AM | Feedback (0)