Posts
822
Comments
686
Trackbacks
1
July 2010 Blog Posts
A Fundamental Rule of Troubleshooting Software Bugs

In a previous post, I talked about a really annoying bug:

“Quantity 127546.00 for asset Blah in System A does not match Quantity 127546 for asset Blah in System B”

I didn’t want to go through and list out all of the different log messages and whatnot, so I paraphrased them into that sentence.

And even though I know better, I thus violated a fundamental rule of troubleshooting.  That wasn’t, obviously, the bug.

When confronting a bug, and you have log messages of any sort that are related to it, read exactly what the messages are saying, and only what the messages are saying, and start from there.  Don’t immediately try to infer what they mean.  Pay attention.

I mean, it is *possible* that you could have a generally sophisticated validation system that couldn’t tell that 127546.00 and 127546 were identical values, but I think you’d have to try *really* hard to accomplish it.

Once I stopped and reflected on this, I re-read the available messages.  I then started with the basics: where exactly in the system were the individual log entries created, and what exactly were the situations that would cause them?  Dive deeper, rinse and repeat.

The log messages were saying that I had an asset Blah in System A with quantity 127546.00 that didn’t match an asset Blah in System B with quantity 127546.  They were not saying they didn’t match because of the quantities.  But since that was what caught my eye, I wasted time on checking a part of the code that I just couldn’t imagine was failing (of course, if better analysis had led me to that part of the code, it wouldn’t have been a waste of time).

Once I actually focused on reading exactly what the messages said, it was pretty easy to determine that the messages weren’t logging the vital information of what wasn’t matching (for at least a vaguely defensible reason, though having incomplete log entries is really annoying).  And that led to the fix.

And I really do know better.  Back when I was on the hook everyday for fixing live production bugs, following this fundamental rule was second nature.  I’m obviously out of practice.

Just because you think the log messages are saying something doesn’t mean they actually are. 

posted @ Wednesday, July 28, 2010 9:02 PM | Feedback (0)
The sort of bug that makes your teeth hurt

Scenario: generally sophisticated validation logic to compare different positions that an account holds from two different sources to ensure ‘normalization’ across systems, generally well designed and functioning properly.  Suddenly, odd validation failures occur.

Generally sophisticated validation logic failing on the following case:

“Quantity 127546.00 for asset Blah in System A does not match Quantity 127546 for asset Blah in System B”

Eek.

posted @ Friday, July 23, 2010 6:02 PM | Feedback (0)
Jefferson Starship – Save Your Love (live)

Let’s get the ‘apologies’ out of the way:

  • Yes, I am posting a song from Jefferson Starship.
  • It was the eighties.  That’s how people dressed and wore their hair.
  • Calling the lyrics cheesy does them too much justice.  “Go on out and gain the world/ But don’t lose yourself while your trying / Your truth is changing everyday / But your heart will let you know when you’re lying”.  Right, let me grab a pencil and write that down.
  • The guitarist, Craig Chaquico, does look like he might have been separated at birth from Dana Carvey
  • Chaquico is not generally considered in the ‘league’ of guitarists like Steve Vai, Joe Satriani, Eddie Van Halen, etc. etc. etc.

Having said all that, the outro solo is a very well done piece of work.  It’s shorter and tighter on the official recording (though I’m guessing most people won’t want to rush out and buy a 80s album from Jefferson Starship, here’s the Amazon page for it), but given the cheesy, straight forward ‘rock ballad’ format, I think it still plays really well (I’ve actually tried to find if anyone ever created a tablature for it so I could take a shot at it myself, to no avail).

Ignore the hair and give it a chance.

posted @ Wednesday, July 21, 2010 7:40 PM | Feedback (0)
The Top Idea in Your Mind

Normally I read Daring Fireball just to see what the Apple Apologists Dedicated Users have to say about things going on, but there’s also some other good things that come through, including a post by Paul Graham entitled “The Top Idea in Your Mind”.  Teaser paragraph:

“I realized recently that what one thinks about in the shower in the morning is more important than I'd thought. I knew it was a good time to have ideas. Now I'd go further: now I'd say it's hard to do a really good job on anything you don't think about in the shower.”

YMMV, definitely, on how much you agree with the particulars, but it’s an interesting read.  Check it out.

posted @ Wednesday, July 21, 2010 7:16 PM | Feedback (0)
cqrs for dummies – an interlude – one way of implementing queries and commands

This may sound cynical, though I don’t mean it that way (for once), but there’s more than one way to skin a cat, and so I try to avoid advocating strict ways of writing certain types of code.  Having said that, I’ve found that writing queries and commands with a certain basic pattern has worked all right for me, at least recently.

Queries can be implemented using the following interfaces:

public interface IQuery
    {
        TReturnValue Execute();
    }

    public interface IQuery where TParameterDefinition : IQueryParameterDefinition
    {
        TReturnValue Execute(TParameterDefinition d);
    }

    public interface IQueryParameterDefinition {}

Queries that don’t require any input parameters can be used with the first interface, while queries that do require input parameters use the second one.

The IQueryParameterDefinition interface is a marker that allows you to do things like:

public class MyQueryDefinition : IQueryParameterDefinition
{
    public int ParameterID { set; }
}

And then pass it into something like:

public class GetMyQuery : IQuery
{
public string Execute(MyQueryDefinition d)
{
    return myService.GetQuery(d.ParameterID)
}

Nothing fancy, but it gives you type-safe blah blah blah.  Obviously, you can create the QueryDefinition to have as many different parameters as you like.

When it comes to commands, one question that I’ve always ‘struggled’ with (I mean, it doesn’t keep me up at night or anything) is whether a command should return a value (note that this is a separate issue of whether they should execute in an async vs. sync manner).  After looking at various things, what I’ve settled with for now is just as basic:

public interface ICommand
{
    void Execute();
}

public interface ICommand
{
    TReturnValue Execute();
}

I can think of a whole host of philosophical reasons why either or both of these approaches violate some consideration or another, but so far, they’ve worked for me in the situations I’ve needed them.  YMMV.

posted @ Monday, July 19, 2010 9:30 PM | Feedback (0)
More Reasons Why Integration Tests Can Be More Important Than Unit Tests

Over at CodeBetter, Patrick Smacchia (the NDepend dude) recently has blogged a couple of posts about “Tests Driven Development” (not sure if the extra ‘s’ is supposed to signify something important or if that’s just what he calls it). 

I’ve written at other times about why I’m not a big fan of TDD so won’t go through all of that blah blah blah here, but some more events at clients have re-iterated to me why Integration Tests are often much more important than unit tests.

Patrick talks about using code contracts as integral, and I agree with this.  He also talks about some variations of the ‘80/20’ rule:

So now we have another side of the 80/20 law. 80% of the effort is spent writing tests to cover the 20% of the code remained uncovered, but 80% of the issues and bugs are found during this effort.

I’ve always found the idea of trying to achieve 100% code coverage to be, well, kind of nutty, but in all honesty, this is, in my mind, as well-thought out a defense of the idea as I’ve seen in a while.  And succinct too.  I doubt this defense is provable in any way, shape or form, but that’s true about many things.

And yet, it still seems to miss a fundamental point.

At around the same time, it turns out that over at LosTechies, has posted about an ‘anti-pattern’ which is neatly summed up in the title, “Too much of your application is about interacting with external resources.”  Well, neatly summed up except that I think it would be better described as “if you can’t test your application at all without connecting to live external resources, you’re kinda f%cked.”  The items that he lists are things I agree with, such as:

The majority of ‘unit tests’ require a database, or web and application server to be up and running.

This is very true, and yet, it still seems to miss a fundamental point.

It’s all about the data and your external systems

I’m going to go ahead and make a claim that is just as provable as Patrick’s, which is that I think most bugs in software do not come from bugs within the software itself, but bugs that arise because of the data that is entered into the software from external sources.

Which is to say that you can unit test your brains out to your heart’s content (mixed metaphor anyone?) but it won’t actually prove that your software works.  Why?  Because oftentimes the ‘bugs’ that appear in actual production operations are due to the nature of the data that you get from your external systems and/or how your software interacts with the nature of the data you send to your external systems.

What everyone who emphasizes ‘unit testing’ or TDD gets right is that, *of course*, given assumptions about the data you receive, your software should behave in predictable and testable ways, and you should be able to, well, test this in predictable ways.  That’s why interfaces are so neato keen.  You build your system to use interfaces so that you can have a healthy set of tests where you can dictate the input data without having to have to connect to external systems to do so, and can do so in an automated, predictable, reproducible fashion.  Since *so* many systems can’t do this as they are built, emphasizing unit testing, TDD, or whatever gets you on the right track to fixing the sort of issues that arise.  I’m totally on-board with that.

But, building your systems to get you to this level solves only a subset of the issues that software development faces.  What really matters, in the end, is finding out what will happen when you actually connect to external systems and run your software, and integration tests are the only way to really figure this out.

I’ll give some more specific examples in a second, but imagine an incredibly vague, abstract scenario like this:

1) Some data comes in from an external system.

2) Your ‘receiving application’ does stuff with that data.

3) Your ‘sending application’ sends the data-with-stuff-done-to-it to an external system (often different from the system in step 1)

4) The external system does stuff with the data-with-stuff-done-to-it and then sends it back to you.

5) Another ‘receiving application’ takes that data and does some other stuff.

Rinse and repeat various steps.

You can unit test step 2 by using mocks/stubs/whatever to dictate the shape of the data that you assume is going to come in via step 1.  You can unit test step 5 in a similar fashion, and you can unit test step 3 (more or less).  You *cannot* reliably and predictably unit test steps 1 or 4, because you cannot reliably and predictably test what external systems will *actually* send you, by definition.

And yet, steps 1 and 4 (which are multiplied when you are dealing with multiple external systems) are what actually determine if your software works in Production, which is really all that anyone cares about.

Let’s talk about some examples.

External systems are finicky mean things

Over the last several years, the types of clients I’ve worked with have involved either e-Commerce systems or financial systems.  ‘Financial systems’ is a bit vague, so for the sake of the discussion, let’s say that they involve supporting trading operations (but *not* real-time “gotta get this automated trade out to the market in microsecond” operations, which is another ball of wax).

When dealing with e-Commerce systems, you have to deal with product feeds, tax feeds, inventory feeds, changing APIs etc. etc. etc.  When dealing with financial systems, you have to deal with company information, holding information, asset information, changing APIs etc. etc. etc.

Let’s talk at a high level about some of these things:

1) In an ideal world, when an external system changes the data that they send you, either in terms of format or schema, you will know this ahead of time.  This doesn’t actually happen as often as you would hope.  In an e-Commerce system that I’ve worked with recently, the external source of tax information added a column which required a change in the internal logic of the ‘receiving application’ to act upon it.  We discovered this, of course, when the external source actually started sending the new data.

2) In an ideal world, when an external system claims to support a new API, you should be able to use the written documentation of the new API to change how your ‘internal’ applications work and send new data based on the new API.  Then you discover that they support the new API but only in their UAT environment, but not PROD.  Or only in PROD, but not UAT.

3) You’ve been sourcing data for domestic asset information from certain columns sent to you by the external system, and things work swimmingly.  And then some international asset information starts coming in, and it turns out that you really should have been sourcing the data from totally different (but related) columns, but you didn’t know it at the time.

These are just a few of the scenarios I’ve faced recently, there are many others.  The (I hope) obvious point is that you have no way of knowing from unit tests that your ‘internal’ software will actually work in PROD or UAT, until you actually interact with the external systems. And yet these are the common causes of ‘bugs’ that cause the production issues that you have to deal with, oftentimes at 3 in the morning.

How to setup your Specs

What I typically do when designing software is to start with Specs.  ‘Specs’ is just another word for ‘well crafted tests’, where they are well-crafted from agreed upon design requirements, hopefully designed by the business users.  I create separate code to deal with ‘Unit’ and ‘Integration’ specs. 

‘Unit’ specs test your internal business logic (whatever there is of it) and also test that you have wired up things properly.  Since I’m a cqrs fanboi, I tend to create code that follows very predictable patterns, such as, query hits service hits repository hits dataAccess, blah blah blah.  My ‘Unit’ specs define the path of the code and that the right methods are called in the right places.  Where I have intensive domain logic, I create specific specs around that logic, and this is where you do all of that mocking/stubbing/whatever stuff that we all know and love to know that given scenario A, we get output B, or whatever.

‘Integration’ specs actually hit your external resources.  Given that I send external resource data A, data B should be returned and returned in the format and data I expect, or whatever.  These are the tests that actually matter.  The actual interaction between your ‘internal’ software and the external resources is what determines if your software actually works. 

Summary

Unit tests are good things.  Software systems that are built around predictable inputs that pass unit tests are good systems.

But at the end of the day, integration tests that actually hit your external systems are more valuable.  You can never predict for certain how your external systems will behave until you actually hit them, and to a certain extent, there’s no integration test that will protect you completely.   Vendors do all sorts of wild and crazy things.  But integration tests are what tell you is the current state of your software.

posted @ Friday, July 16, 2010 9:47 PM | Feedback (0)
A forthcoming comparison between SpecFlow and StoryTeller

Jeremy Miller has finally released the 1.0 version of StoryTeller, and the timing for me turns out to be fortuitous (I think that means ‘pretty good’…). 

From Jeremy’s own description:

  • StoryTeller is a tool for creating and using “Executable Specifications” against .Net systems.  StoryTeller could be called a Behavior Driven Development tool depending on which of the billion definitions of BDD you subscribe to, but is very much optimized for customer facing tests.  I meant StoryTeller for the older ideas of Acceptance Test Driven Development that predate BDD. 
  • SpecFlow is another tool that I’ve been looking at that sounds similar themes:

    SpecFlow aims at bridging the communication gap between domain experts and developers by binding business readable behavior specifications to the underlying implementation.

    Our mission is to provide a pragmatic and frictionless approach to Acceptance Test Driven Development and Behavior Driven Development for .NET projects today.

    I specifically have not said that I’m making a comparison of SpecFlow vs. StoryTeller, as I think it would be a mistake to thing that they both target exactly the same audience.  That said, I’m sure an interesting Venn Diagram could be created that showed a significant intersection of functionality.

    As it happens, I have started a new project that is perfectly suited (or perfectly enough) to make use of both of these, and to see which tool works best under which scenarios.  I’ll be happy if it turns out that one is clearly more suitable for my needs than the other, or, more likely, that they suit different needs well.

    It should be interesting.

posted @ Wednesday, July 07, 2010 10:47 PM | Feedback (0)
AT&T Samsung Epix upgradeable to Windows Mobile 6.5

This post has the details.

One ‘interesting’ thing is that the upgrade paths only explicitly detail how to do it if you connect your phone to a computer using Vista or Windows XP.  There is no Windows 7 path, in fact, the fine print explicitly says that you can’t use either of the download tools with Windows 7 (my guess is that it required too much work to create a Windows 7 specific download tool so soon to when Windows 7 Phones will be available anyway).

I wonder what would happen if you set the Vista download tool to run under Windows 7 in Vista compatibility mode and tried to upgrade.  An interesting rhetorical question.  It would be foolish to actually try this, of course, since you risk bricking your phone.  Yep, interesting rhetorical question.

posted @ Wednesday, July 07, 2010 10:25 PM | Feedback (0)
RethinkDB : Another challenge to NoSql Orthodoxy

Take a look at RethinkDB, a drop-in replacement for the MySql orthodoxy.

A central theme of NoSql orthodoxy is that ACID can’t scale.  The fact that companies like Amazon and Google (you may have heard of them) have ‘abandoned’ ACID is, in my mind, enough proof anyone needs that the NoSql is a valid option that anyone who needs to consider data storage options (what an ugly sentence….but I digress).

What I like about things like VoltDB and RethinkDB is that they are attempting to rise to the challenge.  Can you have your ACID cake and eat it too?

I still think that if NoSql alternatives really take off, Microsoft and Oracle will rise to the challenge as well.  Eventually.  But, I think it is good to see open-source alternatives here.

Ultimately, I think that what the market wants is something that has ACID, has easy ad hoc query capabilities, is operationally manageable, and doesn’t succumb to scaling limitations.  I have no idea how feasible this is (unless memristors become a reality), but I think that’s what people would really like.  No one wants to throw out their SQL querying skills unless they have to.

But, that’s a guess.  I’d like to think it is an educated guess, but a guess nonetheless.

posted @ Thursday, July 01, 2010 6:36 PM | Feedback (0)