The ‘red-green-refactor’ crew often seems to denigrate the importance of so-called ‘integration’ tests. This isn’t always on purpose, but the idea seems to be that integration tests are ‘slow.’ As a matter of fact this is often true. Integration tests do take longer, because you have to hit those databases and browsers and whatnot, and that takes time.
But this is the actual application. The application is not the series of unit tests, and can never be covered by them. Your end users care only about how your application works in real time. This is why you *have* to have tests to cover the actual user experience if you expect your tests to be relevant.
Here’s an actual example from Apollo 11:
“The PGNC System malfunctioned during the first live lunar descent, with the AGC showing a 1201 alarm ("Executive overflow - no vacant areas") and a 1202 alarm ("Executive overflow - no core sets").[6] In both cases these errors were caused by spurious data from the rendezvous radar, which had been left on during the descent. When the separate landing radar acquired the lunar surface and the AGC began processing this data too, these overflow errors automatically aborted the computer's current task, but the frequency of radar data still meant the abort signals were being sent at too great a rate for the CPU to cope.[7]
Happily for Apollo 11, the AGC software executed a fail-safe routine and shed its low-priority tasks. The critical inertial guidance tasks continued to operate reliably. The degree of overload was minimal because the software had been limited so as to leave very nearly 15% available spare time which, wholly by luck, nearly matched the 6400 bit/s pulse trains from the needless, rendezvous-radar induced Pincs, wasting exactly 15% of the AGC's time. On the instructions of Steve Bales and Jack Garman these errors were ignored and the mission was a success.
The problem was caused by neither a programming error in the AGC nor by pilot error. It was a procedural (protocol) and simulation error. In the simulator, the astronauts had been trained to set the rendezvous radar switch to its auto position. However, there was no connection to a live radar in the simulator and the problem was never seen until the procedure was carried out on Apollo 11's lunar descent when the switch was connected to a real AGC, the landing radar began sending data and the onboard computer was suddenly and very unexpectedly tasked with processing data from two real radars.[8]”
Thankfully, the real world experience turned out to be okay, but the important point from my perspective is that the real-world experience didn’t match the expected one.
Obviously, nothing I’ve ever worked on matched the importance of Apollo 11, but it is a common experience in projects that I’ve worked on that production bugs occur precisely because you have no way of testing what will happen in production due to lack of a testing that matches production.
I don’t want to suggest that it isn’t important to test developer code. It is. But, in the end, production code matters more, and that requires important and relevant integration tests.