Larry has a great post about the myth of zero defects. As a release manager (or rather, bean counter), the issues discussed there are part of my daily life.
One interesting aspect of bug management that may not be immediately obvious has to do with automation. Years ago, product teams at Microsoft didn't have much of it. Many of the tools were in their infancy and we still did a lot of manual testing. Since then, automation has become an increasingly important part of the testing diet, which is great for many reasons.
On paper, you'd think it to be reasonable that if you have more automation, you'll find bugs faster, be able to verify fixes faster and thus save testers the time they used to spend doing those things manually. So therefore you can get to that zero bug bounce faster and thus can do more writing of new code than you used to in the same time frame.
But there are many complexities - the two foremost on my mind are:
You have to schedule time to write the automation, of course, and that time replaces a chunk of the time testing used to spend manually testing. But you also have to account for time to fix the bugs in the automation just like you account for time to fix the bugs in the product code. This is something we didn't used to have to deal with years ago, and so we've been learning how to adjust our planning to handle this. This is the kind of thing I deal with all the time as a release manager.
The increase in automation is a fantastic thing for us - we don't just release a product, we release a test product. That test product is used to verify the product operates as expected as the environment changes (OS service packs, etc) and as the product changes with service packs. But overall it's been an eye-opening experience for me to see how the improvements in engineering excellence have impacted our scheduling (in both directions).
> 2. Automation is code. Code has bugs.
And for some reason engineers who produce clear, well-structured shipping code let all that go when creating automated tests and write the most horrible unintelligible spagetti code imaginable. And then they are surprised when defects are traced to their test code. I've never understood why test code is not treated the same as shipping code. I love reading The Braidy Tester's blog http://blogs.msdn.com/micahel/ for his rants against bad test code
The other big issue with automated tests, in my mind at least, is that they test the exact same thing every time. Clearly automation is a huge productivity benefit but manual testing will always find additional defects that automation will not. Ideally you'd combat this by constantly writing new tests even for old functionality, however there never seems to time for that :-(
The technique the OpenBSD team uses is that when they find a bug (especially one with security concerns) it's usually tracable to a library call that's easy to use incorrectly, so they scan the entire source tree for uses of that call and either replace it with something harder to misuse or check for common errors. They've even switched all uses of select(2) to poll(2) because select(2) FD_SETs are theoretically overflowable.
This is precisely why I only write 100% bug free code.