Recently, I interviewed for a new role within Microsoft. As is the norm here, the last 15 minutes of the interview allows the interviewee to ask questions of the interviewer. I usually ask the same two questions of each interviewer:
1. What do you think makes a good <insert position name here>?
2. Why did you choose to work for this group and what are your most and least favorite things about the group?
I was having a conversation about how my interviews were going with a co-worker and they asked me what my answers to those two questions would be. I thought it might make an interesting blog post as I am regularly asked by the high school and college students I speak to at Microsoft events about what they can do to "get into Microsoft" or the skills required to do my job. Many people think you need a ton of fancy degrees and certifications or loads of external work experience, and all of those definitely help.
But as I thought about how I would answer the two questions though, it became clear to me that question number one is an easy answer that's becoming increasingly hard to find in potential candidates I speak with. So anyways, here are my answers to the question one above. Keep in mind this is MY opinion and doesn't have any bearing on the way groups inside of Microsoft hire.
What do you think makes a good support engineer?
For me, this is relatively simple, it comes down to two traits and I think one trait begets the other. You must have a passion for technology first and foremost. I'm not talking about someone who "likes" technology and buys a lot of gadgets, although that's not a bad start. I'm speaking of someone who lives and breaths technology. Someone who looks for interesting ways to make technology more a part of their lives or even define solutions for problems in their lives with technological answers. Does this mean that the person doesn't interact with humans? Hardly. You still need to be able to speak to people (respectfully I might add), but I want you to be as comfortable speaking to a machine as a person. And I would prefer that you're comfortable doing that with any machine and not just a computer. As an example, we recently got new phones here at work and I spent the better part of my lunch hour just trying to reprogram the ringtones (harder than it sounds) so that it would be annoying to the co-worker that sits across from me - sorry Scott. It should have been as easy as adding that dog whistle ringtone to the list, but once it wasn't, I dug into the phone and its software and tried to modify the WAV file to match the bitrate that the phone application was looking for in an attempt to get it to work. It's a small, childish example, but many people are prone to just get their phone, plug it in and make calls on it. I want someone who wants to play with the device to see how it works, even if for nefarious means.
The second trait is the ability to troubleshoot. As I said, I think an interest in technology begets an interest in troubleshooting and vice versa. However this is becoming increasingly rare to find in candidates that I speak to. There are plenty of brilliant people out there, most much more so than me, but if you cant troubleshoot, you cant do my job no matter how intelligently gifted you are. Microsoft, like many tech companies, we're fairly notorious for asking questions about manhole covers, light bulbs and moving Mt. Fuji. While we don't really ask questions like that any longer (it's a shame if you ask me), but we do still test your ability to troubleshoot. I recently did a round of interviews of college candidates and the skill I am usually assigned to interview on is "problem solving". A question I love for this is an oldie but goody: "How do you troubleshoot a keyboard?". I love this question because of the possibilities of it and it does give you information on how much a particular person has "played" with computers throughout their lives. Do they even think about PS2? Do they try other devices, computers or combinations? How do they test the keyboard they have in the scenario is good by itself? I get all sorts of answers but rarely does someone give me more than 3-4 attempts before saying it must be a hardware problem and to buy a new keyboard. While I agree, that's a fine answer, I really want someone who is going to exhaust every possibility they can (and maybe the aforementioned candidates are), especially if its to the point where I cant think of anything else they could have done. Now that's a great candidate for the work I do.
Sadly, that's a rare and dying breed. If you're interested in critical thinking type questions, I highly recommend the book "How would you move Mt. Fuji?". Its a classic from 2003/2004: http://www.amazon.com/How-Would-Move-Mount-Fuji/dp/0316778494/ref=sr_1_3?ie=UTF8&qid=1336390776&sr=8-3
Although I haven't read all of it, I hear that the authors book on Google hiring is equally interesting: http://www.amazon.com/Are-Smart-Enough-Work-Google/dp/031609997X/ref=sr_1_2?ie=UTF8&qid=1336390776&sr=8-2
Great post! I think I'd agree, the easier devices get the less we seem to feel the need to investigate why they do what they do, and when they break is there a way we can make them go again. And honestly, any answer that doesn't include using a multimeter as part of a keyboard troublshooting question should really be tossed ;-)
So what happened to your interview.. Did you take up the new role you set out for?
If yes, I hope you continue blogging and the ramblings of a support engineer turned servicing guy continue :)
Thanks for asking Rahul. I actually did accept a new role here at Microsoft but don't worry, the blog will not die. Right now I've just been quiet while we finish Windows 8. I'll have some more posts once we release about the "new hotness" in servicing as well as possibly some other topics.
Great!! So Come August and we will see you write more..
BTW.. I've been on W8 from quite sometime now.. DP, CP and RP.. and its interesting.. Everyone I've shown my RP laptop did not know quite how to use it. I too did not know how things work HATS OFF to make it so different.. Today, I am easily able to do things, that I do on W7, and quite sometime start to hunt for the charms bar to turn off my Win 7 machine :)
Look forward to your posts...!!
If you have specific topics you're interested in, let me know and I'll see what I can do on them.
Can you share any TechNet or via your blog, on how and what has been changed in Servicing, and Clustering.. Any gotchas that we need to know about Cluster Aware Updates, I am keen to know what would happen when I have set-up CAU and one of the nodes gets stuck.. somewhere... how will things revert.. how will things continue?
Sure thing, I'll probably do something along the changes in servicing as my first blog post release. I do discuss some of the changes in the UTG for Servicing that was published back at beta.
"How do you troubleshoot a keyboard?".
As you suggest there are "well known" methods like swapping in "known good" replacements or trying the part on another machine. Then there are system level checks like looking in Device Manager or CMOS Setup. Other than these (good) options, i'd say my personal strategy is to first visualize the component in a heirarchy, or dependency chain. Where does the component "fit in", and what is it dependent on that could be the root cause of the problem. People in technology should understand the technical meaning of 'stack', and why this means that components cannot be regarded in isolation. Also, greater repair efficiency might sometimes be obtained by replacing an entire stack of components, rather than actually tracing a fault to it its root. This would often be true in the case with software - because most stacks are software, and replacing bits is cheap, or at least it should be.
For example, i have a Vista machine that occasionally seems to lose its wireless configuration. There is nothing wrong with Wi-Fi card, and the problem does not occur with Win7 on the same machine. The solution i've found is to uninstall the card device, either in Device Manager, or if i have an elevated cmd.exe open:
> devcon remove =net pci\*
> shutdown -r -t 0
Removing the card also resets any TCP/IP config, and wireless config (profiles) related to the card, because this config is tied to (not just associated) with the card. So removing the device also removes and resets this data. After rebooting, the system PnPs the device, i type in my passphrase and reconnect to the local network. Of course its not fun having to do this but it always works. Also, i never really find out what the cause of the problem is, but should i care?
Interesting take on the question Drew and I like the answer.
Thanks Joseph. So that gives me the confidence to generalize further. :-)
Most technical faults are going to have some sort of tension between repair and replace. If we were talking about the human body, the equivalent tension might be between 'operate on x' and 'transplant x'. In the case of IT, replacement can involve the entire sub-system that the apparently faulty component is part of, as this can avoid having to trace the fault to its root cause - at least immediately. There is also the intermediate option (in some cases), of combining repair and replace - that is, replace onsite and repair offsite. This is often going to be a good option where the aim is to get the end-user/customer back to 100% functional capacity ASAP, and repairing onsite is impractical because it lacks workshop conditions. There is possibly a forth option in some circumstances; retire and add capacity. My understanding of very large datacenters such as the major search engines, is that failed equipment is never repaired nor replaced *specifically*, but is instead simply retired. Continuously added capacity more than compensates for the loss. But this is getting a long way from a keyboard!
What causes the tension between repair and replace? Probably the two primary issues in the case of IT would be:
1. Economics. The relative costs plus time issues. There is also the specialization that is implied by regarding 'restore functionality' and 'find root cause' as discrete problems. This could have efficiency dividends.
2. Threat of or known loss of system state and/or business/end-user data. This would be at least partially dependent on how well the system or relevant sub-system implements the principle of separation of code and data.
Arguably, the following is *not* a good example of separation of code and data:
Of course, this on its own ignores a lot of complications and unintended consequences.
So for this, I would say you're overthinking it a little bit. I see where you're going but you were more on track with the first way of thinking. Understanding HOW something works is the most important key to troubleshooting it. From there, you check the fail points of that system. In this case, you understand how a keyboard works so its all a matter of finding where that breaks down.
One of the things I usually do, which is mostly for the new college grads, is I intentionally plug a PS2 keyboard into the mouse port. Most students don't know about PS2 and assume everything is USB or Bluetooth. This quickly puts people into two camps: they run out of ideas and say "I don't know" or they start making up other reasons. For me, the first response is the right response.
So a year later the blog dies :(
Congratulations for your new role yet again.. All the best.. will look forward to your blogs on askcore when you post...