So at a customer location the following question was posed to me:
“In our VDI environment, how do we capture trace information? How can we set a capture for the network and xperf the user can start (or logon script the start) and then give them a link to click when a random non-reproducible problem occurs?
So what do we want, data wise? A netmon 3.4 consumable trace, and an xperf trace including stackwalking.
So I want to use the following commands:
netsh trace start scenario=LAN,RPC capture=yes report=yes tracefile=<path to file\netmon.etl> maxSize=512 fileMode=circular overwrite=yes
(this will collect a network trace in ETL format (Netmon 3.4 can read this), generate an HTML report, trace to a circular logging etl file in a directory you specify and it cannot grow larger than 512 meg. It will overwrite an existing log file to create a new one if need be)
xperf –on disk_io+dispatcher+latency –f <path to file\xperftrace.etl> –MinBuffers 1024 –MaxBuffers 1024 –MaxFile 1024 –FileMode Circular –stackwalk cswitch+readythread+threadcreate+profile
(this will collect an xperf trace to the path specified, buffering a bit of memory, restricting the file size of the xperf trace to 1GB and again it is a circular log, with stackwalking enabled)
We place these two in a batch file, have the batch file run as administrator when opened, and the user is educated to double click this at logon, we place it in the logon script, etc. Delivery method doesn’t really matter as long as it happens before the user starts working.
When/if the problem reproduces, we run a second batch file that also runs elevated that has the following commands:
netsh trace stop
xperf –d <path to file\merged.etl>
Note: There is a caveat. If you are tracing (stackwalking in particular) on 64 bit Windows, you must set “DisablePagingExecutive” in the registry to 1. This command will do that:
reg add “HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management” /v DisablePagingExecutive /d 0x1 /t REG_DWORD /f
Then we can have the user notify us a repro of the issue occurred, collect the files that were logged and analyze .