Useful tools and techniques to monitor system performance on a Windows computer:
1- configure perfmon to capture data in a blg file format (using logman utility and task scheduler)
2- use the perform flowchart (VSBS document)
3- create a report using the VSBS powerpoint template
4- alternatively use also sysinternal tools, Server Performance Advisor
Which Tools?
Xperf, Xperfview (Win7 and greater): available from Windows ADK
Perfmon (NT up to 2003) : make performance monitor output file in .blg format; but load this output log file on the new Win7 perfmon

Note:

For more information on mmc.exe /32, refer to http://support.microsoft.com/kb/891238/en-us.Use sample rate: 5 min. Use a perfmon alert to notify IT admins; ex: Available MBytes reach 10MB or below

Performance and reliability monitor (evolution of perfmon) for Vista or greater (also called WRPM):
 perform /sys ; starts only the performance monitor, formerly system monitor
 perfmon /res ; starts only the resource monitor
 perfmon /report ; starts only the diagnostic report for 60sec and display the results
 perfmon /rel  ;  starts only the reliability monitor
  

Reliability Monitor Helps in historical tracking of software installation and un-installation, and miscellaneous failures over time

For more information on how to use Reliability Monitor to track multiple systems, refer to the following links:

– Using Reliability Monitor: http://technet.microsoft.com/en-us/library/cc722107.aspx

– Start Reliability Monitor: http://technet.microsoft.com/en-us/library/cc748864.aspx

– View Reliability Monitor on a Remote Computer: http://technet.microsoft.com/en-us/library/cc722052.aspx

– Enable Data Collection for Reliability Monitor: http://technet.microsoft.com/en-us/library/cc766393.aspx

– Understanding the System Stability Index: http://technet.microsoft.com/en-us/library/cc749032.aspx

How to rebuild perform counters?

– for XP up to 2003 use fist method or a ‘new’ dedicated tool called: performance counter rebuild wizard (PCRW)

Note:

For more information on KB300956, visit http://support.microsoft.com/default.aspx?scid=kb;EN-US;300956.

– for Vista or above: use lodctr command tool only

Sysinternal tools (www.microsoft.com/sysinternals): procexp; procmon are the two most important tools
typeperf : to extract perform counters in a txt file (used in conjonction of logman);Note:For more information about Typeperf, visit http://technet.microsoft.com/en-us/library/cc753182.aspx.
logman : command line utility of perfmon;
tracerpt : to export .etl in CSV

  Example: Logman create counter BlackBox -v mmddhhmm -cf counters.txt -si 05:00 -f bincirc – o “c:PerflogsBlackbox_%computername%” -max 250

relog : command line tool to re-sample or extract portion of perfmon file (blg …)

 Example: Relog SQL:<DSN-name>!<LogSetName> -f bin -o <output.blg> ;  check this blog to discover over explanations; http://blogs.technet.com/b/richard_macdonald/archive/2008/04/08/3032386.aspx

performance analysis of logs, PAL v2 (how to analyze perfmon log files): http://pal.codeplex.com/releases/view/51623
PAL requires:
   .net framework 2.x
Server Performance Advisor: SPA (w2k3 SP2 or R2); complementary to Perform
Responsiveness
• Response times
• Failing requests
• Hung application
Resource Usage
• Rogue clients
• Bad scripts
• Out of resources
Tuning and configuration
• Incorrect cache size
• Password expiration policy
• Not enough dynamic ports

Note:

SPA is built into Windows Vista and Windows Server 2008 and does not need to be installed (new data collector set on Performance and Reliabiliy Monitor). Microsoft Windows 2000 or Microsoft Windows XP are not supported.

taskman (Task manager)
debugging tools and symbols configuration (windbg) or procexp: http://www.microsoft.com/whdc/devtools/debugging/default.mspx

 

How to script perfmon?

The following links provide additional information about script deployment methods for perfmon:

– http://technet2.microsoft.com/WindowsServer/f/?en/Library/46938289-edb5-468a-b03f-4e5985bf8fca1033.mspx

– http://technet2.microsoft.com/WindowsServer/f/?en/Library/e7b81ac6-23d3-434f-b33a-e940caf5c1a81033.mspx

– http://blogs.technet.com/richard_macdonald/archive/2008/04/08/3032386.aspx

How to use the Microsoft Symbol Server?

1. Make sure you have installed the latest version of Debugging Tools for Windows.
2. Start a debugging session.
3. Decide where to store the downloaded symbols (the “downstream store”). This can be a local drive or a UNC path.
4. Set the debugger symbol path as follows, substituting your downstream store path for DownstreamStore.SRV*DownstreamStore*http://msdl.microsoft.com/download/symbols

For example, to download symbols to c:websymbols, you would add the following to your symbol path:
SRV*c:websymbols*http://msdl.microsoft.com/download/symbols

Key OS Performance counters?
Examples:

1- Exchange servers that are exceed kernel paged pool memory due to token use and are slow or even hang at an OS level.

2- SQL servers with slow performance due to disk speeds in the 50ms or higher range (.050) during the reported problem interval.

3- File/Print Servers that perform unacceptably slow usually due to Kernel Non-Pool paged being exceeded by the Server Service, or Disk performance worse than .25ms (.025) response time.

Key OS Performance Metrics Counter Guidelines(Sustained or during captured problem interval)
Logical Disk/Physical DiskBoth are to be captured and monitored due to today’s virtualized disk environments.%idle is a reasonable indicator of disk interface pressure.Note: On very large SAN’s offering a LUN totaling (100’s of drives) can have 0% idle and still be okay, but if we see normal levels and then 0% at the precise time of reported problems, then treat it as a valid issue %idle- 100% idle to 50% idle = Healthy- 49% idle to 20% idle = Warning or Monitor- 19% idle to 0% idle = Critical or Out of Spec%Avg. Disk Sec Read or Write

– 1ms to 15ms = Healthy (10ms for Exch/AD, 15ms for SQL)

15ms to 25ms = Warning or Monitor (25ms in general) – 26ms or greater = Critical or Out of Spec

Avg or Current Disk Queue Length

2 or under = Healthy 3-31 = Warning or Monitor. A possible issue, check read or write latency to confirm. 32 or higher = A likely issue, check read or write latency to confirm.

Memory– * = Windows does not have the ability to report maximum pool values in the OS without a debugger attached. – Please see the appendix chart document get an approximate maximum pool size given the amount of physical memory and boot.ini switches used. Note: Hotplug memory, Special Pool debug flag, Having 6GB or more but using the /Maxmem boot.ini switch to force 4GB of recognized memory (usually on Exchange servers) can reduce Pool Paged Bytes by as much as 100MB, so the appendix chart document is good estimation, but it should be understood that these maximum can vary a little depending on the server configuration in hardware and boot.ini switches chosen. Free System Page Table Entries- Greater than 10,000 free = Healthy– 9,999 to 5,000 free = Monitor – 4,999 or below = Critical or Out of SpecPool Non Paged Bytes*- Less that 60% of pool consumed=Healthy

– 61% – 80% of pool consumed = Warning or Monitor.

Greater than 80% pool consumed = Critical or Out of Spec. Pool Paged Bytes*

– Less that 60% of pool consumed=Healthy – 61% – 80% of pool consumed = Warning or Monitor.

Greater than 80% pool consumed = Critical or Out of Spec. Available Megabytes

50% of free memory available or more =Healthy

25% of free memory available = Monitor.

10% of free memory available = Warning – Less than 100MB or 5% of free memory available = Critical or out of spec

Pages per Second (4k per page, so 1000pps=4MB/sec)

– Less that 1000 pages/sec

sustained = Healthy– 1000-2500 pages/sec sustained = Caution or Monitor.

– Greater than 2500 pages/sec (

10.24MB/sec) sustained = Warning.– Greater than 5000 pages/sec peak = Warning to Critical and should be investigated.

Note regarding Hyper-V and performance:Note:For more information, refer to the Measuring Performance on Hyper-V article at http://msdn.microsoft.com/en-us/library/cc768535.aspx.
ProcessorAt this point it becomes important to identify the process, service, or driver causing the workload. %Processor Time (all instances)- Less than 60% consumed = Healthy- 51% – 90% consumed = Monitor or Caution- 91% – 100% consumed = Critical or Out of Spec%processor time + % idle time = 100%

%processor time = (%user time + %privileged time); %user time used by applications, %privileged time used by the kernel/system part

Important:

Sum of %processor time per process object = %user time

Network Interface-Due to electrical signaling limitations we do not expect to exceed 80% throughput on any bus. So if we see 80% of the interface consumed on either received or send, we expect to see the link saturated. Using the rule of thumb that we do not want to operate at > 80% of planned capacity, a maximum threshold of ~64% (80% * 80%) is the guideline for received and send, evaluating each independently. Current Bandwidth*Instances- Note bandwidth for calculation (100Mb, 10Mb, 1000Mb, etc)- Remember that Ethernet is approximately 80% usable due to collision, etc – so the usable ceiling is up to 80% of the interface as a typical guideline since we cannot guarantee a pure switched environment for all customers in all scenarios, all of the time. See note1Bytes Total/sec-

Less than 40% of the interface consumed = Healthy– 41%-64% of the interface consumed = Monitor or Caution.

– 65-100% of the interface consumed = Critical or Out of Spec

Output Queue Length

– 0 = Healthy

– 1-2 = Monitor or Caution.

– Greater than 2 = Critical or Out of Spec

ProcessThis is to detect possible leaks by applications. <process>Handle Count- If this process instance has greater than 500 handles it should be examined over time to see if it is legitimate allocation & de-allocation, or if it is a leak pattern over time
Private Bytes guideline rationaleThe goal is to catch many of the smaller memory footprint services (WinMgmt, SVCHost, or 3rd party applications) before they consume too many resources and begin to cause serious performance or stability issues.LSASS on an Active Directory DC, Exchange Server’s Store.exe, and SQL server’s Sqlserv.exe are expected to be 1+GB values and will need their own ruleFor servers with BackOffice or Large memory applications it may be better to create 2 Perfmon alert, MoM, or NetIQ rules for Private Bytes.The first rule should examine all processes except the very large memory applications, and then the second rule is adjusted for observed levels for the application greater tan 250MB as an average.

.

<process>Thread Count – If this process instance has greater than 500 threads it should be examined over time to see if it legitimate allocation & de-allocation, or if it is a leak pattern over time.<process>Private Bytes- If this process instance has greater than 250MB of use it should be examined over time to see if it is legitimate allocation & de-allocation, or it is a leak pattern over time.-

Note: Private bytes are not related to pool bytes in any way but very commonly code paths within an application that leak private bytes may leak pool bytes as well. This counter is a key in looking for the source of pool leaks.

Private Bytes is used instead of Working Set since a Private Bytes leak can be difficult to detect using the Working Set object because Private Bytes leaks can be paged out, etc.

<process>Working Set

– If this process instance has greater than 250MB of use it should be examined over time to see if it is legitimate allocation & de-allocation, or it is a leak pattern over time.

 

Note1:
Threshold for switched networks and latency tolerant
applications:
• < 30 percent: Low utilization
• 30 to 60 percent: Significant utilization
• > 60 percent: High utilization
Threshold for shared networks and latency sensitive
applications:
• < 30 percent: Normal utilization
• > 30 percent: High utilization

The performance counters available to measure the Network Interface object are expressed in a mix of bits and bytes. To convert this value into a utilization percentage, you can use the following formula:

( ( “Bytes Total Per Second” * 8) / “Current Bandwidth” ) * 100

For the purpose of this workshop, a Windows Powershell function called Get- NICUtilPercent has been prepared to aid in this calculation. To use this function, copy and paste it (from the appendix of this module) into a Powershell command prompt. Then, run the following command by typing Get-NICUtilPercent followed by the value for Bytes Total / Sec followed by the value for Current Bandwidth, as shown below:

Get-NICUtilPercent -bytesTotal 6250000 -bandwidth 11000000

You can also shorten the command shown above as follows:

Get-NICUtilPercent 6250000 11000000

In either case, the command will return a percentage string, such as the one shown below:

45 percent

Because many NICs run at common speeds, the Get-NICUtilPercent function can accept