Wednesday, 5 August 2015

How to check your CPU cooler using software

Is your heatsink correctly seated onto your CPU? Is the cooling system up to the job?

How would you check that it is correctly fitted and is working correctly?

Many people fall into the trap of measuring the CPU temperature to check that their heatsink and heatsink fan are working correctly. There are many software utilities that will check the internal CPU temperature, such as CPU-Z or RealTemp. For Intel CPUs there is also the Intel Extreme Tuning Utility but this needs to be installed (.Net and drivers) and the system must be restarted.

Intel Extreme  Tuning will run a stress test and show you if thermal throttling is occurring (in my case this happened once 100 deg C was reached).

Testing with RealTemp

If you want to test your CPU cooling, you need to monitor the CPU temperature and stress the CPU cores. To do this you can download and extract Real Temp and the Prime95 files to a new folder (I use a folder on my E2B USB drive so that I can use it on other systems). Both of these are portable apps.

I then run RealTemp and start a Prime95.exe  'Torture Test' using the Prime95 'In-place large FFTs' test for maximum power consumption = maximum heat.

The 'Thermal Status' box shows that two cores reached TJMax during this short test.

RealTemp shows the 'Distance to TJMax' for each core. TJMax is the maximum junction temperature that the core is specified to work at. If the temperature is exceeded, then the CPU will usually automatically throttle back or even shutdown.

WARNING: If you have overclocked your CPU, increased the core voltages or have a faulty cooling system, there is a small possibility that this test could permanently damage your CPU.

Under normal (non-stressed) circumstances, TJMax should not be reached (i.e. the 'Distance to TJMax' should never reach 0 or go negative!).

Why is just looking at the CPU temperature a trap?

Just because the CPU temperature looks to be well below TJMax, does not mean you don't have a problem!

Let us think about what happens when your CPU starts to work harder and starts to get hotter...

Because the temperature of the CPU is monitored, as it starts to get hotter, the CPU heatsink fan spin-speed will start to increase and it will try to reduce the temperature of the CPU.

The net result is that the CPU temperature actually may not change very much, but the heatsink fan spin-speed increases dramatically!

Whilst at RM, I wrote and introduced factory test software which logged various internal parameters of the new PC systems and notebooks that were being built on the production line. CPU temperature and fan speed were amongst the parameters that were recorded (as well as others such as internal system temperature, case fan speed, hard disk temperatures, PSU and mainboard VRM voltages, HDD SMART data, measured CPU core speed, etc.).

We often produced over 1000 systems a day and all systems automatically ran this new test software during their 'burn-in' tests. Obvious faults such as no fan being detected or the CPU(s) approaching TJMax, caused immediate test fails.

There were MANY failures during the first few weeks and all the production leaders were swearing at me, until I proved to them that they were actually building the systems incorrectly! The most common fault was where the assembly-line worker had connected the case fan to the wrong mainboard header (or not at all!). On the other hand, PSU and VRM voltage failures were rare. Some systems also failed as they approached the TJMax of whatever CPU was fitted - this was usually due to assembly errors, such as not removing the protective plastic film when fitting the heatsink to the CPU.

I also analysed the data (imported into an Excel spreadsheet) each day  to look for patterns, and I found that a small percentage of systems had much higher CPU fan speeds than the others - this did not seem to be related to the particular mainboard, case or model of CPU.

When these systems were disassembled and investigated (before they were shipped to customers), it turned out that although the CPU temperatures never reached the TJMax limit, the CPU heatsink fans were all spinning their heads off because the heatsink was not clipped down and seated correctly onto the CPU thermal plate or because of too much or too little thermal paste (if used).

Even though these systems passed the stress test and CPU TJMax was never reached even when running Prime95, there was a good chance that the heatsink assembly might fall off during shipping or the thermal paste would later dry out. At the very least, the customer might complain that it was 'noisy' due to the maxed-out CPU fan!

The test software was 'tweaked' to look for these anomalies of a faster than expected CPU fan speed when under stress (taking into account PC case, ambient temperature, CPU model, mainboard model, number of case fans fitted, number of hard disks, graphics card type, etc.) and the Dead-on-Arrival (DOA) rate which was already very low, fell by 50%!

So how should I check my system?

To check your CPU cooling system, you need to run a 'Torture test' such as Prime95 and monitor the CPU heatsink fan(s) as well as the CPU core temperature.

One way to do this is to also run a utility such as SpeedFan or OpenHardwareMonitor at the same time as running Prime95. If you have an Intel CPU, The Intel Extreme Tuning Utility is useful because it shows you when thermal throttling occurs (which will happen once TJMax is reached).

We can see that the CPU Fan (Fan #2) is working at 100% and the CPU core is at 87.5 deg Centigrade (TJMax=100). The heatsink+fan fitted to this system was not faulty, but it shows it is barely adequate for high-stress scenarios.

If you are not sure which fan is which, stop it gently with your finger for a second or so (but no more!). If the fan is spinning at it's maximum speed, then it is obviously struggling to provide adequate cooling.

We also need to keep an eye on the CPU Core frequency (4007MHz) and CPU voltage. The onboard circuitry or the CPU itself could 'throttle-back' these automatically if the CPU gets too hot.

If you are a gamer or you stress your CPU a lot, you may want to check your CPU fan speed and duty cycle using these techniques.

Note that the tests I describe do not take into account the ambient temperature of the room or the temperature inside the case which can be raised by making other components such as the graphics adaptor and hard disks work harder. If the internal case temperature is raised by 10 degrees by these factors, and the CPU heatsink fan is already running at 100%, then the CPU is going to get 10 degrees hotter too!
For a real stress test, you therefore need to stress your hard disks, memory and graphics adaptor whilst running the Prime95 test in a 35 deg C room (a PC sitting in direct sunlight on a hot Summer day can get quite hot)!