Friday 17 July 2020

re. Fixing faulty computers

This is just a quick blog post to say 'Hi'. I have been busy doing other things recently (like playing with my new IODD Mini SSD - Amazon link which is performing very well) so I have not been spending much time on E2B.

In my few moments of free time, I have been enjoying watching Adamant IT  repair shop YouTube videos which are quite entertaining. He has videos on 'live' repair and also 'live' PC builds as well as reviews, etc.

Although I have retired from repairing/building/developing PCs and Notebooks now, unless they have changed a lot in the last 6 years or so, I thought I would go through what I tended to do to diagnose and fix PCs\Notebooks.



1. History
Learn all you can about the previous history of the computer. Symptoms, recent changes, etc.
Get any username and passwords (BIOS and OS).
Customer contact details.
If the user has replaced the CPU, VGA or RAM recently, ask for the old ones too if possible.
If laptop, ask for power brick.

2. Customer expectations
What does the customer want you to do, timescales, maximum spend, etc.
Anything important on the computer?
Do they want me to make a backup?
Any upgrades performed or parts swapped by user?
Disinfect required?
etc.

3. Record details
If the computer powers up, I make notes of the current BIOS settings and Windows settings (e.g. drive partitions, drive letters, contents). If a hard disk is replaced, there may be links and shortcuts in the current OS which reference a different drive letter and so it is best to preserve the drive letter assignments for any new/added drive.
Clues as to whether the system MBR or UEFI-boots or Secure UEFI-boots, etc.
Take photos before and during disassembly.

Check that the main components are correct. Check that the CPU and DIMMS that are fitted are actually correct for that mainboard\notebook. Keep in mind that the BIOS version may not be correct for the CPU currently fitted.

4. Diagnose the fault(s)

Most of the Adamant IT videos show him diagnosing computers by semi-random part replacement. Many of the computers will not 'POST' or will not boot to the OS.

However, it is useful to understand how a BIOS works in order to arrive at a quicker diagnosis...

Power On Self Test (POST)

Apart from a few very basic tests, the first task of the BIOS POST is to try to identify the type of memory fitted, then program the memory controller so that it sets the correct bank arrangement and timing parameters for that memory and then performs a brief test of the memory to ensure that it is working.

The BIOS needs working RAM before it can do anything else, so no matter whether the BIOS is in a PC, notebook or mobile phone, this will always need to be done.

Now different BIOSes do this in different ways. Some can auto-detect the type of memory fitted and automatically program the chipset control registers correctly, whilst others may simply look at the BIOS settings which have been set by the user (usually held in EEPROM these days rather than the 'CMOS' non-volatile battery-backed memory).

Now if the information stored in the CMOS/EEPROM is incorrect or corrupt, this can have disastrous consequences. The computer may produce an error message (e.g. beep) or crash or simply turn itself off (to prevent damage).

Most BIOSes signal faults to the user in three ways:

1. POST codes - these codes are written to an I/O port (usually I/O port 80h). As the BIOS code performs different tasks, then different numbers are written to this port during POST.
Most professionals use a POST LED board (such as this one - Amazon link). At its simplest, you just plug it in and turn on the computer. If you see numbers on the display then the CPU is alive. The numbers will change as the POST code performs different tests. If the numbers stop then it indicates that the CPU has hung at that point. However, it is often impossible to know what the numbers mean since most manufacturers do not provide this information.

2. BEEP codes - during startup or on finding an error, all BIOSes should beep the beeper/speaker (Amazon link) which may or may not already be fitted. Some notebooks will not have a beeper fitted or the internal speakers will not be connected to the I/O port used to generate the 'beeps' - in this case you will need to use other clues to diagnose the problem!


3. DISPLAY - if you connect a monitor, you may see a display of some sort if the POST tests get past the initial memory+other diagnostic tests. Error messages can be displayed (providing you connect the monitor to the correct video connector!).

Some computers may have integrated graphics which will use part of the main memory - so before anything can be displayed on the screen, the memory system (DIMMS etc.) must be working.

No display? - simple test - remove the DIMMs!

If the computer is not producing any type of display when switched on, I find it useful to try to determine if the CPU is actually running. The easiest way to do this is to fit a 'beeper/speaker' to the mainboard (if not already fitted) and then remove all the DIMMs.

Because the first thing a computer will do is setup and test the RAM (until the CPU has working RAM it can't even call a subroutine by using the stack to store the return address!), it will find there is no memory and you should hear the speaker beep (note that only a 'beeper' will beep - you will not hear any sound through your PC audio output speakers). Check the mainboard manual to ensure that the beep code is the correct one for 'no RAM'.

If I don't hear any beeps when I remove the DIMMs then I know that something is very wrong!

If I do not hear any beeps, then I remove the CPU heatsink and feel the CPU. If it is warm then it has at least got power. If it is cold then it may be a power issue. If the CPU fan (or any fan or LED) shows signs of life, then we do have power for at least a short time.

Before proceeding any further, I will reset the BIOS settings to their defaults. You should consult the mainboard manual for how to do this. Note that some mainboards require you to jumper two pins and then switch on the power - then switch it off - then remove the jumper. Other mainboards require you to briefly place a jumper on two pins and then remove it WITHOUT APPLYING POWER. Some mainboards are badly designed and you can cause damage if you apply power whilst the 'BIOS RESET' pins are shorted - so READ THE MANUAL!

The other thing you should do is remove the small CMOS coin-cell battery and wait 5 minutes for the internal capacitors to discharge. If you have a notebook, you should remove the main Lithium battery first and disconnect it from the mains - then press the power button to discharge the internal chips. Some notebooks have a separate internal battery for the RTC\CMOS - you will have to disassemble it to unplug this!

Tip: An easy way to check for a flat CMOS Real Time Clock battery is to set the correct time in the BIOS Setup menu. Then unplug the power cable - wait a few minutes - then plug in the power cable and check the time in the BIOS Setup menu again. The CMOS contents may survive but the clock requires more power and will not tick if it has a flat battery - if the time has lost 2 minutes then you know the battery needs replacing.

If no beeps are heard, then my next step is to disconnect as many leads as possible that are connected to the mainboard. All peripherals, PCI cards, USB ports, connectors, etc. are unplugged. Basically I just have  PSU + Mainboard + CPU. This is because one of these may be shorting out something on the mainboard.

It is not a problem to run the CPU for a short time without a heatsink or fan - just don't let it get hot!

If I still get no beeps, then I will try another PSU. Tip: if none of the fans move/twitch when you press the power button - suspect a duff PSU and maybe get out your multimeter and check the 5V Standby power rail (purple wire)!

If I still get no beeps, then I will remove the mainboard and test the PSU+CPU+mainboard combination on my desk. This is because sometimes a metal pillar or some other metal object may have been underneath the mainboard when in the chassis and it was shorting out something!

Again, I feel the CPU with my fingers to see if it is warm and now I only have two things to test by substitution - the mainboard and the CPU (I am using a known good PSU).

Once I have a working 'core' of  PSU+CPU+mainboard then I can add CPU heatsink\fan and RAM.

Another common problem is that the DIMMs are fitted into the wrong slots, are the wrong type or are faulty.

Once you have working RAM, you can now add a graphics card (if required) and see if you have a display. Now you can transfer the working 'core' to the PC case and start connecting up the other cables and peripherals.

Remember to set up the BIOS settings correctly (esp. MBR\UEFI options and IDE\AHCI SATA\RAID options!).

P.S. As Glen says in the comments, on old systems, check the CMOS coin cell battery voltage!

Good luck!

4 comments:

  1. Thanks for the article!
    Note that the link for the POST LED board leads to the YouTube channel :)

    ReplyDelete
  2. An excellent article. I have followed the basic process you described for many years with much success, but you added some good points for me to add to my process. Thanks!

    I'll offer one added step that has saved me many hours of frustration when attempting to solve stubborn diagnostic problems. Many times, when a system just isn’t “right”, but I can’t nail down anything specifically wrong this simple step will solve the mystery:

    Check the CMOS battery.

    I have found that a CMOS battery that is below the normal 3 volt level, but not yet dead, will often cause erratic or obscure errors that defy logic or reason. The lower the voltage the more likely there will be problems.

    Note, checking the CMOS voltage in BIOS (for those systems that report it) does not reflect the actual battery voltage. Remove the battery and test it with a good meter. It does not need to be load tested because, for all practical purposes, there is no “load” on a CMOS battery. That is why they last so long.

    I have had techs give me “defective” system boards that they were ready to throw away as non-useable and the only problem was the CMOS battery. After changing the battery, the system ran flawlessly. Fast, easy, and cheap fix.

    I hope this tip will help many others along the way…


    ReplyDelete
    Replies
    1. Yes, good point, I also check the date is correct in the bios menu and that the clock is ticking. A date of Jan 1 1980 indicates a flat battery also. Some OS's will not boot if the date is 1980!
      Another tip is to not to get grease from your fingers across the battery ends because it is slightly conductive, it's the same when fitting watch batteries!

      Delete