"Fatal PCI Express Device Error" HP Proliant Server

Wednesday, December 15, 2010

Issue : HP Proliant Server may fail to boot the operating system and will display the following message after an AC power-cycle.
Integrated Management Log in ilo will log the error message "uncorrectable PCI express Error (Embedded Device, Bus 0, Device 0, Function 0, Error Status 0x00000000)
This only occurs if all of the following criteria are met:

    * Hard drive temperature monitoring is active on the system.
    * The QPI speed is changed from 6.4 GT to 4.8 GT.
    * The system memory size is reduced below 4 GB on a system that previously ran with more than 4 GB of memory.
    * The system is powered OFF and then immediately powered ON.

The hard drive temperature monitoring mechanism leverages system memory for drive health information sharing. The mechanism generates the message when the system memory size is reduced to less than 4 GB after the system has being running with more than 4 GB of system memory.

Solution : The firmware for  ilo need to be upgraded to 2.00.  As of now there is not solution available for this. 

As a work around, we can remove the memory greater than 4GB, and then install the hotswap memory after the server has completed POST.

Also check the HP Advisory:

0 comments:

Post a Comment