Another ACPI anecdote, plus footnotes

Bill Sconce sconce at in-spec-inc.com
Sat Jan 22 18:45:01 EST 2005


We've had threads here previously about ACPI. [1] [2] [3]

I had an experience yesterday which I'll share.  Maybe it can save
time for someone.

    Problem:        hang during boot
    Workaround:     "acpi=off"

I was installing Debian, kernel 2.6.8.1, (Ubuntu), on 16 brand-new 
Dell GX280s. [4]

All went well initially.  Keyboard selection, partitioning, etc.,
no problem.  Until the reboot.  Then, after the message which says
"* Setting the System Clock using the Hardware Clock as reference..."
...nothing.  

Hung.  

One of those frustrating situations where there's no
error message.

I tried safe mode, disconnecting the network, etc.  To no avail.
I resorted to Googling [5] for "GX280" + the above message, and
found that evidently there's a bug in the way certain Dell
motherboards handle the real-time clock. [6] [7]

I added "acpi=off" to grub's kernel line and the problem went
away.  The rest of the installation ran smoothly and everything 
on the new system seems to work.  (I know there may be other
problems later! :)  But the hang is fixed.

Moral:

    "acpi=off" is your friend.

-Bill

P.S. What does Intel have to say?  That could be the topic for
another thread. See [8].



_____________________
[1] ACPI. From: Benjamin Scott  1 Sep 2004 00:26:24 -0400

    However, as Mark Komarinski notes, these days, there is
    plenty that happens in "firmware" that is not, technically
    speaking, part of the "BIOS".  Hardware initialization and
    setup code was tiny in the IBM-PC ROM, but is huge today.
    PCI enumeration.  PCI interrupt routing.  ACPI.  ISA PnP.  
    Microprocessor configuration.  Perhaps even FPGA or ASIC
    programming.  All of that is done by the firmware.

    Linux needs it to be done properly.  So if the firmware is
    buggy, Linux may break.

_____________________
[2] ACPI. From: Benjamin Scott  17 Jan 2005 21:58:56 -0500

    The make and model [...] matter because some (perhaps
    many) have different, non-standard, or just plain broken
    power management.
    
_____________________
[3] ACPI. From: Mark Komarinski  31 Aug 2004 09:06:56 -0400

    So I got myself a new IBM X40 from work a few weeks ago
    and under Linux suspend and resume would cause the display
    to just plain freak.  The various HOWTOs suggested that
    it was a result of ACPI not working properly, so I turned
    ACPI off and turned APM on.  No change.  Or rather, instead
    of a corrupted display, I got a REALLY corrupted display.
    One that restarting X would not cure - the only solution
    was to reboot.

_____________________
[4] GX280s. PC numbers stopped making sense long ago.  (Or is
    it just me?)
        Processor: 2.8 GHz
        Bus: 0.8 GHz
        RAM: 1GB
    For "ordinary" classroom workstations.  (!)

_____________________
[5] Googling.  Isn't it standard operating procedure to have
    to "Google around" at least one showstopper to get ANY new
    distribution to work?  :)
    
_____________________
[6] http://www.mail-archive.com/debian-boot@lists.debian.org/msg64063.html

    uname -a:  std debian 2.6.8 kernel [...]
    Machine: Dell Optiplex GX280
    Processor: PIV 2.8Ghz
    Memory: 512Mo
    Root Device: /dev/hda [...]

    Comments/Problems:
    
    The Dell optiplex GX280 are the new professional desktop 
    from Dell for the year to come. Those boxen are USB only 
    (no more ps2 ports) and the HD are sata drives with an IDE
    compatibility mode in BIOS.
    
    This installation was with the BIOS in native SATA mode
    and the linux26 flavour + USB keyboard.  The installation
    went OK until the first reboot where I get stuck in 
    rcS.d/S18hwclockfirst.sh. At this point I didn't have
    any keyboard (module loaded later) either. Pretty messy.
    It seems hwclock needs the rtc module to be loaded to
    do whatever it is expected to do. 

_____________________
[7] http://www.mail-archive.com/debian-boot@lists.debian.org/msg64398.html

    This exact problem is occuring when I install debian to
    my Dell PowerEdge SC420 which like that Optiplex GX280 
    is a new product from dell, only released about a month
    ago. Lucky for me the PowerEdge SC420 has a old fashion
    PS/2 keyboard so I could bypass the hwclock at boot
    by means of the ctrl+c.

    After the install I investigated and found that running
    'hwclock --show' for example would make that command stop,
    which means that it is unable to even read from the hardware
    clock through the rtc module. However when running
    'hwclock --show --directisa' hwclock functinos properly.

    So the conclusion is thus that the rtc module doesn't
    support whatever dell has done on their new motherboards,
    right?

_____________________
[8] What Intel has to say.

    Found by Googling for "8208CA", and then reading Google's
    "cached" copy.  (This stuff isn't easy to find - the original
    has evidently been taken down:
    
    sunsite.rediris.es/sites/download.intel.nl/ design/chipsets/specupdt/290739.htm
                    ^^ Estonia??)
    
    This document may or may not explain 2.8.1 real-time clock
    problem.  But for those of you with a hardware manufacturing
    background, can you IMAGINE shipping a piece of equipment with
    design flaws such as these and  requiring BIOS manufacturers
    to develop and test band-aids for them?  And then requiring OS
    developers to do the same work over again?

    Software timers in BIOS?  In the KERNEL??  Arrrgh.
    
    A chip with an arbitration deadlock which can lock up a bus,
    which they're not going to fix?
    
    And this stuff is used in ... avionics!  (At least by one
    manufacturer.  Not certified for the airlines, fortunately.)
    
    Quotes from Intel's document follow.
    
   "Problem:
    Under certain conditions, a CPU generated I/O read to RTC
    (Real Time Clock) registers 0-9 may return an incorrect
    value. The issue occurs on the read path from the RTC
    registers and the RTC value in the registers is not impacted.
    Should the certain conditions occur, one or more of the bits
    read from the RTC registers may be incorrect. The issue has
    only been found using a synthetic test and has not been seen
    using commercially available software.  [ <= N.B. ]
    
   "Implication:
    An operating system or software applications which 
    synchronizes the time/date value with the RTC <registers may
    get an incorrect value.
    
   "Workaround:
    BIOS workaround is available in the Intel ICH3-S BIOS
    Specification Update Rev 2.01. The workaround does take into
    account multiple CPUs and Hyper Threading. The workaround uses
    timers to zero-in on the window where invalide RTC I/O read
    data could be returned. Using the software SMI, HPET timers 
    and port 71 traps, the BIOS will ensure that there are no
    accesses to the RTC during this timing window.

   "Status:
    There are no plans to fix this erratum."
    
    
   "Problem:
    A setup timing issue exists in the ICH3's impedance compensation
    circuit which can cause the USB buffers to shut off, leading to
    missed USB transmit packets.
    
   "Implication:
    This is likely to result in data loss or may cause data corruption.
    
   "Workaround:
    A BIOS workaround has been validated.
    
   "Status:
    There are not plans to fix this erratum."
    
    
   "Problem:
    An arbitration deadlock in the ICH3 may occur if IDE traffic
    is combined with heavy graphic traffic and internal/external
    PCI Bus Master traffic to memory.
    
   "Implication:
    This issue may lockup the IDE bus master causing a system hang.
    This issue was found during ongoing internal validation using 
    a synthetic test environment and there have been no failures
    reported by customers.
    
   "Workaround:
    BIOS needs to set configuration register (Device 31, Function
    1 0, offset FCh, bit 23) to prevent the arbitration deadlock.
    
   "Status:
    There are no plans to fix this erratum."
    
    
The document contains this notice, which may be the best part,
on Page 1.  
    
   "Notice: The Intel I/O Controller Hub 3 (ICH3-S) product
    may contain design defects or errors known as errata
    which may cause the product to deviate from published
    specifications. Current characterized errata are
    documented in this specification update."
    
Yeah, that's the ticket.  Call a design defect an "erratum"
and amend the specification to match the bugs.

Kernel hackers really are gods - how can they write a driver when the
hardware specs are this rubbery?  How does ANYONE test a driver for
correct handling of a "setup timing issue" in impedance compensation?



More information about the gnhlug-discuss mailing list