Another ACPI anecdote, plus footnotes
Bill Sconce
sconce at in-spec-inc.com
Sat Jan 22 18:45:01 EST 2005
We've had threads here previously about ACPI. [1] [2] [3]
I had an experience yesterday which I'll share. Maybe it can save
time for someone.
Problem: hang during boot
Workaround: "acpi=off"
I was installing Debian, kernel 2.6.8.1, (Ubuntu), on 16 brand-new
Dell GX280s. [4]
All went well initially. Keyboard selection, partitioning, etc.,
no problem. Until the reboot. Then, after the message which says
"* Setting the System Clock using the Hardware Clock as reference..."
...nothing.
Hung.
One of those frustrating situations where there's no
error message.
I tried safe mode, disconnecting the network, etc. To no avail.
I resorted to Googling [5] for "GX280" + the above message, and
found that evidently there's a bug in the way certain Dell
motherboards handle the real-time clock. [6] [7]
I added "acpi=off" to grub's kernel line and the problem went
away. The rest of the installation ran smoothly and everything
on the new system seems to work. (I know there may be other
problems later! :) But the hang is fixed.
Moral:
"acpi=off" is your friend.
-Bill
P.S. What does Intel have to say? That could be the topic for
another thread. See [8].
_____________________
[1] ACPI. From: Benjamin Scott 1 Sep 2004 00:26:24 -0400
However, as Mark Komarinski notes, these days, there is
plenty that happens in "firmware" that is not, technically
speaking, part of the "BIOS". Hardware initialization and
setup code was tiny in the IBM-PC ROM, but is huge today.
PCI enumeration. PCI interrupt routing. ACPI. ISA PnP.
Microprocessor configuration. Perhaps even FPGA or ASIC
programming. All of that is done by the firmware.
Linux needs it to be done properly. So if the firmware is
buggy, Linux may break.
_____________________
[2] ACPI. From: Benjamin Scott 17 Jan 2005 21:58:56 -0500
The make and model [...] matter because some (perhaps
many) have different, non-standard, or just plain broken
power management.
_____________________
[3] ACPI. From: Mark Komarinski 31 Aug 2004 09:06:56 -0400
So I got myself a new IBM X40 from work a few weeks ago
and under Linux suspend and resume would cause the display
to just plain freak. The various HOWTOs suggested that
it was a result of ACPI not working properly, so I turned
ACPI off and turned APM on. No change. Or rather, instead
of a corrupted display, I got a REALLY corrupted display.
One that restarting X would not cure - the only solution
was to reboot.
_____________________
[4] GX280s. PC numbers stopped making sense long ago. (Or is
it just me?)
Processor: 2.8 GHz
Bus: 0.8 GHz
RAM: 1GB
For "ordinary" classroom workstations. (!)
_____________________
[5] Googling. Isn't it standard operating procedure to have
to "Google around" at least one showstopper to get ANY new
distribution to work? :)
_____________________
[6] http://www.mail-archive.com/debian-boot@lists.debian.org/msg64063.html
uname -a: std debian 2.6.8 kernel [...]
Machine: Dell Optiplex GX280
Processor: PIV 2.8Ghz
Memory: 512Mo
Root Device: /dev/hda [...]
Comments/Problems:
The Dell optiplex GX280 are the new professional desktop
from Dell for the year to come. Those boxen are USB only
(no more ps2 ports) and the HD are sata drives with an IDE
compatibility mode in BIOS.
This installation was with the BIOS in native SATA mode
and the linux26 flavour + USB keyboard. The installation
went OK until the first reboot where I get stuck in
rcS.d/S18hwclockfirst.sh. At this point I didn't have
any keyboard (module loaded later) either. Pretty messy.
It seems hwclock needs the rtc module to be loaded to
do whatever it is expected to do.
_____________________
[7] http://www.mail-archive.com/debian-boot@lists.debian.org/msg64398.html
This exact problem is occuring when I install debian to
my Dell PowerEdge SC420 which like that Optiplex GX280
is a new product from dell, only released about a month
ago. Lucky for me the PowerEdge SC420 has a old fashion
PS/2 keyboard so I could bypass the hwclock at boot
by means of the ctrl+c.
After the install I investigated and found that running
'hwclock --show' for example would make that command stop,
which means that it is unable to even read from the hardware
clock through the rtc module. However when running
'hwclock --show --directisa' hwclock functinos properly.
So the conclusion is thus that the rtc module doesn't
support whatever dell has done on their new motherboards,
right?
_____________________
[8] What Intel has to say.
Found by Googling for "8208CA", and then reading Google's
"cached" copy. (This stuff isn't easy to find - the original
has evidently been taken down:
sunsite.rediris.es/sites/download.intel.nl/ design/chipsets/specupdt/290739.htm
^^ Estonia??)
This document may or may not explain 2.8.1 real-time clock
problem. But for those of you with a hardware manufacturing
background, can you IMAGINE shipping a piece of equipment with
design flaws such as these and requiring BIOS manufacturers
to develop and test band-aids for them? And then requiring OS
developers to do the same work over again?
Software timers in BIOS? In the KERNEL?? Arrrgh.
A chip with an arbitration deadlock which can lock up a bus,
which they're not going to fix?
And this stuff is used in ... avionics! (At least by one
manufacturer. Not certified for the airlines, fortunately.)
Quotes from Intel's document follow.
"Problem:
Under certain conditions, a CPU generated I/O read to RTC
(Real Time Clock) registers 0-9 may return an incorrect
value. The issue occurs on the read path from the RTC
registers and the RTC value in the registers is not impacted.
Should the certain conditions occur, one or more of the bits
read from the RTC registers may be incorrect. The issue has
only been found using a synthetic test and has not been seen
using commercially available software. [ <= N.B. ]
"Implication:
An operating system or software applications which
synchronizes the time/date value with the RTC <registers may
get an incorrect value.
"Workaround:
BIOS workaround is available in the Intel ICH3-S BIOS
Specification Update Rev 2.01. The workaround does take into
account multiple CPUs and Hyper Threading. The workaround uses
timers to zero-in on the window where invalide RTC I/O read
data could be returned. Using the software SMI, HPET timers
and port 71 traps, the BIOS will ensure that there are no
accesses to the RTC during this timing window.
"Status:
There are no plans to fix this erratum."
"Problem:
A setup timing issue exists in the ICH3's impedance compensation
circuit which can cause the USB buffers to shut off, leading to
missed USB transmit packets.
"Implication:
This is likely to result in data loss or may cause data corruption.
"Workaround:
A BIOS workaround has been validated.
"Status:
There are not plans to fix this erratum."
"Problem:
An arbitration deadlock in the ICH3 may occur if IDE traffic
is combined with heavy graphic traffic and internal/external
PCI Bus Master traffic to memory.
"Implication:
This issue may lockup the IDE bus master causing a system hang.
This issue was found during ongoing internal validation using
a synthetic test environment and there have been no failures
reported by customers.
"Workaround:
BIOS needs to set configuration register (Device 31, Function
1 0, offset FCh, bit 23) to prevent the arbitration deadlock.
"Status:
There are no plans to fix this erratum."
The document contains this notice, which may be the best part,
on Page 1.
"Notice: The Intel I/O Controller Hub 3 (ICH3-S) product
may contain design defects or errors known as errata
which may cause the product to deviate from published
specifications. Current characterized errata are
documented in this specification update."
Yeah, that's the ticket. Call a design defect an "erratum"
and amend the specification to match the bugs.
Kernel hackers really are gods - how can they write a driver when the
hardware specs are this rubbery? How does ANYONE test a driver for
correct handling of a "setup timing issue" in impedance compensation?
More information about the gnhlug-discuss
mailing list