[Bugme-new] [Bug 13616] New: IOAPIC -> kernel: BUG: soft lockup - CPU#1 stuck for 61s!

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Wed Jun 24 21:36:00 PDT 2009


http://bugzilla.kernel.org/show_bug.cgi?id=13616

           Summary: IOAPIC -> kernel: BUG: soft lockup - CPU#1 stuck for
                    61s!
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 2.6.27.24-170.2.68.fc10.x86_64
          Platform: All
        OS/Version: Linux
              Tree: Fedora
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: io_other at kernel-bugs.osdl.org
        ReportedBy: faxguy at howardsilvan.com
        Regression: No


This Fedora 10 system (mostly operating as a mail server) had been running in
production without any problem for possibly two weeks when it locked up with
repeated messages of "BUG: soft lockup - CPU#1 stuck for 61s!"  The full
message of the first instance is attached in "messages.txt".

As I couldn't reboot remotely I had to drive to the datacenter and reset the
system.  It then ran fine for another two days until today when it locked up
again.  I reset it again and this time updated the kernel to
2.6.27.25-170.2.72.fc10.x86_64.

No more than 20 minutes later I had to return to the datacenter and then
another 20 minutes after that.  Each time I had to reset the system as the
console was unresponsive.

Ultimately I added "noapic" to the kernel boot parameters, and I haven't had an
issue now for three hours.  (I'm still running 2.6.27.25-170.2.72.fc10.x86_64.)

It would be purely speculative for me to guess as to why it ran fine for two
weeks and then for two days but ultimately could not last an hour.  I suppose
it's possible that our mail traffic has increased (it probably has, as it gets
closer to the end of the month).  I suppose it's possible that changes in
2.6.27.25-170.2.72.fc10.x86_64 aggravated the problem.  And, I suppose that
only three hours of uptime isn't completely conclusive about the "noapic"
resolution.

That said, based on other reports I've seen (which led me to test "noapic"), it
genuinely "feels" like this is an IOAPIC problem.

What information can I get you from the system?  If there is a least-intrusive
manner of obtaining the information that would be favored since the system is
operating in production use.

Thanks.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the Bugme-new mailing list