Discussion:
[PATCH] PCI: Add Broadcom 4331 reset quirk to prevent IRQ storm
Lukas Wunner
2016-03-29 17:41:30 UTC
Permalink
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.

The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.

Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Tested-by: Lukas Wunner <***@wunner.de> [MacBookPro9,1]
Signed-off-by: Lukas Wunner <***@wunner.de>
---
drivers/bcma/bcma_private.h | 2 --
drivers/pci/quirks.c | 27 +++++++++++++++++++++++++++
include/linux/bcma/bcma.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index eda0909..f642c42 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>

-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8e67802..d4fb5ee 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,8 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"

@@ -3282,6 +3284,31 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
quirk_apple_wait_for_thunderbolt);
#endif

+/*
+ * Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
+ * on boot until they are reset, causing spurious interrupts if the IRQ is
+ * shared. Apparently the EFI bootloader enables the device to phone home
+ * to Cupertino and does not disable it before passing control to the OS.
+ */
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ pr_err("b43 quirk: Cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_BROADCOM, 0x4331, quirk_apple_b43_reset);
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
diff --git a/include/linux/bcma/bcma.h b/include/linux/bcma/bcma.h
index 0367c63..5c37b58 100644
--- a/include/linux/bcma/bcma.h
+++ b/include/linux/bcma/bcma.h
@@ -158,6 +158,7 @@ struct bcma_host_ops {
#define BCMA_CORE_DEFAULT 0xFFF

#define BCMA_MAX_NR_CORES 16
+#define BCMA_CORE_SIZE 0x1000

/* Chip IDs of PCIe devices */
#define BCMA_CHIP_ID_BCM4313 0x4313
--
2.8.0.rc3
Lukas Wunner
2016-03-29 17:46:50 UTC
Permalink
Dear bcma maintainers,

I'm moving BCMA_CORE_SIZE from drivers/bcma/bcma_private.h to
include/linux/bcma/bcma.h in the patch included below to be
able to use it in drivers/pci/quirks.c.

If you are okay with this change and have no other objections then
please kindly ack the patch for merging via the pci tree.

If you object against this change then I could alternatively redefine
BCMA_CORE_SIZE in drivers/pci/quirks.c. Other pci quirks are redefining
constants there as well. However it's obviously nicer to have a single
definition in one place.

Thanks,

Lukas
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
---
drivers/bcma/bcma_private.h | 2 --
drivers/pci/quirks.c | 27 +++++++++++++++++++++++++++
include/linux/bcma/bcma.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index eda0909..f642c42 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>
-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8e67802..d4fb5ee 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,8 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"
@@ -3282,6 +3284,31 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
quirk_apple_wait_for_thunderbolt);
#endif
+/*
+ * Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
+ * on boot until they are reset, causing spurious interrupts if the IRQ is
+ * shared. Apparently the EFI bootloader enables the device to phone home
+ * to Cupertino and does not disable it before passing control to the OS.
+ */
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ pr_err("b43 quirk: Cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_BROADCOM, 0x4331, quirk_apple_b43_reset);
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
diff --git a/include/linux/bcma/bcma.h b/include/linux/bcma/bcma.h
index 0367c63..5c37b58 100644
--- a/include/linux/bcma/bcma.h
+++ b/include/linux/bcma/bcma.h
@@ -158,6 +158,7 @@ struct bcma_host_ops {
#define BCMA_CORE_DEFAULT 0xFFF
#define BCMA_MAX_NR_CORES 16
+#define BCMA_CORE_SIZE 0x1000
/* Chip IDs of PCIe devices */
#define BCMA_CHIP_ID_BCM4313 0x4313
--
2.8.0.rc3
Rafał Miłecki
2016-03-31 18:51:48 UTC
Permalink
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Acked-by: Rafał Miłecki <***@gmail.com>

For bcma part (I'm totally OK with moving BCMA_CORE_SIZE).
Michael Büsch
2016-03-31 19:09:36 UTC
Permalink
On Tue, 29 Mar 2016 19:41:30 +0200
Post by Lukas Wunner
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ pr_err("b43 quirk: Cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_BROADCOM, 0x4331, quirk_apple_b43_reset);
Looks good, as long as the wireless core is always guaranteed to be
the second core on that PCI device.
--
Michael
Chris Bainbridge
2016-03-31 23:13:46 UTC
Permalink
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Should also fix https://bugzilla.kernel.org/show_bug.cgi?id=111781 ?
Given that this is a serious bug that can corrupt filesystems it would
be good to see the fix in stable too.
Michael Büsch
2016-04-01 04:59:10 UTC
Permalink
On Fri, 1 Apr 2016 00:13:46 +0100
Post by Chris Bainbridge
Should also fix https://bugzilla.kernel.org/show_bug.cgi?id=111781 ?
Given that this is a serious bug that can corrupt filesystems it would
be good to see the fix in stable too.
Wow, this is horrible.

This patch should help a bit. But it's probably hard to workaround that
DMA issue completely. There still is a chance of corruption, as soon as
the kernel allocates and uses some memory that the DMA controller still
uses from boot-up before the fixup is applied.
--
Michael
Lukas Wunner
2016-04-01 22:46:46 UTC
Permalink
Hi Chris,
Post by Chris Bainbridge
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Should also fix https://bugzilla.kernel.org/show_bug.cgi?id=111781 ?
Given that this is a serious bug that can corrupt filesystems it would
be good to see the fix in stable too.
I cannot reproduce this particular issue on my MBP9,1 even though it
is architecturally very similar to your MBP10,2. I tested it with
"iommu=force intel_iommu=on", blacklisted b43 and stressed the machine
a bit with kernel compiles. No issues.

Have you tested my patch? As Michael Büsch has pointed out correctly,
it should mitigate that issue but will not solve it completely.
You write in the bugzilla entry that you've seen the issue as early
as 0.7 seconds into the boot. That's not a very precise method to
measure boot progress but on my machine the card is reset around 1.2
seconds, so likely too late. It's possible to execute the PCI quirk
slightly earlier (in the header fixup section rather than the late
fixup section) but that wouldn't make a significant difference.

If you use the method described in Matthew Garrett's blog post,
i.e. boot with "memmap=0xc0$0x1568a0", does the issue go away?
If so, we could probably add a quirk in arch/x86/kernel/e820.c
to mark that particular area reserved, or generally mark all
EFI boot services areas as reserved on Macs. This memory could
then be marked usable again once the card has been reset.

As to why the issue doesn't occur on my machine, my guess is that
the wireless card only writes packets to memory if it is associated
with a base station. Do you have an open WLAN around? Try moving
out of its reach and test if the issue goes away. Alternatively,
did you enter Wifi credentials in OS X? If so it has saved the
credentials to EFI variables. To verify if it has, use this on OS X:

/usr/sbin/nvram 36C28AB5-6566-4C50-9EBD-CBB920F83843:current-network
/usr/sbin/nvram 36C28AB5-6566-4C50-9EBD-CBB920F83843:preferred-networks
/usr/sbin/nvram 36C28AB5-6566-4C50-9EBD-CBB920F83843:preferred-count

On Linux, invoke "mount -t efivarfs none /sys/firmware/efi/efivars"
and delete the variables (use chattr -i first if you're on Linux 4.5+).
Test if the issue goes away.

I have neither an open WLAN nearby nor Wifi credentials in my EFI
variables, so the wireless card cannot associate with a base station
in the pre-OS phase and that's probably the reason why the issue doesn't
occur on my machine.

Best regards,

Lukas
Michael Büsch
2016-04-02 07:30:29 UTC
Permalink
On Sat, 2 Apr 2016 00:46:46 +0200
Post by Lukas Wunner
Hi Chris,
Post by Chris Bainbridge
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Should also fix https://bugzilla.kernel.org/show_bug.cgi?id=111781 ?
Given that this is a serious bug that can corrupt filesystems it would
be good to see the fix in stable too.
I cannot reproduce this particular issue on my MBP9,1 even though it
is architecturally very similar to your MBP10,2. I tested it with
"iommu=force intel_iommu=on", blacklisted b43 and stressed the machine
a bit with kernel compiles. No issues.
I think you will have to stress the wireless, not the kernel.
Enable iommu and let the wireless receive packets that go through the
hw filters. If you have a calm network and nobody sends data to you,
the card won't write anything to DMA. (This all depends on how the
firmware configured the filters).
--
Michael
Andrew Worsley
2016-04-02 11:40:41 UTC
Permalink
Perhaps it can be hardware other than the Broadcom that causes this problem?

I do see an irq 17 problem on my macbook, but I thought grub is
supposed to stop the boardcom wireless?
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Thank-you for drawing my attention to this issue - I didn't know about
this issue and that " IRQ 17 nobody cared" isue has been annoying a
lot.
I have a MacBook10,1 :
dmesg | grep MacBook
DMI: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
MBP101.88Z.00EE.B00.1205101839 05/10/2012

Investigating grub2 git://git.savannah.gnu.org/grub.git I see this
patch rev 9d34bb8 which says it disables Broadcom wireless hardware
on Apples:

* commit 9d34bb8
| Author: Matthew Garrett <***@redhat.com>
| Date: Thu May 3 17:26:55 2012 +0200
|
| Suspend broadcom cards in order to stop their DMA.
|
| * grub-core/Makefile.am (KERNEL_HEADER_FILES): Add pci.h on x86 EFI.
| * grub-core/Makefile.core.def (kernel): Add pci.c on x86 EFI.
| (pci): Don't build on x86 EFI.
| * grub-core/bus/pci.c (grub_pci_find_capability): New function.
| * grub-core/kern/efi/mm.c (stop_broadcom) [__i386__ || __x86_64__]:
| New function.
| (grub_efi_finish_boot_services) [__i386__ || __x86_64__]: Call
| stop_broadcom if running on EFI.
| * include/grub/pci.h (GRUB_PCI_CLASS_NETWORK): New enum value.
| (GRUB_PCI_CAP_POWER_MANAGEMENT): Likewise.
| (GRUB_PCI_VENDOR_BROADCOM): Likewise.
| (grub_pci_find_capability): New proto.
|
| Also-By: Vladimir Serbinenko <***@gmail.com>
|
| M ChangeLog
| M grub-core/Makefile.am
| M grub-core/Makefile.core.def
| M grub-core/bus/pci.c
| M grub-core/kern/efi/mm.c
| M include/grub/pci.h

But I run debian grub2-common 2.02~beta2-22+deb8u1 which has this fix
and I *still* get this irq issue

....
[ 608.242849] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
with disabled ep ffff88008a19df48
[ 608.242851] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
with disabled ep ffff88008a19df90
[ 608.254975] irq 17: nobody cared (try booting with the "irqpoll" option)
[ 608.254979] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O
3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u4
[ 608.254981] Hardware name: Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
MBP101.88Z.00EE.B00.1205101839 05/10/2012
[ 608.254985] ffff88045a85bac4 ffffffff8150dcff ffff88045a85ba00
ffffffff810bd3ad
[ 608.254987] ffff88045a85ba00 0000000000000011 0000000000000000
ffffffff810bd8d1
[ 608.254989] 0000000000000000 0000000000000000 0000000000000011
0000000000000000
[ 608.254990] Call Trace:
[ 608.254999] <IRQ> [<ffffffff8150dcff>] ? dump_stack+0x41/0x51
[ 608.255006] [<ffffffff810bd3ad>] ? __report_bad_irq+0x2d/0xc0
[ 608.255010] [<ffffffff810bd8d1>] ? note_interrupt+0x241/0x290
[ 608.255013] [<ffffffff810bb0f1>] ? handle_irq_event_percpu+0xa1/0x190
[ 608.255017] [<ffffffff810bb218>] ? handle_irq_event+0x38/0x60
[ 608.255020] [<ffffffff810be413>] ? handle_fasteoi_irq+0x83/0x150
[ 608.255025] [<ffffffff810150cd>] ? handle_irq+0x1d/0x30
[ 608.255029] [<ffffffff81516cd9>] ? do_IRQ+0x49/0xe0
[ 608.255033] [<ffffffff81514b2d>] ? common_interrupt+0x6d/0x6d
[ 608.255037] <EOI> [<ffffffff8108aded>] ?
__hrtimer_start_range_ns+0x1cd/0x390
[ 608.255041] [<ffffffff813dfd42>] ? cpuidle_enter_state+0x52/0xc0
[ 608.255044] [<ffffffff813dfd38>] ? cpuidle_enter_state+0x48/0xc0
[ 608.255047] [<ffffffff810a80d8>] ? cpu_startup_entry+0x2f8/0x400
[ 608.255051] [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
[ 608.255054] [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
[ 608.255057] [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120
[ 608.255060] [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c
[ 608.255061] handlers:
[ 608.255088] [<ffffffffa0645730>] azx_interrupt [snd_hda_controller]
[ 608.255089] Disabling IRQ #17
[ 608.354706] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
...
Lukas Wunner
2016-04-03 11:49:12 UTC
Permalink
Hi Andrew,
Post by Andrew Worsley
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
I do see an irq 17 problem on my macbook, but I thought grub is
supposed to stop the boardcom wireless?
Investigating grub2 git://git.savannah.gnu.org/grub.git I see this
patch rev 9d34bb8 which says it disables Broadcom wireless hardware
Thanks for the pointer to the grub2 commit, I wasn't aware of that.

The commit puts the wireless card in power state D3hot but that doesn't
stop it from sending interrupts. I have just tested that. So it's
perfectly plausible that you're still seeing spurious interrupts
despite using grub. Please test the patch I've posted, the spurious
interrupts should disappear. If you "cat /proc/interrupts", you should
then only see a few hundred interrupts on IRQ 17. Without the patch it
should be in the 100000+ range.

Best regards,

Lukas
Post by Andrew Worsley
* commit 9d34bb8
| Date: Thu May 3 17:26:55 2012 +0200
|
| Suspend broadcom cards in order to stop their DMA.
|
| * grub-core/Makefile.am (KERNEL_HEADER_FILES): Add pci.h on x86 EFI.
| * grub-core/Makefile.core.def (kernel): Add pci.c on x86 EFI.
| (pci): Don't build on x86 EFI.
| * grub-core/bus/pci.c (grub_pci_find_capability): New function.
| New function.
| (grub_efi_finish_boot_services) [__i386__ || __x86_64__]: Call
| stop_broadcom if running on EFI.
| * include/grub/pci.h (GRUB_PCI_CLASS_NETWORK): New enum value.
| (GRUB_PCI_CAP_POWER_MANAGEMENT): Likewise.
| (GRUB_PCI_VENDOR_BROADCOM): Likewise.
| (grub_pci_find_capability): New proto.
|
|
| M ChangeLog
| M grub-core/Makefile.am
| M grub-core/Makefile.core.def
| M grub-core/bus/pci.c
| M grub-core/kern/efi/mm.c
| M include/grub/pci.h
But I run debian grub2-common 2.02~beta2-22+deb8u1 which has this fix
and I *still* get this irq issue
....
[ 608.242849] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
with disabled ep ffff88008a19df48
[ 608.242851] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
with disabled ep ffff88008a19df90
[ 608.254975] irq 17: nobody cared (try booting with the "irqpoll" option)
[ 608.254979] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O
3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u4
[ 608.254981] Hardware name: Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
MBP101.88Z.00EE.B00.1205101839 05/10/2012
[ 608.254985] ffff88045a85bac4 ffffffff8150dcff ffff88045a85ba00
ffffffff810bd3ad
[ 608.254987] ffff88045a85ba00 0000000000000011 0000000000000000
ffffffff810bd8d1
[ 608.254989] 0000000000000000 0000000000000000 0000000000000011
0000000000000000
[ 608.254999] <IRQ> [<ffffffff8150dcff>] ? dump_stack+0x41/0x51
[ 608.255006] [<ffffffff810bd3ad>] ? __report_bad_irq+0x2d/0xc0
[ 608.255010] [<ffffffff810bd8d1>] ? note_interrupt+0x241/0x290
[ 608.255013] [<ffffffff810bb0f1>] ? handle_irq_event_percpu+0xa1/0x190
[ 608.255017] [<ffffffff810bb218>] ? handle_irq_event+0x38/0x60
[ 608.255020] [<ffffffff810be413>] ? handle_fasteoi_irq+0x83/0x150
[ 608.255025] [<ffffffff810150cd>] ? handle_irq+0x1d/0x30
[ 608.255029] [<ffffffff81516cd9>] ? do_IRQ+0x49/0xe0
[ 608.255033] [<ffffffff81514b2d>] ? common_interrupt+0x6d/0x6d
[ 608.255037] <EOI> [<ffffffff8108aded>] ?
__hrtimer_start_range_ns+0x1cd/0x390
[ 608.255041] [<ffffffff813dfd42>] ? cpuidle_enter_state+0x52/0xc0
[ 608.255044] [<ffffffff813dfd38>] ? cpuidle_enter_state+0x48/0xc0
[ 608.255047] [<ffffffff810a80d8>] ? cpu_startup_entry+0x2f8/0x400
[ 608.255051] [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
[ 608.255054] [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
[ 608.255057] [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120
[ 608.255060] [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c
[ 608.255088] [<ffffffffa0645730>] azx_interrupt [snd_hda_controller]
[ 608.255089] Disabling IRQ #17
[ 608.354706] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
...
Andrew Worsley
2016-04-07 12:04:05 UTC
Permalink
Sorry but testing the patch shows no difference.

I have just compiled debian jessie kernel 3.16.7-ckt25 and booted it
and hibernated it twice, then did the same with your patch applied.
There appeared to be no difference

On first boot I didn't get the nobody card disabling problem but
after each hibernate I got the problem. But I did get 51130 IRQ 17
interrupts on the first boot but after the hibernate restore each
time I got 100000 extra interrupts in /proc/interrupts and the irq 17:
nobody cared message.

I could not see any difference with or with out the patch.
I boot with grub-efi using the linux/initrd commands

So perhaps the hibernate-restore needs the fix?

Andrew
Post by Lukas Wunner
Hi Andrew,
Post by Andrew Worsley
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
I do see an irq 17 problem on my macbook, but I thought grub is
supposed to stop the boardcom wireless?
Investigating grub2 git://git.savannah.gnu.org/grub.git I see this
patch rev 9d34bb8 which says it disables Broadcom wireless hardware
Thanks for the pointer to the grub2 commit, I wasn't aware of that.
The commit puts the wireless card in power state D3hot but that doesn't
stop it from sending interrupts. I have just tested that. So it's
perfectly plausible that you're still seeing spurious interrupts
despite using grub. Please test the patch I've posted, the spurious
interrupts should disappear. If you "cat /proc/interrupts", you should
then only see a few hundred interrupts on IRQ 17. Without the patch it
should be in the 100000+ range.
Best regards,
Lukas
Post by Andrew Worsley
* commit 9d34bb8
| Date: Thu May 3 17:26:55 2012 +0200
|
| Suspend broadcom cards in order to stop their DMA.
|
| * grub-core/Makefile.am (KERNEL_HEADER_FILES): Add pci.h on x86 EFI.
| * grub-core/Makefile.core.def (kernel): Add pci.c on x86 EFI.
| (pci): Don't build on x86 EFI.
| * grub-core/bus/pci.c (grub_pci_find_capability): New function.
| New function.
| (grub_efi_finish_boot_services) [__i386__ || __x86_64__]: Call
| stop_broadcom if running on EFI.
| * include/grub/pci.h (GRUB_PCI_CLASS_NETWORK): New enum value.
| (GRUB_PCI_CAP_POWER_MANAGEMENT): Likewise.
| (GRUB_PCI_VENDOR_BROADCOM): Likewise.
| (grub_pci_find_capability): New proto.
|
|
| M ChangeLog
| M grub-core/Makefile.am
| M grub-core/Makefile.core.def
| M grub-core/bus/pci.c
| M grub-core/kern/efi/mm.c
| M include/grub/pci.h
But I run debian grub2-common 2.02~beta2-22+deb8u1 which has this fix
and I *still* get this irq issue
....
[ 608.242849] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
with disabled ep ffff88008a19df48
[ 608.242851] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
with disabled ep ffff88008a19df90
[ 608.254975] irq 17: nobody cared (try booting with the "irqpoll" option)
[ 608.254979] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O
3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u4
[ 608.254981] Hardware name: Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
MBP101.88Z.00EE.B00.1205101839 05/10/2012
[ 608.254985] ffff88045a85bac4 ffffffff8150dcff ffff88045a85ba00
ffffffff810bd3ad
[ 608.254987] ffff88045a85ba00 0000000000000011 0000000000000000
ffffffff810bd8d1
[ 608.254989] 0000000000000000 0000000000000000 0000000000000011
0000000000000000
[ 608.254999] <IRQ> [<ffffffff8150dcff>] ? dump_stack+0x41/0x51
[ 608.255006] [<ffffffff810bd3ad>] ? __report_bad_irq+0x2d/0xc0
[ 608.255010] [<ffffffff810bd8d1>] ? note_interrupt+0x241/0x290
[ 608.255013] [<ffffffff810bb0f1>] ? handle_irq_event_percpu+0xa1/0x190
[ 608.255017] [<ffffffff810bb218>] ? handle_irq_event+0x38/0x60
[ 608.255020] [<ffffffff810be413>] ? handle_fasteoi_irq+0x83/0x150
[ 608.255025] [<ffffffff810150cd>] ? handle_irq+0x1d/0x30
[ 608.255029] [<ffffffff81516cd9>] ? do_IRQ+0x49/0xe0
[ 608.255033] [<ffffffff81514b2d>] ? common_interrupt+0x6d/0x6d
[ 608.255037] <EOI> [<ffffffff8108aded>] ?
__hrtimer_start_range_ns+0x1cd/0x390
[ 608.255041] [<ffffffff813dfd42>] ? cpuidle_enter_state+0x52/0xc0
[ 608.255044] [<ffffffff813dfd38>] ? cpuidle_enter_state+0x48/0xc0
[ 608.255047] [<ffffffff810a80d8>] ? cpu_startup_entry+0x2f8/0x400
[ 608.255051] [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
[ 608.255054] [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
[ 608.255057] [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120
[ 608.255060] [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c
[ 608.255088] [<ffffffffa0645730>] azx_interrupt [snd_hda_controller]
[ 608.255089] Disabling IRQ #17
[ 608.354706] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
...
Andrew Worsley
2016-04-10 10:09:29 UTC
Permalink
Further testing Broadcom 4331 reset quirk to prevent IRQ storm patch
testing reveals that:
1. quirk is run on initial boot up and this time appears to have
vastly reduced the interrupts (only 81 this time):
cat /proc/interrupts| grep 17
17: 81 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel

2. But it is apparently *NOT* run after a suspend/resume and we get
the problem:
17: 100084 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel

Rebooting a further nine times shows the low number (below 100) only
happens around 1/3 of the times:
boot #2
17: 38706 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
boot #3
17: 87 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
LOC: 2494 2031 2094 1831 1157 1171
1573 1271 Local timer interrupts
boot #4
17: 50616 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
boot#5
17: 26454 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
boot#6
17: 34440 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
boot#7
17: 79 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
boot#8
17: 84 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
boot#9
17: 37054 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
boot#10
17: 24648 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel

Is there an easy setpci command to stop this we can add to grub?

Presently I have a grub work around for black screen as described here:
http://askubuntu.com/questions/264247/proprietary-nvidia-drivers-with-efi-on-mac-to-prevent-overheating/613573#613573

which basically involves adding a grub scriptlet to enable PCI-E bus
mastering on graphics cards:

In /etc/grub.d/01_enable_vga.conf:

setpci -s "00:01.0" 3e.b=8
setpci -s "01:00.0" 04.b=7

Can we do some similar magic setpci commands to disable 04:00.0
which is my BCM4331

lspci | grep 4331
04:00.0 Network controller: Broadcom Corporation BCM4331 802.11a/b/g/n (rev 02)
Post by Andrew Worsley
Sorry but testing the patch shows no difference.
I have just compiled debian jessie kernel 3.16.7-ckt25 and booted it
and hibernated it twice, then did the same with your patch applied.
There appeared to be no difference
....

Thanks for any suggestions

Andrew
Lukas Wunner
2016-04-12 18:32:02 UTC
Permalink
Hi Andrew,

thank you for the extensive testing.
Post by Andrew Worsley
Further testing Broadcom 4331 reset quirk to prevent IRQ storm patch
1. quirk is run on initial boot up and this time appears to have
cat /proc/interrupts| grep 17
17: 81 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
Something in the ballpark of 81 interrupt requests is fine.

The kernel will print the error message about spurious interrupts and
switch to polling at 100000 requests. But even 20000 is way too much.
This just means that b43 loaded quickly enough to stop the interrupts
before the kernel limit of 100000 was reached, but the wireless card
wasn't reset early on as it should have been.

It looks like the patch didn't work at all on your machine for some
reason. Do you see a message "cannot iomap device, IRQ storm ahead"
in dmesg?

Included below is a slightly different patch which is based on the
3.16.7-ckt tree (so it should apply cleanly to your kernel version).
This version prints a message when the wireless card is reset. Please
verify that you see the message "b43 quirk: resetting controller" in
dmesg with this patch. This version of the patch also resets the card
a bit earlier in the boot process than the previous version did.
Please let me know if this improves the situation on your machine.

You might also want to try a newer kernel version, 3.16 is very old
and I'm not sure if there have been any changes since which might
negatively impact proper functioning of the patch. There are also
official 4.4 kernels available for Jessie. Canonical ends support
for the 3.16.y-ckt series anyway this month.
Post by Andrew Worsley
http://askubuntu.com/questions/264247/proprietary-nvidia-drivers-with-efi-on-mac-to-prevent-overheating/613573#613573
which basically involves adding a grub scriptlet to enable PCI-E bus
setpci -s "00:01.0" 3e.b=8
setpci -s "01:00.0" 04.b=7
I think you need to upgrade to a newer version of the Nvidia driver,
I have just verified that 352.63 will set the busmaster bit if it's
cleared.
Post by Andrew Worsley
Can we do some similar magic setpci commands to disable 04:00.0
which is my BCM4331
I think grub is also able to write to mmio space of a PCI device,
which is what's necessary to reset the wireless card, so theoretically
yes, but the right place to fix this is in the kernel since not
everyone is using grub.

Best regards,

Lukas

-- >8 --
Subject: [PATCH] PCI: Add Broadcom 4331 reset quirk to prevent IRQ storm

Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.

The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.

Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Cc: <***@vger.kernel.org>
Tested-by: Lukas Wunner <***@wunner.de> [MacBookPro9,1]
Signed-off-by: Lukas Wunner <***@wunner.de>
Acked-by: Rafał Miłecki <***@gmail.com>
---
drivers/bcma/bcma_private.h | 2 --
drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++
include/linux/bcma/bcma.h | 1 +
3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index 09b632a..7ded994 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>

-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1f5ea24..5a33924 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,8 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"

@@ -3082,6 +3084,32 @@ static void quirk_no_bus_reset(struct pci_dev *dev)
*/
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);

+/*
+ * Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
+ * on boot until they are reset, causing spurious interrupts if the IRQ is
+ * shared. Apparently the EFI bootloader enables the device to phone home
+ * to Cupertino and does not disable it before passing control to the OS.
+ */
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_SYS_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ dev_err(&dev->dev, "b43 quirk: cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ dev_info(&dev->dev, "b43 quirk: resetting controller\n");
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x4331, quirk_apple_b43_reset);
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
diff --git a/include/linux/bcma/bcma.h b/include/linux/bcma/bcma.h
index 0b3bb16..b19c7ed 100644
--- a/include/linux/bcma/bcma.h
+++ b/include/linux/bcma/bcma.h
@@ -153,6 +153,7 @@ struct bcma_host_ops {
#define BCMA_CORE_DEFAULT 0xFFF

#define BCMA_MAX_NR_CORES 16
+#define BCMA_CORE_SIZE 0x1000

/* Chip IDs of PCIe devices */
#define BCMA_CHIP_ID_BCM4313 0x4313
--
1.8.5.2 (Apple Git-48)
Andrew Worsley
2016-04-13 20:42:50 UTC
Permalink
Thank-you very much for your comments in your reply.

Actually the patch did work - I confirmed it was run and the iomap
call was successful by adding a pr_info() after the pci_iomap()
success branch.

The only time I am getting the IRQ 17 nobody cared message is on
suspend / resume. A fresh boot always had below the 100k interrupt
threshold level.

I tried your new patch and the number is even lower < 30,000 over two boots.

BUT on suspend resume again 126856.

Have you any insights on fixing suspend to disk / resume paths which
presumably face the same issue of being passed live hardware on boot
up?
Post by Lukas Wunner
Hi Andrew,
thank you for the extensive testing.
Post by Andrew Worsley
Further testing Broadcom 4331 reset quirk to prevent IRQ storm patch
1. quirk is run on initial boot up and this time appears to have
cat /proc/interrupts| grep 17
17: 81 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
Something in the ballpark of 81 interrupt requests is fine.
The kernel will print the error message about spurious interrupts and
switch to polling at 100000 requests. But even 20000 is way too much.
This just means that b43 loaded quickly enough to stop the interrupts
before the kernel limit of 100000 was reached, but the wireless card
wasn't reset early on as it should have been.
It looks like the patch didn't work at all on your machine for some
reason. Do you see a message "cannot iomap device, IRQ storm ahead"
in dmesg?
Result from two reboots with my 3.16 kernel and your new patch

Three full boots (all below 30k interrupts):
17: 23978 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
17: 30088 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
17: 26853 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel


dmesg output showing quirk running
dmesg | grep -C 5 quirk
[ 3.270315] pci 0000:00:1c.0: PCI bridge to [bus 03]
[ 3.270323] pci 0000:00:1c.0: bridge window [mem 0xc1a00000-0xc1afffff]
[ 3.270331] pci 0000:00:1c.0: bridge window [mem
0xc1800000-0xc18fffff 64bit pref]
[ 3.270463] pci 0000:04:00.0: [14e4:4331] type 00 class 0x028000
[ 3.270495] pci 0000:04:00.0: reg 0x10: [mem 0xc1900000-0xc1903fff 64bit]
[ 3.270574] pci 0000:04:00.0: b43 quirk: resetting controller
[ 3.270711] pci 0000:04:00.0: supports D1 D2
[ 3.270712] pci 0000:04:00.0: PME# supported from D0 D3hot D3cold
[ 3.270759] pci 0000:04:00.0: System wakeup disabled by ACPI
[ 3.278239] pci 0000:00:1c.1: PCI bridge to [bus 04]
[ 3.278251] pci 0000:00:1c.1: bridge window [mem 0xc1900000-0xc19fffff]

Output after resume. Note: Some times it looks it can happen on the
suspend to disk? But a new one is always present after the resume.

17: 126856 0 0 0 0 0
0 0 IO-APIC-fasteoi snd_hda_intel
[ 53.404157] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
with disabled ep ffff88045d495540
[ 53.468249] irq 17: nobody cared (try booting with the "irqpoll" option)
[ 53.468253] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O
3.16.7-ckt25-3.16-bcm4331-patch2 #7
[ 53.468254] Hardware name: Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
MBP101.88Z.00EE.B00.1205101839 05/10/2012
[ 53.468259] 0000000000000000 ffffffff81520370 ffff88045a8a8c00
ffff88045a8a8cc4
[ 53.468262] ffffffff810bfe7d ffff88045a8a8c00 0000000000000000
0000000000000011
[ 53.468264] ffffffff810c022f 0000000000000000 0000000000000011
0000000000000000
[ 53.468265] Call Trace:
[ 53.468275] <IRQ> [<ffffffff81520370>] ? dump_stack+0x5d/0x78
[ 53.468282] [<ffffffff810bfe7d>] ? __report_bad_irq+0x2d/0xd0
[ 53.468286] [<ffffffff810c022f>] ? note_interrupt+0x25f/0x2b0
[ 53.468290] [<ffffffff810bd9c1>] ? handle_irq_event_percpu+0x121/0x190
[ 53.468294] [<ffffffff810bda68>] ? handle_irq_event+0x38/0x50
[ 53.468296] [<ffffffff810c0d5f>] ? handle_fasteoi_irq+0x7f/0x150
[ 53.468302] [<ffffffff810153ad>] ? handle_irq+0x1d/0x30
[ 53.468307] [<ffffffff81529118>] ? do_IRQ+0x48/0xe0
[ 53.468311] [<ffffffff81526f6d>] ? common_interrupt+0x6d/0x6d
[ 53.468317] <EOI> [<ffffffff813ef54c>] ? cpuidle_enter_state+0x4c/0xc0
[ 53.468320] [<ffffffff813ef542>] ? cpuidle_enter_state+0x42/0xc0
[ 53.468323] [<ffffffff810aaa6a>] ? cpu_startup_entry+0x33a/0x460
[ 53.468326] [<ffffffff81911f34>] ? start_kernel+0x473/0x47b
[ 53.468331] [<ffffffff81911120>] ? early_idt_handler_array+0x120/0x120
[ 53.468335] [<ffffffff81911608>] ? x86_64_start_kernel+0x14d/0x15c
[ 53.468336] handlers:
[ 53.468367] [<ffffffffa02ce740>] azx_interrupt [snd_hda_controller]
[ 53.468368] Disabling IRQ #17
[ 53.513740] usb 3-1: reset high-speed USB device number 2 using xhci_hcd
[ 53.633633] usb 1-1.1: reset high-speed USB device number 3 using ehci-pci
[ 53.633646] usb 2-1.8: reset high-speed USB device number 3 using ehci-pci

Sorry for the old kernel - I want to run debian stable rather than
hand buit kernels so my other packages.

I don't see any newer kernels when I do
apt-cache search "^linux-source"

so perhaps I have to add backports or testing into my repository list?
If you think it is worth it I can do that.

What other boot loaders do people use on a MacBook beside grub?

I think the setpci commands in grub might fix the problem for me for
suspend/resume as well as boot. Can you can easily point me to how to
translate the numbers from your patch:

Would it be:

setpci -s "04:00.0" 1800.l=1

Do you have another pointer to where to fix the suspend resume?

Thanks very much again

Andrew
Lukas Wunner
2016-04-24 17:04:23 UTC
Permalink
Hi Andrew,
Post by Andrew Worsley
The only time I am getting the IRQ 17 nobody cared message is on
suspend / resume. A fresh boot always had below the 100k interrupt
threshold level.
I tried your new patch and the number is even lower < 30,000 over two boots.
Hm, that's still way too much. You should have < 100 interrupts on IRQ 17
on *every* boot, anything else isn't satisfactory. I'm working on a patch
that runs before EFI ExitBootServices() but will need a little longer to
find an optimal solution.
Post by Andrew Worsley
Have you any insights on fixing suspend to disk / resume paths which
presumably face the same issue of being passed live hardware on boot
up?
Suspend to disk involves two kernels, the boot kernel and the image kernel.
Perhaps these two are not the same on your machine and you've only compiled
the patch into one of them?

Best regards,

Lukas
Lukas Wunner
2016-05-23 14:42:16 UTC
Permalink
Hi Andrew,
Post by Lukas Wunner
Post by Andrew Worsley
The only time I am getting the IRQ 17 nobody cared message is on
suspend / resume. A fresh boot always had below the 100k interrupt
threshold level.
I tried your new patch and the number is even lower < 30,000 over two boots.
Hm, that's still way too much. You should have < 100 interrupts on IRQ 17
on *every* boot, anything else isn't satisfactory.
The reason why my previous patches didn't work on your machine is
because you're using grub, and grub contains a patch by Matthew
Garrett which puts the wireless card into power state D3hot. The
card's mmio space isn't accessible in D3hot. Included below is a
new patch which transitions the card to D0 before resetting it.
Please let me know if this fixes the issue for you.

@Chris Bainbridge: Please test this as well, this is no longer a
plain vanilla PCI quirk but an early quirk, it should prevent any
kind of memory corruption by DMAed packets.

Best regards,

Lukas

-- >8 --
Subject: [PATCH] x86: Add early quirk to reset Apple AirPort card
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its
IRQ, causing spurious interrupts if the IRQ is shared. It also
corrupts memory by DMAing received packets.

The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.

The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to prevent memory corruption by DMAed packets:
Matthew Garrett found out in 2012 that the packets are written to
EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occuring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.

When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56

While this may prevent DMAed packets, it does not cure the spurious
interrupts emanating from the card. Unfortunately the card's mmio space
is inaccessible during D3hot, so to reset it, we have to undo the effect
of Matthew's patch and transition the card back to D0 if the system was
booted with grub.

Since commit 8659c406ade3 ("x86: only scan the root bus in early PCI
quirks"), early quirks can only be applied to devices on the root bus.
However the Broadcom 4331 card is located on a secondary bus behind a
PCIe root port. This commit therefore reintroduces scanning of secondary
buses. The primary motivation of 8659c406ade3 was to prevent application
of the nvidia_bugs() quirk on secondary buses. Amend the quirk to open
code this requirement.

A secondary motivation was to speed up PCI scanning. The algorithm used
prior to 8659c406ade3 was particularly time consuming because it scanned
buses 0 to 31 brute force. The recursive algorithm used by this commit
only scans buses that are actually reachable from the root bus and
should thus be a bit faster. If this algorithm is found to significantly
impact boot time, it would be possible to limit its recursion depth:
The Apple AirPort quirk applies at depth 1, all others at depth 0,
so the bus need not be scanned deeper than that for now. An alternative
approach would be to continue scanning only the root bus, and apply the
AirPort quirk to the root ports 8086:1c12, 8086:1e12 and 8086:1e16.
Apple always positioned the Broadcom 4331 behind one of these three
ports (see model list below). The quirk would then check presence of the
Broadcom 4331 in slot 0 below the root port and do its deed.

Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.

Michael Büsch and Bjorn Helgaas contributed feedback towards finding
the best solution to this problem.

The following should be a comprehensive list of affected models:
iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16]
iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16]
Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12]
Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12]
MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]

For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):
irq 17: nobody cared (try booting with the "irqpoll" option)
handlers:
[<ffffffff81374370>] pcie_isr
[<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
[<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
Disabling IRQ #17

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Cc: Chris Milsted <***@redhat.com>
Cc: Matthew Garrett <***@srcf.ucam.org>
Cc: Chris Bainbridge <***@gmail.com>
Cc: Andi Kleen <***@linux.intel.com>
Cc: Michael Buesch <***@bues.ch>
Cc: Bjorn Helgaas <***@google.com>
Cc: ***@vger.kernel.org
Tested-by: Lukas Wunner <***@wunner.de> [MacBookPro9,1]
Signed-off-by: Lukas Wunner <***@wunner.de>
Acked-by: Rafał Miłecki <***@gmail.com>
---
arch/x86/kernel/early-quirks.c | 88 +++++++++++++++++++++++++++++++++++-------
drivers/bcma/bcma_private.h | 2 -
include/linux/bcma/bcma.h | 1 +
3 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index db9a675..8aaab75 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -11,7 +11,11 @@

#include <linux/pci.h>
#include <linux/acpi.h>
+#include <linux/delay.h>
+#include <linux/dmi.h>
#include <linux/pci_ids.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <drm/i915_drm.h>
#include <asm/pci-direct.h>
#include <asm/dma.h>
@@ -21,6 +25,7 @@
#include <asm/iommu.h>
#include <asm/gart.h>
#include <asm/irq_remapping.h>
+#include <asm/early_ioremap.h>

static void __init fix_hypertransport_config(int num, int slot, int func)
{
@@ -76,6 +81,13 @@ static void __init nvidia_bugs(int num, int slot, int func)
#ifdef CONFIG_ACPI
#ifdef CONFIG_X86_IO_APIC
/*
+ * Only applies to Nvidia root ports (bus 0) and not to
+ * Nvidia graphics cards on secondary buses with PCI ports.
+ */
+ if (num)
+ return;
+
+ /*
* All timer overrides on Nvidia are
* wrong unless HPET is enabled.
* Unfortunately that's not true on many Asus boards.
@@ -589,6 +601,47 @@ static void __init force_disable_hpet(int num, int slot, int func)
#endif
}

+#define BCM4331_MMIO_SIZE 16384
+#define BCM4331_PM_CAP 0x40
+
+static void __init apple_airport_reset(int bus, int slot, int func)
+{
+ void __iomem *mmio;
+ u16 pmcsr;
+ u64 addr;
+
+ if (!dmi_match(DMI_SYS_VENDOR, "Apple Inc."))
+ return;
+
+ /* Card may have been put into PCI_D3hot by grub quirk */
+ pmcsr = read_pci_config_16(bus, slot, func,
+ BCM4331_PM_CAP + PCI_PM_CTRL);
+ if ((pmcsr & PCI_PM_CTRL_STATE_MASK) != PCI_D0) {
+ pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
+ write_pci_config_16(bus, slot, func,
+ BCM4331_PM_CAP + PCI_PM_CTRL, pmcsr);
+ mdelay(10);
+ pmcsr = read_pci_config_16(bus, slot, func,
+ BCM4331_PM_CAP + PCI_PM_CTRL);
+ if ((pmcsr & PCI_PM_CTRL_STATE_MASK) != PCI_D0) {
+ printk(KERN_ERR "Cannot power up Apple AirPort card\n");
+ return;
+ }
+ }
+
+ addr = read_pci_config(bus, slot, func, PCI_BASE_ADDRESS_0);
+ addr |= (u64)read_pci_config(bus, slot, func, PCI_BASE_ADDRESS_1) << 32;
+ addr &= PCI_BASE_ADDRESS_MEM_MASK;
+ mmio = early_ioremap(addr, BCM4331_MMIO_SIZE);
+ if (!mmio) {
+ printk(KERN_ERR "Cannot iomap Apple AirPort card\n");
+ return;
+ }
+ printk(KERN_INFO "Resetting Apple AirPort card\n");
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ early_iounmap(mmio, BCM4331_MMIO_SIZE);
+}

#define QFLAG_APPLY_ONCE 0x1
#define QFLAG_APPLIED 0x2
@@ -602,12 +655,6 @@ struct chipset {
void (*f)(int num, int slot, int func);
};

-/*
- * Only works for devices on the root bus. If you add any devices
- * not on bus 0 readd another loop level in early_quirks(). But
- * be careful because at least the Nvidia quirk here relies on
- * only matching on bus 0.
- */
static struct chipset early_qrk[] __initdata = {
{ PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, QFLAG_APPLY_ONCE, nvidia_bugs },
@@ -637,9 +684,13 @@ static struct chipset early_qrk[] __initdata = {
*/
{ PCI_VENDOR_ID_INTEL, 0x0f00,
PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
+ { PCI_VENDOR_ID_BROADCOM, 0x4331,
+ PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset},
{}
};

+static void __init early_pci_scan_bus(int bus);
+
/**
* check_dev_quirk - apply early quirks to a given PCI device
* @num: bus number
@@ -648,7 +699,7 @@ static struct chipset early_qrk[] __initdata = {
*
* Check the vendor & device ID against the early quirks table.
*
- * If the device is single function, let early_quirks() know so we don't
+ * If the device is single function, let early_pci_scan_bus() know so we don't
* poke at this device again.
*/
static int __init check_dev_quirk(int num, int slot, int func)
@@ -657,6 +708,7 @@ static int __init check_dev_quirk(int num, int slot, int func)
u16 vendor;
u16 device;
u8 type;
+ u8 sec;
int i;

class = read_pci_config_16(num, slot, func, PCI_CLASS_DEVICE);
@@ -684,25 +736,35 @@ static int __init check_dev_quirk(int num, int slot, int func)

type = read_pci_config_byte(num, slot, func,
PCI_HEADER_TYPE);
+
+ if ((type & 0x7f) == PCI_HEADER_TYPE_BRIDGE) {
+ sec = read_pci_config_byte(num, slot, func, PCI_SECONDARY_BUS);
+ early_pci_scan_bus(sec);
+ }
+
if (!(type & 0x80))
return -1;

return 0;
}

-void __init early_quirks(void)
+static void __init early_pci_scan_bus(int bus)
{
int slot, func;

- if (!early_pci_allowed())
- return;
-
/* Poor man's PCI discovery */
- /* Only scan the root bus */
for (slot = 0; slot < 32; slot++)
for (func = 0; func < 8; func++) {
/* Only probe function 0 on single fn devices */
- if (check_dev_quirk(0, slot, func))
+ if (check_dev_quirk(bus, slot, func))
break;
}
}
+
+void __init early_quirks(void)
+{
+ if (!early_pci_allowed())
+ return;
+
+ early_pci_scan_bus(0);
+}
diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index 38f1567..71df8f2 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>

-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/include/linux/bcma/bcma.h b/include/linux/bcma/bcma.h
index 3feb1b2..14cd6f7 100644
--- a/include/linux/bcma/bcma.h
+++ b/include/linux/bcma/bcma.h
@@ -156,6 +156,7 @@ struct bcma_host_ops {
#define BCMA_CORE_DEFAULT 0xFFF

#define BCMA_MAX_NR_CORES 16
+#define BCMA_CORE_SIZE 0x1000

/* Chip IDs of PCIe devices */
#define BCMA_CHIP_ID_BCM4313 0x4313
--
2.8.1
Chris Bainbridge
2016-05-24 23:38:58 UTC
Permalink
Post by Lukas Wunner
The reason why my previous patches didn't work on your machine is
because you're using grub, and grub contains a patch by Matthew
Garrett which puts the wireless card into power state D3hot. The
card's mmio space isn't accessible in D3hot. Included below is a
new patch which transitions the card to D0 before resetting it.
Please let me know if this fixes the issue for you.
The grub patch is in Debian but didn't prevent the issue on my setup,
presumably because of the Ubuntu "linuxefi" fallback that Matthew
previously posted about.
Post by Lukas Wunner
@Chris Bainbridge: Please test this as well, this is no longer a
plain vanilla PCI quirk but an early quirk, it should prevent any
kind of memory corruption by DMAed packets.
I tested a "no b43" kernel for 10 boots with and without the patch.

Without the patch: 7 boots had potential memory corruption (caught by
the IOMMU) and all had IRQ 17 disabled.

With the patch: no memory corruption or disabled IRQ 17.

With b43 built in to the kernel wifi worked as expected.

So the patch looks good to me. Thanks for the fix.

Tested-by: Chris Bainbridge <***@gmail.com> [MacBookPro10,2]
Bjorn Helgaas
2016-04-05 19:40:15 UTC
Permalink
[+cc Matthew]

Hi Lukas,
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
---
drivers/bcma/bcma_private.h | 2 --
drivers/pci/quirks.c | 27 +++++++++++++++++++++++++++
include/linux/bcma/bcma.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index eda0909..f642c42 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>
-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8e67802..d4fb5ee 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,8 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"
@@ -3282,6 +3284,31 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
quirk_apple_wait_for_thunderbolt);
#endif
+/*
+ * Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
+ * on boot until they are reset, causing spurious interrupts if the IRQ is
+ * shared. Apparently the EFI bootloader enables the device to phone home
+ * to Cupertino and does not disable it before passing control to the OS.
+ */
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ pr_err("b43 quirk: Cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
https://bugzilla.kernel.org/show_bug.cgi?id=111781 and
https://mjg59.dreamwidth.org/11235.html describe a sort of similar
issue, but with DMA. An interrupt from the device is probably to
signal a DMA completion, but these problem reports only mention the
"IRQ nobody cared" issue; I don't see anything about memory
corruption.

If this resets the device, I guess that should prevent future DMA as
well as future interrupts. Would pci_reset_function() do the same
thing in a more generic way?

I'm a little bit hesitant to put a quirk like this in Linux because
it's only a 90% solution. If the only problem is the interrupts, it's
probably OK since a few extra interrupts doesn't hurt anything. But
if there is also DMA that might corrupt something, a 90% solution just
makes it harder to debug for the remaining 10%.
Post by Lukas Wunner
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_BROADCOM, 0x4331, quirk_apple_b43_reset);
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
diff --git a/include/linux/bcma/bcma.h b/include/linux/bcma/bcma.h
index 0367c63..5c37b58 100644
--- a/include/linux/bcma/bcma.h
+++ b/include/linux/bcma/bcma.h
@@ -158,6 +158,7 @@ struct bcma_host_ops {
#define BCMA_CORE_DEFAULT 0xFFF
#define BCMA_MAX_NR_CORES 16
+#define BCMA_CORE_SIZE 0x1000
/* Chip IDs of PCIe devices */
#define BCMA_CHIP_ID_BCM4313 0x4313
--
2.8.0.rc3
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Büsch
2016-04-05 19:49:45 UTC
Permalink
On Tue, 5 Apr 2016 14:40:15 -0500
Post by Bjorn Helgaas
[+cc Matthew]
Hi Lukas,
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
---
drivers/bcma/bcma_private.h | 2 --
drivers/pci/quirks.c | 27 +++++++++++++++++++++++++++
include/linux/bcma/bcma.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index eda0909..f642c42 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>
-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8e67802..d4fb5ee 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,8 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"
@@ -3282,6 +3284,31 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
quirk_apple_wait_for_thunderbolt);
#endif
+/*
+ * Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
+ * on boot until they are reset, causing spurious interrupts if the IRQ is
+ * shared. Apparently the EFI bootloader enables the device to phone home
+ * to Cupertino and does not disable it before passing control to the OS.
+ */
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ pr_err("b43 quirk: Cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
https://bugzilla.kernel.org/show_bug.cgi?id=111781 and
https://mjg59.dreamwidth.org/11235.html describe a sort of similar
issue, but with DMA. An interrupt from the device is probably to
signal a DMA completion, but these problem reports only mention the
"IRQ nobody cared" issue; I don't see anything about memory
corruption.
If this resets the device, I guess that should prevent future DMA as
well as future interrupts. Would pci_reset_function() do the same
thing in a more generic way?
I'm a little bit hesitant to put a quirk like this in Linux because
it's only a 90% solution. If the only problem is the interrupts, it's
probably OK since a few extra interrupts doesn't hurt anything. But
if there is also DMA that might corrupt something, a 90% solution just
makes it harder to debug for the remaining 10%.
This was already discussed in this thread.

It is not a 90% solution. It fixes the IRQ issue. It is not supposed to
fix the DMA issue. And it can't. It does mitigate it a tiny bit though.
--
Michael
Bjorn Helgaas
2016-04-06 13:31:51 UTC
Permalink
Post by Michael Büsch
On Tue, 5 Apr 2016 14:40:15 -0500
Post by Bjorn Helgaas
[+cc Matthew]
Hi Lukas,
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
---
drivers/bcma/bcma_private.h | 2 --
drivers/pci/quirks.c | 27 +++++++++++++++++++++++++++
include/linux/bcma/bcma.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index eda0909..f642c42 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>
-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8e67802..d4fb5ee 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,8 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"
@@ -3282,6 +3284,31 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
quirk_apple_wait_for_thunderbolt);
#endif
+/*
+ * Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
+ * on boot until they are reset, causing spurious interrupts if the IRQ is
+ * shared. Apparently the EFI bootloader enables the device to phone home
+ * to Cupertino and does not disable it before passing control to the OS.
+ */
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ pr_err("b43 quirk: Cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
https://bugzilla.kernel.org/show_bug.cgi?id=111781 and
https://mjg59.dreamwidth.org/11235.html describe a sort of similar
issue, but with DMA. An interrupt from the device is probably to
signal a DMA completion, but these problem reports only mention the
"IRQ nobody cared" issue; I don't see anything about memory
corruption.
If this resets the device, I guess that should prevent future DMA as
well as future interrupts. Would pci_reset_function() do the same
thing in a more generic way?
I'm a little bit hesitant to put a quirk like this in Linux because
it's only a 90% solution. If the only problem is the interrupts, it's
probably OK since a few extra interrupts doesn't hurt anything. But
if there is also DMA that might corrupt something, a 90% solution just
makes it harder to debug for the remaining 10%.
This was already discussed in this thread.
It is not a 90% solution. It fixes the IRQ issue. It is not supposed to
fix the DMA issue. And it can't. It does mitigate it a tiny bit though.
Even for interrupts, it's only a 90% solution because we do still get
a few interrupts before the quirk runs. There may not be enough to
trigger the "IRQ nobody cared" check, but that check is based on an
arbitrary number that happens to be more than the number of interrupts
we see, so it's basically still a race. But as I said, if interrupts
are the *only* problem, this is probably OK.

DMA is a bigger problem. If there is a DMA issue, I don't want to fix
part of it. If there is no DMA issue, I'd like to know how we *know*
there's no issue, and why we're still seeing interrupts with no DMA.
If there's a separate fix for the DMA issue, I'd like to know why we
can't fix the interrupt issue the same place we fixed the DMA issue.

Bjorn
Michael Büsch
2016-04-06 15:17:24 UTC
Permalink
On Wed, 6 Apr 2016 08:31:51 -0500
Post by Bjorn Helgaas
Even for interrupts, it's only a 90% solution because we do still get
a few interrupts before the quirk runs. There may not be enough to
trigger the "IRQ nobody cared" check,
If no driver requested the shared interrupt yet, it should be disabled
in the interrupt controller. So the interrupts do not reach the CPU.
The interrupt storm (on the CPU) starts as soon as some driver
that shares the interrupt with b43 requests and thus enables the
interrupt. And that always happens after the PCI fixup. Thus this is
safe.
--
Michael
Lukas Wunner
2016-04-06 21:36:08 UTC
Permalink
Hi,
Post by Michael Büsch
Post by Bjorn Helgaas
Even for interrupts, it's only a 90% solution because we do still get
a few interrupts before the quirk runs. There may not be enough to
trigger the "IRQ nobody cared" check,
If no driver requested the shared interrupt yet, it should be disabled
in the interrupt controller. So the interrupts do not reach the CPU.
The interrupt storm (on the CPU) starts as soon as some driver
that shares the interrupt with b43 requests and thus enables the
interrupt. And that always happens after the PCI fixup. Thus this is
safe.
Yes, that is correct. I only see 67 interrupts for IRQ 17 with the
PCI quirk compiled in, all of which seem to come from the initialization
of pciehp, mmc and the hda sound card (which share the IRQ with b43 on my
machine).

Thanks,

Lukas
Matthew Garrett
2016-04-05 19:59:46 UTC
Permalink
Post by Bjorn Helgaas
https://bugzilla.kernel.org/show_bug.cgi?id=111781 and
https://mjg59.dreamwidth.org/11235.html describe a sort of similar
issue, but with DMA. An interrupt from the device is probably to
signal a DMA completion, but these problem reports only mention the
"IRQ nobody cared" issue; I don't see anything about memory
corruption.
I "fixed" this with
https://github.com/mjg59/grub-fedora/commit/21fcd6d79b7601e4b20ad70c5408adff2dabbc1d
- doing the same in the kernel EFI stub would probably be the best way
to handle it. This way you're guaranteed to stop DMA before the kernel
reclaims boot services memory, which guarantees you won't have any
corruption.
--
Matthew Garrett | ***@srcf.ucam.org
Andrew Worsley
2016-04-06 11:28:27 UTC
Permalink
That patch appears to be the grub 1 equivalent to grub2
git://git.savannah.gnu.org/grub.git rev 9d34bb8 which puts the network
device into D3 power state.

I am running grub2 with that patch and it doesn't fix my irq 17 problem.

Can we not fix this in grub2 via something like Lukas original patch
or disable any DMA transfers before the kernel starts?

Andrew
Post by Matthew Garrett
Post by Bjorn Helgaas
https://bugzilla.kernel.org/show_bug.cgi?id=111781 and
https://mjg59.dreamwidth.org/11235.html describe a sort of similar
issue, but with DMA. An interrupt from the device is probably to
signal a DMA completion, but these problem reports only mention the
"IRQ nobody cared" issue; I don't see anything about memory
corruption.
I "fixed" this with
https://github.com/mjg59/grub-fedora/commit/21fcd6d79b7601e4b20ad70c5408adff2dabbc1d
- doing the same in the kernel EFI stub would probably be the best way
to handle it. This way you're guaranteed to stop DMA before the kernel
reclaims boot services memory, which guarantees you won't have any
corruption.
Lukas Wunner
2016-04-06 21:30:29 UTC
Permalink
[+cc Matt Fleming, linux-efi]

Hi Bjorn,
Post by Bjorn Helgaas
Post by Lukas Wunner
Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
on boot until they are reset, causing spurious interrupts if the IRQ is
shared. Apparently the EFI bootloader enables the device and does not
disable it before passing control to the OS. The bootloader contains a
driver for the wireless card which allows it to phone home to Cupertino.
This is used for Internet Recovery (download and install OS X images)
and probably also for Back to My Mac (remote access, RFC 6281) and to
discover stolen hardware.
The issue is most pronounced on 2011 and 2012 MacBook Pros where the IRQ
is shared with 3 other devices (Light Ridge Thunderbolt controller, SDXC
reader, HDA card on discrete GPU). As soon as an interrupt handler is
installed for one of these devices, the ensuing storm of spurious IRQs
causes the kernel to disable the IRQ and switch to polling. This lasts
until the b43 driver loads and resets the device.
Loading the b43 driver first is not always an option, in particular with
the Light Ridge Thunderbolt controller: The PCI hotplug IRQ handler gets
installed early on because it is built in, unlike b43 which is usually
a module.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632
---
drivers/bcma/bcma_private.h | 2 --
drivers/pci/quirks.c | 27 +++++++++++++++++++++++++++
include/linux/bcma/bcma.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/bcma/bcma_private.h b/drivers/bcma/bcma_private.h
index eda0909..f642c42 100644
--- a/drivers/bcma/bcma_private.h
+++ b/drivers/bcma/bcma_private.h
@@ -8,8 +8,6 @@
#include <linux/bcma/bcma.h>
#include <linux/delay.h>
-#define BCMA_CORE_SIZE 0x1000
-
#define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8e67802..d4fb5ee 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,8 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/bcma/bcma.h>
+#include <linux/bcma/bcma_regs.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"
@@ -3282,6 +3284,31 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_INTEL, 0x156d,
quirk_apple_wait_for_thunderbolt);
#endif
+/*
+ * Broadcom 4331 wireless cards built into Apple Macs unleash an IRQ storm
+ * on boot until they are reset, causing spurious interrupts if the IRQ is
+ * shared. Apparently the EFI bootloader enables the device to phone home
+ * to Cupertino and does not disable it before passing control to the OS.
+ */
+static void quirk_apple_b43_reset(struct pci_dev *dev)
+{
+ void __iomem *mmio;
+
+ if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc.") || !dev->bus->self ||
+ pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+ return;
+
+ mmio = pci_iomap(dev, 0, 0);
+ if (!mmio) {
+ pr_err("b43 quirk: Cannot iomap device, IRQ storm ahead\n");
+ return;
+ }
+ iowrite32(BCMA_RESET_CTL_RESET,
+ mmio + (1 * BCMA_CORE_SIZE) + BCMA_RESET_CTL);
+ pci_iounmap(dev, mmio);
https://bugzilla.kernel.org/show_bug.cgi?id=111781 and
https://mjg59.dreamwidth.org/11235.html describe a sort of similar
issue, but with DMA. An interrupt from the device is probably to
signal a DMA completion, but these problem reports only mention the
"IRQ nobody cared" issue; I don't see anything about memory
corruption.
If this resets the device, I guess that should prevent future DMA as
well as future interrupts. Would pci_reset_function() do the same
thing in a more generic way?
Matthew's grub patch puts the wireless card into D3hot, which does not
stop the interrupts. I have tested this. (It may stop the DMA, I cannot
verify that as it doesn't occur on my machine for lack of an access point.)

Calling pci_reset_function() does stop the interrupts. More specifically,
clearing the command register in pci_dev_save_and_disable() does. However
the b43/bcma driver subsequently locks up the machine on module load.
The closed source broadcom-sta driver loads fine. (I'm not even sure it's
worth fixing this anomalous use case in b43/bcma.)

It should be noted that clearing the command register merely cures the
symptom, but not the cause. The wireless core keeps generating interrupts
but they're not coming through because DisINTx is set. I think resetting
the wireless core, as I've done in the patch above, is preferable. Putting
on my tinfoil hat for a moment, this feels like EFI phoning home to Apple
and passing control to the OS with Cupertino still on the line. Seems like
a good idea to hang up that call firmly on boot.
Post by Bjorn Helgaas
I'm a little bit hesitant to put a quirk like this in Linux because
it's only a 90% solution. If the only problem is the interrupts, it's
probably OK since a few extra interrupts doesn't hurt anything. But
if there is also DMA that might corrupt something, a 90% solution just
makes it harder to debug for the remaining 10%.
Matthew mentions in his blog post that the DMA occurs in an EFI boot
services data area. According to the spec the OS is free to use that
memory after ExitBootServices(), but a lot of EFI implementations do
not adhere to that rule. Apple is not alone there. That's why the kernel
delays making that memory available to the page allocator, but it doesn't
delay it long enough, i.e. not until PCI header fixups are run, which is
the earliest point where we could reset the wireless core from the PCI
subsystem. (I.e, immediately after BAR scanning.)

More specifically, arch/x86/platform/efi/quirks.c:efi_free_boot_services()
makes EFI boot services memory available to the page allocator and this
is called near the end of init/main.c:start_kernel(). *Afterwards*, the
initcalls are executed via rest_init() -> kernel_init() ->
kernel_init_freeable() -> do_basic_setup() -> do_initcalls(). In particular,
PCI header fixups are executed in the subsys_initcall acpi_init().

Some of the possible solutions are:

(1) Delay efi_free_boot_services() further, e.g. until late_initcall.

(2) Find out exactly which EFI boot services area is used for DMA and
amend arch/x86/platform/efi/quirks.c:efi_apply_memmap_quirks() to
assign that area a different type if dmi_match(DMI_SYS_VENDOR,
"Apple Inc."). That way it's not freed by efi_free_boot_services().
Add a late_initcall which calls free_bootmem() for that area.

(3) In lieu of a PCI quirk, add an EFI quirk which resets the wireless
core using efi_pci_io_protocol.mem.write().

(4) It's conceivable that Apple has provided an EFI protocol for the
wireless card. If it exists, it might allow us to reset it or
disassociate from the access point, which should stop the DMA.

@Matt Fleming: Do you have a preference or recommendation for either of
these options?

I'm not sure how quickly I can come up with a patch for (3) or how long
it would take to reverse-engineer (4).

Thanks,

Lukas
Post by Bjorn Helgaas
Post by Lukas Wunner
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_BROADCOM, 0x4331, quirk_apple_b43_reset);
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
diff --git a/include/linux/bcma/bcma.h b/include/linux/bcma/bcma.h
index 0367c63..5c37b58 100644
--- a/include/linux/bcma/bcma.h
+++ b/include/linux/bcma/bcma.h
@@ -158,6 +158,7 @@ struct bcma_host_ops {
#define BCMA_CORE_DEFAULT 0xFFF
#define BCMA_MAX_NR_CORES 16
+#define BCMA_CORE_SIZE 0x1000
/* Chip IDs of PCIe devices */
#define BCMA_CHIP_ID_BCM4313 0x4313
--
2.8.0.rc3
Matthew Garrett
2016-04-06 22:19:43 UTC
Permalink
Post by Lukas Wunner
Matthew's grub patch puts the wireless card into D3hot, which does not
stop the interrupts. I have tested this. (It may stop the DMA, I cannot
verify that as it doesn't occur on my machine for lack of an access point.)
Ah, interesting. One thing to note (and which I've just noticed) is that
it looks like this fixup is only called when you boot with the "linux"
command and not with the "linuxefi" command. If you're using Ubuntu,
"linux" may fall back to "linuxefi" without executing this fixup. So we
need to add it to the kernel EFI stub anyway.
--
Matthew Garrett | ***@srcf.ucam.org
Matt Fleming
2016-04-09 12:00:55 UTC
Permalink
Post by Lukas Wunner
More specifically, arch/x86/platform/efi/quirks.c:efi_free_boot_services()
makes EFI boot services memory available to the page allocator and this
is called near the end of init/main.c:start_kernel(). *Afterwards*, the
initcalls are executed via rest_init() -> kernel_init() ->
kernel_init_freeable() -> do_basic_setup() -> do_initcalls(). In particular,
PCI header fixups are executed in the subsys_initcall acpi_init().
(1) Delay efi_free_boot_services() further, e.g. until late_initcall.
I'd prefer not to do this because boot services regions can account
for a large amount of memory (multiple gigabytes).
Post by Lukas Wunner
(2) Find out exactly which EFI boot services area is used for DMA and
amend arch/x86/platform/efi/quirks.c:efi_apply_memmap_quirks() to
assign that area a different type if dmi_match(DMI_SYS_VENDOR,
"Apple Inc."). That way it's not freed by efi_free_boot_services().
Add a late_initcall which calls free_bootmem() for that area.
If it is possible to find out which boot services region is the
problematic one, this would be the best solution. If there's any way
to tie it into the PCI quirk, that would be even better so we don't
need to maintain the blacklist in two places.
Lukas Wunner
2016-04-24 16:58:58 UTC
Permalink
Hi,
Post by Matt Fleming
Post by Lukas Wunner
(2) Find out exactly which EFI boot services area is used for DMA and
amend arch/x86/platform/efi/quirks.c:efi_apply_memmap_quirks() to
assign that area a different type if dmi_match(DMI_SYS_VENDOR,
"Apple Inc."). That way it's not freed by efi_free_boot_services().
Add a late_initcall which calls free_bootmem() for that area.
If it is possible to find out which boot services region is the
problematic one, this would be the best solution. If there's any way
to tie it into the PCI quirk, that would be even better so we don't
need to maintain the blacklist in two places.
I've since cooked up an experimental patch which resets the Broadcom 4331
wireless card from arch/x86/kernel/early-quirks.c, and another which does
the same from arch/x86/boot/compressed/eboot.c. Both approaches stopped
the spurious interrupts for me and should also stop the DMA'ing.

In the latter case, the machine locked up immediately after resetting the
card. Suspecting that the EFI driver doesn't digest the reset well,
I tried kicking it off the card with DisconnectController() first.
This worked, although it returns EFI_NOT_FOUND, which is strange as that
error isn't defined for DisconnectController() per the spec.

I then tested if DisconnectController() alone is already sufficient
(i.e. without resetting the card by writing to its mmio).
Guess what, it is. So the EFI driver seems to leave the card in a
sane state if it's unloaded.

I then realized that the Simple Network protocol is supported for both
the wireless card as well as the Ethernet card built into my MacBook Pro.
What I'll try next is to iterate over all handles that support that
protocol and just Stop() the interface if it's been brought up by EFI.

Perhaps OS X supports some kind of connection handover from EFI.
That would explain why they leave the card enabled.

Best regards,

Lukas
Loading...