1. 25 2月, 2012 3 次提交
    • Y
      PCI: print out suggestion about using pci=realloc · eb572e7c
      Yinghai Lu 提交于
      let user know they could try if pci=realloc could help.
      
      -v2: update suggestion text.
      Suggested-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      eb572e7c
    • Y
      PCI: prepare pci=realloc for multiple options · b55438fd
      Yinghai Lu 提交于
      Let the user could enable and disable with pci=realloc=on or pci=realloc=off
      
      Also
      1. move variable and functions near the place they are used.
      2. change macro to function
      3. change related functions and variable to static and _init
      4. update parameter description accordingly.
      
      This will let us add a config option to control default behavior, and
      still allow the user to turn off automatic reallocation if it fails on
      their platform until a permanent solution is found.
      
      -v2: still honor pci=realloc, and treat it as pci=realloc=on
           also use enum instead of ...
      -v3: update kernel-paramenters.txt according to Jesse.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      b55438fd
    • Y
      PCI: Retry on IORESOURCE_IO type allocations · 0c5be0cb
      Yinghai Lu 提交于
      When enabling pci reallocation for a pci bridge, we clear the small size
      in in bridge and re-assign with requested + optional size for first
      several tries, but Ram mention could have problem with one case:
      	https://bugzilla.kernel.org/show_bug.cgi?id=15960
      
      After checking the booting log in
      	https://lkml.org/lkml/2010/4/19/44
      	[regression, bisected] Xonar DX invalid PCI I/O range since 977d17bb
      
      We should not stop too early for io ports.
      	Apr 19 10:19:38 [kernel] pci 0000:04:00.0: BAR 7: can't assign io (size 0x4000)
      	Apr 19 10:19:38 [kernel] pci 0000:05:01.0: BAR 8: assigned [mem 0x80400000-0x805fffff]
      	Apr 19 10:19:38 [kernel] pci 0000:05:01.0: BAR 7: can't assign io (size 0x2000)
      	Apr 19 10:19:38 [kernel] pci 0000:05:02.0: BAR 7: can't assign io (size 0x1000)
      	Apr 19 10:19:38 [kernel] pci 0000:05:03.0: BAR 7: can't assign io (size 0x1000)
      	Apr 19 10:19:38 [kernel] pci 0000:08:00.0: BAR 7: can't assign io (size 0x1000)
      	Apr 19 10:19:38 [kernel] pci 0000:09:04.0: BAR 0: can't assign io (size 0x100)
      and clear 00:1c.0 to retry again.
      
      This patch removes IORESOUCE_IO checking, and tries one more time.  It
      gives us a chance to get an allocation for the 00:1c.0 io port range
      because the range from 0x4000 to 0x8000 will be freed and we can use it.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      0c5be0cb
  2. 24 2月, 2012 7 次提交
    • M
      PCI: Add pcie_hp=nomsi to disable MSI/MSI-X for pciehp driver · 7570a333
      MUNEDA Takahiro 提交于
      Add a parameter to avoid using MSI/MSI-X for PCIe native hotplug; it's
      known to be buggy on some platforms.
      
      In my environment, while shutting down, following stack trace is shown
      sometimes.
      
        irq 16: nobody cared (try booting with the "irqpoll" option)
        Pid: 1081, comm: reboot Not tainted 3.2.0 #1
        Call Trace:
         <IRQ>  [<ffffffff810cec1d>] __report_bad_irq+0x3d/0xe0
         [<ffffffff810cee1c>] note_interrupt+0x15c/0x210
         [<ffffffff810cc485>] handle_irq_event_percpu+0xb5/0x210
         [<ffffffff810cc621>] handle_irq_event+0x41/0x70
         [<ffffffff810cf675>] handle_fasteoi_irq+0x55/0xc0
         [<ffffffff81015356>] handle_irq+0x46/0xb0
         [<ffffffff814fbe9d>] do_IRQ+0x5d/0xe0
         [<ffffffff814f146e>] common_interrupt+0x6e/0x6e
         [<ffffffff8106b040>] ? __do_softirq+0x60/0x210
         [<ffffffff8108aeb1>] ? hrtimer_interrupt+0x151/0x240
         [<ffffffff814fb5ec>] call_softirq+0x1c/0x30
         [<ffffffff810152d5>] do_softirq+0x65/0xa0
         [<ffffffff8106ae9d>] irq_exit+0xbd/0xe0
         [<ffffffff814fbf8e>] smp_apic_timer_interrupt+0x6e/0x99
         [<ffffffff814f9e5e>] apic_timer_interrupt+0x6e/0x80
         <EOI>  [<ffffffff814f0fb1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
         [<ffffffff812629fc>] pci_bus_write_config_word+0x6c/0x80
         [<ffffffff81266fc2>] pci_intx+0x52/0xa0
         [<ffffffff8127de3d>] pci_intx_for_msi+0x1d/0x30
        [<ffffffff8127e4fb>] pci_msi_shutdown+0x7b/0x110
         [<ffffffff81269d34>] pci_device_shutdown+0x34/0x50
         [<ffffffff81326c4f>] device_shutdown+0x2f/0x140
         [<ffffffff8107b981>] kernel_restart_prepare+0x31/0x40
         [<ffffffff8107b9e6>] kernel_restart+0x16/0x60
         [<ffffffff8107bbfd>] sys_reboot+0x1ad/0x220
         [<ffffffff814f4b90>] ? do_page_fault+0x1e0/0x460
         [<ffffffff811942d0>] ? __sync_filesystem+0x90/0x90
         [<ffffffff8105c9aa>] ? __cond_resched+0x2a/0x40
         [<ffffffff814ef090>] ? _cond_resched+0x30/0x40
         [<ffffffff81169e17>] ? iterate_supers+0xb7/0xd0
         [<ffffffff814f9382>] system_call_fastpath+0x16/0x1b
        handlers:
        [<ffffffff8138a0f0>] usb_hcd_irq
        [<ffffffff8138a0f0>] usb_hcd_irq
        [<ffffffff8138a0f0>] usb_hcd_irq
        Disabling IRQ #16
      
      An un-wanted interrupt is generated when PCI driver switches from
      MSI/MSI-X to INTx while shutting down the device.  The interrupt does
      not happen if MSI/MSI-X is not used on the device.
      I confirmed that this problem does not happen if pcie_hp=nomsi was
      specified and hotplug operation worked fine as usual.
      
      v2: Automatically disable MSI/MSI-X against following device:
          PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02)
      v3: Based on the review comment, combile the if statements.
      v4: Removed module parameter.
          Move some code to build pciehp as a module.
          Move device specific code to driver/pci/quirks.c.
      v5: Drop a device specific code until getting a vendor statement.
      Reviewed-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      Signed-off-by: NMUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      7570a333
    • Y
      PCI: move pci_find_saved_cap out of linux/pci.h · 34a4876e
      Yinghai Lu 提交于
      Only one user in driver/pci/pci.c, so we don't need to put it in global
      pci.h
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      34a4876e
    • Y
      PCI: fix memleak for pci dev removing during hotplug · f796841e
      Yinghai Lu 提交于
      unreferenced object 0xffff880276d17700 (size 64):
        comm "swapper/0", pid 1, jiffies 4294897182 (age 3976.028s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 18 f9 de 76 02 88 ff ff  ...........v....
          10 00 00 00 0e 00 00 00 0f 28 40 00 00 00 00 00  .........(@.....
        backtrace:
          [<ffffffff81c8aede>] kmemleak_alloc+0x26/0x43
          [<ffffffff811385f0>] __kmalloc+0x121/0x183
          [<ffffffff813cf821>] pci_add_cap_save_buffer+0x35/0x7c
          [<ffffffff813d12b7>] pci_allocate_cap_save_buffers+0x1d/0x65
          [<ffffffff813cdb52>] pci_device_add+0x92/0xf1
          [<ffffffff81c8afe6>] pci_scan_single_device+0x9f/0xa1
          [<ffffffff813cdbd2>] pci_scan_slot.part.20+0x21/0x106
          [<ffffffff813cdce2>] pci_scan_slot+0x2b/0x35
          [<ffffffff81c8dae4>] __pci_scan_child_bus+0x51/0x107
          [<ffffffff81c8d75b>] pci_scan_bridge+0x376/0x6ae
          [<ffffffff81c8db60>] __pci_scan_child_bus+0xcd/0x107
          [<ffffffff81c8dbab>] pci_scan_child_bus+0x11/0x2a
          [<ffffffff81cca58c>] pci_acpi_scan_root+0x18b/0x21c
          [<ffffffff81c916be>] acpi_pci_root_add+0x1e1/0x42a
          [<ffffffff81406210>] acpi_device_probe+0x50/0x190
          [<ffffffff814a0227>] really_probe+0x99/0x126
      
      Need to free saved_buffer for capabilities.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      f796841e
    • Y
      PCI: Fix device class print out · 2dd8ba92
      Yinghai Lu 提交于
      Found debug print of class is shifted.
      
      | pci 0000:f8:15.2: [8086:2b56] type 0 class 0x000600
      
      Code is trying to print class with 6 digits, but use shifted class with
      4 digits valid value as variable.
      
      Change to original dev->class directly.
      
      Also remove not needed calculating of local variable class, because it
      will be updated after pci_fixup_device(pci_fixup_early...)
      
      Also unify type print out when class and header is not matched.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      2dd8ba92
    • Y
      PCI: Skip cardbus assigned resource reset during pci bus rescan · 3796f1e2
      Yinghai Lu 提交于
      Otherwise when rescan is used for cardbus, assigned resources will get
      cleared.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Tested-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      3796f1e2
    • Y
      PCI: Fix "cardbus bridge resources as optional" size handling · 11848934
      Yinghai Lu 提交于
      We should not set the requested size to -2; that will confuse the
      resource list sorting with align when SIZEALIGN is used.
      
      Change to STARTALIGN and pass align from start;  we are safe to do that
      just as we do that regular pci bridge.  In the long run, we should just
      treat cardbus like a regular pci bridge.
      
      Also fix the case when realloc_head is not passed: we should keep the
      requested size.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Tested-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      11848934
    • Y
      PCI: Disable cardbus bridge MEM1 prefetchable bit · dcef0d06
      Yinghai Lu 提交于
      Some BIOSes enable prefetch on both MEM0 and MEM1.  But the cardbus code
      assumes MEM1 is non-pref...
      
      Discussion could be found at:
      	https://lkml.org/lkml/2012/1/12/1
      	https://bugzilla.kernel.org/show_bug.cgi?id=41622#c23Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Tested-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      dcef0d06
  3. 18 2月, 2012 3 次提交
  4. 15 2月, 2012 27 次提交
    • T
      PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs · f67fd55f
      Thomas Jarosch 提交于
      Some BIOS implementations leave the Intel GPU interrupts enabled,
      even though no one is handling them (f.e. i915 driver is never loaded).
      Additionally the interrupt destination is not set up properly
      and the interrupt ends up -somewhere-.
      
      These spurious interrupts are "sticky" and the kernel disables
      the (shared) interrupt line after 100.000+ generated interrupts.
      
      Fix it by disabling the still enabled interrupts.
      This resolves crashes often seen on monitor unplug.
      
      Tested on the following boards:
      - Intel DH61CR: Affected
      - Intel DH67BL: Affected
      - Intel S1200KP server board: Affected
      - Asus P8H61-M LE: Affected, but system does not crash.
        Probably the IRQ ends up somewhere unnoticed.
      
      According to reports on the net, the Intel DH61WW board is also affected.
      
      Many thanks to Jesse Barnes from Intel for helping
      with the register configuration and to Intel in general
      for providing public hardware documentation.
      Signed-off-by: NThomas Jarosch <thomas.jarosch@intra2net.com>
      Tested-by: NCharlie Suffin <charlie.suffin@stratus.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      f67fd55f
    • A
      PCI: Annotate PCI quirks in initcall_debug style · 3209874a
      Arjan van de Ven 提交于
      While diagnosing some boot time issues on a platform, all that I
      could see in the bootgraph/dmesg was that the system was spending
      a lot of time in applying one or more PCI quirks... which
      was virtually undebuggable.
      
      This patch adds printk's in "initcall_debug" style to the dmesg,
      which are added when the user asks for the initcall_debug
      (the nr one tool to use when debugging boot hangs or boot time issues)
      kernel command line option.
      
      v2: add #includes so quirks can build on non-x86
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      3209874a
    • D
      PCI hotplug: cpcihp: fix debug module parameter to be bool · 309c6651
      Danny Kukawka 提交于
      Fix debug variable from module parameter to be really bool to
      fix 'warning: return from incompatible pointer type'.
      Acked-by: NScott Murray <scott@spiteful.org>
      Signed-off-by: NDanny Kukawka <danny.kukawka@bisect.de>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      309c6651
    • K
      PCI: check for pci bar restore completion and retry · 26f41062
      Kay, Allen M 提交于
      On some OEM systems, pci_restore_state() is called while FLR has not yet
      completed.  As a result, PCI BAR register restore is not successful.  This fix
      reads back the restored value and compares it with saved value and re-tries 10
      times before giving up.
      Signed-off-by: NJean Guyader <jean.guyader@eu.citrix.com>
      Signed-off-by: NEric Chanudet <eric.chanudet@citrix.com>
      Signed-off-by: NAllen Kay <allen.m.kay@intel.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      26f41062
    • Y
      PCI: pciehp: Disable/enable link during slot power off/on · 2debd928
      Yinghai Lu 提交于
      On a system with a repeater on the system board to support gen2 hotplug,
      we found that when an ExpressModule is removed from some slots,
      /var/log/messages will be full of "card present/not present" warnings.
      
      It turns out the root complex is continually trying to train the link to
      the repeater because the repeater has not been reset.
      
      This patch will disable the link at removal time to allow the repeater
      to be reset properly.  This also prevents a potential AER message at
      removal time.
      
      Also, when testing hotplug on a system under development, we found if we
      boot the system without an EM installed, and later hot-add an EM, it
      does not work with Linux, but another OS is ok.  The root cause is that
      BIOS left link disabled when slot was empty at boot time, and other OS
      is modifying the link disable bit in link ctrl during power on/off.
      
      So we should do the same thing to disable/enable link during power off/on.
      
      -v2: check link DLLA bit instead of 100ms waiting.
           Separate link disable/enable functions to another patch.
      Signed-off-by: NYinghai Lu <yinghai.lu@oracle.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      2debd928
    • Y
      PCI: pciehp: Add Disable/enable link functions · 7f822999
      Yinghai Lu 提交于
      Will use it during power off/on of slots
      Signed-off-by: NYinghai Lu <yinghai.lu@oracle.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      7f822999
    • Y
      PCI: pciehp: Add pcie_wait_link_not_active() · bffe4f72
      Yinghai Lu 提交于
      Will use it for link disable status checking.
      Signed-off-by: NYinghai Lu <yinghai.lu@oracle.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      bffe4f72
    • Y
      PCI: pciehp: make check_link_active more helpful · 4e2ce405
      Yinghai Lu 提交于
      A few changes:
        - remove the 'inline' and let the complier decide
        - return a bool to indicate whether the link was active
        - add a debug message to indicate link state when it beocmes active
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      4e2ce405
    • Y
      PCI: pciehp: replace unconditional sleep with config space access check · 2f5d8e4f
      Yinghai Lu 提交于
      During reviewing
      |	PCI: pciehp: wait 1000 ms before Link Training check
      Linus said:
      >...
      > That's a *long* time, and it's irritating to the user. It makes the
      > user think "the machine is slow".
      >...
      > And quite frankly, an unconditional one-second delay here seems bad.
      >Two seconds was unacceptable, one second is just bad.
      
      Try to access the pci conf of a pci device that is supposed to show up
      in 1s.  If we can read back a valid vendor/device id, we can return
      early.
      
      Related discussion could be found:
      	https://lkml.org/lkml/2011/12/6/339
      
      -v2: seperate code to pci_bus_read_dev_vendor_id() from pci_scan_device()
          and reuse it from pciehp code. Suggested by Matthew Wilcox.
      -v3: According to Kenj, don't use array in stack, and don't wait too long
          for crs, also return fail status if not found.
          Also separate pci_bus_dev_read_vendor_id() change to another patch.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      2f5d8e4f
    • Y
      PCI: Separate pci_bus_read_dev_vendor_id from pci_scan_device · efdc87da
      Yinghai Lu 提交于
      We can reuse it for pciehp probing.
      
      -v2: according to Kenji, fix crs timeout checking, and export the function
           for later use when pciehp is compiled as a module.
      Suggested-by: NMatthew Wilcox <matthew@wil.cx>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      efdc87da
    • Y
      PCI: make sriov work with hotplug remove · ac205b7b
      Yinghai Lu 提交于
      When hot removing a pci express module that has a pcie switch and supports
      SRIOV, we got:
      
      [ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
      [ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
      [ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
      [ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
      [ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
      [ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
      [ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
      [ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
      [ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
      [ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
      [ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
      [ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
      [ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
      [ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
      [ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
      [ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
      [ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
      [ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
      [ 5926.133377] PGD 0
      [ 5926.135402] Oops: 0000 [#1] SMP
      [ 5926.138659] CPU 2
      [ 5926.140499] Modules linked in:
      ...
      [ 5926.143754]
      [ 5926.275823] Call Trace:
      [ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
      [ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
      [ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
      [ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
      [ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
      [ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
      ...
      
       +-[0000:80]-+-00.0-[81-8f]--
       |           +-01.0-[90-9f]--
       |           +-02.0-[a0-af]--
       |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
       |           |                               |            +-00.1 Device
       |           |                               |            +-00.2 Device
       |           |                               |            \-00.3 Device
       |           |                               \-03.0-[b3]--+-00.0 Device
       |           |                                            +-00.1 Device
       |           |                                            +-00.2 Device
       |           |                                            \-00.3 Device
      
      root complex: 80:02.2
      pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
      end devices  are b2:00.0 and b3.00.0.
      VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3
      
      Root cause: when doing pci_stop_bus_device() with phys fn, it will stop
      virt fn and remove the fn, so
      	list_for_each_safe(l, n, &bus->devices)
      will have problem to refer freed n that is pointed to vf entry.
      
      Solution is just replacing list_for_each_safe() with
      list_for_each_prev_safe().  This will make sure we can get valid n pointer
      to PF instead of the freed VF pointer (because newly added devices are
      inserted to the bus->devices list tail).
      
      During reviewing the patch, Bjorn said:
      |   The PCI hot-remove path calls pci_stop_bus_devices() via
      |   pci_remove_bus_device().
      |
      |   pci_stop_bus_devices() traverses the bus->devices list (point A below),
      |   stopping each device in turn, which calls the driver remove() method.  When
      |   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
      |   also uses pci_remove_bus_device() to remove the VF devices from the
      |   bus->devices list (point B).
      |
      |       pci_remove_bus_device
      |         pci_stop_bus_device
      |           pci_stop_bus_devices(subordinate)
      |             list_for_each(bus->devices)             <-- A
      |               pci_stop_bus_device(PF)
      |                 ...
      |                   driver->remove
      |                     pci_disable_sriov
      |                       ...
      |                         pci_remove_bus_device(VF)
      |                             <remove from bus_list>  <-- B
      |
      |   At B, we're changing the same list we're iterating through at A, so when
      |   the driver remove() method returns, the pci_stop_bus_devices() iterator has
      |   a pointer to a list entry that has already been freed.
      
      Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141
      				 https://lkml.org/lkml/2012/1/23/360
      
      -v5: According to Linus to make remove more robust, Change to
           list_for_each_prev_safe instead. That is more reasonable, because
           those devices are added to tail of the list before.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      ac205b7b
    • Y
      PCI: remove add_to_failed_list() · 67cc7e26
      Yinghai Lu 提交于
      Only one user; just use add_to_list instead.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      67cc7e26
    • Y
      PCI: add debug print out for add_size · b592443d
      Yinghai Lu 提交于
      For use in debugging resource reallocation.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      b592443d
    • Y
      PCI: make free_list() into a function · bffc56d4
      Yinghai Lu 提交于
      After merging struct pci_dev_resource_x and pci_dev_resource,
      We can use a function instead of macro now.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      bffc56d4
    • Y
      PCI: Rename dev_res_x to add_res or fail_res · b9b0bba9
      Yinghai Lu 提交于
      Linus says don't use dev_res_x because it doesn't communicate anything
      about usage.  Rename them to add_res or fail_res etc according to
      context.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      b9b0bba9
    • Y
      PCI: Merge pci_dev_resource_x and pci_dev_resource · 764242a0
      Yinghai Lu 提交于
      pci_dev_resource_x is a superset of pci_dev_resource and they're just
      temp structs used during resource reallocation.
      
      pci_dev_resource usage is quite limted.
      
      So just use pci_dev_resource_x, and rename it as new pci_dev_resource.
      
      -v2: According to Linus, Separate free_list change to another patch
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      764242a0
    • Y
      PCI: Replace resource_list with generic list · bdc4abec
      Yinghai Lu 提交于
      So we can use helper functions for generic list.  This makes the
      resource re-allocation code much more readable.
      
      -v2: Use list_add_tail instead of adding list_insert_before, Pointed out
           by Linus.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      bdc4abec
    • Y
      PCI: Move struct resource_list to setup-bus.c · 2934a0de
      Yinghai Lu 提交于
      No user outside of setup-bus.c now.  Later patches will convert
      resource_list to a regular list.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      2934a0de
    • Y
      PCI: Move pdev_sort_resources() to setup-bus.c · 78c3b329
      Yinghai Lu 提交于
      This allows us to move the definition of struct resource_list to
      setup_bus.c and later convert resource_list to a regular list.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      78c3b329
    • Y
      PCI: make re-allocation try harder by reassigning ranges higher in the heirarchy · 19aa7ee4
      Yinghai Lu 提交于
      On a system with devices that support SRIOV connected to a pcie switch
      to pcie root port:
      
       +-[0000:80]-+-00.0-[81-8f]--
       |           +-01.0-[90-9f]--
       |           +-02.0-[a0-af]----00.0-[a1-a3]--+-02.0-[a2]--+-00.0 Oracle Corporation Device 207a
       |           |                               \-03.0-[a3]--+-00.0 Oracle Corporation Device 207a
       |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Oracle Corporation Device 207a
       |           |                               \-03.0-[b3]--+-00.0 Oracle Corporation Device 207a
      
      When the BIOS does not assign resources for SRIOV BARs, kernel pci
      reallocation only goes up one bridge and then gives up, failing to to
      get resources for all sSRIOV BARs, even though the range is large enough
      in the peer root bus.
      
      Specifically, only the bridge at the a1:02.0 level has its resources
      cleared and reallocated.  The kernel does not go up to clear the bridge
      at the 80:02.0 level.
      
      To make it go to upper levels, during retry, we need to treat "good to have"
      resources as "must have".
      
      Only on the last try will we treat good to have resources as optional.
      At that time, parent bridge resources will already have been released so
      we'll have a chance to get everything assigned with must_have plus
      good_to_have for all child devices.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      19aa7ee4
    • Y
      PCI: Make pci_rescan_bus handle add_list · 9b03088f
      Yinghai Lu 提交于
      This allows us to allocate resources to hotplug bridges during
      remove/rescan.
      
      We need to move the function to setup-bus.c so it can use
      __pci_bus_size_bridges and __pci_bus_assign_resources directly to take
      the add_list resource tracking list.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      9b03088f
    • Y
      PCI: Make rescan bus increase bridge resource size if needed · 2f320521
      Yinghai Lu 提交于
      Current rescan will not touch bridge MMIO and IO.
      
      Try to reuse pci_assign_unassigned_bridge_resources(bridge) to update bridge
      resources, if child devices need more resources.
      
      Only do that for bridges whose children are all removed already; i.e. don't
      release resources that could already be in use by drivers on child devices.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      2f320521
    • Y
      PCI: Use add_list in pcie hotplug path. · 8424d759
      Yinghai Lu 提交于
      We need add size for hot plug path when pluging in hotplug chassis
      without cards.
      
      -v2: change descriptions. make it applicable after "pci: Check bridge
           resources after resource allocation."
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      8424d759
    • Y
      PCI: try to assign required+option size first · 3e6e0d80
      Yinghai Lu 提交于
      We found reassignment can not find a range for one resource, even if the
      total available range is large enough.
      
      bridge b1:02.0 will need 2M+3M
      bridge b1:03.0 will need 2M+3M
      
      so bridge b0:00.0 will get assigned: 4M : [f8000000-f83fffff]
         later is reassigned to 10M : [f8000000-f9ffffff]
      
      b1:02.0 is assigned to 2M : [f8000000-f81fffff]
      b1:03.0 is assigned to 2M : [f8200000-f83fffff]
      
      After that b1:03.0 get chance to be reassigned to [f8200000-f86fffff],
      but b1:02.0 will not have chance to expand, because b1:03.0 is using in
      middle one.
      
      [  187.911401] pci 0000:b1:02.0: bridge window [mem 0x00100000-0x002fffff] to [bus b2-b2] add_size 300000
      [  187.920764] pci 0000:b1:03.0: bridge window [mem 0x00100000-0x002fffff] to [bus b3-b3] add_size 300000
      [  187.930129] pci 0000:b1:02.0: [mem 0x00100000-0x002fffff] get_res_add_size  add_size 300000
      [  187.938500] pci 0000:b1:03.0: [mem 0x00100000-0x002fffff] get_res_add_size  add_size 300000
      [  187.946857] pci 0000:b0:00.0: bridge window [mem 0x00100000-0x004fffff] to [bus b1-b3] add_size 600000
      [  187.956206] pci 0000:b0:00.0: BAR 14: assigned [mem 0xf8000000-0xf83fffff]
      [  187.963102] pci 0000:b0:00.0: BAR 15: assigned [mem 0xf5000000-0xf51fffff pref]
      [  187.970434] pci 0000:b0:00.0: BAR 14: reassigned [mem 0xf8000000-0xf89fffff]
      [  187.977497] pci 0000:b1:02.0: BAR 14: assigned [mem 0xf8000000-0xf81fffff]
      [  187.984383] pci 0000:b1:02.0: BAR 15: assigned [mem 0xf5000000-0xf50fffff pref]
      [  187.991695] pci 0000:b1:03.0: BAR 14: assigned [mem 0xf8200000-0xf83fffff]
      [  187.998576] pci 0000:b1:03.0: BAR 15: assigned [mem 0xf5100000-0xf51fffff pref]
      [  188.005888] pci 0000:b1:03.0: BAR 14: reassigned [mem 0xf8200000-0xf86fffff]
      [  188.012939] pci 0000:b1:02.0: BAR 14: can't assign mem (size 0x200000)
      [  188.019471] pci 0000:b1:02.0: failed to add 300000 to res=[mem 0xf8000000-0xf81fffff]
      [  188.027326] pci 0000:b2:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
      [  188.034071] pci 0000:b2:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
      [  188.040795] pci 0000:b2:00.0: BAR 2: assigned [mem 0xf8000000-0xf80fffff 64bit]
      [  188.048119] pci 0000:b2:00.0: BAR 2: set to [mem 0xf8000000-0xf80fffff 64bit] (PCI address [0xf8000000-0xf80fffff])
      [  188.058550] pci 0000:b2:00.0: BAR 6: assigned [mem 0xf5000000-0xf50fffff pref]
      [  188.065802] pci 0000:b2:00.0: BAR 0: assigned [mem 0xf8100000-0xf8103fff 64bit]
      [  188.073125] pci 0000:b2:00.0: BAR 0: set to [mem 0xf8100000-0xf8103fff 64bit] (PCI address [0xf8100000-0xf8103fff])
      [  188.083596] pci 0000:b2:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
      [  188.090310] pci 0000:b2:00.0: BAR 9: can't assign mem (size 0x300000)
      [  188.096773] pci 0000:b2:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
      [  188.103479] pci 0000:b2:00.0: BAR 7: assigned [mem 0xf8104000-0xf810ffff 64bit]
      [  188.110801] pci 0000:b2:00.0: BAR 7: set to [mem 0xf8104000-0xf810ffff 64bit] (PCI address [0xf8104000-0xf810ffff])
      [  188.121256] pci 0000:b1:02.0: PCI bridge to [bus b2-b2]
      [  188.126512] pci 0000:b1:02.0:   bridge window [mem 0xf8000000-0xf81fffff]
      [  188.133328] pci 0000:b1:02.0:   bridge window [mem 0xf5000000-0xf50fffff pref]
      [  188.140608] pci 0000:b3:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
      [  188.147341] pci 0000:b3:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
      [  188.154076] pci 0000:b3:00.0: BAR 2: assigned [mem 0xf8200000-0xf82fffff 64bit]
      [  188.161417] pci 0000:b3:00.0: BAR 2: set to [mem 0xf8200000-0xf82fffff 64bit] (PCI address [0xf8200000-0xf82fffff])
      [  188.171865] pci 0000:b3:00.0: BAR 6: assigned [mem 0xf5100000-0xf51fffff pref]
      [  188.179090] pci 0000:b3:00.0: BAR 0: assigned [mem 0xf8300000-0xf8303fff 64bit]
      [  188.186431] pci 0000:b3:00.0: BAR 0: set to [mem 0xf8300000-0xf8303fff 64bit] (PCI address [0xf8300000-0xf8303fff])
      [  188.196884] pci 0000:b3:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
      [  188.203591] pci 0000:b3:00.0: BAR 9: assigned [mem 0xf8400000-0xf86fffff 64bit]
      [  188.210909] pci 0000:b3:00.0: BAR 9: set to [mem 0xf8400000-0xf86fffff 64bit] (PCI address [0xf8400000-0xf86fffff])
      [  188.221379] pci 0000:b3:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
      [  188.228089] pci 0000:b3:00.0: BAR 7: assigned [mem 0xf8304000-0xf830ffff 64bit]
      [  188.235407] pci 0000:b3:00.0: BAR 7: set to [mem 0xf8304000-0xf830ffff 64bit] (PCI address [0xf8304000-0xf830ffff])
      [  188.245843] pci 0000:b1:03.0: PCI bridge to [bus b3-b3]
      [  188.251107] pci 0000:b1:03.0:   bridge window [mem 0xf8200000-0xf86fffff]
      [  188.257922] pci 0000:b1:03.0:   bridge window [mem 0xf5100000-0xf51fffff pref]
      [  188.265180] pci 0000:b0:00.0: PCI bridge to [bus b1-b3]
      [  188.270443] pci 0000:b0:00.0:   bridge window [mem 0xf8000000-0xf89fffff]
      [  188.277250] pci 0000:b0:00.0:   bridge window [mem 0xf5000000-0xf51fffff pref]
      [  188.284512] pcieport 0000:80:02.2: PCI bridge to [bus b0-bf]
      [  188.290184] pcieport 0000:80:02.2:   bridge window [io  0xa000-0xbfff]
      [  188.296735] pcieport 0000:80:02.2:   bridge window [mem 0xf8000000-0xf8ffffff]
      [  188.303963] pcieport 0000:80:02.2:   bridge window [mem 0xf5000000-0xf5ffffff 64bit pref]
      
      Thus b2:00.0 BAR 9 does not get assigned...
      
      root cause:
      b1:02.0 can not be added more range, because b1:03.0 is just after it;
      no space between the required ranges.
      
      Solution:
      Try to assign required + optional all together at first, and if that
      fails, try again with just the required resources.
      
      -v2: seperate add_to_list change() to another patch according to Jesse.
           seperate get_res_add_size() moving to another patch according to Jesse.
           add !realloc_head->next check if the list is empty to bail early
           according to Jesse.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      3e6e0d80
    • Y
      PCI: Move get_res_add_size() function · 1c372353
      Yinghai Lu 提交于
      Need to call it from __assign_resources_sorted() later and we'd like to
      avoid a forward declaraion.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      1c372353
    • Y
      PCI: Make add_to_list() return status · ef62dfef
      Yinghai Lu 提交于
      Will be used for resource_list_x duplication when trying
      requested+optional at first.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      ef62dfef
    • Y
      PCI : Calculate right add_size · a4ac9fea
      Yinghai Lu 提交于
      During debug of one SRIOV enabled hotplug device, we found found that
      add_size is not passed properly.
      
      The device has devices under two level bridges:
      
       +-[0000:80]-+-00.0-[81-8f]--
       |           +-01.0-[90-9f]--
       |           +-02.0-[a0-af]----00.0-[a1-a3]--+-02.0-[a2]--+-00.0  Oracle Corporation Device
       |           |                               \-03.0-[a3]--+-00.0  Oracle Corporation Device
      
      Which means later the parent bridge will not try to add a big enough range:
      
      [  557.455077] pci 0000:a0:00.0: BAR 14: assigned [mem 0xf9000000-0xf93fffff]
      [  557.461974] pci 0000:a0:00.0: BAR 15: assigned [mem 0xf6000000-0xf61fffff pref]
      [  557.469340] pci 0000:a1:02.0: BAR 14: assigned [mem 0xf9000000-0xf91fffff]
      [  557.476231] pci 0000:a1:02.0: BAR 15: assigned [mem 0xf6000000-0xf60fffff pref]
      [  557.483582] pci 0000:a1:03.0: BAR 14: assigned [mem 0xf9200000-0xf93fffff]
      [  557.490468] pci 0000:a1:03.0: BAR 15: assigned [mem 0xf6100000-0xf61fffff pref]
      [  557.497833] pci 0000:a1:03.0: BAR 14: can't assign mem (size 0x200000)
      [  557.504378] pci 0000:a1:03.0: failed to add optional resources res=[mem 0xf9200000-0xf93fffff]
      [  557.513026] pci 0000:a1:02.0: BAR 14: can't assign mem (size 0x200000)
      [  557.519578] pci 0000:a1:02.0: failed to add optional resources res=[mem 0xf9000000-0xf91fffff]
      
      It turns out we did not calculate size1 properly.
      
      static resource_size_t calculate_memsize(resource_size_t size,
                      resource_size_t min_size,
                      resource_size_t size1,
                      resource_size_t old_size,
                      resource_size_t align)
      {
              if (size < min_size)
                      size = min_size;
              if (old_size == 1 )
                      old_size = 0;
              if (size < old_size)
                      size = old_size;
              size = ALIGN(size + size1, align);
              return size;
      }
      
      We should not pass add_size with min_size in calculate_memsize since
      that will make add_size not contribute final add_size.
      
      So just pass add_size with size1 to calculate_memsize().
      
      With this change, we should have chance to remove extra addon in
      pci_reassign_resource.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      a4ac9fea