1. 28 8月, 2013 3 次提交
    • S
      raid5: offload stripe handle to workqueue · 851c30c9
      Shaohua Li 提交于
      This is another attempt to create multiple threads to handle raid5 stripes.
      This time I use workqueue.
      
      raid5 handles request (especially write) in stripe unit. A stripe is page size
      aligned/long and acrosses all disks. Writing to any disk sector, raid5 runs a
      state machine for the corresponding stripe, which includes reading some disks
      of the stripe, calculating parity, and writing some disks of the stripe. The
      state machine is running in raid5d thread currently. Since there is only one
      thread, it doesn't scale well for high speed storage. An obvious solution is
      multi-threading.
      
      To get better performance, we have some requirements:
      a. locality. stripe corresponding to request submitted from one cpu is better
      handled in thread in local cpu or local node. local cpu is preferred but some
      times could be a bottleneck, for example, parity calculation is too heavy.
      local node running has wide adaptability.
      b. configurablity. Different setup of raid5 array might need diffent
      configuration. Especially the thread number. More threads don't always mean
      better performance because of lock contentions.
      
      My original implementation is creating some kernel threads. There are
      interfaces to control which cpu's stripe each thread should handle. And
      userspace can set affinity of the threads. This provides biggest flexibility
      and configurability. But it's hard to use and apparently a new thread pool
      implementation is disfavor.
      
      Recent workqueue improvement is quite promising. unbound workqueue will be
      bound to numa node. If WQ_SYSFS is set in workqueue, there are sysfs option to
      do affinity setting. For example, we can only include one HT sibling in
      affinity. Since work is non-reentrant by default, and we can control running
      thread number by limiting dispatched work_struct number.
      
      In this patch, I created several stripe worker group. A group is a numa node.
      stripes from cpus of one node will be added to a group list. Workqueue thread
      of one node will only handle stripes of worker group of the node. In this way,
      stripe handling has numa node locality. And as I said, we can control thread
      number by limiting dispatched work_struct number.
      
      The work_struct callback function handles several stripes in one run. A typical
      work queue usage is to run one unit in each work_struct. In raid5 case, the
      unit is a stripe. But we can't do that:
      a. Though handling a stripe doesn't need lock because of reference accounting
      and stripe isn't in any list, queuing a work_struct for each stripe will make
      workqueue lock contended very heavily.
      b. blk_start_plug()/blk_finish_plug() should surround stripe handle, as we
      might dispatch request. If each work_struct only handles one stripe, such block
      plug is meaningless.
      
      This implementation can't do very fine grained configuration. But the numa
      binding is most popular usage model, should be enough for most workloads.
      
      Note: since we have only one stripe queue, switching to multi-thread might
      decrease request size dispatching down to low level layer. The impact depends
      on thread number, raid configuration and workload. So multi-thread raid5 might
      not be proper for all setups.
      
      Changes V1 -> V2:
      1. remove WQ_NON_REENTRANT
      2. disabling multi-threading by default
      3. Add more descriptions in changelog
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      851c30c9
    • S
      raid5: fix stripe release order · d265d9dc
      Shaohua Li 提交于
      patch "make release_stripe lockless" changes the order stripes are released.
      Originally I thought block layer can take care of request merge, but it appears
      there are still some requests not merged. It's easy to fix the order.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d265d9dc
    • S
      raid5: make release_stripe lockless · 773ca82f
      Shaohua Li 提交于
      release_stripe still has big lock contention. We just add the stripe to a llist
      without taking device_lock. We let the raid5d thread to do the real stripe
      release, which must hold device_lock anyway. In this way, release_stripe
      doesn't hold any locks.
      
      The side effect is the released stripes order is changed. But sounds not a big
      deal, stripes are never handled in order. And I thought block layer can already
      do nice request merge, which means order isn't that important.
      
      I kept the unplug release batch, which is unnecessary with this patch from lock
      contention avoid point of view, and actually if we delete it, the stripe_head
      release_list and lru can share storage. But the unplug release batch is also
      helpful for request merge. We probably can delay wakeup raid5d till unplug, but
      I'm still afraid of the case which raid5d is running.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      773ca82f
  2. 27 8月, 2013 5 次提交
    • N
      md: avoid deadlock when dirty buffers during md_stop. · 260fa034
      NeilBrown 提交于
      When the last process closes /dev/mdX sync_blockdev will be called so
      that all buffers get flushed.
      So if it is then opened for the STOP_ARRAY ioctl to be sent there will
      be nothing to flush.
      
      However if we open /dev/mdX in order to send the STOP_ARRAY ioctl just
      moments before some other process which was writing closes their file
      descriptor, then there won't be a 'last close' and the buffers might
      not get flushed.
      
      So do_md_stop() calls sync_blockdev().  However at this point it is
      holding ->reconfig_mutex.  So if the array is currently 'clean' then
      the writes from sync_blockdev() will not complete until the array
      can be marked dirty and that won't happen until some other thread
      can get ->reconfig_mutex.  So we deadlock.
      
      We need to move the sync_blockdev() call to before we take
      ->reconfig_mutex.
      However then some other thread could open /dev/mdX and write to it
      after we call sync_blockdev() and before we actually stop the array.
      This can leave dirty data in the page cache which is awkward.
      
      So introduce new flag MD_STILL_CLOSED.  Set it before calling
      sync_blockdev(), clear it if anyone does open the file, and abort the
      STOP_ARRAY attempt if it gets set before we lock against further
      opens.
      
      It is still possible to get problems if you open /dev/mdX, write to
      it, then issue the STOP_ARRAY ioctl.  Just don't do that.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      260fa034
    • N
      md: Don't test all of mddev->flags at once. · 7a0a5355
      NeilBrown 提交于
      mddev->flags is mostly used to record if an update of the
      metadata is needed.  Sometimes the whole field is tested
      instead of just the important bits.  This makes it difficult
      to introduce more state bits.
      
      So replace all bare tests of mddev->flags with tests for the bits
      that actually need testing.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7a0a5355
    • D
      md: Fix apparent cut-and-paste error in super_90_validate · c9ad020f
      Dave Jones 提交于
      Setting a variable to itself probably wasn't the intention here.
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c9ad020f
    • N
      md: fix safe_mode buglet. · 275c51c4
      NeilBrown 提交于
      Whe we set the safe_mode_timeout to a smaller value we trigger a timeout
      immediately - otherwise the small value might not be honoured.
      However if the previous timeout was 0 meaning "no timeout", we didn't.
      This would mean that no timeout happens until the next write completes,
      which could be a long time.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      275c51c4
    • N
      md: don't call md_allow_write in get_bitmap_file. · 60559da4
      NeilBrown 提交于
      There is no really need as GFP_NOIO is very likely sufficient,
      and failure is not catastrophic.
      
      Calling md_allow_write here will convert a read-auto array to
      read/write which could be confusing when you are just performing
      a read operation.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      60559da4
  3. 24 8月, 2013 4 次提交
    • A
      usb: phy: fix build breakage · 52d5b9ab
      Anatolij Gustschin 提交于
      Commit 94ae9843 (usb: phy: rename all phy drivers to phy-$name-usb.c)
      renamed drivers/usb/phy/otg_fsm.h to drivers/usb/phy/phy-fsm-usb.h
      but changed drivers/usb/phy/phy-fsm-usb.c to include not existing
      "phy-otg-fsm.h" instead of new "phy-fsm-usb.h". This breaks building:
        ...
        drivers/usb/phy/phy-fsm-usb.c:32:25: fatal error: phy-otg-fsm.h: No such file or directory
        compilation terminated.
        make[3]: *** [drivers/usb/phy/phy-fsm-usb.o] Error 1
      
      This commit also missed to modify drivers/usb/phy/phy-fsl-usb.h
      to include new "phy-fsm-usb.h" instead of "otg_fsm.h" resulting
      in another build breakage:
        ...
        In file included from drivers/usb/phy/phy-fsl-usb.c:46:0:
        drivers/usb/phy/phy-fsl-usb.h:18:21: fatal error: otg_fsm.h: No such file or directory
        compilation terminated.
        make[3]: *** [drivers/usb/phy/phy-fsl-usb.o] Error 1
      
      Fix both issues.
      Signed-off-by: NAnatolij Gustschin <agust@denx.de>
      Cc: stable <stable@vger.kernel.org> # 3.10+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52d5b9ab
    • A
      USB: OHCI: add missing PCI PM callbacks to ohci-pci.c · 9a11899c
      Alan Stern 提交于
      Commit c1117afb (USB: OHCI: make ohci-pci a separate driver)
      neglected to preserve the entries for the pci_suspend and pci_resume
      driver callbacks.  As a result, OHCI controllers don't work properly
      during suspend and after hibernation.
      
      This patch adds the missing callbacks to the driver.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Reported-and-tested-by: NSteve Cotton <steve@s.cotton.clara.co.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a11899c
    • I
      staging: comedi: bug-fix NULL pointer dereference on failed attach · 3955dfa8
      Ian Abbott 提交于
      Commit dcd7b8bd ("staging: comedi: put
      module _after_ detach" by myself) reversed a couple of calls in
      `comedi_device_attach()` when recovering from an error returned by the
      low-level driver's 'attach' handler.  Unfortunately, that introduced a
      NULL pointer dereference bug as `dev->driver` is NULL after the call to
      `comedi_device_detach()`.   We still have a pointer to the low-level
      comedi driver structure in the `driv` variable, so use that instead.
      Signed-off-by: NIan Abbott <abbotti@mev.co.uk>
      Cc: <stable@vger.kernel.org> # 3.10+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3955dfa8
    • D
      drivers/platform/olpc/olpc-ec.c: initialise earlier · 93dbc1b3
      Daniel Drake 提交于
      Being a low-level component, various drivers (e.g.  olpc-battery) assume
      that it is ok to communicate with the OLPC Embedded Controller during
      probe.  Therefore the OLPC EC driver must be initialised before other
      drivers try to use it.  This was the case until it was recently moved
      out of arch/x86 and restructured around commits ac250415 ("Platform:
      OLPC: turn EC driver into a platform_driver") and 85f90cf6 ("x86:
      OLPC: switch over to using new EC driver on x86").
      
      Use arch_initcall so that olpc-ec is readied earlier, matching the
      previous behaviour.
      
      Fixes a regression introduced in Linux-3.6 where various drivers such as
      olpc-battery and olpc-xo1-sci failed to load due to an inability to
      communicate with the EC.  The user-visible effect was a lack of battery
      monitoring, missing ebook/lid switch input devices, etc.
      Signed-off-by: NDaniel Drake <dsd@laptop.org>
      Cc: Andres Salomon <dilinger@queued.net>
      Cc: Paul Fox <pgf@laptop.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93dbc1b3
  4. 23 8月, 2013 4 次提交
    • S
      be2net: fix disabling TX in be_close() · 6e1f9975
      Sathya Perla 提交于
      commit fba87559 ("disable TX in be_close()") disabled TX in be_close()
      to protect be_xmit() from touching freed up queues in the AER recovery
      flow.  But, TX must be disabled *before* cleaning up TX completions in
      the close() path, not after. This allows be_tx_compl_clean() to free up
      all TX-req skbs that were notified to the HW.
      Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e1f9975
    • R
      Revert "ACPI / video: Always call acpi_video_init_brightness() on init" · 168cf0ec
      Rafael J. Wysocki 提交于
      Revert commit c04c697c (ACPI / video: Always call acpi_video_init_brightness()
      on init), because it breaks eDP backlight at 1920x1080 on Acer Aspire S3
      for Trevor Bortins.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=68355Reported-and-bisected-by: NTrevor Bortins <enabfluw@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      168cf0ec
    • M
      [SCSI] zfcp: remove access control tables interface (keep sysfs files) · b5dc3c48
      Martin Peschke 提交于
      By popular demand, this patch brings back a couple of sysfs attributes
      removed by commit 663e0890
      "[SCSI] zfcp: remove access control tables interface".
      The content has been irrelevant for years, but the files must be
      there forever for whatever user space tools that may rely on them.
      
      Since these files always return a constant value, a new stripped
      down show-macro was required. Otherwise build warnings would have
      been introduced.
      Signed-off-by: NMartin Peschke <mpeschke@linux.vnet.ibm.com>
      Signed-off-by: NSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      b5dc3c48
    • M
      [SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops · 924dd584
      Martin Peschke 提交于
      BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
      in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
      CPU: 1 Not tainted 3.9.3+ #69
      Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
      <snip>
      Call Trace:
      ([<00000000001165de>] show_trace+0x106/0x154)
       [<00000000001166a0>] show_stack+0x74/0xf4
       [<00000000006ff646>] dump_stack+0xc6/0xd4
       [<000000000017f3a0>] __might_sleep+0x128/0x148
       [<000000000015ece8>] flush_work+0x54/0x1f8
       [<00000000001630de>] __cancel_work_timer+0xc6/0x128
       [<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
       [<0000000000161816>] execute_in_process_context+0x96/0xa8
       [<00000000004d33d8>] device_release+0x60/0xc0
       [<000000000048af48>] kobject_release+0xa8/0x1c4
       [<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
       [<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
       [<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
       [<000000000016b75a>] kthread+0xf2/0xfc
       [<000000000070c9de>] kernel_thread_starter+0x6/0xc
       [<000000000070c9d8>] kernel_thread_starter+0x0/0xc
      
      Apparently, the ref_count for some scsi_device drops down to zero,
      triggering device removal through execute_in_process_context(), while
      the lldd error recovery thread iterates through a scsi device list.
      Unfortunately, execute_in_process_context() decides to immediately
      execute that device removal function, instead of scheduling asynchronous
      execution, since it detects process context and thinks it is safe to do
      so. But almost all calls to shost_for_each_device() in our lldd are
      inside spin_lock_irq, even in thread context. Obviously, schedule()
      inside spin_lock_irq sections is a bad idea.
      
      Change the lldd to use the proper iterator function,
      __shost_for_each_device(), in combination with required locking.
      
      Occurences that need to be changed include all calls in zfcp_erp.c,
      since those might be executed in zfcp error recovery thread context
      with a lock held.
      
      Other occurences of shost_for_each_device() in zfcp_fsf.c do not
      need to be changed (no process context, no surrounding locking).
      
      The problem was introduced in Linux 2.6.37 by commit
      b62a8d9b
      "[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Peschke <mpeschke@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org #2.6.37+
      Signed-off-by: NSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      924dd584
  5. 22 8月, 2013 6 次提交
    • M
      [SCSI] zfcp: fix lock imbalance by reworking request queue locking · d79ff142
      Martin Peschke 提交于
      This patch adds wait_event_interruptible_lock_irq_timeout(), which is a
      straight-forward descendant of wait_event_interruptible_timeout() and
      wait_event_interruptible_lock_irq().
      
      The zfcp driver used to call wait_event_interruptible_timeout()
      in combination with some intricate and error-prone locking. Using
      wait_event_interruptible_lock_irq_timeout() as a replacement
      nicely cleans up that locking.
      
      This rework removes a situation that resulted in a locking imbalance
      in zfcp_qdio_sbal_get():
      
      BUG: workqueue leaked lock or atomic: events/1/0xffffff00/10
          last function: zfcp_fc_wka_port_offline+0x0/0xa0 [zfcp]
      
      It was introduced by commit c2af7545
      "[SCSI] zfcp: Do not wait for SBALs on stopped queue", which had a new
      code path related to ZFCP_STATUS_ADAPTER_QDIOUP that took an early exit
      without a required lock being held. The problem occured when a
      special, non-SCSI I/O request was being submitted in process context,
      when the adapter's queues had been torn down. In this case the bug
      surfaced when the Fibre Channel port connection for a well-known address
      was closed during a concurrent adapter shut-down procedure, which is a
      rare constellation.
      
      This patch also fixes these warnings from the sparse tool (make C=1):
      
      drivers/s390/scsi/zfcp_qdio.c:224:12: warning: context imbalance in
       'zfcp_qdio_sbal_check' - wrong count at exit
      drivers/s390/scsi/zfcp_qdio.c:244:5: warning: context imbalance in
       'zfcp_qdio_sbal_get' - unexpected unlock
      
      Last but not least, we get rid of that crappy lock-unlock-lock
      sequence at the beginning of the critical section.
      
      It is okay to call zfcp_erp_adapter_reopen() with req_q_lock held.
      Reported-by: NMikulas Patocka <mpatocka@redhat.com>
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Peschke <mpeschke@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org #2.6.35+
      Signed-off-by: NSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      d79ff142
    • D
      hso: Fix stack corruption on some architectures · e75dc677
      Daniel Gimpelevich 提交于
      As Sergei Shtylyov explained in the #mipslinux IRC channel:
      [Mon 2013-08-19 12:28:21 PM PDT] <headless> guys, are you sure it's not "DMA off stack" case?
      [Mon 2013-08-19 12:28:35 PM PDT] <headless> it's a known stack corruptor on non-coherent arches
      [Mon 2013-08-19 12:31:48 PM PDT] <DonkeyHotei> headless: for usb/ehci?
      [Mon 2013-08-19 12:34:11 PM PDT] <DonkeyHotei> headless: explain
      [Mon 2013-08-19 12:35:38 PM PDT] <headless> usb_control_msg() (or other such func) should not use buffer on stack. DMA from/to stack is prohibited
      [Mon 2013-08-19 12:35:58 PM PDT] <headless> and EHCI uses DMA on control xfers (as well as all the others)
      Signed-off-by: NDaniel Gimpelevich <daniel@gimpelevich.san-francisco.ca.us>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e75dc677
    • D
      hso: Earlier catch of error condition · 35e57e1b
      Daniel Gimpelevich 提交于
      There is no need to get an interface specification if we know it's the
      wrong one.
      Signed-off-by: NDaniel Gimpelevich <daniel@gimpelevich.san-francisco.ca.us>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35e57e1b
    • W
      of: fdt: fix memory initialization for expanded DT · 9e401275
      Wladislav Wiebe 提交于
      Already existing property flags are filled wrong for properties created from
      initial FDT. This could cause problems if this DYNAMIC device-tree functions
      are used later, i.e. properties are attached/detached/replaced. Simply dumping
      flags from the running system show, that some initial static (not allocated via
      kzmalloc()) nodes are marked as dynamic.
      
      I putted some debug extensions to property_proc_show(..) :
      ..
      +       if (OF_IS_DYNAMIC(pp))
      +               pr_err("DEBUG: xxx : OF_IS_DYNAMIC\n");
      +       if (OF_IS_DETACHED(pp))
      +               pr_err("DEBUG: xxx : OF_IS_DETACHED\n");
      
      when you operate on the nodes (e.g.: ~$ cat /proc/device-tree/*some_node*) you
      will see that those flags are filled wrong, basically in most cases it will dump
      a DYNAMIC or DETACHED status, which is in not true.
      (BTW. this OF_IS_DETACHED is a own define for debug purposes which which just
      make a test_bit(OF_DETACHED, &x->_flags)
      
      If nodes are dynamic kernel is allowed to kfree() them. But it will crash
      attempting to do so on the nodes from FDT -- they are not allocated via
      kzmalloc().
      Signed-off-by: NWladislav Wiebe <wladislav.kw@gmail.com>
      Acked-by: NAlexander Sverdlin <alexander.sverdlin@nsn.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRob Herring <rob.herring@calxeda.com>
      9e401275
    • G
      gma500: Fix SDVO turning off randomly · 6f1e1204
      Guillaume Clement 提交于
      Some Poulsbo cards seem to incorrectly report SDVO_CMD_STATUS_TARGET_NOT_SPECIFIED instead of SDVO_CMD_STATUS_PENDING, which causes the display to be turned off.
      Signed-off-by: NGuillaume Clement <gclement@baobob.org>
      Acked-by: NPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      6f1e1204
    • A
      [SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on · f5944daa
      Anton Blanchard 提交于
      We want ppc64 to be able to select between optimised assembly
      checksum routines in big endian and the generic lib/checksum.c
      routines in little endian.
      
      The lpfc driver is forcing CONFIG_GENERIC_CSUM on which means
      we are unable to make the decision to enable it in the arch
      Kconfig. If the option exists it is always forced on.
      
      This got introduced in 3.10 via commit 6a7252fd ([SCSI] lpfc:
      fix up Kconfig dependencies). I spoke to Randy about it and
      the original issue was with CRC_T10DIF not being defined.
      
      As such, remove the select of CONFIG_GENERIC_CSUM.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: <stable@vger.kernel.org> # 3.10
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      f5944daa
  6. 21 8月, 2013 13 次提交
  7. 20 8月, 2013 5 次提交
    • D
      xen/events: mask events when changing their VCPU binding · 4704fe4f
      David Vrabel 提交于
      When a event is being bound to a VCPU there is a window between the
      EVTCHNOP_bind_vpcu call and the adjustment of the local per-cpu masks
      where an event may be lost.  The hypervisor upcalls the new VCPU but
      the kernel thinks that event is still bound to the old VCPU and
      ignores it.
      
      There is even a problem when the event is being bound to the same VCPU
      as there is a small window beween the clear_bit() and set_bit() calls
      in bind_evtchn_to_cpu().  When scanning for pending events, the kernel
      may read the bit when it is momentarily clear and ignore the event.
      
      Avoid this by masking the event during the whole bind operation.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      CC: stable@vger.kernel.org
      4704fe4f
    • D
      xen/events: initialize local per-cpu mask for all possible events · 84ca7a8e
      David Vrabel 提交于
      The sizeof() argument in init_evtchn_cpu_bindings() is incorrect
      resulting in only the first 64 (or 32 in 32-bit guests) ports having
      their bindings being initialized to VCPU 0.
      
      In most cases this does not cause a problem as request_irq() will set
      the irq affinity which will set the correct local per-cpu mask.
      However, if the request_irq() is called on a VCPU other than 0, there
      is a window between the unmasking of the event and the affinity being
      set were an event may be lost because it is not locally unmasked on
      any VCPU. If request_irq() is called on VCPU 0 then local irqs are
      disabled during the window and the race does not occur.
      
      Fix this by initializing all NR_EVENT_CHANNEL bits in the local
      per-cpu masks.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: stable@vger.kernel.org
      84ca7a8e
    • A
      sata_fsl: save irqs while coalescing · 99bbdfa6
      Anthony Foiani 提交于
      Before this patch, I was seeing the following lockdep splat on my
      MPC8315 (PPC32) target:
      
        [    9.086051] =================================
        [    9.090393] [ INFO: inconsistent lock state ]
        [    9.094744] 3.9.7-ajf-gc39503d #1 Not tainted
        [    9.099087] ---------------------------------
        [    9.103432] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
        [    9.109431] scsi_eh_1/39 [HC1[1]:SC0[0]:HE0:SE1] takes:
        [    9.114642]  (&(&host->lock)->rlock){?.+...}, at: [<c02f4168>] sata_fsl_interrupt+0x50/0x250
        [    9.123137] {HARDIRQ-ON-W} state was registered at:
        [    9.128004]   [<c006cdb8>] lock_acquire+0x90/0xf4
        [    9.132737]   [<c043ef04>] _raw_spin_lock+0x34/0x4c
        [    9.137645]   [<c02f3560>] fsl_sata_set_irq_coalescing+0x68/0x100
        [    9.143750]   [<c02f36a0>] sata_fsl_init_controller+0xa8/0xc0
        [    9.149505]   [<c02f3f10>] sata_fsl_probe+0x17c/0x2e8
        [    9.154568]   [<c02acc90>] driver_probe_device+0x90/0x248
        [    9.159987]   [<c02acf0c>] __driver_attach+0xc4/0xc8
        [    9.164964]   [<c02aae74>] bus_for_each_dev+0x5c/0xa8
        [    9.170028]   [<c02ac218>] bus_add_driver+0x100/0x26c
        [    9.175091]   [<c02ad638>] driver_register+0x88/0x198
        [    9.180155]   [<c0003a24>] do_one_initcall+0x58/0x1b4
        [    9.185226]   [<c05aeeac>] kernel_init_freeable+0x118/0x1c0
        [    9.190823]   [<c0004110>] kernel_init+0x18/0x108
        [    9.195542]   [<c000f6b8>] ret_from_kernel_thread+0x64/0x6c
        [    9.201142] irq event stamp: 160
        [    9.204366] hardirqs last  enabled at (159): [<c043f778>] _raw_spin_unlock_irq+0x30/0x50
        [    9.212469] hardirqs last disabled at (160): [<c000f414>] reenable_mmu+0x30/0x88
        [    9.219867] softirqs last  enabled at (144): [<c002ae5c>] __do_softirq+0x168/0x218
        [    9.227435] softirqs last disabled at (137): [<c002b0d4>] irq_exit+0xa8/0xb4
        [    9.234481]
        [    9.234481] other info that might help us debug this:
        [    9.240995]  Possible unsafe locking scenario:
        [    9.240995]
        [    9.246898]        CPU0
        [    9.249337]        ----
        [    9.251776]   lock(&(&host->lock)->rlock);
        [    9.255878]   <Interrupt>
        [    9.258492]     lock(&(&host->lock)->rlock);
        [    9.262765]
        [    9.262765]  *** DEADLOCK ***
        [    9.262765]
        [    9.268684] no locks held by scsi_eh_1/39.
        [    9.272767]
        [    9.272767] stack backtrace:
        [    9.277117] Call Trace:
        [    9.279589] [cfff9da0] [c0008504] show_stack+0x48/0x150 (unreliable)
        [    9.285972] [cfff9de0] [c0447d5c] print_usage_bug.part.35+0x268/0x27c
        [    9.292425] [cfff9e10] [c006ace4] mark_lock+0x2ac/0x658
        [    9.297660] [cfff9e40] [c006b7e4] __lock_acquire+0x754/0x1840
        [    9.303414] [cfff9ee0] [c006cdb8] lock_acquire+0x90/0xf4
        [    9.308745] [cfff9f20] [c043ef04] _raw_spin_lock+0x34/0x4c
        [    9.314250] [cfff9f30] [c02f4168] sata_fsl_interrupt+0x50/0x250
        [    9.320187] [cfff9f70] [c0079ff0] handle_irq_event_percpu+0x90/0x254
        [    9.326547] [cfff9fc0] [c007a1fc] handle_irq_event+0x48/0x78
        [    9.332220] [cfff9fe0] [c007c95c] handle_level_irq+0x9c/0x104
        [    9.337981] [cfff9ff0] [c000d978] call_handle_irq+0x18/0x28
        [    9.343568] [cc7139f0] [c000608c] do_IRQ+0xf0/0x1a8
        [    9.348464] [cc713a20] [c000fc8c] ret_from_except+0x0/0x14
        [    9.353983] --- Exception: 501 at _raw_spin_unlock_irq+0x40/0x50
        [    9.353983]     LR = _raw_spin_unlock_irq+0x30/0x50
        [    9.364839] [cc713af0] [c043db10] wait_for_common+0xac/0x188
        [    9.370513] [cc713b30] [c02ddee4] ata_exec_internal_sg+0x2b0/0x4f0
        [    9.376699] [cc713be0] [c02de18c] ata_exec_internal+0x68/0xa8
        [    9.382454] [cc713c20] [c02de4b8] ata_dev_read_id+0x158/0x594
        [    9.388205] [cc713ca0] [c02ec244] ata_eh_recover+0xd88/0x13d0
        [    9.393962] [cc713d20] [c02f2520] sata_pmp_error_handler+0xc0/0x8ac
        [    9.400234] [cc713dd0] [c02ecdc8] ata_scsi_port_error_handler+0x464/0x5e8
        [    9.407023] [cc713e10] [c02ecfd0] ata_scsi_error+0x84/0xb8
        [    9.412528] [cc713e40] [c02c4974] scsi_error_handler+0xd8/0x47c
        [    9.418457] [cc713eb0] [c004737c] kthread+0xa8/0xac
        [    9.423355] [cc713f40] [c000f6b8] ret_from_kernel_thread+0x64/0x6c
      
      This fix was suggested by Bhushan Bharat <R65777@freescale.com>, and
      was discussed in email at:
      
        http://linuxppc.10917.n7.nabble.com/MPC8315-reboot-failure-lockdep-splat-possibly-related-tp75162.html
      
      Same patch successfully tested with 3.9.7.  linux-next compiled but
      not tested on hardware.
      
      This patch is based off linux-next tag next-20130819
      (which is commit 66a01bae29d11916c09f9f5a937cafe7d402e4a5 )
      Signed-off-by: NAnthony Foiani <anthony.foiani@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      99bbdfa6
    • A
      bnx2x: set VF DMAE when first function has 0 supported VFs · 49baea88
      Ariel Elior 提交于
      There are possible HW configurations in which PFs will have SR-IOV capability
      but will have Max VFs set to 0 - this happens when there are Multi-Function
      devices where the VFs are allocated to only some of the PFs.
      
      DMAE is configured to support VFs only if the configuring PF has supported VFs.
      In case the first PF to be loaded will be one without supported VFs, it will
      not configure DMAE to the VF-supporting mode. When VFs of other PFs will be
      loaded later on, they will not be able to communicate with their PF.
      
      This changes the requirement for configuring DMAE for VF-supporting mode;
      If the device has SR-IOV capabilities there must be some PF that has
      max supported VFs > 0, thus it will configure the DMAE for supporting VFs.
      Signed-off-by: NAriel Elior <ariele@broadcom.com>
      Signed-off-by: NYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49baea88
    • A
      bnx2x: Protect against VFs' ndos when SR-IOV is disabled · 5ae30d78
      Ariel Elior 提交于
      Since SR-IOV can be activated dynamically and iproute2 can be called
      asynchronously, the various callbacks need a robust sanity check before
      attempting to access the SR-IOV database and members since there are numerous
      states in which it can find the driver (e.g., PF is down, sriov was not enabled
      yet, VF is down, etc.).
      
      In many of the states the callback result will be null pointer dereference.
      Signed-off-by: NAriel Elior <ariele@broadcom.com>
      Signed-off-by: NYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae30d78