1. 09 11月, 2013 1 次提交
    • D
      IB/cma: Use cached gids · 29f27e84
      Doug Ledford 提交于
      The cma_acquire_dev function was changed by commit 3c86aa70
      ("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
      because multiport devices might have either IB or IBoE formatted gids.
      The old function assumed that all ports on the same device used the
      same GID format.
      
      However, when it was changed to use find_gid_port(), we inadvertently
      lost usage of the GID cache.  This turned out to be a very costly
      change.  In our testing, each iteration through each index of the GID
      table takes roughly 35us.  When you have multiple devices in a system,
      and the GID you are looking for is on one of the later devices, the
      code loops through all of the GID indexes on all of the early devices
      before it finally succeeds on the target device.  This pathological
      search behavior combined with 35us per GID table index retrieval
      results in results such as the following from the cmtime application
      that's part of the latest librdmacm git repo:
      
      ib1:
      step              total ms     max ms     min us  us / conn
      create id    :       29.42       0.04       1.00       2.94
      bind addr    :   186705.66      19.00   18556.00   18670.57
      resolve addr :       41.93       9.68     619.00       4.19
      resolve route:      486.93       0.48     101.00      48.69
      create qp    :     4021.95       6.18     330.00     402.20
      connect      :    68350.39   68588.17   24632.00    6835.04
      disconnect   :     1460.43     252.65-1862269.00     146.04
      destroy      :       41.16       0.04       2.00       4.12
      
      ib0:
      step              total ms     max ms     min us  us / conn
      create id    :       28.61       0.68       1.00       2.86
      bind addr    :     2178.86       2.95     201.00     217.89
      resolve addr :       51.26      16.85     845.00       5.13
      resolve route:      620.08       0.43      92.00      62.01
      create qp    :     3344.40       6.36     273.00     334.44
      connect      :     6435.99    6368.53    7844.00     643.60
      disconnect   :     5095.38     321.90     757.00     509.54
      destroy      :       37.13       0.02       2.00       3.71
      
      Clearly, both the bind address and connect operations suffer
      a huge penalty for being anything other than the default
      GID on the first port in the system.
      
      After applying this patch, the numbers now look like this:
      
      ib1:
      step              total ms     max ms     min us  us / conn
      create id    :       30.15       0.03       1.00       3.01
      bind addr    :       80.27       0.04       7.00       8.03
      resolve addr :       43.02      13.53     589.00       4.30
      resolve route:      482.90       0.45     100.00      48.29
      create qp    :     3986.55       5.80     330.00     398.66
      connect      :     7141.53    7051.29    5005.00     714.15
      disconnect   :     5038.85     193.63     918.00     503.88
      destroy      :       37.02       0.04       2.00       3.70
      
      ib0:
      step              total ms     max ms     min us  us / conn
      create id    :       34.27       0.05       1.00       3.43
      bind addr    :       26.45       0.04       1.00       2.64
      resolve addr :       38.25      10.54     760.00       3.82
      resolve route:      604.79       0.43      97.00      60.48
      create qp    :     3314.95       6.34     273.00     331.49
      connect      :    12399.26   12351.10    8609.00    1239.93
      disconnect   :     5096.76     270.72    1015.00     509.68
      destroy      :       37.10       0.03       2.00       3.71
      
      It's worth noting that we still suffer a bit of a penalty on
      connect to the wrong device, but the penalty is much less than
      it used to be.  Follow on patches deal with this penalty.
      
      Many thanks to Neil Horman for helping to track the source of
      slow function that allowed us to track down the fact that
      the original patch I mentioned above backed out cache usage
      and identify just how much that impacted the system.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      29f27e84
  2. 26 10月, 2013 3 次提交
    • N
      vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter · 60a01f55
      Nicholas Bellinger 提交于
      This patch addresses a long-standing bug where the get_user_pages_fast()
      write parameter used for setting the underlying page table entry permission
      bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
      passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().
      
      However, this parameter is intended to signal WRITEs to pinned userspace
      PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
      for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.
      
      This bug would manifest itself as random process segmentation faults on
      KVM host after repeated vhost starts + stops and/or with lots of vhost
      endpoints + LUNs.
      
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Asias He <asias@redhat.com>
      Cc: <stable@vger.kernel.org> # 3.6+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      60a01f55
    • W
      target/pscsi: fix return value check · 58932e96
      Wei Yongjun 提交于
      In case of error, the function scsi_host_lookup() returns NULL
      pointer not ERR_PTR(). The IS_ERR() test in the return value check
      should be replaced with NULL test.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      58932e96
    • D
      mtd: gpmi: fix ECC regression · 031e2777
      David Woodhouse 提交于
      The "legacy" ECC layout used until 3.12-rc1 uses all the OOB area by
      computing the ECC strength and ECC step size ourselves.
      
      Commit 2febcdf8 ("mtd: gpmi: set the BCHs geometry with the ecc info")
      makes the driver use the ECC info (ECC strength and ECC step size)
      provided by the MTD code, and creates a different NAND ECC layout
      for the BCH, and use the new ECC layout. This causes a regression:
      
         We can not mount the ubifs which was created by the old NAND ECC layout.
      
      This patch fixes this issue by reverting to the legacy ECC layout.
      
      We will probably introduce a new device-tree property to indicate that
      the new ECC layout can be used. For now though, for the imminent 3.12
      release, we just unconditionally revert to the 3.11 behaviour.
      
      This leaves a harmless cosmetic warning about an unused function. At
      this point in the cycle I really don't care.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NBrian Norris <computersforpeace@gmail.com>
      Acked-by: NHuang Shijie <b32955@freescale.com>
      Acked-by: NMarek Vasut <marex@denx.de>
      Tested-by: NMarek Vasut <marex@denx.de>
      031e2777
  3. 25 10月, 2013 4 次提交
  4. 24 10月, 2013 8 次提交
    • N
      target: Fail XCOPY for non matching source + destination block_size · 48502ddb
      Nicholas Bellinger 提交于
      This patch adds an explicit check + failure for XCOPY I/O to source +
      destination devices with a non-matching block_size.
      
      This limitiation is currently due to the fact that the scatterlist
      memory allocated for the XCOPY READ operation is passed zero-copy
      to the XCOPY WRITE operation.
      Reported-by: NThomas Glanzmann <thomas@glanzmann.de>
      Reported-by: NDouglas Gilbert <dgilbert@interlog.com>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      48502ddb
    • N
      target: Generate failure for XCOPY I/O with non-zero scsi_status · 8a955d6d
      Nicholas Bellinger 提交于
      This patch adds the missing non-zero se_cmd->scsi_status check required
      for local XCOPY I/O within target_xcopy_issue_pt_cmd() to signal an
      exception case failure.
      
      This will trigger the generation of SAM_STAT_CHECK_CONDITION status
      from within target_xcopy_do_work() process context code.
      Reported-by: NThomas Glanzmann <thomas@glanzmann.de>
      Reported-by: NDouglas Gilbert <dgilbert@interlog.com>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      8a955d6d
    • N
      target: Add missing XCOPY I/O operation sense_buffer · 366bda19
      Nicholas Bellinger 提交于
      This patch adds the missing xcopy_pt_cmd->sense_buffer[] required for
      correctly handling CHECK_CONDITION exceptions within the locally
      generated XCOPY I/O path.
      
      Also update target_xcopy_read_source() + target_xcopy_setup_pt_cmd()
      to pass this buffer into transport_init_se_cmd() to correctly setup
      se_cmd->sense_buffer.
      Reported-by: NThomas Glanzmann <thomas@glanzmann.de>
      Reported-by: NDouglas Gilbert <dgilbert@interlog.com>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      366bda19
    • V
      iser-target: check device before dereferencing its variable · 0a66614b
      Vu Pham 提交于
      This patch changes isert_connect_release() to correctly check for
      the existence struct isert_device *device before checking for
      isert_device->use_frwr.
      Signed-off-by: NVu Pham <vu@mellanox.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      0a66614b
    • S
      raid5: avoid finding "discard" stripe · d47648fc
      Shaohua Li 提交于
      SCSI discard will damage discard stripe bio setting, eg, some fields are
      changed. If the stripe is reused very soon, we have wrong bios setting. We
      remove discard stripe from hash list, so next time the strip will be fully
      initialized.
      
      Suitable for backport to 3.7+.
      
      Cc: <stable@vger.kernel.org> (3.7+)
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d47648fc
    • S
      raid5: set bio bi_vcnt 0 for discard request · 37c61ff3
      Shaohua Li 提交于
      SCSI layer will add new payload for discard request. If two bios are merged
      to one, the second bio has bi_vcnt 1 which is set in raid5. This will confuse
      SCSI and cause oops.
      
      Suitable for backport to 3.7+
      
      Cc: stable@vger.kernel.org (v3.7+)
      Reported-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      37c61ff3
    • B
      md: avoid deadlock when md_set_badblocks. · 905b0297
      Bian Yu 提交于
      When operate harddisk and hit errors, md_set_badblocks is called after
      scsi_restart_operations which already disabled the irq. but md_set_badblocks
      will call write_sequnlock_irq and enable irq. so softirq can preempt the
      current thread and that may cause a deadlock. I think this situation should
      use write_sequnlock_irqsave/irqrestore instead.
      
      I met the situation and the call trace is below:
      [  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
      [  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
      [  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
      [  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
      [  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
      [  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
      [  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
      [  638.933884] Call Trace:
      [  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
      [  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
      [  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
      [  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
      [  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
      [  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
      [  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
      [  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
      [  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
      [  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
      [  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
      [  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
      [  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
      [  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
      [  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
      [  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
      [  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
      [  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
      [  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
      [  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
      [  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
      [  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
      [  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
      [  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
      [  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
      [  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
      [  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
      [  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
      [  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
      [  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
      [  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
      [  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
      [  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
      [  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
      [  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
      [  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
      [  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
      [  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
      [  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
      [  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      [  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
      [  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      
      This bug was introduce in commit  2e8ac303
      (the first time rdev_set_badblock was call from interrupt context),
      so this patch is appropriate for 3.5 and subsequent kernels.
      
      Cc: <stable@vger.kernel.org> (3.5+)
      Signed-off-by: NBian Yu <bianyu@kedacom.com>
      Reviewed-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      905b0297
    • L
      md: Fix skipping recovery for read-only arrays. · 61e4947c
      Lukasz Dorau 提交于
      Since:
              commit 7ceb17e8
              md: Allow devices to be re-added to a read-only array.
      
      spares are activated on a read-only array. In case of raid1 and raid10
      personalities it causes that not-in-sync devices are marked in-sync
      without checking if recovery has been finished.
      
      If a read-only array is degraded and one of its devices is not in-sync
      (because the array has been only partially recovered) recovery will be skipped.
      
      This patch adds checking if recovery has been finished before marking a device
      in-sync for raid1 and raid10 personalities. In case of raid5 personality
      such condition is already present (at raid5.c:6029).
      
      Bug was introduced in 3.10 and causes data corruption.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
      Signed-off-by: NLukasz Dorau <lukasz.dorau@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      61e4947c
  5. 23 10月, 2013 6 次提交
  6. 22 10月, 2013 16 次提交
  7. 21 10月, 2013 2 次提交