1. 12 11月, 2013 1 次提交
    • S
      RDMA/ucma: Discard events for IDs not yet claimed by user space · c6b21824
      Sean Hefty 提交于
      Problem reported by Avneesh Pant <avneesh.pant@oracle.com>:
      
          It looks like we are triggering a bug in RDMA CM/UCM interaction.
          The bug specifically hits when we have an incoming connection
          request and the connecting process dies BEFORE the passive end of
          the connection can process the request i.e. it does not call
          rdma_get_cm_event() to retrieve the initial connection event.  We
          were able to triage this further and have some additional
          information now.
      
          In the example below when P1 dies after issuing a connect request
          as the CM id is being destroyed all outstanding connects (to P2)
          are sent a reject message. We see this reject message being
          received on the passive end and the appropriate CM ID created for
          the initial connection message being retrieved in cm_match_req().
          The problem is in the ucma_event_handler() code when this reject
          message is delivered to it and the initial connect message itself
          HAS NOT been delivered to the client. In fact the client has not
          even called rdma_cm_get_event() at this stage so we haven't
          allocated a new ctx in ucma_get_event() and updated the new
          connection CM_ID to point to the new UCMA context.
      
          This results in the reject message not being dropped in
          ucma_event_handler() for the new connection request as the
          (if (!ctx->uid)) block is skipped since the ctx it refers to is
          the listen CM id context which does have a valid UID associated
          with it (I believe the new CMID for the connection initially
          uses the listen CMID -> context when it is created in
          cma_new_conn_id). Thus the assumption that new events for a
          connection can get dropped in ucma_event_handler() is incorrect
          IF the initial connect request has not been retrieved in the
          first case. We end up getting a CM Reject event on the listen CM
          ID and our upper layer code asserts (in fact this event does not
          even have the listen_id set as that only gets set up librdmacm
          for connect requests).
      
      The solution is to verify that the cm_id being reported in the event
      is the same as the cm_id referenced by the ucma context.  A mismatch
      indicates that the ucma context corresponds to the listen.  This fix
      was validated by using a modified version of librdmacm that was able
      to verify the problem and see that the reject message was indeed
      dropped after this patch was applied.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c6b21824
  2. 09 11月, 2013 2 次提交
    • D
      IB/cma: Check for GID on listening device first · be9130cc
      Doug Ledford 提交于
      As a simple optimization that should speed up the vast majority of
      connect attemps on IB devices, when we are searching for the GID of an
      incoming connection in the cached GID lists of devices, search the
      device that received the incoming connection request first.  If we
      don't find it there, then move on to other devices.
      
      This reduces the time to perform 10,000 connections considerably.
      Prior to this patch, a bad run of cmtime would look like this:
      
      connect      :    12399.26   12351.10    8609.00    1239.93
      
      With this patch, it looks more like this:
      
      connect      :     5864.86    5799.80    8876.00     586.49
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      be9130cc
    • D
      IB/cma: Use cached gids · 29f27e84
      Doug Ledford 提交于
      The cma_acquire_dev function was changed by commit 3c86aa70
      ("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
      because multiport devices might have either IB or IBoE formatted gids.
      The old function assumed that all ports on the same device used the
      same GID format.
      
      However, when it was changed to use find_gid_port(), we inadvertently
      lost usage of the GID cache.  This turned out to be a very costly
      change.  In our testing, each iteration through each index of the GID
      table takes roughly 35us.  When you have multiple devices in a system,
      and the GID you are looking for is on one of the later devices, the
      code loops through all of the GID indexes on all of the early devices
      before it finally succeeds on the target device.  This pathological
      search behavior combined with 35us per GID table index retrieval
      results in results such as the following from the cmtime application
      that's part of the latest librdmacm git repo:
      
      ib1:
      step              total ms     max ms     min us  us / conn
      create id    :       29.42       0.04       1.00       2.94
      bind addr    :   186705.66      19.00   18556.00   18670.57
      resolve addr :       41.93       9.68     619.00       4.19
      resolve route:      486.93       0.48     101.00      48.69
      create qp    :     4021.95       6.18     330.00     402.20
      connect      :    68350.39   68588.17   24632.00    6835.04
      disconnect   :     1460.43     252.65-1862269.00     146.04
      destroy      :       41.16       0.04       2.00       4.12
      
      ib0:
      step              total ms     max ms     min us  us / conn
      create id    :       28.61       0.68       1.00       2.86
      bind addr    :     2178.86       2.95     201.00     217.89
      resolve addr :       51.26      16.85     845.00       5.13
      resolve route:      620.08       0.43      92.00      62.01
      create qp    :     3344.40       6.36     273.00     334.44
      connect      :     6435.99    6368.53    7844.00     643.60
      disconnect   :     5095.38     321.90     757.00     509.54
      destroy      :       37.13       0.02       2.00       3.71
      
      Clearly, both the bind address and connect operations suffer
      a huge penalty for being anything other than the default
      GID on the first port in the system.
      
      After applying this patch, the numbers now look like this:
      
      ib1:
      step              total ms     max ms     min us  us / conn
      create id    :       30.15       0.03       1.00       3.01
      bind addr    :       80.27       0.04       7.00       8.03
      resolve addr :       43.02      13.53     589.00       4.30
      resolve route:      482.90       0.45     100.00      48.29
      create qp    :     3986.55       5.80     330.00     398.66
      connect      :     7141.53    7051.29    5005.00     714.15
      disconnect   :     5038.85     193.63     918.00     503.88
      destroy      :       37.02       0.04       2.00       3.70
      
      ib0:
      step              total ms     max ms     min us  us / conn
      create id    :       34.27       0.05       1.00       3.43
      bind addr    :       26.45       0.04       1.00       2.64
      resolve addr :       38.25      10.54     760.00       3.82
      resolve route:      604.79       0.43      97.00      60.48
      create qp    :     3314.95       6.34     273.00     331.49
      connect      :    12399.26   12351.10    8609.00    1239.93
      disconnect   :     5096.76     270.72    1015.00     509.68
      destroy      :       37.10       0.03       2.00       3.71
      
      It's worth noting that we still suffer a bit of a penalty on
      connect to the wrong device, but the penalty is much less than
      it used to be.  Follow on patches deal with this penalty.
      
      Many thanks to Neil Horman for helping to track the source of
      slow function that allowed us to track down the fact that
      the original patch I mentioned above backed out cache usage
      and identify just how much that impacted the system.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      29f27e84
  3. 28 10月, 2013 7 次提交
    • L
      Linux 3.12-rc7 · 959f5854
      Linus Torvalds 提交于
      959f5854
    • L
      Merge branch 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · a2ff8206
      Linus Torvalds 提交于
      Pull parisc fix from Helge Deller:
       "This is a 2-line patch to save the CPU register which holds our task
        thread info pointer before calling a firmware function and then to
        restore it again afterwards.
      
        This is necessary because on some 64bit machines the high-order 32bits
        are being clobbered by the firmware call, and thus we failed to bring
        up secondary CPUs (and instead crashed the kernel) in some situations
        eg if we had more than 4GB RAM.  This patch fixes a bug which has been
        since ever in the parisc linux kernel and which prevented some people
        to use a 64bit kernel"
      
      * 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM
      a2ff8206
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · aff22d3f
      Linus Torvalds 提交于
      Pull timer fix from Ingo Molnar:
       "This tree contains a clockevents regression fix for certain ARM
        subarchitectures"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clockevents: Sanitize ticks to nsec conversion
      aff22d3f
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e2756f5e
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "The tree contains three fixes:
      
         - Two tooling fixes
      
         - Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
           this merge window.  (Patches were proposed to fix it but it was all
           a bit late and we felt it's safer to just delay the ABI one more
           kernel release and do it right)"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf: Disable PERF_RECORD_MMAP2 support
        perf scripting perl: Fix build error on Fedora 12
        perf probe: Fix to initialize fname always before use it
      e2756f5e
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1c99ca43
      Linus Torvalds 提交于
      Pull locking fix from Ingo Molnar:
       "This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
        kernels built with GCC 3.x (there are still such distros)"
      
      Side note: it's not just a fix for old gcc versions, it's also removing
      an incredibly broken/subtle check that LLVM had issues with, and that
      made no sense.
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        mutex: Avoid gcc version dependent __builtin_constant_p() usage
      1c99ca43
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · acda24c4
      Linus Torvalds 提交于
      Pull SCSI target fixes from Nicholas Bellinger:
       "Here are the outstanding target pending fixes for v3.12-rc7.
      
        This includes a number of EXTENDED_COPY related fixes as a result of
        Thomas and Doug's continuing testing and feedback.
      
        Also included is an important vhost/scsi fix that addresses a long
        standing issue where the 'write' parameter for get_user_pages_fast()
        was incorrectly set for virtio-scsi WRITEs -> DMA_TO_DEVICE, and not
        for virtio-scsi READs -> DMA_FROM_DEVICE.
      
        This resulted in random userspace segfaults and other unpleasantness
        on KVM host, and unfortunately has been an issue since the initial
        merge of vhost/scsi in v3.6.  This patch is CC'ed to stable, along
        with two other less critical items"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter
        target/pscsi: fix return value check
        target: Fail XCOPY for non matching source + destination block_size
        target: Generate failure for XCOPY I/O with non-zero scsi_status
        target: Add missing XCOPY I/O operation sense_buffer
        iser-target: check device before dereferencing its variable
        target: Return an error for WRITE SAME with ANCHOR==1
        target: Fix assignment of LUN in tracepoints
        target: Reject EXTENDED_COPY when emulate_3pc is disabled
        target: Allow non zero ListID in EXTENDED_COPY parameter list
        target: Make target_do_xcopy failures return INVALID_PARAMETER_LIST
      acda24c4
    • L
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · 63e65608
      Linus Torvalds 提交于
      Pull slave-dmaengine fixes from Vinod Koul:
       "Here is the late fixes pull request for dmaengine while you fly back
        from KS.
      
        We have a new dmaengine ML hosted by vger so a patch for that along
        with addition of Dave as driver mainatainer for ioat.  Other fixes are
        memeory leak fixes on edma driver, small fixes on rcar-hpbdma driver
        by Sergei"
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: edma: fix another memory leak
        dma: edma: Fix memory leak
        MAINTAINERS: add to ioatdma maintainer list
        MAINTAINERS: add the new dmaengine mailing list
      63e65608
  4. 27 10月, 2013 1 次提交
    • H
      parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM · 54e181e0
      Helge Deller 提交于
      Since the beginning of the parisc-linux port, sometimes 64bit SMP kernels were
      not able to bring up other CPUs than the monarch CPU and instead crashed the
      kernel.  The reason was unclear, esp. since it involved various machines (e.g.
      J5600, J6750 and SuperDome). Testing showed, that those crashes didn't happened
      when less than 4GB were installed, or if a 32bit Linux kernel was booted.
      
      In the end, the fix for those SMP problems is trivial:
      During the early phase of the initialization of the CPUs, including the monarch
      CPU, the PDC_PSW firmware function to enable WIDE (=64bit) mode is called.
      It's documented that this firmware function may clobber various registers, and
      one one of those possibly clobbered registers is %cr30 which holds the task
      thread info pointer.
      
      Now, if %cr30 would always have been clobbered, then this bug would have been
      detected much earlier. But lots of testing finally showed, that - at least for
      %cr30 - on some machines only the upper 32bits of the 64bit register suddenly
      turned zero after the firmware call.
      
      So, after finding the root cause, the explanation for the various crashes
      became clear:
      - On 32bit SMP Linux kernels all upper 32bit were zero, so we didn't faced this
        problem.
      - Monarch CPUs in 64bit mode always booted sucessfully, because the inital task
        thread info pointer was below 4GB.
      - Secondary CPUs booted sucessfully on machines with less than 4GB RAM because
        the upper 32bit were zero anyay.
      - Secondary CPus failed to boot if we had more than 4GB RAM and the task thread
        info pointer was located above the 4GB boundary.
      
      Finally, the patch to fix this problem is trivial by saving the %cr30 register
      before the firmware call and restoring it afterwards.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org> # 2.6.12+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      54e181e0
  5. 26 10月, 2013 6 次提交
    • L
      Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 20582e34
      Linus Torvalds 提交于
      Pull ACPI and power management fixes from
       "These fix two bugs in the intel_pstate driver, a hibernate bug leading
        to nasty resume failures sometimes and acpi-cpufreq initialization bug
        that causes problems to happen during module unload when intel_pstate
        is in use.
      
        Specifics:
      
         - Fix for rounding errors in intel_pstate causing CPU utilization to
           be underestimated from Brennan Shacklett.
      
         - intel_pstate fix to always use the correct max pstate value when
           computing the min pstate from Dirk Brandewie.
      
         - Hibernation fix for deadlocking resume in cases when the probing of
           the device containing the image is deferred from Russ Dill.
      
         - acpi-cpufreq fix to prevent the module from staying in memory when
           the driver cannot be registered and then attempting to unregister
           things that have never been registered on exit"
      
      * tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        acpi-cpufreq: Fail initialization if driver cannot be registered
        PM / hibernate: Move software_resume to late_initcall_sync
        intel_pstate: Correct calculation of min pstate value
        intel_pstate: Improve accuracy by not truncating until final result
      20582e34
    • L
      Merge tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd · d255c59a
      Linus Torvalds 提交于
      Pull final mtd fixes from Brian Norris:
       "A few more last-minute regression fixes, prepared jointly by me and
        David Woodhouse:
      
         - Revert pxa3xx to its old name to avoid breaking existing
           'mtdparts=' boot strings.
      
         - Return GPMI NAND to its legacy ECC layout for backwards
           compatibility.  We will revisit this in 3.13.
      
        A note from David on the latter fix: 'This leaves a harmless cosmetic
        warning about an unused function.  At this point in the cycle I really
        don't care.'"
      
      * tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd:
        mtd: gpmi: fix ECC regression
        mtd: nand: pxa3xx: Fix registered MTD name
      d255c59a
    • N
      vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter · 60a01f55
      Nicholas Bellinger 提交于
      This patch addresses a long-standing bug where the get_user_pages_fast()
      write parameter used for setting the underlying page table entry permission
      bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
      passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().
      
      However, this parameter is intended to signal WRITEs to pinned userspace
      PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
      for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.
      
      This bug would manifest itself as random process segmentation faults on
      KVM host after repeated vhost starts + stops and/or with lots of vhost
      endpoints + LUNs.
      
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Asias He <asias@redhat.com>
      Cc: <stable@vger.kernel.org> # 3.6+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      60a01f55
    • W
      target/pscsi: fix return value check · 58932e96
      Wei Yongjun 提交于
      In case of error, the function scsi_host_lookup() returns NULL
      pointer not ERR_PTR(). The IS_ERR() test in the return value check
      should be replaced with NULL test.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      58932e96
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f55ac56d
      Linus Torvalds 提交于
      Pull vfs fixes (try two) from Al Viro:
       "nfsd performance regression fix + seq_file lseek(2) fix"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        seq_file: always update file->f_pos in seq_lseek()
        nfsd regression since delayed fput()
      f55ac56d
    • D
      mtd: gpmi: fix ECC regression · 031e2777
      David Woodhouse 提交于
      The "legacy" ECC layout used until 3.12-rc1 uses all the OOB area by
      computing the ECC strength and ECC step size ourselves.
      
      Commit 2febcdf8 ("mtd: gpmi: set the BCHs geometry with the ecc info")
      makes the driver use the ECC info (ECC strength and ECC step size)
      provided by the MTD code, and creates a different NAND ECC layout
      for the BCH, and use the new ECC layout. This causes a regression:
      
         We can not mount the ubifs which was created by the old NAND ECC layout.
      
      This patch fixes this issue by reverting to the legacy ECC layout.
      
      We will probably introduce a new device-tree property to indicate that
      the new ECC layout can be used. For now though, for the imminent 3.12
      release, we just unconditionally revert to the 3.11 behaviour.
      
      This leaves a harmless cosmetic warning about an unused function. At
      this point in the cycle I really don't care.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NBrian Norris <computersforpeace@gmail.com>
      Acked-by: NHuang Shijie <b32955@freescale.com>
      Acked-by: NMarek Vasut <marex@denx.de>
      Tested-by: NMarek Vasut <marex@denx.de>
      031e2777
  6. 25 10月, 2013 9 次提交
  7. 24 10月, 2013 12 次提交
    • N
      target: Fail XCOPY for non matching source + destination block_size · 48502ddb
      Nicholas Bellinger 提交于
      This patch adds an explicit check + failure for XCOPY I/O to source +
      destination devices with a non-matching block_size.
      
      This limitiation is currently due to the fact that the scatterlist
      memory allocated for the XCOPY READ operation is passed zero-copy
      to the XCOPY WRITE operation.
      Reported-by: NThomas Glanzmann <thomas@glanzmann.de>
      Reported-by: NDouglas Gilbert <dgilbert@interlog.com>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      48502ddb
    • N
      target: Generate failure for XCOPY I/O with non-zero scsi_status · 8a955d6d
      Nicholas Bellinger 提交于
      This patch adds the missing non-zero se_cmd->scsi_status check required
      for local XCOPY I/O within target_xcopy_issue_pt_cmd() to signal an
      exception case failure.
      
      This will trigger the generation of SAM_STAT_CHECK_CONDITION status
      from within target_xcopy_do_work() process context code.
      Reported-by: NThomas Glanzmann <thomas@glanzmann.de>
      Reported-by: NDouglas Gilbert <dgilbert@interlog.com>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      8a955d6d
    • N
      target: Add missing XCOPY I/O operation sense_buffer · 366bda19
      Nicholas Bellinger 提交于
      This patch adds the missing xcopy_pt_cmd->sense_buffer[] required for
      correctly handling CHECK_CONDITION exceptions within the locally
      generated XCOPY I/O path.
      
      Also update target_xcopy_read_source() + target_xcopy_setup_pt_cmd()
      to pass this buffer into transport_init_se_cmd() to correctly setup
      se_cmd->sense_buffer.
      Reported-by: NThomas Glanzmann <thomas@glanzmann.de>
      Reported-by: NDouglas Gilbert <dgilbert@interlog.com>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      366bda19
    • L
      Merge tag 'md/3.12-fixes' of git://neil.brown.name/md · e6036c0b
      Linus Torvalds 提交于
      Pull md bugfixes from Neil Brown:
       "Assorted md bug-fixes for 3.12.
      
        All tagged for -stable releases too"
      
      * tag 'md/3.12-fixes' of git://neil.brown.name/md:
        raid5: avoid finding "discard" stripe
        raid5: set bio bi_vcnt 0 for discard request
        md: avoid deadlock when md_set_badblocks.
        md: Fix skipping recovery for read-only arrays.
      e6036c0b
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · be6e8c76
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "This is a set of two fixes which cause oopses (Buslogic, qla2xxx) and
       one fix which may cause a hang because of request miscounting (sd)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] sd: call blk_pm_runtime_init before add_disk
        [SCSI] qla2xxx: Fix request queue null dereference.
        [SCSI] BusLogic: Fix an oops when intializing multimaster adapter
      be6e8c76
    • V
      iser-target: check device before dereferencing its variable · 0a66614b
      Vu Pham 提交于
      This patch changes isert_connect_release() to correctly check for
      the existence struct isert_device *device before checking for
      isert_device->use_frwr.
      Signed-off-by: NVu Pham <vu@mellanox.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      0a66614b
    • S
      raid5: avoid finding "discard" stripe · d47648fc
      Shaohua Li 提交于
      SCSI discard will damage discard stripe bio setting, eg, some fields are
      changed. If the stripe is reused very soon, we have wrong bios setting. We
      remove discard stripe from hash list, so next time the strip will be fully
      initialized.
      
      Suitable for backport to 3.7+.
      
      Cc: <stable@vger.kernel.org> (3.7+)
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d47648fc
    • S
      raid5: set bio bi_vcnt 0 for discard request · 37c61ff3
      Shaohua Li 提交于
      SCSI layer will add new payload for discard request. If two bios are merged
      to one, the second bio has bi_vcnt 1 which is set in raid5. This will confuse
      SCSI and cause oops.
      
      Suitable for backport to 3.7+
      
      Cc: stable@vger.kernel.org (v3.7+)
      Reported-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      37c61ff3
    • B
      md: avoid deadlock when md_set_badblocks. · 905b0297
      Bian Yu 提交于
      When operate harddisk and hit errors, md_set_badblocks is called after
      scsi_restart_operations which already disabled the irq. but md_set_badblocks
      will call write_sequnlock_irq and enable irq. so softirq can preempt the
      current thread and that may cause a deadlock. I think this situation should
      use write_sequnlock_irqsave/irqrestore instead.
      
      I met the situation and the call trace is below:
      [  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
      [  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
      [  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
      [  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
      [  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
      [  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
      [  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
      [  638.933884] Call Trace:
      [  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
      [  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
      [  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
      [  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
      [  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
      [  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
      [  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
      [  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
      [  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
      [  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
      [  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
      [  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
      [  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
      [  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
      [  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
      [  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
      [  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
      [  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
      [  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
      [  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
      [  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
      [  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
      [  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
      [  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
      [  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
      [  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
      [  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
      [  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
      [  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
      [  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
      [  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
      [  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
      [  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
      [  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
      [  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
      [  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
      [  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
      [  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
      [  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
      [  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      [  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
      [  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      
      This bug was introduce in commit  2e8ac303
      (the first time rdev_set_badblock was call from interrupt context),
      so this patch is appropriate for 3.5 and subsequent kernels.
      
      Cc: <stable@vger.kernel.org> (3.5+)
      Signed-off-by: NBian Yu <bianyu@kedacom.com>
      Reviewed-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      905b0297
    • L
      md: Fix skipping recovery for read-only arrays. · 61e4947c
      Lukasz Dorau 提交于
      Since:
              commit 7ceb17e8
              md: Allow devices to be re-added to a read-only array.
      
      spares are activated on a read-only array. In case of raid1 and raid10
      personalities it causes that not-in-sync devices are marked in-sync
      without checking if recovery has been finished.
      
      If a read-only array is degraded and one of its devices is not in-sync
      (because the array has been only partially recovered) recovery will be skipped.
      
      This patch adds checking if recovery has been finished before marking a device
      in-sync for raid1 and raid10 personalities. In case of raid5 personality
      such condition is already present (at raid5.c:6029).
      
      Bug was introduced in 3.10 and causes data corruption.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
      Signed-off-by: NLukasz Dorau <lukasz.dorau@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      61e4947c
    • D
      MAINTAINERS: add to ioatdma maintainer list · 18ebd564
      Dave Jiang 提交于
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      [djbw: add dmaengine list]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      18ebd564
    • V
      MAINTAINERS: add the new dmaengine mailing list · 17b59560
      Vinod Koul 提交于
      We have a new mailing list hosted by vger for dmaengine
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      17b59560
  8. 23 10月, 2013 2 次提交
    • A
      [SCSI] sd: call blk_pm_runtime_init before add_disk · 10c580e4
      Aaron Lu 提交于
      Sujit has found a race condition that would make q->nr_pending
      unbalanced, it occurs as Sujit explained:
      
      "
      sd_probe_async() ->
      	add_disk() ->
      		disk_add_event() ->
      			schedule(disk_events_workfn)
      	sd_revalidate_disk()
      	blk_pm_runtime_init()
      return;
      
      Let's say the disk_events_workfn() calls sd_check_events() which tries
      to send test_unit_ready() and because of sd_revalidate_disk() trying to
      send another commands the test_unit_ready() might be re-queued as the
      tagged command queuing is disabled.
      
      So the race condition is -
      
      Thread 1 			  |		Thread 2
      sd_revalidate_disk()		  |	sd_check_events()
      ...nr_pending = 0 as q->dev = NULL|	scsi_queue_insert()
      blk_runtime_pm_init()		  | 	blk_pm_requeue_request() ->
      				  |	nr_pending = -1 since
      				  |	q->dev != NULL
      "
      
      The problem is, the test_unit_ready request doesn't get counted the
      first time it is queued, so the later decrement of q->nr_pending in
      blk_pm_requeue_request makes it unbalanced.
      
      Fix this by calling blk_pm_runtime_init before add_disk so that all
      requests initiated there will all be counted.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Reported-and-tested-by: NSujit Reddy Thumma <sthumma@codeaurora.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      10c580e4
    • C
      [SCSI] qla2xxx: Fix request queue null dereference. · 36008cf1
      Chad Dupuis 提交于
      If an invalid IOCB is returned on the response queue then the index into the
      request queue map could be invalid and could return to us a bogus value. This
      could cause us to try to deference an invalid pointer and cause an exception.
      
      If we encounter this condition, simply return as no context can be established
      for this response.
      Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
      Signed-off-by: NSaurav Kashyap <saurav.kashyap@qlogic.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      36008cf1