1. 01 12月, 2016 5 次提交
    • M
      scsi: cxlflash: Allocate memory instead of using command pool for AFU sync · 350bb478
      Matthew R. Ochs 提交于
      As staging for the removal of the AFU command pool, remove the reliance
      upon the pool for the internal AFU sync command. Instead of obtaining an
      AFU command from the pool, dynamically allocate memory with the appropriate
      alignment requirements. Since the AFU sync service is only executed from
      the process environment, blocking is acceptable.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      350bb478
    • M
      scsi: cxlflash: Remove unused buffer from AFU command · e7ab2d40
      Matthew R. Ochs 提交于
      The cxlflash driver originally required a per-command 4K buffer that
      hosted data passed to the AFU. When the routines that initiate AFU
      and internal SCSI commands were refactored to use scsi_execute(), the
      need for this buffer became obsolete. As it is no longer necessary,
      the buffer is removed.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e7ab2d40
    • U
      scsi: cxlflash: Avoid command room violation · 11f7b184
      Uma Krishnan 提交于
      During test, a command room violation interrupt is occasionally seen
      for the master context when the CXL flash devices are stressed.
      
      After studying the code, there could be gaps in the way command room
      value is being cached in cxlflash. When the cached command room is zero
      the thread attempting to send becomes burdened with updating the cached
      value with the actual value from the AFU. Today, this is handled with an
      atomic set operation of the raw value read. Following the atomic update,
      the thread proceeds to send.
      
      This behavior is incorrect on two counts:
      
         - The update fails to take into account the current thread and its
           consumption of one of the hardware commands.
      
         - The update does not take into account other threads also atomically
           updating. Per design, a worker thread updates the cached value when a
           send thread times out. By not protecting the update with a lock, the
           cached value can be incorrectly clobbered.
      
      To correct these issues, the update of the cached command room has been
      simplified and also protected using a spin lock which is held until the
      MMIO is complete. This ensures the command room is properly consumed by
      the same thread. Update of cached value also takes into account the
      current thread consuming a hardware command.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      11f7b184
    • U
      scsi: cxlflash: Improve context_reset() logic · 3d2f617d
      Uma Krishnan 提交于
      Currently, the context reset routine waits for command room to
      be available before sending the reset request. Per review of the
      SISLite specification and clarifications from the CXL Flash AFU
      designers, this wait is unnecessary. The reset request can be
      sent anytime regardless of command room, so long as only a single
      reset request is active at any one point in time.
      
      This commit simplifies the reset routine by removing the wait for
      command room. Additionally it adds a debug trace to help pinpoint
      hardware errors when a context reset does not complete.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      3d2f617d
    • U
      scsi: cxlflash: Set sg_tablesize to 1 instead of SG_NONE · 68ab2d76
      Uma Krishnan 提交于
      The following Oops is encountered when blk_mq is enabled with the
      cxlflash driver:
      
      [ 2960.817172] Oops: Kernel access of bad area, sig: 11 [#5]
      [ 2960.817309] NIP  __blk_mq_run_hw_queue+0x278/0x4c0
      [ 2960.817313] LR __blk_mq_run_hw_queue+0x2bc/0x4c0
      [ 2960.817314] Call Trace:
      [ 2960.817320] __blk_mq_run_hw_queue+0x2bc/0x4c0 (unreliable)
      [ 2960.817324] blk_mq_run_hw_queue+0xd8/0x100
      [ 2960.817329] blk_mq_insert_requests+0x14c/0x1f0
      [ 2960.817333] blk_mq_flush_plug_list+0x150/0x190
      [ 2960.817338] blk_flush_plug_list+0x11c/0x2b0
      [ 2960.817344] blk_finish_plug+0x58/0x80
      [ 2960.817348] __do_page_cache_readahead+0x1c0/0x2e0
      [ 2960.817352] force_page_cache_readahead+0x68/0xd0
      [ 2960.817356] generic_file_read_iter+0x43c/0x6a0
      [ 2960.817359] blkdev_read_iter+0x68/0xa0
      [ 2960.817361] __vfs_read+0x11c/0x180
      [ 2960.817364] vfs_read+0xa4/0x1c0
      [ 2960.817366] SyS_read+0x6c/0x110
      [ 2960.817369] system_call+0x38/0xb4
      
      The SCSI blk_mq stack assumes that sg_tablesize is always a non-zero
      value with scsi_mq_setup_tags() allocating tags using sg_tablesize.
      The cxlflash driver currently uses SG_NONE (0) for the sg_tablesize
      as the devices it supports are not capable of scatter gather. This
      mismatch of values results in the Oops above.
      
      To resolve this issue, sg_tablesize for cxlflash can simply be set
      to 1, a value which satisfies the constraints in cxlflash and the
      lack of support of SG_NONE in SCSI blk_mq.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      68ab2d76
  2. 15 9月, 2016 3 次提交
  3. 09 9月, 2016 2 次提交
    • U
      scsi: cxlflash: Remove the device cleanly in the system shutdown path · babf985d
      Uma Krishnan 提交于
      Commit 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash
      cards") was recently introduced to notify the AFU when a system is going
      down. Due to the position of the cxlflash driver in the device stack,
      cxlflash devices are _always_ removed during a reboot/shutdown. This can
      lead to a crash if the cxlflash shutdown hook is invoked _after_ the
      shutdown hook for the owning virtual PHB. Furthermore, the current
      implementation of shutdown/remove hooks for cxlflash are not tolerant to
      being invoked when the device is not enabled. This can also lead to a
      crash in situations where the remove hook is invoked after the device
      has been removed via the vPHBs shutdown hook. An example of this
      scenario would be an EEH reset failure while a reboot/shutdown is in
      progress.
      
      To solve both problems, the shutdown hook for cxlflash is updated to
      simply remove the device. This path already includes the AFU
      notification and thus this solution will continue to perform the
      original intent. At the same time, the remove hook is updated to protect
      against being called when the device is not enabled.
      
      Fixes: 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash
      cards")
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      babf985d
    • U
      scsi: cxlflash: Scan host only after the port is ready for I/O · bbbfae96
      Uma Krishnan 提交于
      When a port link is established, the AFU sends a 'link up' interrupt.
      After the link is up, corresponding initialization steps are performed
      on the card. Following that, when the card is ready for I/O, the AFU
      sends 'login succeeded' interrupt. Today, cxlflash invokes
      scsi_scan_host() upon receipt of both interrupts.
      
      SCSI commands sent to the port prior to the 'login succeeded' interrupt
      will fail with 'port not available' error. This is not desirable.
      Moreover, when async_scan is active for the host, subsequent scan calls
      are terminated with error. Due to this, the scsi_scan_host() call
      performed after 'login succeeded' interrupt could portentially return
      error and the devices may not be scanned properly.
      
      To avoid this problem, scsi_scan_host() should be called only after the
      'login succeeded' interrupt.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      bbbfae96
  4. 27 7月, 2016 1 次提交
  5. 13 7月, 2016 3 次提交
  6. 29 3月, 2016 2 次提交
    • M
      cxlflash: Move to exponential back-off when cmd_room is not available · ea765431
      Manoj N. Kumar 提交于
      While profiling the cxlflash_queuecommand() path under a heavy load it
      was found that number of retries to find cmd_room was fairly high.
      
      There are two problems with the current back-off:
      a) It starts with a udelay of 0
      b) It backs-off linearly
      
      Tried several approaches (a higher multiple 10*n, 100*n, as well as n^2,
      2^n) and found that the exponential back-off(2^n) approach had the least
      overall cost. Cost as being defined as overall time spent waiting.
      
      The fix is to change the linear back-off to an exponential back-off.
      This solution also takes care of the problem with the initial
      delay (starts with 1 usec).
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ea765431
    • M
      cxlflash: Fix regression issue with re-ordering patch · 9526f360
      Manoj N. Kumar 提交于
      While running 'sg_reset -H' back to back the following exception was seen:
      
      [  735.115695] Faulting instruction address: 0xd0000000098c0864
      cpu 0x0: Vector: 300 (Data Access) at [c000000ffffafa80]
          pc: d0000000098c0864: cxlflash_async_err_irq+0x84/0x5c0 [cxlflash]
          lr: c00000000013aed0: handle_irq_event_percpu+0xa0/0x310
          sp: c000000ffffafd00
         msr: 9000000000009033
         dar: 2010000
       dsisr: 40000000
        current = 0xc000000001510880
        paca    = 0xc00000000fb80000   softe: 0        irq_happened: 0x01
          pid   = 0, comm = swapper/0
      
      Linux version 4.5.0-491-26f710d+
      
      enter ? for help
      [c000000ffffafe10] c00000000013aed0 handle_irq_event_percpu+0xa0/0x310
      [c000000ffffafed0] c00000000013b1a8 handle_irq_event+0x68/0xc0
      [c000000ffffaff00] c0000000001404ec handle_fasteoi_irq+0xec/0x2a0
      [c000000ffffaff30] c00000000013a084 generic_handle_irq+0x54/0x80
      [c000000ffffaff60] c000000000011130 __do_irq+0x80/0x1d0
      [c000000ffffaff90] c000000000024d40 call_do_irq+0x14/0x24
      [c000000001573a20] c000000000011318 do_IRQ+0x98/0x140
      [c000000001573a70] c000000000002594 hardware_interrupt_common+0x114/0x180
      
      This exception is being hit because the async_err interrupt path performs
      an MMIO to read the interrupt status register. The MMIO region in this
      case is not available.
      
      Commit 6ded8b3c ("cxlflash: Unmap problem state area before detaching
      master context") re-ordered the sequence in which term_mc() and stop_afu()
      are called. This introduces a window for interrupts to come in with the
      problem space area unmapped, that did not exist previously.
      
      The fix is to separate the disabling of all AFU interrupts to a distinct
      function, term_intr() so that it is the first thing that is done in the
      tear down process.
      
      To keep the initialization process symmetric, separate the AFU interrupt
      setup also to a distinct function: init_intr().
      
      Fixes: 6ded8b3c ("cxlflash: Unmap problem state area before detaching master context")
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      9526f360
  7. 09 3月, 2016 5 次提交
  8. 07 1月, 2016 5 次提交
  9. 30 10月, 2015 14 次提交