1. 13 7月, 2016 2 次提交
    • U
      cxlflash: Add device dependent flags · 96e1b660
      Uma Krishnan 提交于
      Device dependent flags are needed to support functions that are specific
      to a particular device.
      
      One such case is - some CXL Flash cards need to be notified of device
      shutdown. For other CXL devices, this feature does not prove to be
      useful yet. Such distinct features need to be identified in the driver
      to bypass or invoke specific functionality.
      
      In this patch, a member 'flags' has been added to device dependent
      values. These flags will be used and expanded in the future to support
      various device specific functions.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      96e1b660
    • M
      cxlflash: Fix to drain operations from previous reset · f411396d
      Manoj N. Kumar 提交于
      While running 'sg_reset -H' in a loop with a user-space application active,
      hit the following exception:
      
      cpu 0x2: Vector: 300 (Data Access)
          pc: : afu_attach+0x50/0x240 [cxlflash]
          lr: : cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash]
          pid   = 20365, comm = run_block_fvt
      
      Linux version 4.5.0-491-26f710d+
      
      cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash]
      cxlflash_ioctl+0x5a8/0x6f0 [cxlflash]
      scsi_ioctl+0x3b0/0x4c0
      sd_ioctl+0x110/0x190
      blkdev_ioctl+0x28c/0xc20
      block_ioctl+0xa4/0xd0
      do_vfs_ioctl+0xd8/0x8c0
      SyS_ioctl+0xd4/0xf0
      system_call+0x38/0xb4
      
      The problem here is that the problem space area is unmapped while the
      application issues the DK_CXLFLASH_RECOVER_AFU ioctl.
      
      This is the order I observe:
      
      proc1				proc2
      1) sg_reset
      				2) ioctl(DK_CXLFLASH_RECOVER_AFU)
      3) sg_reset again
         causing a PSA unmap
      				4) continues RECOVER_AFU processing
      
      The resolution to this problem is to have the reset handler drain all
      outstanding user space initiated ioctls before proceeding.  It is safe
      to drain after the state has been changed to STATE_RESET. Also since
      drain_ioctls() was static, it had to be moved up a bit to be before
      cxlflash_eh_host_reset_handler().
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      f411396d
  2. 06 5月, 2016 1 次提交
    • M
      cxlflash: Fix to resolve dead-lock during EEH recovery · 635f6b08
      Manoj N. Kumar 提交于
      When a cxlflash adapter goes into EEH recovery and multiple processes
      (each having established its own context) are active, the EEH recovery
      can hang if the processes attempt to recover in parallel. The symptom
      logged after a couple of minutes is:
      
      INFO: task eehd:48 blocked for more than 120 seconds.
      Not tainted 4.5.0-491-26f710d+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      eehd            0    48      2
      Call Trace:
      __switch_to+0x2f0/0x410
      __schedule+0x300/0x980
      schedule+0x48/0xc0
      rwsem_down_write_failed+0x294/0x410
      down_write+0x88/0xb0
      cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash]
      cxl_vphb_error_detected+0x88/0x110 [cxl]
      cxl_pci_error_detected+0xb0/0x1d0 [cxl]
      eeh_report_error+0xbc/0x130
      eeh_pe_dev_traverse+0x94/0x160
      eeh_handle_normal_event+0x17c/0x450
      eeh_handle_event+0x184/0x370
      eeh_event_handler+0x1c8/0x1d0
      kthread+0x110/0x130
      ret_from_kernel_thread+0x5c/0xa4
      INFO: task blockio:33215 blocked for more than 120 seconds.
      
      Not tainted 4.5.0-491-26f710d+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      blockio         0 33215  33213
      Call Trace:
      0x1 (unreliable)
      __switch_to+0x2f0/0x410
      __schedule+0x300/0x980
      schedule+0x48/0xc0
      rwsem_down_read_failed+0x124/0x1d0
      down_read+0x68/0x80
      cxlflash_ioctl+0x70/0x6f0 [cxlflash]
      scsi_ioctl+0x3b0/0x4c0
      sg_ioctl+0x960/0x1010
      do_vfs_ioctl+0xd8/0x8c0
      SyS_ioctl+0xd4/0xf0
      system_call+0x38/0xb4
      INFO: task eehd:48 blocked for more than 120 seconds.
      
      The hang is because of a 3 way dead-lock:
      
      Process A holds the recovery mutex, and waits for eehd to complete.
      Process B holds the semaphore and waits for the recovery mutex.
      eehd waits for semaphore.
      
      The fix is to have Process B above release the semaphore before
      attempting to acquire the recovery mutex. This will allow
      eehd to proceed to completion.
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      635f6b08
  3. 29 3月, 2016 2 次提交
    • M
      cxlflash: Move to exponential back-off when cmd_room is not available · ea765431
      Manoj N. Kumar 提交于
      While profiling the cxlflash_queuecommand() path under a heavy load it
      was found that number of retries to find cmd_room was fairly high.
      
      There are two problems with the current back-off:
      a) It starts with a udelay of 0
      b) It backs-off linearly
      
      Tried several approaches (a higher multiple 10*n, 100*n, as well as n^2,
      2^n) and found that the exponential back-off(2^n) approach had the least
      overall cost. Cost as being defined as overall time spent waiting.
      
      The fix is to change the linear back-off to an exponential back-off.
      This solution also takes care of the problem with the initial
      delay (starts with 1 usec).
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ea765431
    • M
      cxlflash: Fix regression issue with re-ordering patch · 9526f360
      Manoj N. Kumar 提交于
      While running 'sg_reset -H' back to back the following exception was seen:
      
      [  735.115695] Faulting instruction address: 0xd0000000098c0864
      cpu 0x0: Vector: 300 (Data Access) at [c000000ffffafa80]
          pc: d0000000098c0864: cxlflash_async_err_irq+0x84/0x5c0 [cxlflash]
          lr: c00000000013aed0: handle_irq_event_percpu+0xa0/0x310
          sp: c000000ffffafd00
         msr: 9000000000009033
         dar: 2010000
       dsisr: 40000000
        current = 0xc000000001510880
        paca    = 0xc00000000fb80000   softe: 0        irq_happened: 0x01
          pid   = 0, comm = swapper/0
      
      Linux version 4.5.0-491-26f710d+
      
      enter ? for help
      [c000000ffffafe10] c00000000013aed0 handle_irq_event_percpu+0xa0/0x310
      [c000000ffffafed0] c00000000013b1a8 handle_irq_event+0x68/0xc0
      [c000000ffffaff00] c0000000001404ec handle_fasteoi_irq+0xec/0x2a0
      [c000000ffffaff30] c00000000013a084 generic_handle_irq+0x54/0x80
      [c000000ffffaff60] c000000000011130 __do_irq+0x80/0x1d0
      [c000000ffffaff90] c000000000024d40 call_do_irq+0x14/0x24
      [c000000001573a20] c000000000011318 do_IRQ+0x98/0x140
      [c000000001573a70] c000000000002594 hardware_interrupt_common+0x114/0x180
      
      This exception is being hit because the async_err interrupt path performs
      an MMIO to read the interrupt status register. The MMIO region in this
      case is not available.
      
      Commit 6ded8b3c ("cxlflash: Unmap problem state area before detaching
      master context") re-ordered the sequence in which term_mc() and stop_afu()
      are called. This introduces a window for interrupts to come in with the
      problem space area unmapped, that did not exist previously.
      
      The fix is to separate the disabling of all AFU interrupts to a distinct
      function, term_intr() so that it is the first thing that is done in the
      tear down process.
      
      To keep the initialization process symmetric, separate the AFU interrupt
      setup also to a distinct function: init_intr().
      
      Fixes: 6ded8b3c ("cxlflash: Unmap problem state area before detaching master context")
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      9526f360
  4. 09 3月, 2016 8 次提交
  5. 07 1月, 2016 6 次提交
  6. 11 12月, 2015 2 次提交
  7. 30 10月, 2015 19 次提交