1. 18 5月, 2018 1 次提交
    • U
      scsi: cxlflash: Yield to active send threads · e0f76ad1
      Uma Krishnan 提交于
      The following Oops may be encountered if the device is reset, i.e. EEH
      recovery, while there is heavy I/O traffic:
      
      59:mon> t
      [c000200db64bb680] c008000009264c40 cxlflash_queuecommand+0x3b8/0x500
      					[cxlflash]
      [c000200db64bb770] c00000000090d3b0 scsi_dispatch_cmd+0x130/0x2f0
      [c000200db64bb7f0] c00000000090fdd8 scsi_request_fn+0x3c8/0x8d0
      [c000200db64bb900] c00000000067f528 __blk_run_queue+0x68/0xb0
      [c000200db64bb930] c00000000067ab80 __elv_add_request+0x140/0x3c0
      [c000200db64bb9b0] c00000000068daac blk_execute_rq_nowait+0xec/0x1a0
      [c000200db64bba00] c00000000068dbb0 blk_execute_rq+0x50/0xe0
      [c000200db64bba50] c0000000006b2040 sg_io+0x1f0/0x520
      [c000200db64bbaf0] c0000000006b2e94 scsi_cmd_ioctl+0x534/0x610
      [c000200db64bbc20] c000000000926208 sd_ioctl+0x118/0x280
      [c000200db64bbcc0] c00000000069f7ac blkdev_ioctl+0x7fc/0xe30
      [c000200db64bbd20] c000000000439204 block_ioctl+0x84/0xa0
      [c000200db64bbd40] c0000000003f8514 do_vfs_ioctl+0xd4/0xa00
      [c000200db64bbde0] c0000000003f8f04 SyS_ioctl+0xc4/0x130
      [c000200db64bbe30] c00000000000b184 system_call+0x58/0x6c
      
      When there is no room to send the I/O request, the cached room is refreshed
      by reading the memory mapped command room value from the AFU. The AFU
      register mapping is refreshed during a reset, creating a race condition that
      can lead to the Oops above.
      
      During a device reset, the AFU should not be unmapped until all the active
      send threads quiesce. An atomic counter, cmds_active, is currently used to
      track internal AFU commands and quiesce during reset. This same counter can
      also be used for the active send threads.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e0f76ad1
  2. 19 4月, 2018 8 次提交
    • U
      scsi: cxlflash: Handle spurious interrupts · d2d354a6
      Uma Krishnan 提交于
      The following Oops can occur when there is heavy I/O traffic and the host is
      reset by a tool such as sg_reset.
      
      [c000200fff3fbc90] c00800001690117c process_cmd_doneq+0x104/0x500
                                             [cxlflash] (unreliable)
      [c000200fff3fbd80] c008000016901648 cxlflash_rrq_irq+0xd0/0x150 [cxlflash]
      [c000200fff3fbde0] c000000000193130 __handle_irq_event_percpu+0xa0/0x310
      [c000200fff3fbea0] c0000000001933d8 handle_irq_event_percpu+0x38/0x90
      [c000200fff3fbee0] c000000000193494 handle_irq_event+0x64/0xb0
      [c000200fff3fbf10] c000000000198ea0 handle_fasteoi_irq+0xc0/0x230
      [c000200fff3fbf40] c00000000019182c generic_handle_irq+0x4c/0x70
      [c000200fff3fbf60] c00000000001794c __do_irq+0x7c/0x1c0
      [c000200fff3fbf90] c00000000002a390 call_do_irq+0x14/0x24
      [c000200e5828fab0] c000000000017b2c do_IRQ+0x9c/0x130
      [c000200e5828fb00] c000000000009b04 h_virt_irq_common+0x114/0x120
      
      When a context is reset, the pending commands are flushed and the AFU is
      notified. Before the AFU handles this request there could be command
      completion interrupts queued to PHB which are yet to be delivered to the
      context. In this scenario, a context could receive an interrupt for a command
      that has been flushed, leading to a possible crash when the memory for the
      flushed command is accessed.
      
      To resolve this problem, a boolean will indicate if the hardware queue is
      ready to process interrupts or not. This can be evaluated in the interrupt
      handler before proessing an interrupt.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d2d354a6
    • U
      scsi: cxlflash: Remove commmands from pending list on timeout · 9a597cd4
      Uma Krishnan 提交于
      The following Oops can occur if an internal command sent to the AFU does not
      complete within the timeout:
      
      [c000000ff101b810] c008000016020d94 term_mc+0xfc/0x1b0 [cxlflash]
      [c000000ff101b8a0] c008000016020fb0 term_afu+0x168/0x280 [cxlflash]
      [c000000ff101b930] c0080000160232ec cxlflash_pci_error_detected+0x184/0x230
                                             [cxlflash]
      [c000000ff101b9e0] c00800000d95d468 cxl_vphb_error_detected+0x90/0x150[cxl]
      [c000000ff101ba20] c00800000d95f27c cxl_pci_error_detected+0xa4/0x240 [cxl]
      [c000000ff101bac0] c00000000003eaf8 eeh_report_error+0xd8/0x1b0
      [c000000ff101bb20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170
      [c000000ff101bbb0] c00000000003f438 eeh_handle_normal_event+0x198/0x580
      [c000000ff101bc60] c00000000003fba4 eeh_handle_event+0x2a4/0x338
      [c000000ff101bd10] c0000000000400b8 eeh_event_handler+0x1f8/0x200
      [c000000ff101bdc0] c00000000013da48 kthread+0x1a8/0x1b0
      [c000000ff101be30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4
      
      When an internal command times out, the command buffer is freed while it is
      still in the pending commands list of the context. This corrupts the list and
      when the context is cleaned up, a crash is encountered.
      
      To resolve this issue, when an AFU command or TMF command times out, the
      command should be deleted from the hardware queue pending command list before
      freeing the buffer.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      9a597cd4
    • U
      scsi: cxlflash: Synchronize reset and remove ops · a3feb6ef
      Uma Krishnan 提交于
      The following Oops can be encountered if a device removal or system shutdown
      is initiated while an EEH recovery is in process:
      
      [c000000ff2f479c0] c008000015256f18 cxlflash_pci_slot_reset+0xa0/0x100
                                            [cxlflash]
      [c000000ff2f47a30] c00800000dae22e0 cxl_pci_slot_reset+0x168/0x290 [cxl]
      [c000000ff2f47ae0] c00000000003ef1c eeh_report_reset+0xec/0x170
      [c000000ff2f47b20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170
      [c000000ff2f47bb0] c00000000003f80c eeh_handle_normal_event+0x56c/0x580
      [c000000ff2f47c60] c00000000003fba4 eeh_handle_event+0x2a4/0x338
      [c000000ff2f47d10] c0000000000400b8 eeh_event_handler+0x1f8/0x200
      [c000000ff2f47dc0] c00000000013da48 kthread+0x1a8/0x1b0
      [c000000ff2f47e30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4
      
      The remove handler frees AFU memory while the EEH recovery is in progress,
      leading to a race condition. This can result in a crash if the recovery thread
      tries to access this memory.
      
      To resolve this issue, the cxlflash remove handler will evaluate the device
      state and yield to any active reset or probing threads.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      a3feb6ef
    • U
      scsi: cxlflash: Enable OCXL operations · 07d0c52f
      Uma Krishnan 提交于
      This commit enables the OCXL operations for the OCXL devices.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      07d0c52f
    • U
      scsi: cxlflash: Setup LISNs for master contexts · d44af4b0
      Uma Krishnan 提交于
      Similar to user contexts, master contexts also require that the per-context
      LISN registers be programmed for certain AFUs. The mapped trigger page is
      obtained from underlying transport and registered with AFU for each master
      context.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d44af4b0
    • U
      scsi: cxlflash: Hardware AFU for OCXL · 48e077db
      Uma Krishnan 提交于
      When an adapter is initialized, transport specific configuration and MMIO
      mapping details need to be saved. For CXL, this data is managed by the
      underlying kernel module. To maintain a separation between the cxlflash core
      and underlying transports, introduce a new structure to store data specific to
      the OCXL AFU.
      
      Initially only the pointers to underlying PCI and generic devices are added to
      this new structure - it will be expanded further in future commits. Services
      to create and destroy this hardware AFU are added and integrated in the probe
      and exit paths of the driver.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      48e077db
    • M
      scsi: cxlflash: Avoid clobbering context control register value · 465891fe
      Matthew R. Ochs 提交于
      The SISLite specification originally defined the context control register with
      a single field of bits to represent the LISN and also stipulated that the
      register reset value be 0. The cxlflash driver took advantage of this when
      programming the LISN for the master contexts via an unconditional write - no
      other bits were preserved.
      
      When unmap support was added, SISLite was updated to define bit 0 of the
      context control register as a way for the AFU to notify the context owner that
      unmap operations were supported. Thus the assumptions under which the register
      is setup changed and the existing unconditional write is clobbering the unmap
      state for master contexts. This is presently not an issue due to the order in
      which the context control register is programmed in relation to the unmap bit
      being queried but should be addressed to avoid a future regression in the
      event this code is moved elsewhere.
      
      To remedy this issue, preserve the bits when programming the LISN field in the
      context control register. Since the LISN will now be programmed using a read
      value, assert that the initial state of the LISN field is as described in
      SISLite (0).
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      465891fe
    • U
      scsi: cxlflash: Preserve number of interrupts for master contexts · e11e0ff8
      Uma Krishnan 提交于
      The number of interrupts requested for user contexts are stored in the context
      specific structures and utilized to manage the interrupts. For the master
      contexts, this number is only used once and therefore not saved.
      
      To prepare for future commits where the number of interrupts will be required
      in more than one place, preserve the value in the master context structure.
      
      [mkp: typo in comment]
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e11e0ff8
  3. 11 1月, 2018 5 次提交
  4. 01 11月, 2017 1 次提交
  5. 17 10月, 2017 1 次提交
  6. 26 8月, 2017 1 次提交
  7. 13 7月, 2017 1 次提交
  8. 02 7月, 2017 3 次提交
  9. 27 6月, 2017 15 次提交
  10. 14 4月, 2017 4 次提交