1. 09 9月, 2016 1 次提交
    • U
      scsi: cxlflash: Scan host only after the port is ready for I/O · bbbfae96
      Uma Krishnan 提交于
      When a port link is established, the AFU sends a 'link up' interrupt.
      After the link is up, corresponding initialization steps are performed
      on the card. Following that, when the card is ready for I/O, the AFU
      sends 'login succeeded' interrupt. Today, cxlflash invokes
      scsi_scan_host() upon receipt of both interrupts.
      
      SCSI commands sent to the port prior to the 'login succeeded' interrupt
      will fail with 'port not available' error. This is not desirable.
      Moreover, when async_scan is active for the host, subsequent scan calls
      are terminated with error. Due to this, the scsi_scan_host() call
      performed after 'login succeeded' interrupt could portentially return
      error and the devices may not be scanned properly.
      
      To avoid this problem, scsi_scan_host() should be called only after the
      'login succeeded' interrupt.
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      bbbfae96
  2. 24 8月, 2016 2 次提交
  3. 19 8月, 2016 3 次提交
  4. 27 7月, 2016 1 次提交
  5. 19 7月, 2016 1 次提交
  6. 13 7月, 2016 3 次提交
  7. 06 5月, 2016 1 次提交
    • M
      cxlflash: Fix to resolve dead-lock during EEH recovery · 635f6b08
      Manoj N. Kumar 提交于
      When a cxlflash adapter goes into EEH recovery and multiple processes
      (each having established its own context) are active, the EEH recovery
      can hang if the processes attempt to recover in parallel. The symptom
      logged after a couple of minutes is:
      
      INFO: task eehd:48 blocked for more than 120 seconds.
      Not tainted 4.5.0-491-26f710d+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      eehd            0    48      2
      Call Trace:
      __switch_to+0x2f0/0x410
      __schedule+0x300/0x980
      schedule+0x48/0xc0
      rwsem_down_write_failed+0x294/0x410
      down_write+0x88/0xb0
      cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash]
      cxl_vphb_error_detected+0x88/0x110 [cxl]
      cxl_pci_error_detected+0xb0/0x1d0 [cxl]
      eeh_report_error+0xbc/0x130
      eeh_pe_dev_traverse+0x94/0x160
      eeh_handle_normal_event+0x17c/0x450
      eeh_handle_event+0x184/0x370
      eeh_event_handler+0x1c8/0x1d0
      kthread+0x110/0x130
      ret_from_kernel_thread+0x5c/0xa4
      INFO: task blockio:33215 blocked for more than 120 seconds.
      
      Not tainted 4.5.0-491-26f710d+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      blockio         0 33215  33213
      Call Trace:
      0x1 (unreliable)
      __switch_to+0x2f0/0x410
      __schedule+0x300/0x980
      schedule+0x48/0xc0
      rwsem_down_read_failed+0x124/0x1d0
      down_read+0x68/0x80
      cxlflash_ioctl+0x70/0x6f0 [cxlflash]
      scsi_ioctl+0x3b0/0x4c0
      sg_ioctl+0x960/0x1010
      do_vfs_ioctl+0xd8/0x8c0
      SyS_ioctl+0xd4/0xf0
      system_call+0x38/0xb4
      INFO: task eehd:48 blocked for more than 120 seconds.
      
      The hang is because of a 3 way dead-lock:
      
      Process A holds the recovery mutex, and waits for eehd to complete.
      Process B holds the semaphore and waits for the recovery mutex.
      eehd waits for semaphore.
      
      The fix is to have Process B above release the semaphore before
      attempting to acquire the recovery mutex. This will allow
      eehd to proceed to completion.
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      635f6b08
  8. 29 3月, 2016 2 次提交
    • M
      cxlflash: Move to exponential back-off when cmd_room is not available · ea765431
      Manoj N. Kumar 提交于
      While profiling the cxlflash_queuecommand() path under a heavy load it
      was found that number of retries to find cmd_room was fairly high.
      
      There are two problems with the current back-off:
      a) It starts with a udelay of 0
      b) It backs-off linearly
      
      Tried several approaches (a higher multiple 10*n, 100*n, as well as n^2,
      2^n) and found that the exponential back-off(2^n) approach had the least
      overall cost. Cost as being defined as overall time spent waiting.
      
      The fix is to change the linear back-off to an exponential back-off.
      This solution also takes care of the problem with the initial
      delay (starts with 1 usec).
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ea765431
    • M
      cxlflash: Fix regression issue with re-ordering patch · 9526f360
      Manoj N. Kumar 提交于
      While running 'sg_reset -H' back to back the following exception was seen:
      
      [  735.115695] Faulting instruction address: 0xd0000000098c0864
      cpu 0x0: Vector: 300 (Data Access) at [c000000ffffafa80]
          pc: d0000000098c0864: cxlflash_async_err_irq+0x84/0x5c0 [cxlflash]
          lr: c00000000013aed0: handle_irq_event_percpu+0xa0/0x310
          sp: c000000ffffafd00
         msr: 9000000000009033
         dar: 2010000
       dsisr: 40000000
        current = 0xc000000001510880
        paca    = 0xc00000000fb80000   softe: 0        irq_happened: 0x01
          pid   = 0, comm = swapper/0
      
      Linux version 4.5.0-491-26f710d+
      
      enter ? for help
      [c000000ffffafe10] c00000000013aed0 handle_irq_event_percpu+0xa0/0x310
      [c000000ffffafed0] c00000000013b1a8 handle_irq_event+0x68/0xc0
      [c000000ffffaff00] c0000000001404ec handle_fasteoi_irq+0xec/0x2a0
      [c000000ffffaff30] c00000000013a084 generic_handle_irq+0x54/0x80
      [c000000ffffaff60] c000000000011130 __do_irq+0x80/0x1d0
      [c000000ffffaff90] c000000000024d40 call_do_irq+0x14/0x24
      [c000000001573a20] c000000000011318 do_IRQ+0x98/0x140
      [c000000001573a70] c000000000002594 hardware_interrupt_common+0x114/0x180
      
      This exception is being hit because the async_err interrupt path performs
      an MMIO to read the interrupt status register. The MMIO region in this
      case is not available.
      
      Commit 6ded8b3c ("cxlflash: Unmap problem state area before detaching
      master context") re-ordered the sequence in which term_mc() and stop_afu()
      are called. This introduces a window for interrupts to come in with the
      problem space area unmapped, that did not exist previously.
      
      The fix is to separate the disabling of all AFU interrupts to a distinct
      function, term_intr() so that it is the first thing that is done in the
      tear down process.
      
      To keep the initialization process symmetric, separate the AFU interrupt
      setup also to a distinct function: init_intr().
      
      Fixes: 6ded8b3c ("cxlflash: Unmap problem state area before detaching master context")
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      9526f360
  9. 09 3月, 2016 8 次提交
  10. 07 1月, 2016 6 次提交
  11. 11 12月, 2015 2 次提交
  12. 30 10月, 2015 10 次提交
    • M
      cxlflash: Fix to avoid bypassing context cleanup · a82544c7
      Matthew R. Ochs 提交于
      Contexts may be skipped over for cleanup in situations where contention
      for the adapter's table-list mutex is experienced in the presence of a
      signal during the execution of the release handler.
      
      This can lead to two known issues:
      
       - A hang condition on remove as that path tries to wait for users to
         cleanup - something that will never complete should this scenario play
         out as the user has already cleaned up from their perspective.
      
       - An Oops in the unmap_mapping_range() call that is made as part of
         the user waiting mechanism that is invoked on remove when contexts
         are found to still exist.
      
      The root cause of this issue can be found in get_context() and how the
      table-list mutex is acquired. As this code path is shared by several
      different access points within the driver, a decision was made during
      the development cycle to acquire this mutex in this location using the
      interruptible version of the mutex locking service. In almost all of
      the use-cases and environmental scenarios this holds up, even when the
      mutex is contended. However, for critical system threads (such as the
      release handler), failing to acquire the mutex and bailing with the
      intention of the user being able to try again later is unacceptable.
      
      In such a scenario, the context _must_ be derived as it is on an
      irreversible path to being freed. Without being able to derive the
      context, the code mistakenly assumes that it has already been freed
      and proceeds to free up the underlying CXL context resources. From
      this point on, any usage of [the now stale] CXL context resources
      will result in undefined behavior. This is root cause of the Oops
      mentioned as the second known issue as the mapping passed to the
      unmap_mapping_range() service is owned by the CXL context.
      
      To fix this problem, acquisition of the table-list mutex within
      get_context() is simply changed to use the uninterruptible version
      of the mutex locking service. This is safe as the timing windows for
      holding this mutex are short and also protected against blocking.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      a82544c7
    • M
      cxlflash: Fix to avoid lock instrumentation rejection · 0d73122c
      Matthew R. Ochs 提交于
      When running with lock instrumentation (e.g. lockdep), some of the
      instrumentation can become disabled at probe time for a cxlflash
      adapter. This is due to a missing lock registration for the tmf_slock.
      
      The fix is to call spin_lock_init() for the tmf_slock during probe.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      0d73122c
    • M
      cxlflash: Fix to avoid corrupting port selection mask · 1a47401b
      Matthew R. Ochs 提交于
      The port selection mask of a LUN can be corrupted when the manage LUN
      ioctl (DK_CXLFLASH_MANAGE_LUN) is issued more than once for any device.
      
      This mask indicates to the AFU which port[s] can be used for a data
      transfer to/from a particular LUN. The mask is critical to ensuring the
      correct behavior when using the virtual LUN function of this adapter.
      When the mask is configured for both ports, an I/O may be sent to either
      port as the AFU assumes that each port has access to the same physical
      device (specified by LUN ID in the port LUN table).
      
      In a situation where the mask becomes incorrectly configured to reflect
      access to both ports when in fact there is only access through a single
      port, an I/O can be targeted to the wrong physical device. This can lead
      to data corruption among other ill effects (e.g. security leaks).
      
      The cause for this corruption is the assumption that the ioctl will only
      be called a second time for a LUN when it is being configured for access
      via a second port. A boolean 'newly_created' variable is used to
      differentiate between a LUN that was created (and subsequently configured
      for single port access) and one that is destined for access across both
      ports. While initially set to 'true', this sticky boolean is toggled to
      the 'false' state during a lookup on any next ioctl performed on a device
      with a matching WWN/WWID. The code fails to realize that the match could
      in fact be the same device calling in again. From here, an assumption is
      made that any LUN with 'newly_created' set to 'false' is configured for
      access over both ports and the port selection mask is set to reflect this.
      Any future attempts to use this LUN for hosting a virtual LUN will result
      in the port LUN table being incorrectly programmed.
      
      As a remedy, the 'newly_created' concept was removed entirely and replaced
      with code that always constructs the port selection mask based upon the
      SCSI channel of the LUN being accessed. The bits remain sticky, therefore
      allowing for a device to be accessed over both ports when that is in fact
      the correct physical configuration.
      
      Also included in this commit are a few minor related changes to enhance
      the fix and provide better debug information for port selection mask and
      port LUN table bugs in the future. These include renaming refresh_local()
      to lookup_local(), tracing the WWN/WWID as a big-endian entity, and
      tracing the port selection mask, SCSI channel, and LUN ID each time the
      port LUN table is programmed.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      1a47401b
    • M
      cxlflash: Fix to escalate to LINK_RESET on login timeout · e6e6df3f
      Manoj Kumar 提交于
      A 'login timed out' asynchronous error interrupt is generated if no
      response is seen to a FLOGI within 2 seconds.  If the time out error
      is not escalated to a LINK_RESET the port will not be available for
      use. This fix provides the required escalation.
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: NBrian King <brking@linux.vnet.ibm.com>
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      e6e6df3f
    • M
      cxlflash: Fix to avoid leaving dangling interrupt resources · ee3491ba
      Matthew R. Ochs 提交于
      When running with an unsupported AFU, the cxlflash driver fails
      the probe. When the driver is removed, the following Oops is
      encountered on a show_interrupts() thread:
      
      Call Trace:
      [c000001fba5a7a10] [0000000000000003] 0x3 (unreliable)
      [c000001fba5a7a60] [c00000000053dcf4] vsnprintf+0x204/0x4c0
      [c000001fba5a7ae0] [c00000000030045c] seq_vprintf+0x5c/0xd0
      [c000001fba5a7b20] [c00000000030051c] seq_printf+0x4c/0x60
      [c000001fba5a7b50] [c00000000013e140] show_interrupts+0x370/0x4f0
      [c000001fba5a7c10] [c0000000002ff898] seq_read+0xe8/0x530
      [c000001fba5a7ca0] [c00000000035d5c0] proc_reg_read+0xb0/0x110
      [c000001fba5a7cf0] [c0000000002ca74c] __vfs_read+0x6c/0x180
      [c000001fba5a7d90] [c0000000002cb464] vfs_read+0xa4/0x1c0
      [c000001fba5a7de0] [c0000000002cc51c] SyS_read+0x6c/0x110
      [c000001fba5a7e30] [c000000000009204] system_call+0x38/0xb4
      
      The Oops is due to not cleaning up correctly on the unsupported
      AFU error path, leaving various allocated and registered resources.
      In this case, interrupts are in a semi-allocated/registered state,
      which the show_interrupts() thread attempts to use.
      
      To fix, the cleanup logic in init_afu() is consolidated to error
      gates at the bottom of the function and the appropriate goto is
      added to each error path. As a mini side fix while refactoring
      in this routine, the else statement following the AFU version
      evaluation is eliminated as it is not needed.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      ee3491ba
    • M
      cxlflash: Fix to avoid potential deadlock on EEH · aacb4ff6
      Matthew R. Ochs 提交于
      Ioctl threads that use scsi_execute() can run for an excessive amount
      of time due to the fact that they have lengthy timeouts and retry logic
      built in. Under normal operation this is not an issue. However, once EEH
      enters the picture, a long execution time coupled with the possibility
      that a timeout can trigger entry to the driver via registered reset
      callbacks becomes a liability.
      
      In particular, a deadlock can occur when an EEH event is encountered
      while in running in scsi_execute(). As part of the recovery, the EEH
      handler drains all currently running ioctls, waiting until they have
      completed before proceeding with a reset. As the scsi_execute()'s are
      situated on the ioctl path, the EEH handler will wait until they (and
      the remainder of the ioctl handler they're associated with) have
      completed. Normally this would not be much of an issue aside from the
      longer recovery period. Unfortunately, the scsi_execute() triggers a
      reset when it times out. The reset handler will see that the device is
      already being reset and wait until that reset completed. This creates
      a condition where the EEH handler becomes stuck, infinitely waiting for
      the ioctl thread to complete.
      
      To avoid this behavior, temporarily unmark the scsi_execute() threads
      as an ioctl thread by releasing the ioctl read semaphore. This allows
      the EEH handler to proceed with a recovery while the thread is still
      running. Once the scsi_execute() returns, the ioctl read semaphore is
      reacquired and the adapter state is rechecked in case it changed while
      inside of scsi_execute(). The state check will wait if the adapter is
      still being recovered or returns a failure if the recovery failed. In
      the event that the adapter reset failed, the failure is simply returned
      as the ioctl would be unable to continue.
      Reported-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NBrian King <brking@linux.vnet.ibm.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      aacb4ff6
    • M
      cxlflash: Correct trace string · fa3f2c6e
      Matthew R. Ochs 提交于
      The trace following the failure of alloc_mem() incorrectly identifies
      which function failed. This can lead to misdiagnosing a failure.
      
      Fix the string to correctly indicate that alloc_mem() failed.
      Reported-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NBrian King <brking@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      fa3f2c6e
    • M
      cxlflash: Fix to avoid corrupting adapter fops · 17ead26f
      Matthew R. Ochs 提交于
      The fops owned by the adapter can be corrupted in certain scenarios,
      opening a window where certain fops are temporarily NULLed before being
      reset to their proper value. This can potentially lead software to make
      incorrect decisions, leaving the user with the inability to function as
      intended.
      
      An example of this behavior can be observed when there are a number of
      users with a high rate of turn around (attach to LUN, perform an I/O,
      detach from LUN, repeat). Every so often a user is given a valid
      context and adapter file descriptor, but the file associated with the
      descriptor lacks the correct read permission bit (FMODE_CAN_READ) and
      thus the read system call bails before calling the valid read fop.
      
      Background:
      
      The fops is stored in the adapter structure to provide the ability to
      lookup the adapter structure from within the fop handler. CXL services
      use the file's private_data and at present, the CXL context does not
      have a private section. In an effort to limit areas of the cxlflash
      driver with code specific the superpipe function, a design choice was
      made to keep the details of the fops situated away from the legacy
      portions of the driver. This drove the behavior that the adapter fops
      is set at the beginning of the disk attach ioctl handler when there
      are no users present.
      
      The corruption that this fix remedies is due to the fact that the fops
      is initially defaulted to values found within a static structure. When
      the fops is handed down to the CXL services later in the attach path,
      certain services are patched. The fops structure remains correct until
      the user count drops to 0 and the fops is reset, triggering the process
      to repeat again. The user counts are tightly coupled with the creation
      and deletion of the user context. If multiple users perform a disk
      attach at the same time, when the user count is currently 0, some users
      can be in the middle of obtaining a file descriptor and have not yet
      reached the context creation code that [in addition to creating the
      context] increments the user count. Subsequent users coming in to
      perform the attach see that the user count is still 0, and reinitialize
      the fops, temporarily removing the patched fops. The users that are in
      the middle obtaining their file descriptor may then receive an invalid
      descriptor.
      
      The fix simply removes the user count altogether and moves the fops
      initialization to probe time such that it is only performed one time
      for the life of the adapter. In the future, if the CXL services adopt
      a private member for their context, that could be used to store the
      adapter structure reference and cxlflash could revert to a model that
      does not require an embedded fops.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NBrian King <brking@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      17ead26f
    • M
      cxlflash: Fix to double the delay each time · b22b4037
      Manoj Kumar 提交于
      The operator used to double the master context response delay
      is incorrect and does not result in delay doubling.
      
      To fix, use a left shift instead of the XOR operator.
      Reported-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NBrian King <brking@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      b22b4037
    • M
      cxlflash: Fix to prevent stale AFU RRQ · af10483e
      Matthew R. Ochs 提交于
      Following an adapter reset, the AFU RRQ that resides in host memory
      holds stale data. This can lead to a condition where the RRQ interrupt
      handler tries to process stale entries and/or endlessly loops due to an
      out of sync generation bit.
      
      To fix, the AFU RRQ in host memory needs to be cleared after each reset.
      Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NBrian King <brking@linux.vnet.ibm.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Odin.com>
      af10483e