提交 · d4ace35166e55e73afe72a05d166342996063d35 · openanolis / cloud-kernel

01 12月, 2016 11 次提交

scsi: cxlflash: Cleanup send_tmf() · d4ace351

由 Matthew R. Ochs 提交于 11月 28, 2016

The send_tmf() routine includes some copy/paste cruft that can be
removed as well as the setting of an AFU command-specific while
holding the tmf_slock. While not a bug, it is out of place and
should be shifted down alongside the other command initialization
statements for clarity.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d4ace351

scsi: cxlflash: Remove AFU command lock · 9ba848ac

由 Matthew R. Ochs 提交于 11月 28, 2016

The original design of the cxlflash driver required AFU commands
to convey state information across multiple threads. The IOASA
"host use" byte was used to track if a command was done, errored,
or timed out. A per-command spin lock was used to serialize access
to this byte. As this is no longer required with the introduction
of completions and various refactoring over time, the spin lock,
state tracking, and associated code can be removed. To support the
simplification, the wait_resp() routine is refactored to return a
success or failure. Additionally, as the simplification to the
AFU internal command routine, explicit assignments of AFU command
fields to zero are removed as the memory is zeroed upon allocation.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9ba848ac

scsi: cxlflash: Wait for active AFU commands to timeout upon tear down · de01283b

由 Matthew R. Ochs 提交于 11月 28, 2016

With the removal of the static private command pool, the ability to
'complete' outstanding commands was lost. While not an issue for the
commands originating outside the driver, internal AFU commands are
synchronous and therefore have a timeout associated with them. To
avoid a stale memory access, the tear down sequence needs to ensure
that there are not any active commands before proceeding. As these
internal AFU commands are rare events, the simplest way to accomplish
this is detecting the activity and waiting for it to timeout.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

de01283b

scsi: cxlflash: Remove private command pool · 25bced2b

由 Matthew R. Ochs 提交于 11月 28, 2016

Clean up and remove the remaining private command pool infrastructure
that is no longer required.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

25bced2b

scsi: cxlflash: Use cmd_size for private commands · 5fbb96c8

由 Matthew R. Ochs 提交于 11月 28, 2016

Instead of using a private pool of AFU commands, use cmd_size to prime
the private pool of SCSI commands such that they are allocated with a
size large enough to contain an aligned AFU command. Use scsi_cmd_priv()
to derive the aligned/zeroed private command on queuecommand and TMF
paths. Remove cmd_checkout() as it is no longer required. The remaining
AFU private command infrastructure will be removed in a cleanup commit.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5fbb96c8

scsi: cxlflash: Allocate memory instead of using command pool for AFU sync · 350bb478

由 Matthew R. Ochs 提交于 11月 28, 2016

As staging for the removal of the AFU command pool, remove the reliance
upon the pool for the internal AFU sync command. Instead of obtaining an
AFU command from the pool, dynamically allocate memory with the appropriate
alignment requirements. Since the AFU sync service is only executed from
the process environment, blocking is acceptable.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

350bb478

scsi: cxlflash: Remove unused buffer from AFU command · e7ab2d40

由 Matthew R. Ochs 提交于 11月 28, 2016

The cxlflash driver originally required a per-command 4K buffer that
hosted data passed to the AFU. When the routines that initiate AFU
and internal SCSI commands were refactored to use scsi_execute(), the
need for this buffer became obsolete. As it is no longer necessary,
the buffer is removed.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e7ab2d40

scsi: cxlflash: Avoid command room violation · 11f7b184

由 Uma Krishnan 提交于 11月 28, 2016

During test, a command room violation interrupt is occasionally seen
for the master context when the CXL flash devices are stressed.

After studying the code, there could be gaps in the way command room
value is being cached in cxlflash. When the cached command room is zero
the thread attempting to send becomes burdened with updating the cached
value with the actual value from the AFU. Today, this is handled with an
atomic set operation of the raw value read. Following the atomic update,
the thread proceeds to send.

This behavior is incorrect on two counts:

   - The update fails to take into account the current thread and its
     consumption of one of the hardware commands.

   - The update does not take into account other threads also atomically
     updating. Per design, a worker thread updates the cached value when a
     send thread times out. By not protecting the update with a lock, the
     cached value can be incorrectly clobbered.

To correct these issues, the update of the cached command room has been
simplified and also protected using a spin lock which is held until the
MMIO is complete. This ensures the command room is properly consumed by
the same thread. Update of cached value also takes into account the
current thread consuming a hardware command.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

11f7b184

scsi: cxlflash: Improve context_reset() logic · 3d2f617d

由 Uma Krishnan 提交于 11月 28, 2016

Currently, the context reset routine waits for command room to
be available before sending the reset request. Per review of the
SISLite specification and clarifications from the CXL Flash AFU
designers, this wait is unnecessary. The reset request can be
sent anytime regardless of command room, so long as only a single
reset request is active at any one point in time.

This commit simplifies the reset routine by removing the wait for
command room. Additionally it adds a debug trace to help pinpoint
hardware errors when a context reset does not complete.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3d2f617d

scsi: cxlflash: Fix crash in cxlflash_restore_luntable() · 8a260543

由 Uma Krishnan 提交于 11月 28, 2016

During test, the following crash was observed:

[34538.981505] Faulting instruction address: 0xd000000007c9c870
cpu 0x9: Vector: 300 (Data Access) at [c0000007f1e8f590]
    pc: d000000007c9c870: cxlflash_restore_luntable+0x70/0x1d0 [cxlflash]
    lr: d000000007c9c84c: cxlflash_restore_luntable+0x4c/0x1d0 [cxlflash]
    sp: c0000007f1e8f810
   msr: 9000000100009033
   dar: c00000171d637438
 dsisr: 40000000
  current = 0xc0000007f1e43f90
  paca    = 0xc000000007b25100   softe: 0        irq_happened: 0x01
    pid   = 493, comm = eehd
enter ? for help
[c0000007f1e8f8a0] d000000007c940b0 init_afu+0xd60/0x1200 [cxlflash]
[c0000007f1e8f9a0] d000000007c945a8 cxlflash_pci_slot_reset+0x58/0xe0 [cxlflash]
[c0000007f1e8fa20] d00000000715f790 cxl_pci_slot_reset+0x230/0x340 [cxl]
[c0000007f1e8fae0] c000000000040dd4 eeh_report_reset+0x144/0x180
[c0000007f1e8fb20] c00000000003f708 eeh_pe_dev_traverse+0x98/0x170
[c0000007f1e8fbb0] c000000000041618 eeh_handle_normal_event+0x328/0x410
[c0000007f1e8fc30] c000000000041db8 eeh_handle_event+0x178/0x330
[c0000007f1e8fce0] c000000000042118 eeh_event_handler+0x1a8/0x1b0
[c0000007f1e8fd80] c00000000011420c kthread+0xec/0x100
[c0000007f1e8fe30] c00000000000a47c ret_from_kernel_thread+0x5c/0xe0

When superpipe mode is disabled for a LUN, the references for the
local lun are deleted but the LUN is still identified as being present
in the LUN table. This mismatched state can result in the above crash
when the LUN table is restored during an error recovery operation.

To fix this issue, the local LUN information structure is updated to
reflect the LUN is no longer in the LUN table once all references to
the LUN are gone.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8a260543

scsi: cxlflash: Set sg_tablesize to 1 instead of SG_NONE · 68ab2d76

由 Uma Krishnan 提交于 11月 28, 2016

The following Oops is encountered when blk_mq is enabled with the
cxlflash driver:

[ 2960.817172] Oops: Kernel access of bad area, sig: 11 [#5]
[ 2960.817309] NIP  __blk_mq_run_hw_queue+0x278/0x4c0
[ 2960.817313] LR __blk_mq_run_hw_queue+0x2bc/0x4c0
[ 2960.817314] Call Trace:
[ 2960.817320] __blk_mq_run_hw_queue+0x2bc/0x4c0 (unreliable)
[ 2960.817324] blk_mq_run_hw_queue+0xd8/0x100
[ 2960.817329] blk_mq_insert_requests+0x14c/0x1f0
[ 2960.817333] blk_mq_flush_plug_list+0x150/0x190
[ 2960.817338] blk_flush_plug_list+0x11c/0x2b0
[ 2960.817344] blk_finish_plug+0x58/0x80
[ 2960.817348] __do_page_cache_readahead+0x1c0/0x2e0
[ 2960.817352] force_page_cache_readahead+0x68/0xd0
[ 2960.817356] generic_file_read_iter+0x43c/0x6a0
[ 2960.817359] blkdev_read_iter+0x68/0xa0
[ 2960.817361] __vfs_read+0x11c/0x180
[ 2960.817364] vfs_read+0xa4/0x1c0
[ 2960.817366] SyS_read+0x6c/0x110
[ 2960.817369] system_call+0x38/0xb4

The SCSI blk_mq stack assumes that sg_tablesize is always a non-zero
value with scsi_mq_setup_tags() allocating tags using sg_tablesize.
The cxlflash driver currently uses SG_NONE (0) for the sg_tablesize
as the devices it supports are not capable of scatter gather. This
mismatch of values results in the Oops above.

To resolve this issue, sg_tablesize for cxlflash can simply be set
to 1, a value which satisfies the constraints in cxlflash and the
lack of support of SG_NONE in SCSI blk_mq.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

68ab2d76

15 9月, 2016 4 次提交

scsi: cxlflash: Fix context reference tracking on detach · c4a11827

由 Matthew R. Ochs 提交于 9月 02, 2016

Commit 888baf06 ("scsi: cxlflash: Add kref to context") introduced a
kref to the context. In particular, the detach routine was updated to
use the kref services for managing the removal and destruction of a
context.

As part of this change, the tracking mechanism internal to the detach
handler was refactored. This introduced a bug that can cause the
tracking state to be lost. This can lead to a situation where exclusive
access to a context is prematurely [and unknowingly] relinquished for
the executing thread.

To remedy, only update the tracking state when the kref operation
indicates the context was removed.

Fixes: 888baf06 ("scsi: cxlflash: Add kref to context")
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c4a11827

scsi: cxlflash: Refactor WWPN setup · f8013261

由 Matthew R. Ochs 提交于 9月 02, 2016

Commit 964497b3 ("cxlflash: Remove dual port online dependency")
logically removed the ability for the WWPN setup routine afu_set_wwpn()
to return a non-success value. This routine can safely be made a void to
simplify the code as there is no longer a need to report a failure.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f8013261

scsi: cxlflash: Improve EEH recovery time · 05dab432

由 Matthew R. Ochs 提交于 9月 02, 2016

When an EEH occurs during device initialization, the port timeout logic
can cause excessive delays as MMIO reads will fail. Depending on where
they are experienced, these delays can lead to a prolonged reset,
causing an unnecessary triggering of other timeout logic in the SCSI
stack or user applications.

To expedite recovery, the port timeout logic is updated to decay the
timeout at a much faster rate when in the presence of a likely EEH
frozen event.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

05dab432

scsi: cxlflash: Fix to avoid EEH and host reset collisions · 1d3324c3

由 Matthew R. Ochs 提交于 9月 02, 2016

The EEH reset handler is ignorant to the current state of the driver
when processing a frozen event and initiating a device reset. This can
be an issue if an EEH event occurs while a user or stack initiated reset
is executing. More specifically, if an EEH occurs while the SCSI host
reset handler is active, the reset initiated by the EEH thread will
likely collide with the host reset thread. This can leave the device in
an inconsistent state, or worse, cause a system crash.

As a remedy, the EEH handler is updated to evaluate the device state and
take appropriate action (proceed, wait, or disconnect host). The host
reset handler is also updated to handle situations where an EEH occurred
during a host reset. In such situations, the host reset handler will
delay reporting back a success to give the EEH reset an opportunity to
complete.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1d3324c3

09 9月, 2016 2 次提交

scsi: cxlflash: Remove the device cleanly in the system shutdown path · babf985d

由 Uma Krishnan 提交于 9月 02, 2016

Commit 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash
cards") was recently introduced to notify the AFU when a system is going
down. Due to the position of the cxlflash driver in the device stack,
cxlflash devices are _always_ removed during a reboot/shutdown. This can
lead to a crash if the cxlflash shutdown hook is invoked _after_ the
shutdown hook for the owning virtual PHB. Furthermore, the current
implementation of shutdown/remove hooks for cxlflash are not tolerant to
being invoked when the device is not enabled. This can also lead to a
crash in situations where the remove hook is invoked after the device
has been removed via the vPHBs shutdown hook. An example of this
scenario would be an EEH reset failure while a reboot/shutdown is in
progress.

To solve both problems, the shutdown hook for cxlflash is updated to
simply remove the device. This path already includes the AFU
notification and thus this solution will continue to perform the
original intent. At the same time, the remove hook is updated to protect
against being called when the device is not enabled.

Fixes: 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash
cards")
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

babf985d

scsi: cxlflash: Scan host only after the port is ready for I/O · bbbfae96

由 Uma Krishnan 提交于 9月 02, 2016

When a port link is established, the AFU sends a 'link up' interrupt.
After the link is up, corresponding initialization steps are performed
on the card. Following that, when the card is ready for I/O, the AFU
sends 'login succeeded' interrupt. Today, cxlflash invokes
scsi_scan_host() upon receipt of both interrupts.

SCSI commands sent to the port prior to the 'login succeeded' interrupt
will fail with 'port not available' error. This is not desirable.
Moreover, when async_scan is active for the host, subsequent scan calls
are terminated with error. Due to this, the scsi_scan_host() call
performed after 'login succeeded' interrupt could portentially return
error and the devices may not be scanned properly.

To avoid this problem, scsi_scan_host() should be called only after the
'login succeeded' interrupt.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

bbbfae96

24 8月, 2016 2 次提交

scsi: cxlflash: Remove adapter file descriptor cache · de9f0b0c

由 Matthew R. Ochs 提交于 8月 09, 2016

The adapter file descriptor was previously cached within the kernel for
a given context in order to support performing a close on behalf of an
application. This is no longer needed as applications are now required
to perform a close on the adapter file descriptor.
Inspired-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

de9f0b0c

scsi: cxlflash: Transition to application close model · cd34af40

由 Matthew R. Ochs 提交于 8月 09, 2016

Caching the adapter file descriptor and performing a close on behalf of
an application is a poor design. This is due to the fact that once a
file descriptor in installed, it is free to be altered without the
knowledge of the cxlflash driver. This can lead to inconsistencies
between the application and kernel. Furthermore, the nature of the
former design is more exploitable and thus should be abandoned.

To support applications performing a close on the adapter file that is
associated with a context, a new flag is introduced to the user API to
indicate to applications that they are responsible for the close
following the cleanup (detach) of a context. The documentation is also
updated to reflect this change in behavior.
Inspired-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

cd34af40

19 8月, 2016 3 次提交

scsi: cxlflash: Add kref to context · 888baf06

由 Matthew R. Ochs 提交于 8月 09, 2016

Currently, context user references are tracked via the list of LUNs that
have attached to the context. While convenient, this is not intuitive
without a deep study of the code and is inconsistent with the existing
reference tracking patterns within the kernel. This design choice can
lead to future bug injection.

To improve code comprehension and better protect against future bugs,
add explicit reference counting to contexts and migrate the context
removal code to the kref release handler.
Inspired-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

888baf06

scsi: cxlflash: Cache owning adapter within context · 44ef38f9

由 Matthew R. Ochs 提交于 8月 09, 2016

The context removal routine requires access to the owning adapter
structure to reset the context within the AFU as part of the tear down
sequence. In order to support kref adoption, the owning adapter must be
accessible from the release handler. As the kref framework only provides
the kref reference as the sole parameter, another means is needed to
derive the owning adapter.

As a remedy, the owning adapter reference is saved off within the
context during initialization.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

44ef38f9

scsi: cxlflash: Avoid mutex when destroying context · 41b99e1a

由 Matthew R. Ochs 提交于 8月 09, 2016

Context information structures are protected by a mutex that is held
when accessing/manipulating the context. When the code that manages
these structures was authored, a decision was made to include taking the
mutex as part of the allocation/initialization sequence and also handle
the scenario where the mutex was already held when freeing the context.

While not a problem outright, this design decision has been deemed as
too flexible and the code should be made more rigid to avoid future
bugs.  In addition, further review of the code yields that the existing
mutex manipulations in both of these context management paths are
superfluous.

This commit removes the obtaining of the context mutex in the context
initialization routine and assumes the mutex is not held in the context
free path.
Inspired-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

41b99e1a

27 7月, 2016 1 次提交

cxlflash: Verify problem state area is mapped before notifying shutdown · 1bd2b282

由 Uma Krishnan 提交于 7月 21, 2016

If an EEH or some other hard error occurs while the adapter instance was
being initialized, on the subsequent shutdown of the device, the system
could crash with:

[c000000f1da03b60] c0000000005eccfc pci_device_shutdown+0x6c/0x100
[c000000f1da03ba0] c0000000006d67d4 device_shutdown+0x1b4/0x2c0
[c000000f1da03c40] c0000000000ea30c kernel_restart_prepare+0x5c/0x80
[c000000f1da03c70] c0000000000ea48c kernel_restart+0x2c/0xc0
[c000000f1da03ce0] c0000000000ea970 SyS_reboot+0x1c0/0x2d0
[c000000f1da03e30] c000000000009204 system_call+0x38/0xb4

This crash is due to the AFU not being mapped when the shutdown
notification routine is called and is a regression that was inserted
recently with Commit 704c4b0d ("cxlflash: Shutdown notify support
for CXL Flash cards").

As a fix, shutdown notification should only occur when the AFU is
mapped.

Fixes: 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash cards")
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1bd2b282

19 7月, 2016 1 次提交

cxl: remove dead Kconfig options · 1e44727a

由 Andrew Donnellan 提交于 7月 18, 2016

Remove the CXL_KERNEL_API and CXL_EEH Kconfig options, as they were only
needed to coordinate the merging of the cxlflash driver. Also remove the
stub implementation of cxl_perst_reloads_same_image() in cxlflash which is
only used if CXL_EEH isn't defined (i.e. never).
Suggested-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1e44727a

13 7月, 2016 3 次提交

cxlflash: Shutdown notify support for CXL Flash cards · 704c4b0d

由 Uma Krishnan 提交于 6月 15, 2016

Some CXL Flash cards need notification of device shutdown in order to
flush pending I/Os.

A PCI notification hook for shutdown has been added where the driver
notifies the card and returns. When the device is removed in the PCI
remove path, notification code will wait for shutdown processing to
complete.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

704c4b0d

cxlflash: Add device dependent flags · 96e1b660

由 Uma Krishnan 提交于 6月 15, 2016

Device dependent flags are needed to support functions that are specific
to a particular device.

One such case is - some CXL Flash cards need to be notified of device
shutdown. For other CXL devices, this feature does not prove to be
useful yet. Such distinct features need to be identified in the driver
to bypass or invoke specific functionality.

In this patch, a member 'flags' has been added to device dependent
values. These flags will be used and expanded in the future to support
various device specific functions.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

96e1b660

cxlflash: Fix to drain operations from previous reset · f411396d

由 Manoj N. Kumar 提交于 6月 15, 2016

While running 'sg_reset -H' in a loop with a user-space application active,
hit the following exception:

cpu 0x2: Vector: 300 (Data Access)
    pc: : afu_attach+0x50/0x240 [cxlflash]
    lr: : cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash]
    pid   = 20365, comm = run_block_fvt

Linux version 4.5.0-491-26f710d+

cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash]
cxlflash_ioctl+0x5a8/0x6f0 [cxlflash]
scsi_ioctl+0x3b0/0x4c0
sd_ioctl+0x110/0x190
blkdev_ioctl+0x28c/0xc20
block_ioctl+0xa4/0xd0
do_vfs_ioctl+0xd8/0x8c0
SyS_ioctl+0xd4/0xf0
system_call+0x38/0xb4

The problem here is that the problem space area is unmapped while the
application issues the DK_CXLFLASH_RECOVER_AFU ioctl.

This is the order I observe:

proc1				proc2
1) sg_reset
				2) ioctl(DK_CXLFLASH_RECOVER_AFU)
3) sg_reset again
   causing a PSA unmap
				4) continues RECOVER_AFU processing

The resolution to this problem is to have the reset handler drain all
outstanding user space initiated ioctls before proceeding.  It is safe
to drain after the state has been changed to STATE_RESET. Also since
drain_ioctls() was static, it had to be moved up a bit to be before
cxlflash_eh_host_reset_handler().
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f411396d

06 5月, 2016 1 次提交

cxlflash: Fix to resolve dead-lock during EEH recovery · 635f6b08

由 Manoj N. Kumar 提交于 5月 03, 2016

When a cxlflash adapter goes into EEH recovery and multiple processes
(each having established its own context) are active, the EEH recovery
can hang if the processes attempt to recover in parallel. The symptom
logged after a couple of minutes is:

INFO: task eehd:48 blocked for more than 120 seconds.
Not tainted 4.5.0-491-26f710d+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
eehd            0    48      2
Call Trace:
__switch_to+0x2f0/0x410
__schedule+0x300/0x980
schedule+0x48/0xc0
rwsem_down_write_failed+0x294/0x410
down_write+0x88/0xb0
cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash]
cxl_vphb_error_detected+0x88/0x110 [cxl]
cxl_pci_error_detected+0xb0/0x1d0 [cxl]
eeh_report_error+0xbc/0x130
eeh_pe_dev_traverse+0x94/0x160
eeh_handle_normal_event+0x17c/0x450
eeh_handle_event+0x184/0x370
eeh_event_handler+0x1c8/0x1d0
kthread+0x110/0x130
ret_from_kernel_thread+0x5c/0xa4
INFO: task blockio:33215 blocked for more than 120 seconds.

Not tainted 4.5.0-491-26f710d+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blockio         0 33215  33213
Call Trace:
0x1 (unreliable)
__switch_to+0x2f0/0x410
__schedule+0x300/0x980
schedule+0x48/0xc0
rwsem_down_read_failed+0x124/0x1d0
down_read+0x68/0x80
cxlflash_ioctl+0x70/0x6f0 [cxlflash]
scsi_ioctl+0x3b0/0x4c0
sg_ioctl+0x960/0x1010
do_vfs_ioctl+0xd8/0x8c0
SyS_ioctl+0xd4/0xf0
system_call+0x38/0xb4
INFO: task eehd:48 blocked for more than 120 seconds.

The hang is because of a 3 way dead-lock:

Process A holds the recovery mutex, and waits for eehd to complete.
Process B holds the semaphore and waits for the recovery mutex.
eehd waits for semaphore.

The fix is to have Process B above release the semaphore before
attempting to acquire the recovery mutex. This will allow
eehd to proceed to completion.
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

635f6b08

29 3月, 2016 2 次提交

cxlflash: Move to exponential back-off when cmd_room is not available · ea765431

由 Manoj N. Kumar 提交于 3月 25, 2016

While profiling the cxlflash_queuecommand() path under a heavy load it
was found that number of retries to find cmd_room was fairly high.

There are two problems with the current back-off:
a) It starts with a udelay of 0
b) It backs-off linearly

Tried several approaches (a higher multiple 10*n, 100*n, as well as n^2,
2^n) and found that the exponential back-off(2^n) approach had the least
overall cost. Cost as being defined as overall time spent waiting.

The fix is to change the linear back-off to an exponential back-off.
This solution also takes care of the problem with the initial
delay (starts with 1 usec).
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ea765431

cxlflash: Fix regression issue with re-ordering patch · 9526f360

由 Manoj N. Kumar 提交于 3月 25, 2016

While running 'sg_reset -H' back to back the following exception was seen:

[  735.115695] Faulting instruction address: 0xd0000000098c0864
cpu 0x0: Vector: 300 (Data Access) at [c000000ffffafa80]
    pc: d0000000098c0864: cxlflash_async_err_irq+0x84/0x5c0 [cxlflash]
    lr: c00000000013aed0: handle_irq_event_percpu+0xa0/0x310
    sp: c000000ffffafd00
   msr: 9000000000009033
   dar: 2010000
 dsisr: 40000000
  current = 0xc000000001510880
  paca    = 0xc00000000fb80000   softe: 0        irq_happened: 0x01
    pid   = 0, comm = swapper/0

Linux version 4.5.0-491-26f710d+

enter ? for help
[c000000ffffafe10] c00000000013aed0 handle_irq_event_percpu+0xa0/0x310
[c000000ffffafed0] c00000000013b1a8 handle_irq_event+0x68/0xc0
[c000000ffffaff00] c0000000001404ec handle_fasteoi_irq+0xec/0x2a0
[c000000ffffaff30] c00000000013a084 generic_handle_irq+0x54/0x80
[c000000ffffaff60] c000000000011130 __do_irq+0x80/0x1d0
[c000000ffffaff90] c000000000024d40 call_do_irq+0x14/0x24
[c000000001573a20] c000000000011318 do_IRQ+0x98/0x140
[c000000001573a70] c000000000002594 hardware_interrupt_common+0x114/0x180

This exception is being hit because the async_err interrupt path performs
an MMIO to read the interrupt status register. The MMIO region in this
case is not available.

Commit 6ded8b3c ("cxlflash: Unmap problem state area before detaching
master context") re-ordered the sequence in which term_mc() and stop_afu()
are called. This introduces a window for interrupts to come in with the
problem space area unmapped, that did not exist previously.

The fix is to separate the disabling of all AFU interrupts to a distinct
function, term_intr() so that it is the first thing that is done in the
tear down process.

To keep the initialization process symmetric, separate the AFU interrupt
setup also to a distinct function: init_intr().

Fixes: 6ded8b3c ("cxlflash: Unmap problem state area before detaching master context")
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9526f360

09 3月, 2016 8 次提交

cxlflash: Use new cxl_pci_read_adapter_vpd() API · ca946d4e

由 Frederic Barrat 提交于 3月 04, 2016

To read the adapter VPD, drivers can't rely on pci config APIs, as it
wouldn't work on powerVM. cxl introduced a new kernel API especially
for this, so start using it.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ca946d4e

cxlflash: Increase cmd_per_lun for better throughput · 83430833

由 Manoj N. Kumar 提交于 3月 04, 2016

With the current value of cmd_per_lun at 16, the throughput
over a single adapter is limited to around 150kIOPS.

Increase the value of cmd_per_lun to 256 to improve
throughput. With this change a single adapter is able to
attain close to the maximum throughput (380kIOPS).
Also change the number of RRQ entries that can be queued.
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

83430833

cxlflash: Fix to avoid unnecessary scan with internal LUNs · 603ecce9

由 Manoj N. Kumar 提交于 3月 04, 2016

When switching to the internal LUN defined on the
IBM CXL flash adapter, there is an unnecessary
scan occurring on the second port. This scan leads
to the following extra lines in the log:

Dec 17 10:09:00 tul83p1 kernel: [ 3708.561134] cxlflash 0008:00:00.0: cxlflash_queuecommand: (scp=c0000000fc1f0f00) 11/1/0/0 cdb=(A0000000-00000000-10000000-00000000)
Dec 17 10:09:00 tul83p1 kernel: [ 3708.561147] process_cmd_err: cmd failed afu_rc=32 scsi_rc=0 fc_rc=0 afu_extra=0xE, scsi_extra=0x0, fc_extra=0x0

By definition, both of the internal LUNs are on the first port/channel.

When the lun_mode is switched to internal LUN the
same value for host->max_channel is retained. This
causes an unnecessary scan over the second port/channel.

This fix alters the host->max_channel to 0 (1 port), if internal
LUNs are configured and switches it back to 1 (2 ports) while
going back to external LUNs.
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

603ecce9

cxlflash: Reorder user context initialization · 5d1952ac

由 Uma Krishnan 提交于 3月 04, 2016

In order to support cxlflash in the PowerVM environment, underlying
hypervisor APIs have imposed a kernel API ordering change.

For the superpipe access to LUN, user applications need a context.
The cxlflash module creates this context by making a sequence of
cxl calls. In the current code, a context is initialized via
cxl_dev_context_init() followed by cxl_process_element(), a function
that obtains the process element id. Finally, cxl_start_work()
is called to attach the process element.

In the PowerVM environment, a process element id cannot be obtained
from the hypervisor until the process element is attached. The
cxlflash module is unable to create contexts without a valid
process element id.

To fix this problem, cxl_start_work() is called before obtaining
the process element id.
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5d1952ac

cxlflash: Simplify attach path error cleanup · 8a96b52a

由 Matthew R. Ochs 提交于 3月 04, 2016

The cxlflash_disk_attach() routine currently uses a cascading error
gate strategy for its error cleanup path. While this strategy is
commonly used to handle cleanup scenarios, it is too restrictive when
function callouts need to be restructured. Problems range from
inserting error path bugs in previously 'good' code to the cleanup
path imposing design changes to how the normal path is structured.
A less restrictive approach is needed to support ordering changes
that come about when operating in different environments.

To overcome this restriction, the error cleanup path is modified to
have a single entrypoint and use conditional logic to cleanup where
necessary. Entities that require multiple cleanup steps must be
carefully vetted to ensure their APIs support state. In cases where
they do not (none as of this commit) additional local variables can
be used to maintain state on their behalf.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8a96b52a

cxlflash: Split out context initialization · 5e6632d1

由 Matthew R. Ochs 提交于 3月 04, 2016

Presently, context information structures are allocated and
initialized in the same routine, create_context(). This imposes
an ordering restriction such that all pieces of information needed
to initialize a context must be known before the context is even
allocated.

This design point is not flexible when the order of context
creation needs to be modified. Specifically, this can lead to
problems when members of the context information structure are
a part of an ordering dependency (i.e. - the 'work' structure
embedded within the context).

To remedy, the allocation is left as-is, inside of the existing
create_context() routine and the initialization is transitioned
to a new void routine, init_context(). At the same time, in
anticipation of these routines not being called in sequence, a
state boolean is added to the context information structure to
track when the context has been initilized. The context teardown
routine, destroy_context(), is modified to support being called
with a non-initialized context.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5e6632d1

cxlflash: Unmap problem state area before detaching master context · 6ded8b3c

由 Uma Krishnan 提交于 3月 04, 2016

When operating in the PowerVM environment, the cxlflash module can
receive an error from the hypervisor indicating that there are
existing mappings in the page table for the process MMIO space.

This issue exists because term_afu() currently invokes term_mc()
before stop_afu(), allowing for the master context to be detached
first and the problem state area to be unmapped second.

To resolve this issue, stop_afu() should be called before term_mc().
Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6ded8b3c

cxlflash: Simplify PCI registration · 961487e4

由 Manoj N. Kumar 提交于 3月 04, 2016

The calls to pci_request_regions(), pci_resource_start(),
pci_set_dma_mask(), pci_set_master() and pci_save_state() are all
unnecessary for the IBM CXL flash adapter since data buffers
are not required to be mapped to the device's memory.

The use of services such as pci_set_dma_mask() are problematic on
hypervisor managed systems as the IBM CXL flash adapter is operating
under a virtual PCI Host Bridge (virtual PHB) which does not support
these services.

cxlflash 0001:00:00.0: init_pci: Failed to set PCI DMA mask rc=-5

The resolution is to simplify init_pci(), to a point where it does the
bare minimum (pci_enable_device). Similarly, remove the call the
pci_release_regions() from cxlflash_remove().
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

961487e4

07 1月, 2016 2 次提交

cxlflash: Enable device id for future IBM CXL adapter · a2746fb1

由 Manoj Kumar 提交于 12月 14, 2015

This drop enables a future card with a device id of 0x0600 to be
recognized by the cxlflash driver.

As per the design, the Accelerator Function Unit (AFU) for this new IBM
CXL Flash Adapter retains the same host interface as the previous
generation. For the early prototypes of the new card, the driver with
this change behaves exactly as the driver prior to this behaved with the
earlier generation card. Therefore, no card specific programming has
been added. These card specific changes can be staged in later if
needed.
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a2746fb1

cxlflash: Resolve oops in wait_port_offline · b45cdbaf

由 Manoj Kumar 提交于 12月 14, 2015

If an async error interrupt is generated, and the error requires the FC
link to be reset, it cannot be performed in the interrupt context. So a
work element is scheduled to complete the link reset in a process
context. If either an EEH event or an escalation occurs in between when
the interrupt is generated and the scheduled work is started, the MMIO
space may no longer be available. This will cause an oops in the worker
thread.

[  606.806583] NIP kthread_data+0x28/0x40
[  606.806633] LR wq_worker_sleeping+0x30/0x100
[  606.806694] Call Trace:
[  606.806721] 0x50 (unreliable)
[  606.806796] wq_worker_sleeping+0x30/0x100
[  606.806884] __schedule+0x69c/0x8a0
[  606.806959] schedule+0x44/0xc0
[  606.807034] do_exit+0x770/0xb90
[  606.807109] die+0x300/0x460
[  606.807185] bad_page_fault+0xd8/0x150
[  606.807259] handle_page_fault+0x2c/0x30
[  606.807338] wait_port_offline.constprop.12+0x60/0x130 [cxlflash]

To prevent the problem space area from being unmapped, when there is
pending work, a mapcount (using the kref mechanism) is held.  The
mapcount is released only when the work is completed.  The last
reference release is tied to the unmapping service.
Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b45cdbaf

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功