1. 25 10月, 2021 1 次提交
  2. 23 10月, 2021 1 次提交
  3. 22 9月, 2021 3 次提交
    • A
      s390/qeth: fix deadlock during failing recovery · d2b59bd4
      Alexandra Winter 提交于
      Commit 0b9902c1 ("s390/qeth: fix deadlock during recovery") removed
      taking discipline_mutex inside qeth_do_reset(), fixing potential
      deadlocks. An error path was missed though, that still takes
      discipline_mutex and thus has the original deadlock potential.
      
      Intermittent deadlocks were seen when a qeth channel path is configured
      offline, causing a race between qeth_do_reset and ccwgroup_remove.
      Call qeth_set_offline() directly in the qeth_do_reset() error case and
      then a new variant of ccwgroup_set_offline(), without taking
      discipline_mutex.
      
      Fixes: b41b554c ("s390/qeth: fix locking for discipline setup / removal")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d2b59bd4
    • A
      s390/qeth: Fix deadlock in remove_discipline · ee909d0b
      Alexandra Winter 提交于
      Problem: qeth_close_dev_handler is a worker that tries to acquire
      card->discipline_mutex via drv->set_offline() in ccwgroup_set_offline().
      Since commit b41b554c
      ("s390/qeth: fix locking for discipline setup / removal")
      qeth_remove_discipline() is called under card->discipline_mutex and
      cancels the work and waits for it to finish.
      
      STOPLAN reception with reason code IPA_RC_VEPA_TO_VEB_TRANSITION is the
      only situation that schedules close_dev_work. In that situation scheduling
      qeth recovery will also result in an offline interface, when resetting the
      isolation mode fails, if the external switch is still set to VEB.
      And since commit 0b9902c1 ("s390/qeth: fix deadlock during recovery")
      qeth recovery does not aquire card->discipline_mutex anymore.
      
      So we accept the longer pathlength of qeth_schedule_recovery in this
      error situation and re-use the existing function.
      
      As a side-benefit this changes the hwtrap to behave like during recovery
      instead of like during a user-triggered set_offline.
      
      Fixes: b41b554c ("s390/qeth: fix locking for discipline setup / removal")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Acked-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ee909d0b
    • J
      s390/qeth: fix NULL deref in qeth_clear_working_pool_list() · 248f064a
      Julian Wiedmann 提交于
      When qeth_set_online() calls qeth_clear_working_pool_list() to roll
      back after an error exit from qeth_hardsetup_card(), we are at risk of
      accessing card->qdio.in_q before it was allocated by
      qeth_alloc_qdio_queues() via qeth_mpc_initialize().
      
      qeth_clear_working_pool_list() then dereferences NULL, and by writing to
      queue->bufs[i].pool_entry scribbles all over the CPU's lowcore.
      Resulting in a crash when those lowcore areas are used next (eg. on
      the next machine-check interrupt).
      
      Such a scenario would typically happen when the device is first set
      online and its queues aren't allocated yet. An early IO error or certain
      misconfigs (eg. mismatched transport mode, bad portno) then cause us to
      error out from qeth_hardsetup_card() with card->qdio.in_q still being
      NULL.
      
      Fix it by checking the pointer for NULL before accessing it.
      
      Note that we also have (rare) paths inside qeth_mpc_initialize() where
      a configuration change can cause us to free the existing queues,
      expecting that subsequent code will allocate them again. If we then
      error out before that re-allocation happens, the same bug occurs.
      
      Fixes: eff73e16 ("s390/qeth: tolerate pre-filled RX buffer")
      Reported-by: NStefan Raspl <raspl@linux.ibm.com>
      Root-caused-by: NHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      248f064a
  4. 28 7月, 2021 1 次提交
    • A
      qeth: use ndo_siocdevprivate · 18787eee
      Arnd Bergmann 提交于
      qeth has both standard MII ioctls and custom SIOCDEVPRIVATE ones,
      all of which work correctly with compat user space.
      
      Move the private ones over to the new ndo_siocdevprivate callback.
      
      Cc: Julian Wiedmann <jwi@linux.ibm.com>
      Cc: Karsten Graul <kgraul@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: linux-s390@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18787eee
  5. 27 7月, 2021 1 次提交
  6. 20 7月, 2021 3 次提交
  7. 12 6月, 2021 7 次提交
  8. 22 3月, 2021 1 次提交
    • J
      s390/qdio: let driver manage the QAOB · 396c1004
      Julian Wiedmann 提交于
      We are spending way too much effort on qdio-internal bookkeeping for
      QAOB management & caching, and it's still not robust. Once qdio's
      TX path has detached the QAOB from a PENDING buffer, we lost all
      track of it until it shows up in a CQ notification again. So if the
      device is torn down before that notification arrives, we leak the QAOB.
      
      Just have the driver take care of it, and simply pass down a QAOB if
      they want a TX with async-completion capability. For a buffer in PENDING
      state that requires the QAOB for final completion, qeth can now also try
      to recycle the buffer's QAOB rather than unconditionally freeing it.
      
      This also eliminates the qdio_outbuf_state array, which was only needed
      to transfer the aob->user1 tag from the driver to the qdio layer.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Acked-by: NBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      396c1004
  9. 19 3月, 2021 2 次提交
  10. 10 3月, 2021 4 次提交
    • J
      s390/qeth: fix notification for pending buffers during teardown · 7eefda7f
      Julian Wiedmann 提交于
      The cited commit reworked the state machine for pending TX buffers.
      In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
      uses NEED_QAOB for buffers that get parked while waiting for their QAOB
      completion.
      
      But it missed to adjust the check in qeth_tx_complete_buf(). So if
      qeth_tx_complete_pending_bufs() is called during teardown to drain
      the parked TX buffers, we no longer raise a notification for af_iucv.
      
      Instead of updating the checked state, just move this code into
      qeth_tx_complete_pending_bufs() itself. This also gets rid of the
      special-case in the common TX completion path.
      
      Fixes: 8908f36d ("s390/qeth: fix af_iucv notification race")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7eefda7f
    • J
      s390/qeth: schedule TX NAPI on QAOB completion · 3e83d467
      Julian Wiedmann 提交于
      When a QAOB notifies us that a pending TX buffer has been delivered, the
      actual TX completion processing by qeth_tx_complete_pending_bufs()
      is done within the context of a TX NAPI instance. We shouldn't rely on
      this instance being scheduled by some other TX event, but just do it
      ourselves.
      
      qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
      instance. To avoid touching the TX queue's NAPI instance
      before/after it is (un-)registered, reorder the code in qeth_open()
      and qeth_stop() accordingly.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e83d467
    • J
      s390/qeth: improve completion of pending TX buffers · c20383ad
      Julian Wiedmann 提交于
      The current design attaches a pending TX buffer to a custom
      single-linked list, which is anchored at the buffer's slot on the
      TX ring. The buffer is then checked for final completion whenever
      this slot is processed during a subsequent TX NAPI poll cycle.
      
      But if there's insufficient traffic on the ring, we might never make
      enough progress to get back to this ring slot and discover the pending
      buffer's final TX completion. In particular if this missing TX
      completion blocks the application from sending further traffic.
      
      So convert the custom single-linked list code to a per-queue list_head,
      and scan this list on every TX NAPI cycle.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c20383ad
    • J
      s390/qeth: fix memory leak after failed TX Buffer allocation · e7a36d27
      Julian Wiedmann 提交于
      When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
      back an Output Queue, the 'out_freeoutqbufs' path will free all
      previously allocated buffers for this queue. But it misses to free the
      half-finished queue struct itself.
      
      Move the buffer allocation into qeth_alloc_output_queue(), and deal with
      such errors internally.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7a36d27
  11. 14 2月, 2021 2 次提交
    • J
      s390/qdio: remove 'merge_pending' mechanism · 2223318c
      Julian Wiedmann 提交于
      For non-QEBSM devices, get_buf_states() merges PENDING and EMPTY buffers
      into a single group of finished buffers. To allow the upper-layer driver
      to differentiate between the two states, qdio_check_pending() looks at
      each buffer's state again and sets the sbal_state flag to
      QDIO_OUTBUF_STATE_FLAG_PENDING accordingly.
      
      So effectively we're spending overhead on _every_ Output Queue
      inspection, just to avoid some additional TX completion calls in case
      a group of buffers has completed with mixed EMPTY / PENDING state.
      Given that PENDING buffers should rarely occur, this is a bad trade-off.
      In particular so as the additional checks in get_buf_states() affect
      _all_ device types (even those that don't use the PENDING state).
      
      Rip it all out, and just report the PENDING completions separately as
      we already do for QEBSM devices.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      2223318c
    • J
      s390/qdio: improve handling of PENDING buffers for QEBSM devices · 7940eaf2
      Julian Wiedmann 提交于
      For QEBSM devices the 'merge_pending' mechanism in get_buf_states()
      doesn't apply, and we can actually get SLSB_P_OUTPUT_PENDING returned.
      
      So for this case propagating the PENDING state to the driver via the
      queue's sbal_state doesn't make sense and creates unnecessary overhead.
      Instead introduce a new QDIO_ERROR_* flag that gets passed to the
      driver, and triggers the same processing as if the buffers were flagged
      as QDIO_OUTBUF_STATE_FLAG_PENDING.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      7940eaf2
  12. 29 1月, 2021 4 次提交
  13. 08 1月, 2021 2 次提交
    • J
      s390/qeth: fix locking for discipline setup / removal · b41b554c
      Julian Wiedmann 提交于
      Due to insufficient locking, qeth_core_set_online() and
      qeth_dev_layer2_store() can run in parallel, both attempting to load &
      setup the discipline (and stepping on each other toes along the way).
      A similar race can also occur between qeth_core_remove_device() and
      qeth_dev_layer2_store().
      
      Access to .discipline is meant to be protected by the discipline_mutex,
      so add/expand the locking in qeth_core_remove_device() and
      qeth_core_set_online().
      Adjust the locking in qeth_l*_remove_device() accordingly, as it's now
      handled by the callers in a consistent manner.
      
      Based on an initial patch by Ursula Braun.
      
      Fixes: 9dc48ccc ("qeth: serialize sysfs-triggered device configurations")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b41b554c
    • J
      s390/qeth: fix deadlock during recovery · 0b9902c1
      Julian Wiedmann 提交于
      When qeth_dev_layer2_store() - holding the discipline_mutex - waits
      inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete,
      we can hit a deadlock if qeth_do_reset() concurrently calls
      qeth_set_online() and thus tries to aquire the discipline_mutex.
      
      Move the discipline_mutex locking outside of qeth_set_online() and
      qeth_set_offline(), and turn the discipline into a parameter so that
      callers understand the dependency.
      
      To fix the deadlock, we can now relax the locking:
      As already established, qeth_l*_remove_device() waits for
      qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk
      of having card->discipline ripped out while it's running, and thus
      doesn't need to take the discipline_mutex.
      
      Fixes: 9dc48ccc ("qeth: serialize sysfs-triggered device configurations")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      0b9902c1
  14. 07 12月, 2020 5 次提交
  15. 21 11月, 2020 3 次提交
    • J
      s390/qeth: fix tear down of async TX buffers · 7ed10e16
      Julian Wiedmann 提交于
      When qeth_iqd_tx_complete() detects that a TX buffer requires additional
      async completion via QAOB, it might fail to replace the queue entry's
      metadata (and ends up triggering recovery).
      
      Assume now that the device gets torn down, overruling the recovery.
      If the QAOB notification then arrives before the tear down has
      sufficiently progressed, the buffer state is changed to
      QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob().
      
      The tear down code calls qeth_drain_output_queue(), where
      qeth_cleanup_handled_pending() will then attempt to replace such a
      buffer _again_. If it succeeds this time, the buffer ends up dangling in
      its replacement's ->next_pending list ... where it will never be freed,
      since there's no further call to qeth_cleanup_handled_pending().
      
      But the second attempt isn't actually needed, we can simply leave the
      buffer on the queue and re-use it after a potential recovery has
      completed. The qeth_clear_output_buffer() in qeth_drain_output_queue()
      will ensure that it's in a clean state again.
      
      Fixes: 72861ae7 ("qeth: recovery through asynchronous delivery")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7ed10e16
    • J
      s390/qeth: fix af_iucv notification race · 8908f36d
      Julian Wiedmann 提交于
      The two expected notification sequences are
      1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when
         our TX completion code first observed the pending TX and the QAOB
         then completes at a later time; or
      2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB
         completion before our TX completion code even noticed that the TX
         was pending.
      
      But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run
      concurrently, we may end up with a race that results in a sequence of
      TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse
      the af_iucv code in its tracking of pending transmits.
      
      Rework the notification code, so that qeth_qdio_handle_aob() defers its
      notification if the TX completion code is still active.
      
      Fixes: b3332930 ("qeth: add support for af_iucv HiperSockets transport")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8908f36d
    • J
      s390/qeth: make af_iucv TX notification call more robust · 34c7f50f
      Julian Wiedmann 提交于
      Calling into socket code is ugly already, at least check whether we are
      dealing with the expected sk_family. Only looking at skb->protocol is
      bound to cause troubles (consider eg. af_packet).
      
      Fixes: b3332930 ("qeth: add support for af_iucv HiperSockets transport")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      34c7f50f