1. 14 11月, 2022 3 次提交
  2. 03 11月, 2022 1 次提交
  3. 29 9月, 2022 1 次提交
  4. 04 7月, 2022 1 次提交
  5. 06 5月, 2022 1 次提交
  6. 29 4月, 2022 1 次提交
  7. 16 4月, 2022 6 次提交
  8. 18 3月, 2022 1 次提交
    • S
      ibmvnic: fix race between xmit and reset · 4219196d
      Sukadev Bhattiprolu 提交于
      There is a race between reset and the transmit paths that can lead to
      ibmvnic_xmit() accessing an scrq after it has been freed in the reset
      path. It can result in a crash like:
      
      	Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
      	BUG: Kernel NULL pointer dereference on read at 0x00000000
      	Faulting instruction address: 0xc0080000016189f8
      	Oops: Kernel access of bad area, sig: 11 [#1]
      	...
      	NIP [c0080000016189f8] ibmvnic_xmit+0x60/0xb60 [ibmvnic]
      	LR [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
      	Call Trace:
      	[c008000001618f08] ibmvnic_xmit+0x570/0xb60 [ibmvnic] (unreliable)
      	[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
      	[c000000000c9cfcc] sch_direct_xmit+0xec/0x330
      	[c000000000bfe640] __dev_xmit_skb+0x3a0/0x9d0
      	[c000000000c00ad4] __dev_queue_xmit+0x394/0x730
      	[c008000002db813c] __bond_start_xmit+0x254/0x450 [bonding]
      	[c008000002db8378] bond_start_xmit+0x40/0xc0 [bonding]
      	[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
      	[c000000000c00ca4] __dev_queue_xmit+0x564/0x730
      	[c000000000cf97e0] neigh_hh_output+0xd0/0x180
      	[c000000000cfa69c] ip_finish_output2+0x31c/0x5c0
      	[c000000000cfd244] __ip_queue_xmit+0x194/0x4f0
      	[c000000000d2a3c4] __tcp_transmit_skb+0x434/0x9b0
      	[c000000000d2d1e0] __tcp_retransmit_skb+0x1d0/0x6a0
      	[c000000000d2d984] tcp_retransmit_skb+0x34/0x130
      	[c000000000d310e8] tcp_retransmit_timer+0x388/0x6d0
      	[c000000000d315ec] tcp_write_timer_handler+0x1bc/0x330
      	[c000000000d317bc] tcp_write_timer+0x5c/0x200
      	[c000000000243270] call_timer_fn+0x50/0x1c0
      	[c000000000243704] __run_timers.part.0+0x324/0x460
      	[c000000000243894] run_timer_softirq+0x54/0xa0
      	[c000000000ea713c] __do_softirq+0x15c/0x3e0
      	[c000000000166258] __irq_exit_rcu+0x158/0x190
      	[c000000000166420] irq_exit+0x20/0x40
      	[c00000000002853c] timer_interrupt+0x14c/0x2b0
      	[c000000000009a00] decrementer_common_virt+0x210/0x220
      	--- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
      
      The immediate cause of the crash is the access of tx_scrq in the following
      snippet during a reset, where the tx_scrq can be either NULL or an address
      that will soon be invalid:
      
      	ibmvnic_xmit()
      	{
      		...
      		tx_scrq = adapter->tx_scrq[queue_num];
      		txq = netdev_get_tx_queue(netdev, queue_num);
      		ind_bufp = &tx_scrq->ind_buf;
      
      		if (test_bit(0, &adapter->resetting)) {
      		...
      	}
      
      But beyond that, the call to ibmvnic_xmit() itself is not safe during a
      reset and the reset path attempts to avoid this by stopping the queue in
      ibmvnic_cleanup(). However just after the queue was stopped, an in-flight
      ibmvnic_complete_tx() could have restarted the queue even as the reset is
      progressing.
      
      Since the queue was restarted we could get a call to ibmvnic_xmit() which
      can then access the bad tx_scrq (or other fields).
      
      We cannot however simply have ibmvnic_complete_tx() check the ->resetting
      bit and skip starting the queue. This can race at the "back-end" of a good
      reset which just restarted the queue but has not cleared the ->resetting
      bit yet. If we skip restarting the queue due to ->resetting being true,
      the queue would remain stopped indefinitely potentially leading to transmit
      timeouts.
      
      IOW ->resetting is too broad for this purpose. Instead use a new flag
      that indicates whether or not the queues are active. Only the open/
      reset paths control when the queues are active. ibmvnic_complete_tx()
      and others wake up the queue only if the queue is marked active.
      
      So we will have:
      	A. reset/open thread in ibmvnic_cleanup() and __ibmvnic_open()
      
      		->resetting = true
      		->tx_queues_active = false
      		disable tx queues
      		...
      		->tx_queues_active = true
      		start tx queues
      
      	B. Tx interrupt in ibmvnic_complete_tx():
      
      		if (->tx_queues_active)
      			netif_wake_subqueue();
      
      To ensure that ->tx_queues_active and state of the queues are consistent,
      we need a lock which:
      
      	- must also be taken in the interrupt path (ibmvnic_complete_tx())
      	- shared across the multiple queues in the adapter (so they don't
      	  become serialized)
      
      Use rcu_read_lock() and have the reset thread synchronize_rcu() after
      updating the ->tx_queues_active state.
      
      While here, consolidate a few boolean fields in ibmvnic_adapter for
      better alignment.
      
      Based on discussions with Brian King and Dany Madden.
      
      Fixes: 7ed5b31f ("net/ibmvnic: prevent more than one thread from running in reset")
      Reported-by: NVaishnavi Bhat <vaish123@in.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4219196d
  9. 25 2月, 2022 8 次提交
    • S
      ibmvnic: Allow queueing resets during probe · fd98693c
      Sukadev Bhattiprolu 提交于
      We currently don't allow queuing resets when adapter is in VNIC_PROBING
      state - instead we throw away the reset and return EBUSY. The reasoning
      is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
      being initialized so performing a reset during this time can lead us to
      accessing fields in the ibmvnic_adapter that are not fully initialized.
      A review of the code shows that all the adapter state neede to process a
      reset is initialized before registering the CRQ so that should no longer
      be a concern.
      
      Further the expectation is that if we do get a reset (transport event)
      during probe, the do..while() loop in ibmvnic_probe() will handle this
      by reinitializing the CRQ.
      
      While that is true to some extent, it is possible that the reset might
      occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
      but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
      such a reset will be thrown away. While the client assumes that the
      adapter is functional, the vnic server will wait for the client to reinit
      the adapter. This disconnect between the two leaves the adapter down
      needing manual intervention.
      
      Because ibmvnic_probe() has other work to do after initializing the CRQ
      (such as registering the netdev at a minimum) and because the reset event
      can occur at any instant after the CRQ is initialized, there will always
      be a window between initializing the CRQ and considering the adapter
      ready for resets (ie state == PROBED).
      
      So rather than discarding resets during this window, allow queueing them
      - but only process them after the adapter is fully initialized.
      
      To do this, introduce a new completion state ->probe_done and have the
      reset worker thread wait on this before processing resets.
      
      This change brings up two new situations in or just after ibmvnic_probe().
      First after one or more resets were queued, we encounter an error and
      decide to retry the initialization.  At that point the queued resets are
      no longer relevant since we could be talking to a new vnic server. So we
      must purge/flush the queued resets before restarting the initialization.
      As a side note, since we are still in the probing stage and we have not
      registered the netdev, it will not be CHANGE_PARAM reset.
      
      Second this change opens up a potential race between the worker thread
      in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
      following sequence of events:
      
      	1. Register CRQ
      	2. Get transport event before CRQ_INIT completes.
      	3. Tasklet schedules reset:
      		a) add rwi to list
      		b) schedule_work() to start worker thread which runs
      		   and waits for ->probe_done.
      	4. ibmvnic_probe() decides to retry, purges rwi_list
      	5. Re-register crq and this time rest of probe succeeds - register
      	   netdev and complete(->probe_done).
      	6. Worker thread resumes in __ibmvnic_reset() from 3b.
      	7. Worker thread sets ->resetting bit
      	8. ibmvnic_open() comes in, notices ->resetting bit, sets state
      	   to IBMVNIC_OPEN and returns early expecting worker thread to
      	   finish the open.
      	9. Worker thread finds rwi_list empty and returns without
      	   opening the interface.
      
      If this happens, the ->ndo_open() call is effectively lost and the
      interface remains down. To address this, ensure that ->rwi_list is
      not empty before setting the ->resetting  bit. See also comments in
      __ibmvnic_reset().
      
      Fixes: 6a2fb0e9 ("ibmvnic: driver initialization for kdump/kexec")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd98693c
    • S
      ibmvnic: clear fop when retrying probe · f628ad53
      Sukadev Bhattiprolu 提交于
      Clear ->failover_pending flag that may have been set in the previous
      pass of registering CRQ. If we don't clear, a subsequent ibmvnic_open()
      call would be misled into thinking a failover is pending and assuming
      that the reset worker thread would open the adapter. If this pass of
      registering the CRQ succeeds (i.e there is no transport event), there
      wouldn't be a reset worker thread.
      
      This would leave the adapter unconfigured and require manual intervention
      to bring it up during boot.
      
      Fixes: 5a18e1e0 ("ibmvnic: Fix failover case for non-redundant configuration")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f628ad53
    • S
      ibmvnic: init init_done_rc earlier · ae16bf15
      Sukadev Bhattiprolu 提交于
      We currently initialize the ->init_done completion/return code fields
      before issuing a CRQ_INIT command. But if we get a transport event soon
      after registering the CRQ the taskslet may already have recorded the
      completion and error code. If we initialize here, we might overwrite/
      lose that and end up issuing the CRQ_INIT only to timeout later.
      
      If that timeout happens during probe, we will leave the adapter in the
      DOWN state rather than retrying to register/init the CRQ.
      
      Initialize the completion before registering the CRQ so we don't lose
      the notification.
      
      Fixes: 032c5e82 ("Driver for IBM System i/p VNIC protocol")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae16bf15
    • S
      ibmvnic: register netdev after init of adapter · 570425f8
      Sukadev Bhattiprolu 提交于
      Finish initializing the adapter before registering netdev so state
      is consistent.
      
      Fixes: c26eba03 ("ibmvnic: Update reset infrastructure to support tunable parameters")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      570425f8
    • S
      ibmvnic: complete init_done on transport events · 36491f2d
      Sukadev Bhattiprolu 提交于
      If we get a transport event, set the error and mark the init as
      complete so the attempt to send crq-init or login fail sooner
      rather than wait for the timeout.
      
      Fixes: bbd669a8 ("ibmvnic: Fix completion structure initialization")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36491f2d
    • S
      ibmvnic: define flush_reset_queue helper · 83da53f7
      Sukadev Bhattiprolu 提交于
      Define and use a helper to flush the reset queue.
      
      Fixes: 2770a798 ("ibmvnic: Introduce hard reset recovery")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83da53f7
    • S
      ibmvnic: initialize rc before completing wait · 765559b1
      Sukadev Bhattiprolu 提交于
      We should initialize ->init_done_rc before calling complete(). Otherwise
      the waiting thread may see ->init_done_rc as 0 before we have updated it
      and may assume that the CRQ was successful.
      
      Fixes: 6b278c0c ("ibmvnic delay complete()")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      765559b1
    • S
      ibmvnic: free reset-work-item when flushing · 8d0657f3
      Sukadev Bhattiprolu 提交于
      Fix a tiny memory leak when flushing the reset work queue.
      
      Fixes: 2770a798 ("ibmvnic: Introduce hard reset recovery")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d0657f3
  10. 23 2月, 2022 1 次提交
  11. 18 2月, 2022 1 次提交
  12. 09 2月, 2022 1 次提交
    • S
      ibmvnic: don't release napi in __ibmvnic_open() · 61772b09
      Sukadev Bhattiprolu 提交于
      If __ibmvnic_open() encounters an error such as when setting link state,
      it calls release_resources() which frees the napi structures needlessly.
      Instead, have __ibmvnic_open() only clean up the work it did so far (i.e.
      disable napi and irqs) and leave the rest to the callers.
      
      If caller of __ibmvnic_open() is ibmvnic_open(), it should release the
      resources immediately. If the caller is do_reset() or do_hard_reset(),
      they will release the resources on the next reset.
      
      This fixes following crash that occurred when running the drmgr command
      several times to add/remove a vnic interface:
      
      	[102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq
      	[102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq
      	[102056] ibmvnic 30000003 env3: Replenished 8 pools
      	Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
      	BUG: Kernel NULL pointer dereference on read at 0x00000010
      	Faulting instruction address: 0xc000000000a3c840
      	Oops: Kernel access of bad area, sig: 11 [#1]
      	LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
      	...
      	CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 5.16.0-rc5-autotest-g6441998e #1
      	Workqueue: events_long __ibmvnic_reset [ibmvnic]
      	NIP:  c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820
      	REGS: c0000000548e37e0 TRAP: 0300   Not tainted  (5.16.0-rc5-autotest-g6441998e)
      	MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28248484  XER: 00000004
      	CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
      	GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000
      	...
      	NIP [c000000000a3c840] napi_enable+0x20/0xc0
      	LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic]
      	Call Trace:
      	[c0000000548e3a80] [0000000000000006] 0x6 (unreliable)
      	[c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic]
      	[c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic]
      	[c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570
      	[c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660
      	[c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0
      	[c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
      	Instruction dump:
      	7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010
      	38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020
      	---[ end trace 5f8033b08fd27706 ]---
      
      Fixes: ed651a10 ("ibmvnic: Updated reset handling")
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: NDany Madden <drt@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220208001918.900602-1-sukadev@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      61772b09
  13. 24 1月, 2022 4 次提交
    • S
      ibmvnic: remove unused ->wait_capability · 3a5d9db7
      Sukadev Bhattiprolu 提交于
      With previous bug fix, ->wait_capability flag is no longer needed and can
      be removed.
      
      Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: NDany Madden <drt@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a5d9db7
    • S
      ibmvnic: don't spin in tasklet · 48079e7f
      Sukadev Bhattiprolu 提交于
      ibmvnic_tasklet() continuously spins waiting for responses to all
      capability requests. It does this to avoid encountering an error
      during initialization of the vnic. However if there is a bug in the
      VIOS and we do not receive a response to one or more queries the
      tasklet ends up spinning continuously leading to hard lock ups.
      
      If we fail to receive a message from the VIOS it is reasonable to
      timeout the login attempt rather than spin indefinitely in the tasklet.
      
      Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: NDany Madden <drt@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48079e7f
    • S
      ibmvnic: init ->running_cap_crqs early · 151b6a5c
      Sukadev Bhattiprolu 提交于
      We use ->running_cap_crqs to determine when the ibmvnic_tasklet() should
      send out the next protocol message type. i.e when we get back responses
      to all our QUERY_CAPABILITY CRQs we send out REQUEST_CAPABILITY crqs.
      Similiary, when we get responses to all the REQUEST_CAPABILITY crqs, we
      send out the QUERY_IP_OFFLOAD CRQ.
      
      We currently increment ->running_cap_crqs as we send out each CRQ and
      have the ibmvnic_tasklet() send out the next message type, when this
      running_cap_crqs count drops to 0.
      
      This assumes that all the CRQs of the current type were sent out before
      the count drops to 0. However it is possible that we send out say 6 CRQs,
      get preempted and receive all the 6 responses before we send out the
      remaining CRQs. This can result in ->running_cap_crqs count dropping to
      zero before all messages of the current type were sent and we end up
      sending the next protocol message too early.
      
      Instead initialize the ->running_cap_crqs upfront so the tasklet will
      only send the next protocol message after all responses are received.
      
      Use the cap_reqs local variable to also detect any discrepancy (either
      now or in future) in the number of capability requests we actually send.
      
      Currently only send_query_cap() is affected by this behavior (of sending
      next message early) since it is called from the worker thread (during
      reset) and from application thread (during ->ndo_open()) and they can be
      preempted. send_request_cap() is only called from the tasklet  which
      processes CRQ responses sequentially, is not be affected.  But to
      maintain the existing symmtery with send_query_capability() we update
      send_request_capability() also.
      
      Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: NDany Madden <drt@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      151b6a5c
    • S
      ibmvnic: Allow extra failures before disabling · db9f0e8b
      Sukadev Bhattiprolu 提交于
      If auto-priority-failover (APF) is enabled and there are at least two
      backing devices of different priorities, some resets like fail-over,
      change-param etc can cause at least two back to back failovers. (Failover
      from high priority backing device to lower priority one and then back
      to the higher priority one if that is still functional).
      
      Depending on the timimg of the two failovers it is possible to trigger
      a "hard" reset and for the hard reset to fail due to failovers. When this
      occurs, the driver assumes that the network is unstable and disables the
      VNIC for a 60-second "settling time". This in turn can cause the ethtool
      command to fail with "No such device" while the vnic automatically recovers
      a little while later.
      
      Given that it's possible to have two back to back failures, allow for extra
      failures before disabling the vnic for the settling time.
      
      Fixes: f15fde9d ("ibmvnic: delay next reset if hard reset fails")
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: NDany Madden <drt@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db9f0e8b
  14. 14 12月, 2021 1 次提交
  15. 02 12月, 2021 2 次提交
  16. 22 11月, 2021 1 次提交
  17. 17 11月, 2021 1 次提交
  18. 01 11月, 2021 3 次提交
  19. 02 10月, 2021 1 次提交
  20. 27 9月, 2021 1 次提交