提交 · 7fdd375e383097a785bb65c66802e468f398bf82 · openeuler / Kernel

10 12月, 2020 22 次提交

net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower · 7fdd375e

由 Guillaume Nault 提交于 12月 09, 2020

TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL is a u32 attribute (MPLS label is
20 bits long).

Fixes the following bug:

 $ tc filter add dev ethX ingress protocol mpls_uc \
     flower mpls lse depth 2 label 256             \
     action drop

 $ tc filter show dev ethX ingress
   filter protocol mpls_uc pref 49152 flower chain 0
   filter protocol mpls_uc pref 49152 flower chain 0 handle 0x1
     eth_type 8847
     mpls
       lse depth 2 label 0  <-- invalid label 0, should be 256
   ...

Fixes: 61aec25a ("cls_flower: Support filtering on multiple MPLS Label Stack Entries")
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fdd375e

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · b7e4ba9a

由 David S. Miller 提交于 12月 09, 2020

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
   from Subash Abhinov Kasiviswanathan.

2) Fix netlink dump of dynset timeouts later than 23 days.

3) Add comment for the indirect serialization of the nft commit mutex
   with rtnl_mutex.

4) Remove bogus check for confirmed conntrack when matching on the
   conntrack ID, from Brett Mastbergen.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7e4ba9a

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 88287773

由 David S. Miller 提交于 12月 09, 2020

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2020-12-09

This series contains updates to igb, ixgbe, i40e, and ice drivers.

Sven Auhagen fixes issues with igb XDP: return correct error value in XDP
xmit back, increase header padding to include space for double VLAN, add
an extack error when Rx buffer is too small for frame size, set metasize if
it is set in xdp, change xdp_do_flush_map to xdp_do_flush, and update
trans_start to avoid possible Tx timeout.

Björn fixes an issue where an Rx buffer can be reused prematurely with
XDP redirect for ixgbe, i40e, and ice drivers.

The following are changes since commit 323a391a:
  can: isotp: isotp_setsockopt(): block setsockopt on bound sockets
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88287773

Merge branch 'mlx4_en-fixes' · 9a25a30e

由 David S. Miller 提交于 12月 09, 2020

Tariq Toukan says:

====================
mlx4_en fixes

This patchset by Moshe contains fixes to the mlx4 Eth driver,
addressing issues in restart flow.

Patch 1 protects the restart task from being rescheduled while active.
  Please queue for -stable >= v2.6.
Patch 2 reconstructs SQs stuck in error state, and adds prints for improved
  debuggability.
  Please queue for -stable >= v3.12.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a25a30e

net/mlx4_en: Handle TX error CQE · ba603d9d

由 Moshe Shemesh 提交于 12月 09, 2020

In case error CQE was found while polling TX CQ, the QP is in error
state and all posted WQEs will generate error CQEs without any data
transmitted. Fix it by reopening the channels, via same method used for
TX timeout handling.

In addition add some more info on error CQE and WQE for debug.

Fixes: bd2f631d ("net/mlx4_en: Notify user when TX ring in error state")
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba603d9d

net/mlx4_en: Avoid scheduling restart task if it is already running · fed91613

由 Moshe Shemesh 提交于 12月 09, 2020

Add restarting state flag to avoid scheduling another restart task while
such task is already running. Change task name from watchdog_task to
restart_task to better fit the task role.

Fixes: 1e338db5 ("mlx4_en: Fix a race at restart task")
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fed91613

tcp: fix cwnd-limited bug for TSO deferral where we send nothing · 299bcb55

由 Neal Cardwell 提交于 12月 08, 2020

When cwnd is not a multiple of the TSO skb size of N*MSS, we can get
into persistent scenarios where we have the following sequence:

(1) ACK for full-sized skb of N*MSS arrives
  -> tcp_write_xmit() transmit full-sized skb with N*MSS
  -> move pacing release time forward
  -> exit tcp_write_xmit() because pacing time is in the future

(2) TSQ callback or TCP internal pacing timer fires
  -> try to transmit next skb, but TSO deferral finds remainder of
     available cwnd is not big enough to trigger an immediate send
     now, so we defer sending until the next ACK.

(3) repeat...

So we can get into a case where we never mark ourselves as
cwnd-limited for many seconds at a time, even with
bulk/infinite-backlog senders, because:

o In case (1) above, every time in tcp_write_xmit() we have enough
cwnd to send a full-sized skb, we are not fully using the cwnd
(because cwnd is not a multiple of the TSO skb size). So every time we
send data, we are not cwnd limited, and so in the cwnd-limited
tracking code in tcp_cwnd_validate() we mark ourselves as not
cwnd-limited.

o In case (2) above, every time in tcp_write_xmit() that we try to
transmit the "remainder" of the cwnd but defer, we set the local
variable is_cwnd_limited to true, but we do not send any packets, so
sent_pkts is zero, so we don't call the cwnd-limited logic to update
tp->is_cwnd_limited.

Fixes: ca8a2263 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Reported-by: NIngemar Johansson <ingemar.s.johansson@ericsson.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

299bcb55

net: flow_offload: Fix memory leak for indirect flow block · 5137d303

由 Chris Mi 提交于 12月 08, 2020

The offending commit introduces a cleanup callback that is invoked
when the driver module is removed to clean up the tunnel device
flow block. But it returns on the first iteration of the for loop.
The remaining indirect flow blocks will never be freed.

Fixes: 1fac52da ("net: flow_offload: consolidate indirect flow_block infrastructure")
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NChris Mi <cmi@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>

5137d303

tcp: Retain ECT bits for tos reflection · 8ef44b6f

由 Wei Wang 提交于 12月 08, 2020

For DCTCP, we have to retain the ECT bits set by the congestion control
algorithm on the socket when reflecting syn TOS in syn-ack, in order to
make ECN work properly.

Fixes: ac8f1710 ("tcp: reflect tos value received in SYN to the socket")
Reported-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NWei Wang <weiwan@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ef44b6f

ethtool: fix stack overflow in ethnl_parse_bitset() · a770bf51

由 Michal Kubecek 提交于 12月 08, 2020

Syzbot reported a stack overflow in bitmap_from_arr32() called from
ethnl_parse_bitset() when bitset from netlink message is longer than
target bitmap length. While ethnl_compact_sanity_checks() makes sure that
trailing part is all zeros (i.e. the request does not try to touch bits
kernel does not recognize), we also need to cap change_bits to nbits so
that we don't try to write past the prepared bitmaps.

Fixes: 88db6d1e ("ethtool: add ethnl_parse_bitset() helper")
Reported-by: syzbot+9d39fa49d4df294aab93@syzkaller.appspotmail.com
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/3487ee3a98e14cd526f55b6caaa959d2dcbcad9f.1607465316.git.mkubecek@suse.czSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a770bf51

e1000e: fix S0ix flow to allow S0i3.2 subset entry · a379b01c

由 Vitaly Lifshits 提交于 12月 08, 2020

Changed a configuration in the flows to align with
architecture requirements to achieve S0i3.2 substate.

This helps both i219V and i219LM configurations.

Also fixed a typo in the previous commit 632fbd5e
("e1000e: fix S0ix flows for cable connected case").

Fixes: 632fbd5e ("e1000e: fix S0ix flows for cable connected case").
Signed-off-by: NVitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NMario Limonciello <mario.limonciello@dell.com>
Link: https://lore.kernel.org/r/20201208185632.151052-1-mario.limonciello@dell.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a379b01c

ice: avoid premature Rx buffer reuse · 1beb7830

由 Björn Töpel 提交于 8月 25, 2020

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Fixes: efc2214b ("ice: Add support for XDP")
Reported-and-analyzed-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

1beb7830

ixgbe: avoid premature Rx buffer reuse · a06316dc

由 Björn Töpel 提交于 8月 25, 2020

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Fixes: 64530739 ("ixgbe: add initial support for xdp redirect")
Reported-and-analyzed-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

a06316dc

i40e: avoid premature Rx buffer reuse · 75aab4e1

由 Björn Töpel 提交于 8月 25, 2020

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Longer explanation:

Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.

t0: Page is allocated, and put on the Rx ring
              +---------------
used by NIC ->| upper buffer
(rx_buffer)   +---------------
              | lower buffer
              +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX

t1: Buffer is received, and passed to the stack (e.g.)
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 1

t2: Buffer is received, and redirected
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------

Now, prior calling xdp_do_redirect():
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

This means that buffer *cannot* be flipped/reused, because the skb is
still using it.

The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
  page count  == USHRT_MAX - 1
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!

To work around this, the page count is stored prior calling
xdp_do_redirect().

Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not.

Fixes: d9314c47 ("i40e: add support for XDP_REDIRECT")
Reported-and-analyzed-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

75aab4e1

igb: avoid transmit queue timeout in xdp path · ec107e77