提交 · 534da5e856334fb54cb0272a9fb3afec28ea3aed · openeuler / Kernel

31 1月, 2019 11 次提交

virtio_net: Don't call free_old_xmit_skbs for xdp_frames · 534da5e8

由 Toshiaki Makita 提交于 1月 29, 2019

When napi_tx is enabled, virtnet_poll_cleantx() called
free_old_xmit_skbs() even for xdp send queue.
This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled
device tx bytes counters because skb->len is meaningless value, and even
triggered oops due to general protection fault on freeing them.

Since xdp send queues do not aquire locks, old xdp_frames should be
freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for
xdp send queues.

Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI
handler is called even without calling start_xmit() because cb for tx is
by default enabled. Once the handler is called, it enabled the cb again,
and then the handler would be called again. We don't need this handler
for XDP, so don't enable cb as well as not calling free_old_xmit_skbs().

Also, we need to disable tx NAPI when disabling XDP, so
virtnet_poll_tx() can safely access curr_queue_pairs and
xdp_queue_pairs, which are not atomically updated while disabling XDP.

Fixes: b92f1e67 ("virtio-net: transmit napi")
Fixes: 7b0411ef ("virtio-net: clean tx descriptors from rx napi")
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

534da5e8

virtio_net: Don't enable NAPI when interface is down · 8be4d9a4

由 Toshiaki Makita 提交于 1月 29, 2019

Commit 4e09ff53 ("virtio-net: disable NAPI only when enabled during
XDP set") tried to fix inappropriate NAPI enabling/disabling when
!netif_running(), but was not complete.

On error path virtio_net could enable NAPI even when !netif_running().
This can cause enabling NAPI twice on virtnet_open(), which would
trigger BUG_ON() in napi_enable().

Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8be4d9a4

Merge branch 'erspan-always-reports-output-key-to-userspace' · 41ef81be

由 David S. Miller 提交于 1月 30, 2019

Lorenzo Bianconi says:

====================
erspan: always reports output key to userspace

Erspan protocol relies on output key to set session id header field.
However TUNNEL_KEY bit is cleared in order to not add key field to
the external GRE header and so the configured o_key is not reported
to userspace.
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter dumping
device info
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41ef81be

net: ip6_gre: always reports o_key to userspace · c706863b

由 Lorenzo Bianconi 提交于 1月 28, 2019

As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:

$ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
    key 1 seq erspan_ver 1
$ip link set ip6erspan1 up
ip -d link sh ip6erspan1

ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
    link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
    ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq

Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ip6gre_fill_info

Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c706863b

net: ip_gre: always reports o_key to userspace · feaf5c79

由 Lorenzo Bianconi 提交于 1月 28, 2019

Erspan protocol (version 1 and 2) relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
erspan_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:

$ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
    key 1 seq erspan_ver 1
$ip link set erspan1 up
$ip -d link sh erspan1

erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
  link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
  erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0

Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ipgre_fill_info

Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

feaf5c79

ucc_geth: Reset BQL queue when stopping device · e15aa3b2

由 Mathias Thore 提交于 1月 28, 2019

After a timeout event caused by for example a broadcast storm, when
the MAC and PHY are reset, the BQL TX queue needs to be reset as
well. Otherwise, the device will exhibit severe performance issues
even after the storm has ended.
Co-authored-by: NDavid Gounaris <david.gounaris@infinera.com>
Signed-off-by: NMathias Thore <mathias.thore@infinera.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e15aa3b2

Merge branch 'net-various-compat-ioctl-fixes' · 794827f3

由 David S. Miller 提交于 1月 30, 2019

Johannes Berg says:

====================
various compat ioctl fixes

Back a long time ago, I already fixed a few of these by passing
the size of the struct ifreq to do_sock_ioctl(). However, Robert
found more cases, and now it won't be as simple because we'd have
to pass that down all the way to e.g. bond_do_ioctl() which isn't
really feasible.

Therefore, restore the old code.

While looking at why SIOCGIFNAME was broken, I realized that Al
had removed that case - which had been handled in an explicit
separate function - as well, and looking through his work at the
time I saw that bond ioctls were also affected by the erroneous
removal.

I've restored SIOCGIFNAME and bond ioctls by going through the
(now renamed) dev_ifsioc() instead of reintroducing their own
helper functions, which I hope is correct but have only tested
with SIOCGIFNAME.
====================
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

794827f3

net: socket: make bond ioctls go through compat_ifreq_ioctl() · 98406133

由 Johannes Berg 提交于 1月 25, 2019

Same story as before, these use struct ifreq and thus need
to be read with the shorter version to not cause faults.

Cc: stable@vger.kernel.org
Fixes: f92d4fc9 ("kill bond_ioctl()")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98406133

net: socket: fix SIOCGIFNAME in compat · c6c9fee3

由 Johannes Berg 提交于 1月 25, 2019

As reported by Robert O'Callahan in
https://bugzilla.kernel.org/show_bug.cgi?id=202273
reverting the previous changes in this area broke
the SIOCGIFNAME ioctl in compat again (I'd previously
fixed it after his previous report of breakage in
https://bugzilla.kernel.org/show_bug.cgi?id=199469).

This is obviously because I fixed SIOCGIFNAME more or
less by accident.

Fix it explicitly now by making it pass through the
restored compat translation code.

Cc: stable@vger.kernel.org
Fixes: 4cf808e7 ("kill dev_ifname32()")
Reported-by: NRobert O'Callahan <robert@ocallahan.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6c9fee3

Revert "kill dev_ifsioc()" · 37ac39bd

由 Johannes Berg 提交于 1月 25, 2019

This reverts commit bf440573 ("kill dev_ifsioc()").

This wasn't really unused as implied by the original commit,
it still handles the copy to/from user differently, and the
commit thus caused issues such as
  https://bugzilla.kernel.org/show_bug.cgi?id=199469
and
  https://bugzilla.kernel.org/show_bug.cgi?id=202273

However, deviating from a strict revert, rename dev_ifsioc()
to compat_ifreq_ioctl() to be clearer as to its purpose and
add a comment.

Cc: stable@vger.kernel.org
Fixes: bf440573 ("kill dev_ifsioc()")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37ac39bd

Revert "socket: fix struct ifreq size in compat ioctl" · 63ff03ab

由 Johannes Berg 提交于 1月 25, 2019

This reverts commit 1cebf8f1 ("socket: fix struct ifreq
size in compat ioctl"), it's a bugfix for another commit that
I'll revert next.

This is not a 'perfect' revert, I'm keeping some coding style
intact rather than revert to the state with indentation errors.

Cc: stable@vger.kernel.org
Fixes: 1cebf8f1 ("socket: fix struct ifreq size in compat ioctl")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63ff03ab

30 1月, 2019 7 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 62967898

由 Linus Torvalds 提交于 1月 29, 2019

Pull networking fixes from David Miller:

 1) Need to save away the IV across tls async operations, from Dave
    Watson.

 2) Upon successful packet processing, we should liberate the SKB with
    dev_consume_skb{_irq}(). From Yang Wei.

 3) Only apply RX hang workaround on effected macb chips, from Harini
    Katakam.

 4) Dummy netdev need a proper namespace assigned to them, from Josh
    Elsasser.

 5) Some paths of nft_compat run lockless now, and thus we need to use a
    proper refcnt_t. From Florian Westphal.

 6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.

 7) netrom does not refcount sockets properly wrt. timers, fix that by
    using the sock timer API. From Cong Wang.

 8) Fix locking of inexact inserts of xfrm policies, from Florian
    Westphal.

 9) Missing xfrm hash generation bump, also from Florian.

10) Missing of_node_put() in hns driver, from Yonglong Liu.

11) Fix DN_IFREQ_SIZE, from Johannes Berg.

12) ip6mr notifier is invoked during traversal of wrong table, from Nir
    Dotan.

13) TX promisc settings not performed correctly in qed, from Manish
    Chopra.

14) Fix OOB access in vhost, from Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  MAINTAINERS: Add entry for XDP (eXpress Data Path)
  net: set default network namespace in init_dummy_netdev()
  net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
  net: caif: call dev_consume_skb_any when skb xmit done
  net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: macb: Apply RXUBR workaround only to versions with errata
  net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
  net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
  net: tls: Fix deadlock in free_resources tx
  net: tls: Save iv in tls_rec for async crypto requests
  vhost: fix OOB in get_rx_bufs()
  qed: Fix stack out of bounds bug
  qed: Fix system crash in ll2 xmit
  qed: Fix VF probe failure while FLR
  qed: Fix LACP pdu drops for VFs
  qed: Fix bug in tx promiscuous mode settings
  net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  netfilter: ipt_CLUSTERIP: fix warning unused variable cn
  ...

62967898

MAINTAINERS: Add entry for XDP (eXpress Data Path) · d07e1e0f

由 Jesper Dangaard Brouer 提交于 1月 29, 2019

Add multiple people as maintainers for XDP, sorted alphabetically.

XDP is also tied to driver level support and code, but we cannot add all
drivers to the list. Instead K: and N: match on 'xdp' in hope to catch some
of those changes in drivers.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d07e1e0f

net: set default network namespace in init_dummy_netdev() · 35edfdc7

由 Josh Elsasser 提交于 1月 26, 2019

Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
  IP: napi_busy_loop+0xd6/0x200
  Call Trace:
    sock_poll+0x5e/0x80
    do_sys_poll+0x324/0x5a0
    SyS_poll+0x6c/0xf0
    do_syscall_64+0x6b/0x1f0
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: 7db6b048 ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: NJosh Elsasser <jelsasser@appneta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35edfdc7

net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles · 0f0ed828

由 Yang Wei 提交于 1月 29, 2019

The skb should be freed by dev_consume_skb_any() in b44_start_xmit()
when bounce_skb is used. The skb is be replaced by bounce_skb, so the
original skb should be consumed(not drop).

dev_consume_skb_irq() should be called in b44_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f0ed828

net: caif: call dev_consume_skb_any when skb xmit done · e339f863

由 Yang Wei 提交于 1月 29, 2019

The skb shouled be consumed when xmit done, it makes drop profiles
(dropwatch, perf) more friendly.
dev_kfree_skb_irq()/kfree_skb() shouled be replaced by
dev_consume_skb_any(), it makes code cleaner.
Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e339f863

net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · 896cebc0

由 Yang Wei 提交于 1月 29, 2019

dev_consume_skb_irq() should be called in cp_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

896cebc0

net: macb: Apply RXUBR workaround only to versions with errata · e501070e

由 Harini Katakam 提交于 1月 29, 2019

The interrupt handler contains a workaround for RX hang applicable
to Zynq and AT91RM9200 only. Subsequent versions do not need this
workaround. This workaround unnecessarily resets RX whenever RX used
bit read is observed, which can be often under heavy traffic. There
is no other action performed on RX UBR interrupt. Hence introduce a
CAPS mask; enable this interrupt and workaround only on affected
versions.
Signed-off-by: NHarini Katakam <harini.katakam@xilinx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e501070e

29 1月, 2019 16 次提交

net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · b3379a42

由 Yang Wei 提交于 1月 29, 2019

dev_consume_skb_irq() should be called in cpmac_end_xmit() when
xmit done. It makes drop profiles more friendly.
Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3379a42

net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · 10009115

由 Yang Wei 提交于 1月 29, 2019

dev_consume_skb_irq() should be called in bmac_txdma_intr() when
xmit done. It makes drop profiles more friendly.
Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10009115

net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq · 3afa73dd

由 AlbinYang 提交于 1月 27, 2019

dev_consume_skb_irq() should be called in amd8111e_tx() when xmit
done. It makes drop profiles more friendly.
Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3afa73dd

net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq · f48af114

由 AlbinYang 提交于 1月 27, 2019

dev_consume_skb_irq() should be called in ace_tx_int() when xmit
done. It makes drop profiles more friendly.
Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f48af114

net: tls: Fix deadlock in free_resources tx · 10231213

由 Dave Watson 提交于 1月 27, 2019

If there are outstanding async tx requests (when crypto returns EINPROGRESS),
there is a potential deadlock: the tx work acquires the lock, while we
cancel_delayed_work_sync() while holding the lock.  Drop the lock while waiting
for the work to complete.

Fixes: a42055e8 ("Add support for async encryption of records...")
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10231213

net: tls: Save iv in tls_rec for async crypto requests · 32eb67b9

由 Dave Watson 提交于 1月 27, 2019

aead_request_set_crypt takes an iv pointer, and we change the iv
soon after setting it.  Some async crypto algorithms don't save the iv,
so we need to save it in the tls_rec for async requests.

Found by hardcoding x64 aesni to use async crypto manager (to test the async
codepath), however I don't think this combination can happen in the wild.
Presumably other hardware offloads will need this fix, but there have been
no user reports.

Fixes: a42055e8 ("Add support for async encryption of records...")
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32eb67b9

vhost: fix OOB in get_rx_bufs() · b46a0bf7

由 Jason Wang 提交于 1月 28, 2019

After batched used ring updating was introduced in commit e2b3b35e
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.

        headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                    vhost_len, &in, vq_log, &log,
                    likely(mergeable) ? UIO_MAXIOV : 1);

UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

=============================================================================
BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
    kmem_cache_alloc_trace+0xbb/0x140
    alloc_pd+0x22/0x60
    gen8_ppgtt_create+0x11d/0x5f0
    i915_ppgtt_create+0x16/0x80
    i915_gem_create_context+0x248/0x390
    i915_gem_context_create_ioctl+0x4b/0xe0
    drm_ioctl_kernel+0xa5/0xf0
    drm_ioctl+0x2ed/0x3a0
    do_vfs_ioctl+0x9f/0x620
    ksys_ioctl+0x6b/0x80
    __x64_sys_ioctl+0x11/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b

Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.

This fixes CVE-2018-16880.

Fixes: e2b3b35e ("vhost_net: batch used ring update in rx")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b46a0bf7

Merge branch 'qed-Bug-fixes' · bfe2599d

由 David S. Miller 提交于 1月 28, 2019

Manish Chopra says:

====================
qed: Bug fixes

This series have SR-IOV and some general fixes.
Please consider applying it to "net"
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfe2599d

qed: Fix stack out of bounds bug · ffb057f9

由 Manish Chopra 提交于 1月 28, 2019

KASAN reported following bug in qed_init_qm_get_idx_from_flags
due to inappropriate casting of "pq_flags". Fix the type of "pq_flags".

[  196.624707] BUG: KASAN: stack-out-of-bounds in qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[  196.624712] Read of size 8 at addr ffff809b00bc7360 by task kworker/0:9/1712
[  196.624714]
[  196.624720] CPU: 0 PID: 1712 Comm: kworker/0:9 Not tainted 4.18.0-60.el8.aarch64+debug #1
[  196.624723] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL024 09/26/2018
[  196.624733] Workqueue: events work_for_cpu_fn
[  196.624738] Call trace:
[  196.624742]  dump_backtrace+0x0/0x2f8
[  196.624745]  show_stack+0x24/0x30
[  196.624749]  dump_stack+0xe0/0x11c
[  196.624755]  print_address_description+0x68/0x260
[  196.624759]  kasan_report+0x178/0x340
[  196.624762]  __asan_report_load_n_noabort+0x38/0x48
[  196.624786]  qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[  196.624808]  qed_init_qm_info+0xec0/0x2200 [qed]
[  196.624830]  qed_resc_alloc+0x284/0x7e8 [qed]
[  196.624853]  qed_slowpath_start+0x6cc/0x1ae8 [qed]
[  196.624864]  __qede_probe.isra.10+0x1cc/0x12c0 [qede]
[  196.624874]  qede_probe+0x78/0xf0 [qede]
[  196.624879]  local_pci_probe+0xc4/0x180
[  196.624882]  work_for_cpu_fn+0x54/0x98
[  196.624885]  process_one_work+0x758/0x1900
[  196.624888]  worker_thread+0x4e0/0xd18
[  196.624892]  kthread+0x2c8/0x350
[  196.624897]  ret_from_fork+0x10/0x18
[  196.624899]
[  196.624902] Allocated by task 2:
[  196.624906]  kasan_kmalloc.part.1+0x40/0x108
[  196.624909]  kasan_kmalloc+0xb4/0xc8
[  196.624913]  kasan_slab_alloc+0x14/0x20
[  196.624916]  kmem_cache_alloc_node+0x1dc/0x480
[  196.624921]  copy_process.isra.1.part.2+0x1d8/0x4a98
[  196.624924]  _do_fork+0x150/0xfa0
[  196.624926]  kernel_thread+0x48/0x58
[  196.624930]  kthreadd+0x3a4/0x5a0
[  196.624932]  ret_from_fork+0x10/0x18
[  196.624934]
[  196.624937] Freed by task 0:
[  196.624938] (stack is not available)
[  196.624940]
[  196.624943] The buggy address belongs to the object at ffff809b00bc0000
[  196.624943]  which belongs to the cache thread_stack of size 32768
[  196.624946] The buggy address is located 29536 bytes inside of
[  196.624946]  32768-byte region [ffff809b00bc0000, ffff809b00bc8000)
[  196.624948] The buggy address belongs to the page:
[  196.624952] page:ffff7fe026c02e00 count:1 mapcount:0 mapping:ffff809b4001c000 index:0x0 compound_mapcount: 0
[  196.624960] flags: 0xfffff8000008100(slab|head)
[  196.624967] raw: 0fffff8000008100 dead000000000100 dead000000000200 ffff809b4001c000
[  196.624970] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[  196.624973] page dumped because: kasan: bad access detected
[  196.624974]
[  196.624976] Memory state around the buggy address:
[  196.624980]  ffff809b00bc7200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624983]  ffff809b00bc7280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624985] >ffff809b00bc7300: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2
[  196.624988]                                                        ^
[  196.624990]  ffff809b00bc7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624993]  ffff809b00bc7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  196.624995] ==================================================================
Signed-off-by: NManish Chopra <manishc@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffb057f9

qed: Fix system crash in ll2 xmit · 7c81626a

由 Manish Chopra 提交于 1月 28, 2019

Cache number of fragments in the skb locally as in case
of linear skb (with zero fragments), tx completion
(or freeing of skb) may happen before driver tries
to get number of frgaments from the skb which could
lead to stale access to an already freed skb.
Signed-off-by: NManish Chopra <manishc@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c81626a

qed: Fix VF probe failure while FLR · 327852ec

由 Manish Chopra 提交于 1月 28, 2019

VFs may hit VF-PF channel timeout while probing, as in some
cases it was observed that VF FLR and VF "acquire" message
transaction (i.e first message from VF to PF in VF's probe flow)
could occur simultaneously which could lead VF to fail sending
"acquire" message to PF as VF is marked disabled from HW perspective
due to FLR, which will result into channel timeout and VF probe failure.

In such cases, try retrying VF "acquire" message so that in later
attempts it could be successful to pass message to PF after the VF
FLR is completed and can be probed successfully.
Signed-off-by: NManish Chopra <manishc@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

327852ec

qed: Fix LACP pdu drops for VFs · ff929696

由 Manish Chopra 提交于 1月 28, 2019

VF is always configured to drop control frames
(with reserved mac addresses) but to work LACP
on the VFs, it would require LACP control frames
to be forwarded or transmitted successfully.

This patch fixes this in such a way that trusted VFs
(marked through ndo_set_vf_trust) would be allowed to
pass the control frames such as LACP pdus.
Signed-off-by: NManish Chopra <manishc@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff929696

qed: Fix bug in tx promiscuous mode settings · 9e71a15d

由 Manish Chopra 提交于 1月 28, 2019

When running tx switched traffic between VNICs
created via a bridge(to which VFs are added),
adapter drops the unicast packets in tx flow due to
VNIC's ucast mac being unknown to it. But VF interfaces
being in promiscuous mode should have caused adapter
to accept all the unknown ucast packets. Later, it
was found that driver doesn't really configure tx
promiscuous mode settings to accept all unknown unicast macs.

This patch fixes tx promiscuous mode settings to accept all
unknown/unmatched unicast macs and works out the scenario.
Signed-off-by: NManish Chopra <manishc@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e71a15d

net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · ca899324

由 AlbinYang 提交于 1月 28, 2019

dev_consume_skb_irq() should be called in i596_interrupt() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <albin_yang@163.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca899324

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · ff44a837

由 David S. Miller 提交于 1月 28, 2019

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree:

1) The nftnl mutex is now per-netns, therefore use reference counter
   for matches and targets to deal with concurrent updates from netns.
   Moreover, place extensions in a pernet list. Patches from Florian Westphal.

2) Bail out with EINVAL in case of negative timeouts via setsockopt()
   through ip_vs_set_timeout(), from ZhangXiaoxu.

3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also
   from Florian.

4) Reset TCP option header parser in case of fingerprint mismatch,
   otherwise follow up overlapping fingerprint definitions including
   TCP options do not work, from Fernando Fernandez Mancera.

5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset.
   From Anders Roxell.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff44a837

Revert "mm, memory_hotplug: initialize struct pages for the full memory section" · 4aa9fc2a

由 Michal Hocko 提交于 1月 25, 2019

This reverts commit 2830bf6f.

The underlying assumption that one sparse section belongs into a single
numa node doesn't hold really. Robert Shteynfeld has reported a boot
failure. The boot log was not captured but his memory layout is as
follows:

  Early memory node ranges
    node   1: [mem 0x0000000000001000-0x0000000000090fff]
    node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
    node   1: [mem 0x0000000100000000-0x0000001423ffffff]
    node   0: [mem 0x0000001424000000-0x0000002023ffffff]

This means that node0 starts in the middle of a memory section which is
also in node1.  memmap_init_zone tries to initialize padding of a
section even when it is outside of the given pfn range because there are
code paths (e.g.  memory hotplug) which assume that the full worth of
memory section is always initialized.

In this particular case, though, such a range is already intialized and
most likely already managed by the page allocator.  Scribbling over
those pages corrupts the internal state and likely blows up when any of
those pages gets used.
Reported-by: NRobert Shteynfeld <robert.shteynfeld@gmail.com>
Fixes: 2830bf6f ("mm, memory_hotplug: initialize struct pages for the full memory section")
Cc: stable@kernel.org
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4aa9fc2a

28 1月, 2019 6 次提交

netfilter: ipt_CLUSTERIP: fix warning unused variable cn · 206b8cc5

由 Anders Roxell 提交于 1月 23, 2019

When CONFIG_PROC_FS isn't set the variable cn isn't used.

net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
  struct clusterip_net *cn = clusterip_pernet(net);
                        ^~

Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".

Fixes: b12f7bad ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

206b8cc5

netfilter: nfnetlink_osf: add missing fmatch check · 1a6a0951

由 Fernando Fernandez Mancera 提交于 1月 21, 2019

When we check the tcp options of a packet and it doesn't match the current
fingerprint, the tcp packet option pointer must be restored to its initial
value in order to do the proper tcp options check for the next fingerprint.

Here we can see an example.
Assumming the following fingerprint base with two lines:

S10:64:1:60:M*,S,T,N,W6:      Linux:3.0::Linux 3.0
S20:64:1:60:M*,S,T,N,W7:      Linux:4.19:arch:Linux 4.1

Where TCP options are the last field in the OS signature, all of them overlap
except by the last one, ie. 'W6' versus 'W7'.

In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
TCP options pointer is updated after checking for the TCP options in the first
line.

Therefore, reset pointer back to where it should be.

Fixes: 11eeef41 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: NFernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1a6a0951

netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present · 2035f3ff

由 Florian Westphal 提交于 1月 21, 2019

Unlike ip(6)tables ebtables only counts user-defined chains.

The effect is that a 32bit ebtables binary on a 64bit kernel can do
'ebtables -N FOO' only after adding at least one rule, else the request
fails with -EINVAL.

This is a similar fix as done in
3f1e53ab ("netfilter: ebtables: don't attempt to allocate 0-sized compat array").

Fixes: 7d7d7e02 ("netfilter: compat: reject huge allocation requests")
Reported-by: NFrancesco Ruggeri <fruggeri@arista.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2035f3ff

net: dsa: mv88e6xxx: Fix serdes irq setup going recursive · 6fb6e637

由 Andrew Lunn 提交于 1月 27, 2019

Duec to a typo, mv88e6390_serdes_irq_setup() calls itself, rather than
mv88e6390x_serdes_irq_setup(). It then blows the stack, and shortly
after the machine blows up.

Fixes: 2defda1f ("net: dsa: mv88e6xxx: Add support for SERDES on ports 2-8 for 6390X")
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fb6e637

ip6mr: Fix notifiers call on mroute_clean_tables() · 146820cc

由 Nir Dotan 提交于 1月 27, 2019

When the MC route socket is closed, mroute_clean_tables() is called to
cleanup existing routes. Mistakenly notifiers call was put on the cleanup
of the unresolved MC route entries cache.
In a case where the MC socket closes before an unresolved route expires,
the notifier call leads to a crash, caused by the driver trying to
increment a non initialized refcount_t object [1] and then when handling
is done, to decrement it [2]. This was detected by a test recently added in
commit 6d4efada ("selftests: forwarding: Add multicast routing test").

Fix that by putting notifiers call on the resolved entries traversal,
instead of on the unresolved entries traversal.

[1]

[  245.748967] refcount_t: increment on 0; use-after-free.
[  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
...
[  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
[  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
...
[  245.907487] Call Trace:
[  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
[  245.917913]  notifier_call_chain+0x45/0x7
[  245.922484]  atomic_notifier_call_chain+0x15/0x20
[  245.927729]  call_fib_notifiers+0x15/0x30
[  245.932205]  mroute_clean_tables+0x372/0x3f
[  245.936971]  ip6mr_sk_done+0xb1/0xc0
[  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
...

[2]

[  246.128487] refcount_t: underflow; use-after-free.
[  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
[  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
...
[  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
[  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
...
[  246.298889] Call Trace:
[  246.301617]  refcount_dec_and_test_checked+0x11/0x20
[  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
[  246.315531]  process_one_work+0x1fa/0x3f0
[  246.320005]  worker_thread+0x2f/0x3e0
[  246.324083]  kthread+0x118/0x130
[  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
[  246.332926]  ? kthread_park+0x80/0x80
[  246.337013]  ret_from_fork+0x1f/0x30

Fixes: 088aa3ee ("ip6mr: Support fib notifications")
Signed-off-by: NNir Dotan <nird@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

146820cc

decnet: fix DN_IFREQ_SIZE · 50c29366

由 Johannes Berg 提交于 1月 26, 2019

Digging through the ioctls with Al because of the previous
patches, we found that on 64-bit decnet's dn_dev_ioctl()
is wrong, because struct ifreq::ifr_ifru is actually 24
bytes (not 16 as expected from struct sockaddr) due to the
ifru_map and ifru_settings members.

Clearly, decnet expects the ioctl to be called with a struct
like
  struct ifreq_dn {
    char ifr_name[IFNAMSIZ];
    struct sockaddr_dn ifr_addr;
  };

since it does
  struct ifreq *ifr = ...;
  struct sockaddr_dn *sdn = (struct sockaddr_dn *)&ifr->ifr_addr;

This means that DN_IFREQ_SIZE is too big for what it wants on
64-bit, as it is
  sizeof(struct ifreq) - sizeof(struct sockaddr) +
  sizeof(struct sockaddr_dn)

This assumes that sizeof(struct sockaddr) is the size of ifr_ifru
but that isn't true.

Fix this to use offsetof(struct ifreq, ifr_ifru).

This indeed doesn't really matter much - the result is that we
copy in/out 8 bytes more than we should on 64-bit platforms. In
case the "struct ifreq_dn" lands just on the end of a page though
it might lead to faults.

As far as I can tell, it has been like this forever, so it seems
very likely that nobody cares.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50c29366

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功