提交 · 4e33d7d47943aaa84a5904472cf2f9c6d6b0a6ca · gsplhtlxg / clone-Linux

03 7月, 2018 1 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4e33d7d4

由 Linus Torvalds 提交于 7月 02, 2018

Pull networking fixes from David Miller:

 1) Verify netlink attributes properly in nf_queue, from Eric Dumazet.

 2) Need to bump memory lock rlimit for test_sockmap bpf test, from
    Yonghong Song.

 3) Fix VLAN handling in lan78xx driver, from Dave Stevenson.

 4) Fix uninitialized read in nf_log, from Jann Horn.

 5) Fix raw command length parsing in mlx5, from Alex Vesker.

 6) Cleanup loopback RDS connections upon netns deletion, from Sowmini
    Varadhan.

 7) Fix regressions in FIB rule matching during create, from Jason A.
    Donenfeld and Roopa Prabhu.

 8) Fix mpls ether type detection in nfp, from Pieter Jansen van Vuuren.

 9) More bpfilter build fixes/adjustments from Masahiro Yamada.

10) Fix XDP_{TX,REDIRECT} flushing in various drivers, from Jesper
    Dangaard Brouer.

11) fib_tests.sh file permissions were broken, from Shuah Khan.

12) Make sure BH/preemption is disabled in data path of mac80211, from
    Denis Kenzior.

13) Don't ignore nla_parse_nested() return values in nl80211, from
    Johannes berg.

14) Properly account sock objects ot kmemcg, from Shakeel Butt.

15) Adjustments to setting bpf program permissions to read-only, from
    Daniel Borkmann.

16) TCP Fast Open key endianness was broken, it always took on the host
    endiannness. Whoops. Explicitly make it little endian. From Yuching
    Cheng.

17) Fix prefix route setting for link local addresses in ipv6, from
    David Ahern.

18) Potential Spectre v1 in zatm driver, from Gustavo A. R. Silva.

19) Various bpf sockmap fixes, from John Fastabend.

20) Use after free for GRO with ESP, from Sabrina Dubroca.

21) Passing bogus flags to crypto_alloc_shash() in ipv6 SR code, from
    Eric Biggers.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  qede: Adverstise software timestamp caps when PHC is not available.
  qed: Fix use of incorrect size in memcpy call.
  qed: Fix setting of incorrect eswitch mode.
  qed: Limit msix vectors in kdump kernel to the minimum required count.
  ipvlan: call dev_change_flags when ipvlan mode is reset
  ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
  net: fix use-after-free in GRO with ESP
  tcp: prevent bogus FRTO undos with non-SACK flows
  bpf: sockhash, add release routine
  bpf: sockhash fix omitted bucket lock in sock_close
  bpf: sockmap, fix smap_list_map_remove when psock is in many maps
  bpf: sockmap, fix crash when ipv6 sock is added
  net: fib_rules: bring back rule_exists to match rule during add
  hv_netvsc: split sub-channel setup into async and sync
  net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
  atm: zatm: Fix potential Spectre v1
  s390/qeth: consistently re-enable device features
  s390/qeth: don't clobber buffer on async TX completion
  s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
  s390/qeth: fix race when setting MAC address
  ...

4e33d7d4

02 7月, 2018 15 次提交

Merge branch 'qed-fixes' · e48e0979

由 David S. Miller 提交于 7月 02, 2018

Sudarsana Reddy Kalluru says:

====================
qed*: Fix series.

The patch series addresses few issues in the qed* drivers.

Please consider applying it to 'net' branch.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e48e0979

qede: Adverstise software timestamp caps when PHC is not available. · 82a4e71b

由 Sudarsana Reddy Kalluru 提交于 7月 01, 2018

When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode),
get-tsinfo() callback should return the software timestamp capabilities
instead of returning the error.

Fixes: 4c55215c ("qede: Add driver support for PTP")
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82a4e71b

qed: Fix use of incorrect size in memcpy call. · cc9b27cd

由 Sudarsana Reddy Kalluru 提交于 7月 01, 2018

Use the correct size value while copying chassis/port id values.

Fixes: 6ad8c632 ("qed: Add support for query/config dcbx.")
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc9b27cd

qed: Fix setting of incorrect eswitch mode. · 538f8d00

由 Sudarsana Reddy Kalluru 提交于 7月 01, 2018

By default, driver sets the eswitch mode incorrectly as VEB (virtual
Ethernet bridging).
Need to set VEB eswitch mode only when sriov is enabled, and it should be
to set NONE by default. The patch incorporates this change.

Fixes: 0fefbfba ("qed*: Management firmware - notifications and defaults")
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

538f8d00

qed: Limit msix vectors in kdump kernel to the minimum required count. · bb7858ba

由 Sudarsana Reddy Kalluru 提交于 7月 01, 2018

Memory size is limited in the kdump kernel environment. Allocation of more
msix-vectors (or queues) consumes few tens of MBs of memory, which might
lead to the kdump kernel failure.
This patch adds changes to limit the number of MSI-X vectors in kdump
kernel to minimum required value (i.e., 2 per engine).

Fixes: fe56b9e6 ("qed: Add module with basic common support")
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb7858ba

ipvlan: call dev_change_flags when ipvlan mode is reset · 5dc2d399

由 Hangbin Liu 提交于 7月 01, 2018

After we change the ipvlan mode from l3 to l2, or vice versa, we only
reset IFF_NOARP flag, but don't flush the ARP table cache, which will
cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2().
Then the message will not come out of host.

Here is the reproducer on local host:

ip link set eth1 up
ip addr add 192.168.1.1/24 dev eth1
ip link add link eth1 ipvlan1 type ipvlan mode l3

ip netns add net1
ip link set ipvlan1 netns net1
ip netns exec net1 ip link set ipvlan1 up
ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1

ip route add 192.168.2.0/24 via 192.168.1.2
ping 192.168.2.2 -c 2

ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2
ping 192.168.2.2 -c 2

Add the same configuration on remote host. After we set the mode to l2,
we could find that the src/dst MAC addresses are the same on eth1:

21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64

Fix this by calling dev_change_flags(), which will call netdevice notifier
with flag change info.

v2:
a) As pointed out by Wang Cong, check return value for dev_change_flags() when
change dev flags.
b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops.
So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path.
Reported-by: NJianlin Shi <jishi@redhat.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dc2d399

ipv6: sr: fix passing wrong flags to crypto_alloc_shash() · fc9c2029

由 Eric Biggers 提交于 6月 30, 2018

The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags,
not 'gfp_t'.  So don't pass GFP_KERNEL to it.

Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc9c2029

net: fix use-after-free in GRO with ESP · 603d4cf8

由 Sabrina Dubroca 提交于 6月 30, 2018

Since the addition of GRO for ESP, gro_receive can consume the skb and
return -EINPROGRESS. In that case, the lower layer GRO handler cannot
touch the skb anymore.

Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
some of the gro_receive handlers that can lead to ESP's gro_receive so
that they wouldn't access the skb when -EINPROGRESS is returned, but
missed other spots, mainly in tunneling protocols.

This patch finishes the conversion to using skb_gro_flush_final(), and
adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
GUE.

Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

603d4cf8

L

Linux 4.18-rc3 · 021c9179
由 Linus Torvalds 提交于 7月 01, 2018

021c9179

Merge tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · d3bc0e67

由 Linus Torvalds 提交于 7月 01, 2018

Pull btrfs fixes from David Sterba:
 "We have a few regression fixes for qgroup rescan status tracking and
  the vm_fault_t conversion that mixed up the error values"

* tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix mount failure when qgroup rescan is in progress
  Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
  btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf

d3bc0e67

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 4a770e63

由 Linus Torvalds 提交于 7月 01, 2018

Pull vfs fix from Al Viro:
 "Followup to procfs-seq_file series this window"

This fixes a memory leak by making sure that proc seq files release any
private data on close.  The 'proc_seq_open' has to be properly paired
with 'proc_seq_release' that releases the extra private data.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  proc: add proc_seq_release

4a770e63

Merge tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · d7563ca5

由 Linus Torvalds 提交于 7月 01, 2018

Pull staging/IIO fixes from Greg KH:
 "Here are a few small staging and IIO driver fixes for 4.18-rc3.

  Nothing major or big, all just fixes for reported problems since
  4.18-rc1. All of these have been in linux-next this week with no
  reported problems"

* tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: android: ion: Return an ERR_PTR in ion_map_kernel
  staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
  iio: imu: inv_mpu6050: Fix probe() failure on older ACPI based machines
  iio: buffer: fix the function signature to match implementation
  iio: mma8452: Fix ignoring MMA8452_INT_DRDY
  iio: tsl2x7x/tsl2772: avoid potential division by zero
  iio: pressure: bmp280: fix relative humidity unit

d7563ca5

Merge tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 652788a9

由 Linus Torvalds 提交于 7月 01, 2018

Pull tty/serial fixes from Greg KH:
 "Here are five fixes for the tty core and some serial drivers.

  The tty core ones fix some security and other issues reported by the
  syzbot that I have taken too long in responding to (sorry Tetsuo!).

  The 8350 serial driver fix resolves an issue of devices that used to
  work properly stopping working as they shouldn't have been added to a
  blacklist.

  All of these have been in linux-next for a few days with no reported
  issues"

* tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  vt: prevent leaking uninitialized data to userspace via /dev/vcs*
  serdev: fix memleak on module unload
  serial: 8250_pci: Remove stalled entries in blacklist
  n_tty: Access echo_* variables carefully.
  n_tty: Fix stall at n_tty_receive_char_special().

652788a9

Merge tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c2aee376

由 Linus Torvalds 提交于 7月 01, 2018

Pull USB fixes from Greg KH:
 "Here is a number of USB gadget and other driver fixes for 4.18-rc3.

  There's a bunch of them here, most of them being gadget driver and
  xhci host controller fixes for reported issues (as normal), but there
  are also some new device ids, and some fixes for the typec code.

  There is an acpi core patch in here that was acked by the acpi
  maintainer as it is needed for the typec fixes in order to properly
  solve a problem in that driver.

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
  usb: chipidea: host: fix disconnection detect issue
  usb: typec: tcpm: fix logbuffer index is wrong if _tcpm_log is re-entered
  typec: tcpm: Fix a msecs vs jiffies bug
  NFC: pn533: Fix wrong GFP flag usage
  usb: cdc_acm: Add quirk for Uniden UBC125 scanner
  staging/typec: fix tcpci_rt1711h build errors
  usb: typec: ucsi: Fix for incorrect status data issue
  usb: typec: ucsi: acpi: Workaround for cache mode issue
  acpi: Add helper for deactivating memory region
  usb: xhci: increase CRS timeout value
  usb: xhci: tegra: fix runtime PM error handling
  usb: xhci: remove the code build warning
  xhci: Fix kernel oops in trace_xhci_free_virt_device
  xhci: Fix perceived dead host due to runtime suspend race with event handler
  dwc2: gadget: Fix ISOC IN DDMA PID bitfield value calculation
  usb: gadget: dwc2: fix memory leak in gadget_init()
  usb: gadget: composite: fix delayed_status race condition when set_interface
  usb: dwc2: fix isoc split in transfer with no data
  usb: dwc2: alloc dma aligned buffer for isoc split in
  usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub
  ...

c2aee376

Merge tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping · c350d6d1

由 Linus Torvalds 提交于 7月 01, 2018

Pull dma mapping fixlet from Christoph Hellwig:
 "Add a missing export required by riscv and unicore"

* tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: export swiotlb_dma_ops

c350d6d1

01 7月, 2018 14 次提交

tcp: prevent bogus FRTO undos with non-SACK flows · 1236f22f

由 Ilpo Järvinen 提交于 6月 29, 2018

If SACK is not enabled and the first cumulative ACK after the RTO
retransmission covers more than the retransmitted skb, a spurious
FRTO undo will trigger (assuming FRTO is enabled for that RTO).
The reason is that any non-retransmitted segment acknowledged will
set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
no indication that it would have been delivered for real (the
scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
case so the check for that bit won't help like it does with SACK).
Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
in tcp_process_loss.

We need to use more strict condition for non-SACK case and check
that none of the cumulatively ACKed segments were retransmitted
to prove that progress is due to original transmissions. Only then
keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
non-SACK case.

(FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
to better indicate its purpose but to keep this change minimal, it
will be done in another patch).

Besides burstiness and congestion control violations, this problem
can result in RTO loop: When the loss recovery is prematurely
undoed, only new data will be transmitted (if available) and
the next retransmission can occur only after a new RTO which in case
of multiple losses (that are not for consecutive packets) requires
one RTO per loss to recover.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1236f22f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 271b955e

由 David S. Miller 提交于 7月 01, 2018

Daniel Borkmann says:

====================
pull-request: bpf 2018-07-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) A bpf_fib_lookup() helper fix to change the API before freeze to
   return an encoding of the FIB lookup result and return the nexthop
   device index in the params struct (instead of device index as return
   code that we had before), from David.

2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
   reject progs when set_memory_*() fails since it could still be RO.
   Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
   an issue, and a memory leak in s390 JIT found during review, from
   Daniel.

3) Multiple fixes for sockmap/hash to address most of the syzkaller
   triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
   a missing sock_map_release() routine to hook up to callbacks, and a
   fix for an omitted bucket lock in sock_close(), from John.

4) Two bpftool fixes to remove duplicated error message on program load,
   and another one to close the libbpf object after program load. One
   additional fix for nfp driver's BPF offload to avoid stopping offload
   completely if replace of program failed, from Jakub.

5) Couple of BPF selftest fixes that bail out in some of the test
   scripts if the user does not have the right privileges, from Jeffrin.

6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
   where we need to set the flag that some of the test cases are expected
   to fail, from Kleber.

7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
   since it has no relation to it and lirc2 users often have configs
   without cgroups enabled and thus would not be able to use it, from Sean.

8) Fix a selftest failure in sockmap by removing a useless setrlimit()
   call that would set a too low limit where at the same time we are
   already including bpf_rlimit.h that does the job, from Yonghong.

9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

271b955e

Merge branch 'bpf-sockmap-fixes' · bf2b866a

由 Daniel Borkmann 提交于 7月 01, 2018

John Fastabend says:

====================
This addresses two syzbot issues that lead to identifying (by Eric and
Wei) a class of bugs where we don't correctly check for IPv4/v6
sockets and their associated state. The second issue was a locking
omission in sockhash.

The first patch addresses IPv6 socks and fixing an error where
sockhash would overwrite the prot pointer with IPv4 prot. To fix
this build similar solution to TLS ULP. Although we continue to
allow socks in all states not just ESTABLISH in this patch set
because as Martin points out there should be no issue with this
on the sockmap ULP because we don't use the ctx in this code. Once
multiple ULPs coexist we may need to revisit this. However we
can do this in *next trees.

The other issue syzbot found that the tcp_close() handler missed
locking the hash bucket lock which could result in corrupting the
sockhash bucket list if delete and close ran at the same time.
And also the smap_list_remove() routine was not working correctly
at all. This was not caught in my testing because in general my
tests (to date at least lets add some more robust selftest in
bpf-next) do things in the "expected" order, create map, add socks,
delete socks, then tear down maps. The tests we have that do the
ops out of this order where only working on single maps not multi-
maps so we never saw the issue. Thanks syzbot. The fix is to
restructure the tcp_close() lock handling. And fix the obvious
bug in smap_list_remove().

Finally, during review I noticed the release handler was omitted
from the upstream code (patch 4) due to an incorrect merge conflict
fix when I ported the code to latest bpf-next before submitting.
This would leave references to the map around if the user never
closes the map.

v3: rework patches, dropping ESTABLISH check and adding rcu
    annotation along with the smap_list_remove fix

v4: missed one more case where maps was being accessed without
    the sk_callback_lock, spoted by Martin as well.

v5: changed to use a specific lock for maps and reduced callback
    lock so that it is only used to gaurd sk callbacks. I think
    this makes the logic a bit cleaner and avoids confusion
    ovoer what each lock is doing.

Also big thanks to Martin for thorough review he caught at least
one case where I missed a rcu_call().
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

bf2b866a

bpf: sockhash, add release routine · caac76a5

由 John Fastabend 提交于 6月 30, 2018

Add map_release_uref pointer to hashmap ops. This was dropped when
original sockhash code was ported into bpf-next before initial
commit.

Fixes: 81110384 ("bpf: sockmap, add hash map support")
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

caac76a5

bpf: sockhash fix omitted bucket lock in sock_close · e9db4ef6

由 John Fastabend 提交于 6月 30, 2018

First the sk_callback_lock() was being used to protect both the
sock callback hooks and the psock->maps list. This got overly
convoluted after the addition of sockhash (in sockmap it made
some sense because masp and callbacks were tightly coupled) so
lets split out a specific lock for maps and only use the callback
lock for its intended purpose. This fixes a couple cases where
we missed using maps lock when it was in fact needed. Also this
makes it easier to follow the code because now we can put the
locking closer to the actual code its serializing.

Next, in sock_hash_delete_elem() the pattern was as follows,

  sock_hash_delete_elem()
     [...]
     spin_lock(bucket_lock)
     l = lookup_elem_raw()
     if (l)
        hlist_del_rcu()
        write_lock(sk_callback_lock)
         .... destroy psock ...
        write_unlock(sk_callback_lock)
     spin_unlock(bucket_lock)

The ordering is necessary because we only know the {p}sock after
dereferencing the hash table which we can't do unless we have the
bucket lock held. Once we have the bucket lock and the psock element
it is deleted from the hashmap to ensure any other path doing a lookup
will fail. Finally, the refcnt is decremented and if zero the psock
is destroyed.

In parallel with the above (or free'ing the map) a tcp close event
may trigger tcp_close(). Which at the moment omits the bucket lock
altogether (oops!) where the flow looks like this,

  bpf_tcp_close()
     [...]
     write_lock(sk_callback_lock)
     for each psock->maps // list of maps this sock is part of
         hlist_del_rcu(ref_hash_node);
         .... destroy psock ...
     write_unlock(sk_callback_lock)

Obviously, and demonstrated by syzbot, this is broken because
we can have multiple threads deleting entries via hlist_del_rcu().

To fix this we might be tempted to wrap the hlist operation in a
bucket lock but that would create a lock inversion problem. In
summary to follow locking rules the psocks maps list needs the
sk_callback_lock (after this patch maps_lock) but we need the bucket
lock to do the hlist_del_rcu.

To resolve the lock inversion problem pop the head of the maps list
repeatedly and remove the reference until no more are left. If a
delete happens in parallel from the BPF API that is OK as well because
it will do a similar action, lookup the lock in the map/hash, delete
it from the map/hash, and dec the refcnt. We check for this case
before doing a destroy on the psock to ensure we don't have two
threads tearing down a psock. The new logic is as follows,

  bpf_tcp_close()
  e = psock_map_pop(psock->maps) // done with map lock
  bucket_lock() // lock hash list bucket
  l = lookup_elem_raw(head, hash, key, key_size);
  if (l) {
     //only get here if elmnt was not already removed
     hlist_del_rcu()
     ... destroy psock...
  }
  bucket_unlock()

And finally for all the above to work add missing locking around  map
operations per above. Then add RCU annotations and use
rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
that the object is not free'd from sock_hash_free() while it is being
referenced in bpf_tcp_close().

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 81110384 ("bpf: sockmap, add hash map support")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

e9db4ef6

bpf: sockmap, fix smap_list_map_remove when psock is in many maps · 54fedb42

由 John Fastabend 提交于 6月 30, 2018

If a hashmap is free'd with open socks it removes the reference to
the hash entry from the psock. If that is the last reference to the
psock then it will also be free'd by the reference counting logic.
However the current logic that removes the hash reference from the
list of references is broken. In smap_list_remove() we first check
if the sockmap entry matches and then check if the hashmap entry
matches. But, the sockmap entry sill always match because its NULL in
this case which causes the first entry to be removed from the list.
If this is always the "right" entry (because the user adds/removes
entries in order) then everything is OK but otherwise a subsequent
bpf_tcp_close() may reference a free'd object.

To fix this create two list handlers one for sockmap and one for
sockhash.

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 81110384 ("bpf: sockmap, add hash map support")
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

54fedb42

bpf: sockmap, fix crash when ipv6 sock is added · 9901c5d7

由 John Fastabend 提交于 6月 30, 2018

This fixes a crash where we assign tcp_prot to IPv6 sockets instead
of tcpv6_prot.

Previously we overwrote the sk->prot field with tcp_prot even in the
AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
are used.

Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
crashing case here.

Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NWei Wang <weiwan@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9901c5d7

Merge branch 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 883c9ab9

由 Linus Torvalds 提交于 6月 30, 2018

Pull parisc fixes and cleanups from Helge Deller:
 "Nothing exiting in this patchset, just

   - small cleanups of header files

   - default to 4 CPUs when building a SMP kernel

   - mark 16kB and 64kB page sizes broken

   - addition of the new io_pgetevents syscall"

* 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Build kernel without -ffunction-sections
  parisc: Reduce debug output in unwind code
  parisc: Wire up io_pgetevents syscall
  parisc: Default to 4 SMP CPUs
  parisc: Convert printk(KERN_LEVEL) to pr_lvl()
  parisc: Mark 16kB and 64kB page sizes BROKEN
  parisc: Drop struct sigaction from not exported header file

883c9ab9

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 08af78d7

由 Linus Torvalds 提交于 6月 30, 2018

Pull ARM SoC fixes from Olof Johansson:
 "A smaller batch for the end of the week (let's see if I can keep the
  weekly cadence going for once).

  All medium-grade fixes here, nothing worrisome:

   - Fixes for some fairly old bugs around SD card write-protect
     detection and GPIO interrupt assignments on Davinci.

   - Wifi module suspend fix for Hikey.

   - Minor DT tweaks to fix inaccuracies for Amlogic platforms, one
     of which solves booting with third-party u-boot"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: hikey960: Define wl1837 power capabilities
  arm64: dts: hikey: Define wl1835 power capabilities
  ARM64: dts: meson-gxl: fix Mali GPU compatible string
  ARM64: dts: meson-axg: fix ethernet stability issue
  ARM64: dts: meson-gx: fix ATF reserved memory region
  ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0
  ARM64: dts: meson: fix register ranges for SD/eMMC
  ARM64: dts: meson: disable sd-uhs modes on the libretech-cc
  ARM: dts: da850: Fix interrups property for gpio
  ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD

08af78d7

Merge tag 'kbuild-fixes-v4.18' of... · 22d3e0c3

由 Linus Torvalds 提交于 6月 30, 2018

Merge tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - introduce __diag_* macros and suppress -Wattribute-alias warnings
   from GCC 8

 - fix stack protector test script for x86_64

 - fix line number handling in Kconfig

 - document that '#' starts a comment in Kconfig

 - handle P_SYMBOL property in dump debugging of Kconfig

 - correct help message of LD_DEAD_CODE_DATA_ELIMINATION

 - fix occasional segmentation faults in Kconfig

* tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: loop boundary condition fix
  kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION
  kconfig: handle P_SYMBOL in print_symbol()
  kconfig: document Kconfig source file comments
  kconfig: fix line numbers for if-entries in menu tree
  stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y
  powerpc: Remove -Wattribute-alias pragmas
  disable -Wattribute-alias warning for SYSCALL_DEFINEx()
  kbuild: add macro for controlling warnings to linux/compiler.h

22d3e0c3

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0fbc4aea

由 Linus Torvalds 提交于 6月 30, 2018

Pull x86 fixes from Ingo Molnar:
 "The biggest diffstat comes from self-test updates, plus there's entry
  code fixes, 5-level paging related fixes, console debug output fixes,
  and misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Clean up the printk()s in show_fault_oops()
  x86/mm: Drop unneeded __always_inline for p4d page table helpers
  x86/efi: Fix efi_call_phys_epilog() with CONFIG_X86_5LEVEL=y
  selftests/x86/sigreturn: Do minor cleanups
  selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
  x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80"
  x86/mm: Don't free P4D table when it is folded at runtime
  x86/entry/32: Add explicit 'l' instruction suffix
  x86/mm: Get rid of KERN_CONT in show_fault_oops()

0fbc4aea

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d7d53886

由 Linus Torvalds 提交于 6月 30, 2018

Pull perf fixes from Ingo Molnar:
 "Tooling fixes mostly, plus a build warning fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  perf/core: Move inline keyword at the beginning of declaration
  tools/headers: Pick up latest kernel ABIs
  perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]
  perf script: Fix crash because of missing evsel->priv
  perf script: Add missing output fields in a hint
  perf bench: Fix numa report output code
  perf stat: Remove duplicate event counting
  perf alias: Rebuild alias expression string to make it comparable
  perf alias: Remove trailing newline when reading sysfs files
  perf tools: Fix a clang 7.0 compilation error
  tools include uapi: Synchronize bpf.h with the kernel
  tools include uapi: Update if_link.h to pick IFLA_{BRPORT_ISOLATED,VXLAN_TTL_INHERIT}
  tools include powerpc: Update arch/powerpc/include/uapi/asm/unistd.h copy to get 'rseq' syscall
  perf tools: Update x86's syscall_64.tbl, adding 'io_pgetevents' and 'rseq'
  tools headers uapi: Synchronize drm/drm.h
  perf intel-pt: Fix packet decoding of CYC packets
  perf tests: Add valid callback for parse-events test
  perf tests: Add event parsing error handling to parse events test
  perf report powerpc: Fix crash if callchain is empty
  perf test session topology: Fix test on s390
  ...

d7d53886

Merge tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 34a484d5

由 Linus Torvalds 提交于 6月 30, 2018

Pull selinux fix from Paul Moore:
 "One fairly straightforward patch to fix a longstanding issue where a
  process could stall while accessing files in selinuxfs and block
  everyone else due to a held mutex.

  The patch passes all our tests and looks to apply cleanly to your
  current tree"

* tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: move user accesses in selinuxfs out of locked regions

34a484d5

Merge tag 'for-linus-20180629' of git://git.kernel.dk/linux-block · e6e5bec4

由 Linus Torvalds 提交于 6月 30, 2018

Pull block fixes from Jens Axboe:
 "Small set of fixes for this series. Mostly just minor fixes, the only
  oddball in here is the sg change.

  The sg change came out of the stall fix for NVMe, where we added a
  mempool and limited us to a single page allocation. CONFIG_SG_DEBUG
  sort-of ruins that, since we'd need to account for that. That's
  actually a generic problem, since lots of drivers need to allocate SG
  lists. So this just removes support for CONFIG_SG_DEBUG, which I added
  back in 2007 and to my knowledge it was never useful.

  Anyway, outside of that, this pull contains:

   - clone of request with special payload fix (Bart)

   - drbd discard handling fix (Bart)

   - SATA blk-mq stall fix (me)

   - chunk size fix (Keith)

   - double free nvme rdma fix (Sagi)"

* tag 'for-linus-20180629' of git://git.kernel.dk/linux-block:
  sg: remove ->sg_magic member
  drbd: Fix drbd_request_prepare() discard handling
  blk-mq: don't queue more if we get a busy return
  block: Fix cloning of requests with a special payload
  nvme-rdma: fix possible double free of controller async event buffer
  block: Fix transfer when chunk sectors exceeds max

e6e5bec4

30 6月, 2018 10 次提交

net: fib_rules: bring back rule_exists to match rule during add · 35e8c7ba

由 Roopa Prabhu 提交于 6月 29, 2018

After commit f9d4b0c1 ("fib_rules: move common handling of newrule
delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
for existing rule lookup in both the add and del paths. While this
is good for the delete path, it solves a few problems but opens up
a few invalid key matches in the add path.

$ip -4 rule add table main tos 10 fwmark 1
$ip -4 rule add table main tos 10
RTNETLINK answers: File exists

The problem here is rule_find does not check if the key masks in
the new and old rule are the same and hence ends up matching a more
secific rule. Rule key masks cannot be easily compared today without
an elaborate if-else block. Its best to introduce key masks for easier
and accurate rule comparison in the future. Until then, due to fear of
regressions this patch re-introduces older loose rule_exists during add.
Also fixes both rule_exists and rule_find to cover missing attributes.

Fixes: f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35e8c7ba

hv_netvsc: split sub-channel setup into async and sync · 3ffe64f1

由 Stephen Hemminger 提交于 6月 29, 2018

When doing device hotplug the sub channel must be async to avoid
deadlock issues because device is discovered in softirq context.

When doing changes to MTU and number of channels, the setup
must be synchronous to avoid races such as when MTU and device
settings are done in a single ip command.
Reported-by: NThomas Walker <Thomas.Walker@twosigma.com>
Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
Fixes: 732e4985 ("netvsc: fix race on sub channel creation")
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ffe64f1

net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN · 3f76df19

由 Cong Wang 提交于 6月 29, 2018

As noticed by Eric, we need to switch to the helper
dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too,
otheriwse still miss dev_qdisc_change_tx_queue_len().

Fixes: 6a643ddb ("net: introduce helper dev_change_tx_queue_len()")
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f76df19

atm: zatm: Fix potential Spectre v1 · ced9e191

由 Gustavo A. R. Silva 提交于 6月 29, 2018

pool can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue
'zatm_dev->pool_info' (local cap)

Fix this by sanitizing pool before using it to index
zatm_dev->pool_info

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ced9e191

Merge branch 's390-qeth-fixes' · c7f653e0

由 David S. Miller 提交于 6月 30, 2018

Julian Wiedmann says:

====================
s390/qeth: fixes 2018-06-29

please apply a few qeth fixes for -net and your 4.17 stable queue.

Patches 1-3 fix several issues wrt to MAC address management that were
introduced during the 4.17 cycle.
Patch 4 tackles a long-standing issue with busy multi-connection workloads
on devices in af_iucv mode.
Patch 5 makes sure to re-enable all active HW offloads, after a card was
previously set offline and thus lost its HW context.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7f653e0

s390/qeth: consistently re-enable device features · d025da9e

由 Julian Wiedmann 提交于 6月 29, 2018

commit e830baa9 ("qeth: restore device features after recovery") and
commit ce344356 ("s390/qeth: rely on kernel for feature recovery")
made sure that the HW functions for device features get re-programmed
after recovery.

But we missed that the same handling is also required when a card is
first set offline (destroying all HW context), and then online again.
Fix this by moving the re-enable action out of the recovery-only path.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d025da9e

s390/qeth: don't clobber buffer on async TX completion · ce28867f

由 Julian Wiedmann 提交于 6月 29, 2018

If qeth_qdio_output_handler() detects that a transmit requires async
completion, it replaces the pending buffer's metadata object
(qeth_qdio_out_buffer) so that this queue buffer can be re-used while
the data is pending completion.

Later when the CQ indicates async completion of such a metadata object,
qeth_qdio_cq_handler() tries to free any data associated with this
object (since HW has now completed the transfer). By calling
qeth_clear_output_buffer(), it erronously operates on the queue buffer
that _previously_ belonged to this transfer ... but which has been
potentially re-used several times by now.
This results in double-free's of the buffer's data, and failing
transmits as the buffer descriptor is scrubbed in mid-air.

The correct way of handling this situation is to
1. scrub the queue buffer when it is prepared for re-use, and
2. later obtain the data addresses from the async-completion notifier
   (ie. the AOB), instead of the queue buffer.

All this only affects qeth devices used for af_iucv HiperTransport.

Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce28867f

s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6] · 9d0a58fb

由 Vasily Gorbik 提交于 6月 29, 2018

*ether_addr*_64bits functions have been introduced to optimize
performance critical paths, which access 6-byte ethernet address as u64
value to get "nice" assembly. A harmless hack works nicely on ethernet
addresses shoved into a structure or a larger buffer, until busted by
Kasan on smth like plain (u8 *)[6].

qeth_l2_set_mac_address calls qeth_l2_remove_mac passing
u8 old_addr[ETH_ALEN] as an argument.

Adding/removing macs for an ethernet adapter is not that performance
critical. Moreover is_multicast_ether_addr_64bits itself on s390 is not
faster than is_multicast_ether_addr:

is_multicast_ether_addr(%r2) -> %r2
llc	%r2,0(%r2)
risbg	%r2,%r2,63,191,0

is_multicast_ether_addr_64bits(%r2) -> %r2
llgc	%r2,0(%r2)
risbg	%r2,%r2,63,191,0

So, let's just use is_multicast_ether_addr instead of
is_multicast_ether_addr_64bits.

Fixes: bcacfcbc ("s390/qeth: fix MAC address update sequence")
Reviewed-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d0a58fb

s390/qeth: fix race when setting MAC address · 4789a218

由 Julian Wiedmann 提交于 6月 29, 2018

When qeth_l2_set_mac_address() finds the card in a non-reachable state,
it merely copies the new MAC address into dev->dev_addr so that
__qeth_l2_set_online() can later register it with the HW.

But __qeth_l2_set_online() may very well be running concurrently, so we
can't trust the card state without appropriate locking:
If the online sequence is past the point where it registers
dev->dev_addr (but not yet in SOFTSETUP state), any address change needs
to be properly programmed into the HW. Otherwise the netdevice ends up
with a different MAC address than what's set in the HW, and inbound
traffic is not forwarded as expected.

This is most likely to occur for OSD in LPAR, where
commit 21b1702a ("s390/qeth: improve fallback to random MAC address")
now triggers eg. systemd to immediately change the MAC when the netdevice
is registered with a NET_ADDR_RANDOM address.

Fixes: bcacfcbc ("s390/qeth: fix MAC address update sequence")
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4789a218

Revert "s390/qeth: use Read device to query hypervisor for MAC" · 46646105

由 Julian Wiedmann 提交于 6月 29, 2018

This reverts commit b7493e91.

On its own, querying RDEV for a MAC address works fine. But when upgrading
from a qeth that previously queried DDEV on a z/VM NIC (ie. any kernel with
commit ec61bd2f), the RDEV query now returns a _different_ MAC address
than the DDEV query.

If the NIC is configured with MACPROTECT, z/VM apparently requires us to
use the MAC that was initially returned (on DDEV) and registered. So after
upgrading to a kernel that uses RDEV, the SETVMAC registration cmd for the
new MAC address fails and we end up with a non-operabel interface.

To avoid regressions on upgrade, switch back to using DDEV for the MAC
address query. The downgrade path (first RDEV, later DDEV) is fine, in this
case both queries return the same MAC address.

Fixes: b7493e91 ("s390/qeth: use Read device to query hypervisor for MAC")
Reported-by: NMichal Kubecek <mkubecek@suse.com>
Tested-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46646105