提交 · d3c89bbc7491d5e288ca2993e999d24ba9ff52ad · openanolis / cloud-kernel

28 8月, 2018 4 次提交

nl80211: Fix nla_put_u8 to u16 for NL80211_WMMR_TXOP · d3c89bbc

由 Haim Dreyfuss 提交于 8月 21, 2018

TXOP (also known as Channel Occupancy Time) is u16 and should be
added using nla_put_u16 instead of u8, fix that.

Fixes: 50f32718 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command")
Signed-off-by: NHaim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

d3c89bbc

mac80211: don't update the PM state of a peer upon a multicast frame · 20932750

由 Emmanuel Grumbach 提交于 8月 20, 2018

I changed the way mac80211 updates the PM state of the peer.
I forgot that we could also have multicast frames from the
peer and that those frame should of course not change the
PM state of the peer: A peer goes to power save when it
needs to scan, but it won't send the broadcast Probe Request
with the PM bit set.

This made us mark the peer as awake when it wasn't and then
Intel's firmware would fail to transmit because the peer is
asleep according to its database. The driver warned about
this and it looked like this:

 WARNING: CPU: 0 PID: 184 at /usr/src/linux-4.16.14/drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1369 iwl_mvm_rx_tx_cmd+0x53b/0x860
 CPU: 0 PID: 184 Comm: irq/124-iwlwifi Not tainted 4.16.14 #1
 RIP: 0010:iwl_mvm_rx_tx_cmd+0x53b/0x860
 Call Trace:
  iwl_pcie_rx_handle+0x220/0x880
  iwl_pcie_irq_handler+0x6c9/0xa20
  ? irq_forced_thread_fn+0x60/0x60
  ? irq_thread_dtor+0x90/0x90

The relevant code that spits the WARNING is:

        case TX_STATUS_FAIL_DEST_PS:
                /* the FW should have stopped the queue and not
                 * return this status
                 */
                WARN_ON(1);
                info->flags |= IEEE80211_TX_STAT_TX_FILTERED;

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199967.

Fixes: 9fef6544 ("mac80211: always update the PM state of a peer on MGMT / DATA frames")
Cc: <stable@vger.kernel.org>   #4.16+
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

20932750

cfg80211: make wmm_rule part of the reg_rule structure · 38cb87ee

由 Stanislaw Gruszka 提交于 8月 22, 2018

Make wmm_rule be part of the reg_rule structure. This simplifies the
code a lot at the cost of having bigger memory usage. However in most
cases we have only few reg_rule's and when we do have many like in
iwlwifi we do not save memory as it allocates a separate wmm_rule for
each channel anyway.

This also fixes a bug reported in various places where somewhere the
pointers were corrupted and we ended up doing a null-dereference.

Fixes: 230ebaa1 ("cfg80211: read wmm rules from regulatory database")
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
[rephrase commit message slightly]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

38cb87ee

mac80211: correct use of IEEE80211_VHT_CAP_RXSTBC_X · 67d1ba8a

由 Danek Duvall 提交于 8月 22, 2018

The mod mask for VHT capabilities intends to say that you can override
the number of STBC receive streams, and it does, but only by accident.
The IEEE80211_VHT_CAP_RXSTBC_X aren't bits to be set, but values (albeit
left-shifted). ORing the bits together gets the right answer, but we
should use the _MASK macro here instead.
Signed-off-by: NDanek Duvall <duvall@comfychair.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

67d1ba8a

20 8月, 2018 1 次提交

cfg80211: remove division by size of sizeof(struct ieee80211_wmm_rule) · 8a54d8fc

由 Johannes Berg 提交于 6月 18, 2018

Pointer arithmetic already adjusts by the size of the struct,
so the sizeof() calculation is wrong. This is basically the
same as Colin King's patch for similar code in the iwlwifi
driver.

Fixes: 230ebaa1 ("cfg80211: read wmm rules from regulatory database")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

8a54d8fc

14 8月, 2018 2 次提交

mac80211: Run TXQ teardown code before de-registering interfaces · 77cfaf52

由 Toke Høiland-Jørgensen 提交于 8月 13, 2018

The TXQ teardown code can reference the vif data structures that are
stored in the netdev private memory area if there are still packets on
the queue when it is being freed. Since the TXQ teardown code is run
after the netdevs are freed, this can lead to a use-after-free. Fix this
by moving the TXQ teardown code to earlier in ieee80211_unregister_hw().
Reported-by: NBen Greear <greearb@candelatech.com>
Tested-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

77cfaf52

rfkill-gpio: include linux/mod_devicetable.h · f623f75a

由 Arnd Bergmann 提交于 8月 14, 2018

One more driver is apparently broken by the recent change
to linux/platform_device.h:

net/rfkill/rfkill-gpio.c: In function 'rfkill_gpio_acpi_probe':
net/rfkill/rfkill-gpio.c:82:29: error: dereferencing pointer to incomplete type 'const struct acpi_device_id'

Include linux/mod_devicetable.h to get the definition of the
acpi_device_id structure.

Fixes: ac316725 ("headers: separate linux/mod_devicetable.h from linux/platform_device.h")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

f623f75a

09 8月, 2018 5 次提交

dsa: slave: eee: Allow ports to use phylink · 1be52e97

由 Andrew Lunn 提交于 8月 08, 2018

For a port to be able to use EEE, both the MAC and the PHY must
support EEE. A phy can be provided by both a phydev or phylink. Verify
at least one of these exist, not just phydev.

Fixes: aab9c406 ("net: dsa: Plug in PHYLINK support")
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1be52e97

net/smc: move sock lock in smc_ioctl() · 7311d665

由 Ursula Braun 提交于 8月 08, 2018

When an SMC socket is connecting it is decided whether fallback to
TCP is needed. To avoid races between connect and ioctl move the
sock lock before the use_fallback check.

Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com
Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com
Fixes: 1992d998 ("net/smc: take sock lock in smc_ioctl()")
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7311d665

net/smc: allow sysctl rmem and wmem defaults for servers · bd58c7e0

由 Ursula Braun 提交于 8月 08, 2018

Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl
defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base
for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size
optimizations for servers should be ignored.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd58c7e0

net/smc: no shutdown in state SMC_LISTEN · caa21e19

由 Ursula Braun 提交于 8月 08, 2018

Invoking shutdown for a socket in state SMC_LISTEN does not make
sense. Nevertheless programs like syzbot fuzzing the kernel may
try to do this. For SMC this means a socket refcounting problem.
This patch makes sure a shutdown call for an SMC socket in state
SMC_LISTEN simply returns with -ENOTCONN.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caa21e19

rxrpc: Fix the keepalive generator [ver #2] · 330bdcfa

由 David Howells 提交于 8月 08, 2018

AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open.  The implementation is incorrect in the following ways:

 (1) It mixes up ktime_t and time64_t types.

 (2) It uses ktime_get_real(), the output of which may jump forward or
     backward due to adjustments to the time of day.

 (3) If the current time jumps forward too much or jumps backwards, the
     generator function will crank the base of the time ring round one slot
     at a time (ie. a 1s period) until it catches up, spewing out VERSION
     packets as it goes.

Fix the problem by:

 (1) Only using time64_t.  There's no need for sub-second resolution.

 (2) Use ktime_get_seconds() rather than ktime_get_real() so that time
     isn't perceived to go backwards.

 (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
     parts:

     (a) The "worker" function that manages the buckets and the timer.

     (b) The "dispatch" function that takes the pending peers and
     	 potentially transmits a keepalive packet before putting them back
     	 in the ring into the slot appropriate to the revised last-Tx time.

 (4) Taking everything that's pending out of the ring and splicing it into
     a temporary collector list for processing.

     In the case that there's been a significant jump forward, the ring
     gets entirely emptied and then the time base can be warped forward
     before the peers are processed.

     The warping can't happen if the ring isn't empty because the slot a
     peer is in is keepalive-time dependent, relative to the base time.

 (5) Limit the number of iterations of the bucket array when scanning it.

 (6) Set the timer to skip any empty slots as there's no point waking up if
     there's nothing to do yet.

This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started.  The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:

	watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
	...
	Workqueue: krxrpcd rxrpc_peer_keepalive_worker
	EIP: lock_acquire+0x69/0x80
	...
	Call Trace:
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? _raw_spin_lock_bh+0x29/0x60
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
	 ? __lock_acquire+0x3d3/0x870
	 ? process_one_work+0x110/0x340
	 ? process_one_work+0x166/0x340
	 ? process_one_work+0x110/0x340
	 ? worker_thread+0x39/0x3c0
	 ? kthread+0xdb/0x110
	 ? cancel_delayed_work+0x90/0x90
	 ? kthread_stop+0x70/0x70
	 ? ret_from_fork+0x19/0x24

Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

330bdcfa

08 8月, 2018 4 次提交

llc: use refcount_inc_not_zero() for llc_sap_find() · 0dcb8225

由 Cong Wang 提交于 8月 07, 2018

llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.

Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.

Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dcb8225

dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart() · 61ef4b07

由 Alexey Kodanev 提交于 8月 07, 2018

The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
can lead to undefined behavior [1].

In order to fix this use a gradual shift of the window with a 'while'
loop, similar to what tcp_cwnd_restart() is doing.

When comparing delta and RTO there is a minor difference between TCP
and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
'cwnd' if delta equals RTO. That case is preserved in this change.

[1]:
[40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
[40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
[40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G        W   E     4.18.0-rc7.x86_64 #1
...
[40851.377176] Call Trace:
[40851.408503]  dump_stack+0xf1/0x17b
[40851.451331]  ? show_regs_print_info+0x5/0x5
[40851.503555]  ubsan_epilogue+0x9/0x7c
[40851.548363]  __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
[40851.617109]  ? __ubsan_handle_load_invalid_value+0x18f/0x18f
[40851.686796]  ? xfrm4_output_finish+0x80/0x80
[40851.739827]  ? lock_downgrade+0x6d0/0x6d0
[40851.789744]  ? xfrm4_prepare_output+0x160/0x160
[40851.845912]  ? ip_queue_xmit+0x810/0x1db0
[40851.895845]  ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40851.963530]  ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40852.029063]  dccp_xmit_packet+0x1d3/0x720 [dccp]
[40852.086254]  dccp_write_xmit+0x116/0x1d0 [dccp]
[40852.142412]  dccp_sendmsg+0x428/0xb20 [dccp]
[40852.195454]  ? inet_dccp_listen+0x200/0x200 [dccp]
[40852.254833]  ? sched_clock+0x5/0x10
[40852.298508]  ? sched_clock+0x5/0x10
[40852.342194]  ? inet_create+0xdf0/0xdf0
[40852.388988]  sock_sendmsg+0xd9/0x160
...

Fixes: 113ced1f ("dccp ccid-2: Perform congestion-window validation")
Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61ef4b07

tipc: fix an interrupt unsafe locking scenario · 37436d9c

由 Ying Xue 提交于 8月 07, 2018

Commit 9faa89d4 ("tipc: make function tipc_net_finalize() thread
safe") tries to make it thread safe to set node address, so it uses
node_list_lock lock to serialize the whole process of setting node
address in tipc_net_finalize(). But it causes the following interrupt
unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rht_deferred_worker()
  rhashtable_rehash_table()
  lock(&(&ht->lock)->rlock)
			       tipc_nl_compat_doit()
                               tipc_net_finalize()
                               local_irq_disable();
                               lock(&(&tn->node_list_lock)->rlock);
                               tipc_sk_reinit()
                               rhashtable_walk_enter()
                               lock(&(&ht->lock)->rlock);
  <Interrupt>
  tipc_disc_rcv()
  tipc_node_check_dest()
  tipc_node_create()
  lock(&(&tn->node_list_lock)->rlock);

 *** DEADLOCK ***

When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't
disable BH. So if an interrupt happens after the lock, it can create
an inverse lock ordering between ht->lock and tn->node_list_lock. As
a consequence, deadlock might happen.

The reason causing the inverse lock ordering scenario above is because
the initial purpose of node_list_lock is not designed to do the
serialization of node address setting.

As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic,
we use it to replace node_list_lock to ensure setting node address can
be atomically finished. It turns out the potential deadlock can be
avoided as well.

Fixes: 9faa89d4 ("tipc: make function tipc_net_finalize() thread safe")
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <maloy@donjonn.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37436d9c

vsock: split dwork to avoid reinitializations · 455f05ec

由 Cong Wang 提交于 8月 06, 2018

syzbot reported that we reinitialize an active delayed
work in vsock_stream_connect():

	ODEBUG: init active (active state 0) object type: timer_list hint:
	delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
	WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
	debug_print_object+0x16a/0x210 lib/debugobjects.c:326

The pattern is apparently wrong, we should only initialize
the dealyed work once and could repeatly schedule it. So we
have to move out the initializations to allocation side.
And to avoid confusion, we can split the shared dwork
into two, instead of re-using the same one.

Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
Cc: Andy king <acking@vmware.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

455f05ec

07 8月, 2018 1 次提交

packet: refine ring v3 block size test to hold one frame · 4576cd46

由 Willem de Bruijn 提交于 8月 06, 2018

TPACKET_V3 stores variable length frames in fixed length blocks.
Blocks must be able to store a block header, optional private space
and at least one minimum sized frame.

Frames, even for a zero snaplen packet, store metadata headers and
optional reserved space.

In the block size bounds check, ensure that the frame of the
chosen configuration fits. This includes sockaddr_ll and optional
tp_reserve.

Syzbot was able to construct a ring with insuffient room for the
sockaddr_ll in the header of a zero-length frame, triggering an
out-of-bounds write in dev_parse_header.

Convert the comparison to less than, as zero is a valid snap len.
This matches the test for minimum tp_frame_size immediately below.

Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Fixes: eb73190f ("net/packet: refine check for priv area size")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4576cd46

06 8月, 2018 2 次提交

ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit · 82a40777

由 Xin Long 提交于 8月 05, 2018

According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
device must be able to forward without further fragmentation while 576
bytes is the minimum size of IPv4 datagram every device has to be able
to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
value for the ipv4 min mtu check in ip6_tnl_xmit.

While at it, change to use max() instead of if statement.

Fixes: c9fefa08 ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit")
Reported-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82a40777

ipv6: fix double refcount of fib6_metrics · e70a3aad

由 Cong Wang 提交于 8月 02, 2018

All the callers of ip6_rt_copy_init()/rt6_set_from() hold refcnt
of the "from" fib6_info, so there is no need to hold fib6_metrics
refcnt again, because fib6_metrics refcnt is only released when
fib6_info is gone, that is, they have the same life time, so the
whole fib6_metrics refcnt can be removed actually.

This fixes a kmemleak warning reported by Sabrina.

Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: NSabrina Dubroca <sd@queasysnail.net>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e70a3aad

05 8月, 2018 2 次提交

netlink: Don't shift on 64 for ngroups · 91874ecf

由 Dmitry Safonov 提交于 8月 05, 2018

It's legal to have 64 groups for netlink_sock.

As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.

The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.

Fixes: 61f4b237 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91874ecf

net/smc: no cursor update send in state SMC_INIT · 5607016c

由 Ursula Braun 提交于 8月 03, 2018

If a writer blocked condition is received without data, the current
consumer cursor is immediately sent. Servers could already receive this
condition in state SMC_INIT without finished tx-setup. This patch
avoids sending a consumer cursor update in this case.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5607016c

04 8月, 2018 1 次提交

l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl() · f664e37d

由 Guillaume Nault 提交于 8月 03, 2018

If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
drop the reference taken by l2tp_session_get().

Fixes: ecd012e4 ("l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()")
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f664e37d

02 8月, 2018 4 次提交

Revert "net/ipv6: fix metrics leak" · e6aed040

由 David S. Miller 提交于 8月 01, 2018

This reverts commit df18b504.

This change causes other problems and use-after-free situations as
found by syzbot.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6aed040

rxrpc: Fix user call ID check in rxrpc_service_prealloc_one · c01f6c9b

由 YueHaibing 提交于 8月 01, 2018

There just check the user call ID isn't already in use, hence should
compare user_call_ID with xcall->user_call_ID, which is current
node's user_call_ID.

Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Suggested-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c01f6c9b

net: dsa: Do not suspend/resume closed slave_dev · a94c689e

由 Florian Fainelli 提交于 7月 31, 2018

If a DSA slave network device was previously disabled, there is no need
to suspend or resume it.

Fixes: 24462549 ("net: dsa: allow switch drivers to implement suspend/resume hooks")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a94c689e

netlink: Fix spectre v1 gadget in netlink_create() · bc5b6c0b

由 Jeremy Cline 提交于 7月 31, 2018

'protocol' is a user-controlled value, so sanitize it after the bounds
check to avoid using it for speculative out-of-bounds access to arrays
indexed by it.

This addresses the following accesses detected with the help of smatch:

* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
  spectre issue 'nlk_cb_mutex_keys' [w]

* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
  spectre issue 'nlk_cb_mutex_key_strings' [w]

* net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
  issue 'nl_table' [w] (local cap)

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NJeremy Cline <jcline@redhat.com>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc5b6c0b

01 8月, 2018 2 次提交

ipv4: frags: handle possible skb truesize change · 4672694b

由 Eric Dumazet 提交于 7月 30, 2018

ip_frag_queue() might call pskb_pull() on one skb that
is already in the fragment queue.

We need to take care of possible truesize change, or we
might have an imbalance of the netns frags memory usage.

IPv6 is immune to this bug, because RFC5722, Section 4,
amended by Errata ID 3089 states :

  When reassembling an IPv6 datagram, if
  one or more its constituent fragments is determined to be an
  overlapping fragment, the entire datagram (and any constituent
  fragments) MUST be silently discarded.

Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4672694b

inet: frag: enforce memory limits earlier · 56e2c94f

由 Eric Dumazet 提交于 7月 30, 2018

We currently check current frags memory usage only when
a new frag queue is created. This allows attackers to first
consume the memory budget (default : 4 MB) creating thousands
of frag queues, then sending tiny skbs to exceed high_thresh
limit by 2 to 3 order of magnitude.

Note that before commit 648700f7 ("inet: frags: use rhashtables
for reassembly units"), work queue could be starved under DOS,
getting no cpu cycles.
After commit 648700f7, only the per frag queue timer can eventually
remove an incomplete frag queue and its skbs.

Fixes: b13d3cbf ("inet: frag: move eviction of queues to work queue")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NJann Horn <jannh@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Peter Oskolkov <posk@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56e2c94f

31 7月, 2018 3 次提交

net: xsk: don't return frames via the allocator on error · 2d55d614

由 Jakub Kicinski 提交于 7月 27, 2018

xdp_return_buff() is used when frame has been successfully
handled (transmitted) or if an error occurred during delayed
processing and there is no way to report it back to
xdp_do_redirect().

In case of __xsk_rcv_zc() error is propagated all the way
back to the driver, so there is no need to call
xdp_return_buff().  Driver will recycle the frame anyway
after seeing that error happened.

Fixes: 173d3adb ("xsk: add zero-copy support for Rx")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

2d55d614

netlink: Don't shift with UB on nlk->ngroups · 61f4b237

由 Dmitry Safonov 提交于 7月 30, 2018

On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.

Fixes: 7acf9d42 ("netlink: Do not subscribe to non-existent groups").
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61f4b237

net/ipv6: fix metrics leak · df18b504

由 Sabrina Dubroca 提交于 7月 30, 2018

Since commit d4ead6b3 ("net/ipv6: move metrics from dst to
rt6_info"), ipv6 metrics are shared and refcounted. rt6_set_from()
assigns the rt->from pointer and increases the refcount on from's
metrics. This reference is never released.

Introduce the fib6_metrics_release() helper and use it to release the
metrics.

Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df18b504

30 7月, 2018 2 次提交

openvswitch: meter: Fix setting meter id for new entries · 25432eba

由 Justin Pettit 提交于 7月 28, 2018

The meter code would create an entry for each new meter.  However, it
would not set the meter id in the new entry, so every meter would appear
to have a meter id of zero.  This commit properly sets the meter id when
adding the entry.

Fixes: 96fbc13d ("openvswitch: Add meter infrastructure")
Signed-off-by: NJustin Pettit <jpettit@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25432eba

netlink: Do not subscribe to non-existent groups · 7acf9d42

由 Dmitry Safonov 提交于 7月 27, 2018

Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7acf9d42

29 7月, 2018 6 次提交

tcp_bbr: fix bw probing to raise in-flight data for very small BDPs · 383d4709

由 Neal Cardwell 提交于 7月 27, 2018

For some very small BDPs (with just a few packets) there was a
quantization effect where the target number of packets in flight
during the super-unity-gain (1.25x) phase of gain cycling was
implicitly truncated to a number of packets no larger than the normal
unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow
scenarios some flows could get stuck with a lower bandwidth, because
they did not push enough packets inflight to discover that there was
more bandwidth available. This was really only an issue in multi-flow
LAN scenarios, where RTTs and BDPs are low enough for this to be an
issue.

This fix ensures that gain cycling can raise inflight for small BDPs
by ensuring that in PROBE_BW mode target inflight values with a
super-unity gain are always greater than inflight values with a gain
<= 1. Importantly, this applies whether the inflight value is
calculated for use as a cwnd value, or as a target inflight value for
the end of the super-unity phase in bbr_is_next_cycle_phase() (both
need to be bigger to ensure we can probe with more packets in flight
reliably).

This is a candidate fix for stable releases.

Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control")
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NPriyaranjan Jha <priyarjha@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

383d4709

net: socket: Fix potential spectre v1 gadget in sock_is_registered · e978de7a

由 Jeremy Cline 提交于 7月 27, 2018

'family' can be a user-controlled value, so sanitize it after the bounds
check to avoid speculative out-of-bounds access.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJeremy Cline <jcline@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e978de7a

net: socket: fix potential spectre v1 gadget in socketcall · c8e8cd57

由 Jeremy Cline 提交于 7月 27, 2018

'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.

Found with the help of Smatch:

net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJeremy Cline <jcline@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8e8cd57

ipv4: remove BUG_ON() from fib_compute_spec_dst · 9fc12023

由 Lorenzo Bianconi 提交于 7月 27, 2018

Remove BUG_ON() from fib_compute_spec_dst routine and check
in_dev pointer during flowi4 data structure initialization.
fib_compute_spec_dst routine can be run concurrently with device removal
where ip_ptr net_device pointer is set to NULL. This can happen
if userspace enables pkt info on UDP rx socket and the device
is removed while traffic is flowing

Fixes: 35ebf65e ("ipv4: Create and use fib_compute_spec_dst() helper")
Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fc12023

bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog() · 71eb5255

由 Taehee Yoo 提交于 7月 29, 2018

bpf_parse_prog() is protected by rcu_read_lock().
so that GFP_KERNEL is not allowed in the bpf_parse_prog().

[51015.579396] =============================
[51015.579418] WARNING: suspicious RCU usage
[51015.579444] 4.18.0-rc6+ #208 Not tainted
[51015.579464] -----------------------------
[51015.579488] ./include/linux/rcupdate.h:303 Illegal context switch in RCU read-side critical section!
[51015.579510] other info that might help us debug this:
[51015.579532] rcu_scheduler_active = 2, debug_locks = 1
[51015.579556] 2 locks held by ip/1861:
[51015.579577]  #0: 00000000a8c12fd1 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x2e0/0x910
[51015.579711]  #1: 00000000bf815f8e (rcu_read_lock){....}, at: lwtunnel_build_state+0x96/0x390
[51015.579842] stack backtrace:
[51015.579869] CPU: 0 PID: 1861 Comm: ip Not tainted 4.18.0-rc6+ #208
[51015.579891] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[51015.579911] Call Trace:
[51015.579950]  dump_stack+0x74/0xbb
[51015.580000]  ___might_sleep+0x16b/0x3a0
[51015.580047]  __kmalloc_track_caller+0x220/0x380
[51015.580077]  kmemdup+0x1c/0x40
[51015.580077]  bpf_parse_prog+0x10e/0x230
[51015.580164]  ? kasan_kmalloc+0xa0/0xd0
[51015.580164]  ? bpf_destroy_state+0x30/0x30
[51015.580164]  ? bpf_build_state+0xe2/0x3e0
[51015.580164]  bpf_build_state+0x1bb/0x3e0
[51015.580164]  ? bpf_parse_prog+0x230/0x230
[51015.580164]  ? lock_is_held_type+0x123/0x1a0
[51015.580164]  lwtunnel_build_state+0x1aa/0x390
[51015.580164]  fib_create_info+0x1579/0x33d0
[51015.580164]  ? sched_clock_local+0xe2/0x150
[51015.580164]  ? fib_info_update_nh_saddr+0x1f0/0x1f0
[51015.580164]  ? sched_clock_local+0xe2/0x150
[51015.580164]  fib_table_insert+0x201/0x1990
[51015.580164]  ? lock_downgrade+0x610/0x610
[51015.580164]  ? fib_table_lookup+0x1920/0x1920
[51015.580164]  ? lwtunnel_valid_encap_type.part.6+0xcb/0x3a0
[51015.580164]  ? rtm_to_fib_config+0x637/0xbd0
[51015.580164]  inet_rtm_newroute+0xed/0x1b0
[51015.580164]  ? rtm_to_fib_config+0xbd0/0xbd0
[51015.580164]  rtnetlink_rcv_msg+0x331/0x910
[ ... ]

Fixes: 3a0af8fd ("bpf: BPF for lightweight tunnel infrastructure")
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

71eb5255

bpf: fix bpf_skb_load_bytes_relative pkt length check · 3eee1f75

由 Daniel Borkmann 提交于 7月 28, 2018

The len > skb_headlen(skb) cannot be used as a maximum upper bound
for the packet length since it does not have any relation to the full
linear packet length when filtering is used from upper layers (e.g.
in case of reuseport BPF programs) as by then skb->data, skb->len
already got mangled through __skb_pull() and others.

Fixes: 4e1ec56c ("bpf: add skb_load_bytes_relative helper")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMartin KaFai Lau <kafai@fb.com>

3eee1f75

27 7月, 2018 1 次提交

xdp: add NULL pointer check in __xdp_return() · 36e0f12b

由 Taehee Yoo 提交于 7月 26, 2018

rhashtable_lookup() can return NULL. so that NULL pointer
check routine should be added.

Fixes: 02b55e56 ("xdp: add MEM_TYPE_ZERO_COPY")
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

36e0f12b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功