1. 11 10月, 2018 9 次提交
    • J
      tipc: set link tolerance correctly in broadcast link · 047491ea
      Jon Maloy 提交于
      In the patch referred to below we added link tolerance as an additional
      criteria for declaring broadcast transmission "stale" and resetting the
      affected links.
      
      However, the 'tolerance' field of the broadcast link is never set, and
      remains at zero. This renders the whole commit without the intended
      improving effect, but luckily also with no negative effect.
      
      In this commit we add the missing initialization.
      
      Fixes: a4dc70d4 ("tipc: extend link reset criteria for stale packet retransmission")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      047491ea
    • S
      net: ipv4: don't let PMTU updates increase route MTU · 28d35bcd
      Sabrina Dubroca 提交于
      When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is
      received, we must clamp its value. However, we can receive a PMTU
      exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an
      increase in PMTU.
      
      To fix this, take the smallest of the old MTU and ip_rt_min_pmtu.
      
      Before this patch, in case of an update, the exception's MTU would
      always change. Now, an exception can have only its lock flag updated,
      but not the MTU, so we need to add a check on locking to the following
      "is this exception getting updated, or close to expiring?" test.
      
      Fixes: d52e5a7e ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28d35bcd
    • S
      net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce
      Sabrina Dubroca 提交于
      Since commit 5aad1de5 ("ipv4: use separate genid for next hop
      exceptions"), exceptions get deprecated separately from cached
      routes. In particular, administrative changes don't clear PMTU anymore.
      
      As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
      on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
      the local MTU change can become stale:
       - if the local MTU is now lower than the PMTU, that PMTU is now
         incorrect
       - if the local MTU was the lowest value in the path, and is increased,
         we might discover a higher PMTU
      
      Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
      cases.
      
      If the exception was locked, the discovered PMTU was smaller than the
      minimal accepted PMTU. In that case, if the new local MTU is smaller
      than the current PMTU, let PMTU discovery figure out if locking of the
      exception is still needed.
      
      To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
      notifier. By the time the notifier is called, dev->mtu has been
      changed. This patch adds the old MTU as additional information in the
      notifier structure, and a new call_netdevice_notifiers_u32() function.
      
      Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af7d6cce
    • M
      net/ipv6: stop leaking percpu memory in fib6 info · 7abab7b9
      Mike Rapoport 提交于
      The fib6_info_alloc() function allocates percpu memory to hold per CPU
      pointers to rt6_info, but this memory is never freed. Fix it.
      
      Fixes: a64efe14 ("net/ipv6: introduce fib6_info struct and helpers")
      Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7abab7b9
    • K
      rds: RDS (tcp) hangs on sendto() to unresponding address · 9a4890bd
      Ka-Cheong Poon 提交于
      In rds_send_mprds_hash(), if the calculated hash value is non-zero and
      the MPRDS connections are not yet up, it will wait.  But it should not
      wait if the send is non-blocking.  In this case, it should just use the
      base c_path for sending the message.
      Signed-off-by: NKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a4890bd
    • E
      net: make skb_partial_csum_set() more robust against overflows · 52b5d6f5
      Eric Dumazet 提交于
      syzbot managed to crash in skb_checksum_help() [1] :
      
              BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb));
      
      Root cause is the following check in skb_partial_csum_set()
      
      	if (unlikely(start > skb_headlen(skb)) ||
      	    unlikely((int)start + off > skb_headlen(skb) - 2))
      		return false;
      
      If skb_headlen(skb) is 1, then (skb_headlen(skb) - 2) becomes 0xffffffff
      and the check fails to detect that ((int)start + off) is off the limit,
      since the compare is unsigned.
      
      When we fix that, then the first condition (start > skb_headlen(skb))
      becomes obsolete.
      
      Then we should also check that (skb_headroom(skb) + start) wont
      overflow 16bit field.
      
      [1]
      kernel BUG at net/core/dev.c:2880!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7330 Comm: syz-executor4 Not tainted 4.19.0-rc6+ #253
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:skb_checksum_help+0x9e3/0xbb0 net/core/dev.c:2880
      Code: 85 00 ff ff ff 48 c1 e8 03 42 80 3c 28 00 0f 84 09 fb ff ff 48 8b bd 00 ff ff ff e8 97 a8 b9 fb e9 f8 fa ff ff e8 2d 09 76 fb <0f> 0b 48 8b bd 28 ff ff ff e8 1f a8 b9 fb e9 b1 f6 ff ff 48 89 cf
      RSP: 0018:ffff8801d83a6f60 EFLAGS: 00010293
      RAX: ffff8801b9834380 RBX: ffff8801b9f8d8c0 RCX: ffffffff8608c6d7
      RDX: 0000000000000000 RSI: ffffffff8608cc63 RDI: 0000000000000006
      RBP: ffff8801d83a7068 R08: ffff8801b9834380 R09: 0000000000000000
      R10: ffff8801d83a76d8 R11: 0000000000000000 R12: 0000000000000001
      R13: 0000000000010001 R14: 000000000000ffff R15: 00000000000000a8
      FS:  00007f1a66db5700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7d77f091b0 CR3: 00000001ba252000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       skb_csum_hwoffload_help+0x8f/0xe0 net/core/dev.c:3269
       validate_xmit_skb+0xa2a/0xf30 net/core/dev.c:3312
       __dev_queue_xmit+0xc2f/0x3950 net/core/dev.c:3797
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
       packet_snd net/packet/af_packet.c:2928 [inline]
       packet_sendmsg+0x422d/0x64c0 net/packet/af_packet.c:2953
      
      Fixes: 5ff8dda3 ("net: Ensure partial checksum offset is inside the skb head")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52b5d6f5
    • M
      devlink: Add helper function for safely copy string param · bde74ad1
      Moshe Shemesh 提交于
      Devlink string param buffer is allocated at the size of
      DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure
      this size is not exceeded.
      Renamed DEVLINK_PARAM_MAX_STRING_VALUE to
      __DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by
      devlink only. The driver should use the helper function instead to
      verify it doesn't exceed the allowed length.
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde74ad1
    • M
      devlink: Fix param cmode driverinit for string type · 1276534c
      Moshe Shemesh 提交于
      Driverinit configuration mode value is held by devlink to enable the
      driver fetch the value after reload command. In case the param type is
      string devlink should copy the value from driver string buffer to
      devlink string buffer on devlink_param_driverinit_value_set() and
      vice-versa on devlink_param_driverinit_value_get().
      
      Fixes: ec01aeb1 ("devlink: Add support for get/set driverinit value")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1276534c
    • M
      devlink: Fix param set handling for string type · f355cfcd
      Moshe Shemesh 提交于
      In case devlink param type is string, it needs to copy the string value
      it got from the input to devlink_param_value.
      
      Fixes: e3b7ca18 ("devlink: Add param set command")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f355cfcd
  2. 09 10月, 2018 3 次提交
    • D
      rxrpc: Fix the packet reception routine · c1e15b49
      David Howells 提交于
      The rxrpc_input_packet() function and its call tree was built around the
      assumption that data_ready() handler called from UDP to inform a kernel
      service that there is data to be had was non-reentrant.  This means that
      certain locking could be dispensed with.
      
      This, however, turns out not to be the case with a multi-queue network card
      that can deliver packets to multiple cpus simultaneously.  Each of those
      cpus can be in the rxrpc_input_packet() function at the same time.
      
      Fix by adding or changing some structure members:
      
       (1) Add peer->rtt_input_lock to serialise access to the RTT buffer.
      
       (2) Make conn->service_id into a 32-bit variable so that it can be
           cmpxchg'd on all arches.
      
       (3) Add call->input_lock to serialise access to the Rx/Tx state.  Note
           that although the Rx and Tx states are (almost) entirely separate,
           there's no point completing the separation and having separate locks
           since it's a bi-phasal RPC protocol rather than a bi-direction
           streaming protocol.  Data transmission and data reception do not take
           place simultaneously on any particular call.
      
      and making the following functional changes:
      
       (1) In rxrpc_input_data(), hold call->input_lock around the core to
           prevent simultaneous producing of packets into the Rx ring and
           updating of tracking state for a particular call.
      
       (2) In rxrpc_input_ping_response(), only read call->ping_serial once, and
           check it before checking RXRPC_CALL_PINGING as that's a cheaper test.
           The bit test and bit clear can then be combined.  No further locking
           is needed here.
      
       (3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of
           the ACK packet.  The superseded ACK check is then done both before and
           after the lock is taken.
      
           The handing of ackinfo data is split, parsing before the lock is taken
           and processing with it held.  This is keyed on rxMTU being non-zero.
      
           Congestion management is also done within the locked section.
      
       (4) In rxrpc_input_ackall(), take call->input_lock around the Tx window
           rotation.  The ACKALL packet carries no information and is only really
           useful after all packets have been transmitted since it's imprecise.
      
       (5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to
           prevent calls being simultaneously implicitly ended on two cpus and
           also to prevent any races with incoming call setup.
      
       (6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade
           on a connection.  It is only permitted to happen once for a
           connection.
      
       (7) In rxrpc_new_incoming_call(), we have to recheck the routing inside
           rx->incoming_lock to see if someone else set up the call, connection
           or peer whilst we were getting there.  We can't trust the values from
           the earlier routing check unless we pin refs on them - which we want
           to avoid.
      
           Further, we need to allow for an incoming call to have its state
           changed on another CPU between us making it live and us adjusting it
           because the conn is now in the RXRPC_CONN_SERVICE state.
      
       (8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access
           to the RTT buffer.  Don't need to lock around setting peer->rtt.
      
      For reference, the inventory of state-accessing or state-altering functions
      used by the packet input procedure is:
      
      > rxrpc_input_packet()
        * PACKET CHECKING
      
        * ROUTING
          > rxrpc_post_packet_to_local()
          > rxrpc_find_connection_rcu() - uses RCU
            > rxrpc_lookup_peer_rcu() - uses RCU
            > rxrpc_find_service_conn_rcu() - uses RCU
            > idr_find() - uses RCU
      
        * CONNECTION-LEVEL PROCESSING
          - Service upgrade
            - Can only happen once per conn
            ! Changed to use cmpxchg
          > rxrpc_post_packet_to_conn()
          - Setting conn->hi_serial
            - Probably safe not using locks
            - Maybe use cmpxchg
      
        * CALL-LEVEL PROCESSING
          > Old-call checking
            > rxrpc_input_implicit_end_call()
              > rxrpc_call_completed()
      	> rxrpc_queue_call()
      	! Need to take rx->incoming_lock
      	> __rxrpc_disconnect_call()
      	> rxrpc_notify_socket()
          > rxrpc_new_incoming_call()
            - Uses rx->incoming_lock for the entire process
              - Might be able to drop this earlier in favour of the call lock
            > rxrpc_incoming_call()
            	! Conflicts with rxrpc_input_implicit_end_call()
          > rxrpc_send_ping()
            - Don't need locks to check rtt state
            > rxrpc_propose_ACK
      
        * PACKET DISTRIBUTION
          > rxrpc_input_call_packet()
            > rxrpc_input_data()
      	* QUEUE DATA PACKET ON CALL
      	> rxrpc_reduce_call_timer()
      	  - Uses timer_reduce()
      	! Needs call->input_lock()
      	> rxrpc_receiving_reply()
      	  ! Needs locking around ack state
      	  > rxrpc_rotate_tx_window()
      	  > rxrpc_end_tx_phase()
      	> rxrpc_proto_abort()
      	> rxrpc_input_dup_data()
      	- Fills the Rx buffer
      	- rxrpc_propose_ACK()
      	- rxrpc_notify_socket()
      
            > rxrpc_input_ack()
      	* APPLY ACK PACKET TO CALL AND DISCARD PACKET
      	> rxrpc_input_ping_response()
      	  - Probably doesn't need any extra locking
      	  ! Need READ_ONCE() on call->ping_serial
      	  > rxrpc_input_check_for_lost_ack()
      	    - Takes call->lock to consult Tx buffer
      	  > rxrpc_peer_add_rtt()
      	    ! Needs to take a lock (peer->rtt_input_lock)
      	    ! Could perhaps manage with cmpxchg() and xadd() instead
      	> rxrpc_input_requested_ack
      	  - Consults Tx buffer
      	    ! Probably needs a lock
      	  > rxrpc_peer_add_rtt()
      	> rxrpc_propose_ack()
      	> rxrpc_input_ackinfo()
      	  - Changes call->tx_winsize
      	    ! Use cmpxchg to handle change
      	    ! Should perhaps track serial number
      	  - Uses peer->lock to record MTU specification changes
      	> rxrpc_proto_abort()
      	! Need to take call->input_lock
      	> rxrpc_rotate_tx_window()
      	> rxrpc_end_tx_phase()
      	> rxrpc_input_soft_acks()
      	- Consults the Tx buffer
      	> rxrpc_congestion_management()
      	  - Modifies the Tx annotations
      	  ! Needs call->input_lock()
      	  > rxrpc_queue_call()
      
            > rxrpc_input_abort()
      	* APPLY ABORT PACKET TO CALL AND DISCARD PACKET
      	> rxrpc_set_call_completion()
      	> rxrpc_notify_socket()
      
            > rxrpc_input_ackall()
      	* APPLY ACKALL PACKET TO CALL AND DISCARD PACKET
      	! Need to take call->input_lock
      	> rxrpc_rotate_tx_window()
      	> rxrpc_end_tx_phase()
      
          > rxrpc_reject_packet()
      
      There are some functions used by the above that queue the packet, after
      which the procedure is terminated:
      
       - rxrpc_post_packet_to_local()
         - local->event_queue is an sk_buff_head
         - local->processor is a work_struct
       - rxrpc_post_packet_to_conn()
         - conn->rx_queue is an sk_buff_head
         - conn->processor is a work_struct
       - rxrpc_reject_packet()
         - local->reject_queue is an sk_buff_head
         - local->processor is a work_struct
      
      And some that offload processing to process context:
      
       - rxrpc_notify_socket()
         - Uses RCU lock
         - Uses call->notify_lock to call call->notify_rx
         - Uses call->recvmsg_lock to queue recvmsg side
       - rxrpc_queue_call()
         - call->processor is a work_struct
       - rxrpc_propose_ACK()
         - Uses call->lock to wrap __rxrpc_propose_ACK()
      
      And a bunch that complete a call, all of which use call->state_lock to
      protect the call state:
      
       - rxrpc_call_completed()
       - rxrpc_set_call_completion()
       - rxrpc_abort_call()
       - rxrpc_proto_abort()
         - Also uses rxrpc_queue_call()
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c1e15b49
    • D
      rxrpc: Fix connection-level abort handling · 64753092
      David Howells 提交于
      Fix connection-level abort handling to cache the abort and error codes
      properly so that a new incoming call can be properly aborted if it races
      with the parent connection being aborted by another CPU.
      
      The abort_code and error parameters can then be dropped from
      rxrpc_abort_calls().
      
      Fixes: f5c17aae ("rxrpc: Calls should only have one terminal state")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      64753092
    • D
      rxrpc: Only take the rwind and mtu values from latest ACK · 298bc15b
      David Howells 提交于
      Move the out-of-order and duplicate ACK packet check to before the call to
      rxrpc_input_ackinfo() so that the receive window size and MTU size are only
      checked in the latest ACK packet and don't regress.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      298bc15b
  3. 08 10月, 2018 6 次提交
    • D
      rxrpc: Carry call state out of locked section in rxrpc_rotate_tx_window() · dfe99522
      David Howells 提交于
      Carry the call state out of the locked section in rxrpc_rotate_tx_window()
      rather than sampling it afterwards.  This is only used to select tracepoint
      data, but could have changed by the time we do the tracepoint.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dfe99522
    • D
      rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() · c479d5f2
      David Howells 提交于
      We should only call the function to end a call's Tx phase if we rotated the
      marked-last packet out of the transmission buffer.
      
      Make rxrpc_rotate_tx_window() return an indication of whether it just
      rotated the packet marked as the last out of the transmit buffer, carrying
      the information out of the locked section in that function.
      
      We can then check the return value instead of examining RXRPC_CALL_TX_LAST.
      
      Fixes: 70790dbe ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c479d5f2
    • D
      rxrpc: Don't need to take the RCU read lock in the packet receiver · bfd28211
      David Howells 提交于
      We don't need to take the RCU read lock in the rxrpc packet receive
      function because it's held further up the stack in the IP input routine
      around the UDP receive routines.
      
      Fix this by dropping the RCU read lock calls from rxrpc_input_packet().
      This simplifies the code.
      
      Fixes: 70790dbe ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bfd28211
    • D
      rxrpc: Use the UDP encap_rcv hook · 5271953c
      David Howells 提交于
      Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception
      in which a packet is placed onto the UDP receive queue and then immediately
      removed again by rxrpc.  Going via the queue in this manner seems like it
      should be unnecessary.
      
      This does, however, require the invention of a value to place in encap_type
      as that's one of the conditions to switch packets out to the encap_rcv
      hook.  Possibly the value doesn't actually matter for anything other than
      sockopts on the UDP socket, which aren't accessible outside of rxrpc
      anyway.
      
      This seems to cut a bit of time out of the time elapsed between each
      sk_buff being timestamped and turning up in rxrpc (the final number in the
      following trace excerpts).  I measured this by making the rxrpc_rx_packet
      trace point print the time elapsed between the skb being timestamped and
      the current time (in ns), e.g.:
      
      	... 424.278721: rxrpc_rx_packet: ...  ACK 25026
      
      So doing a 512MiB DIO read from my test server, with an unmodified kernel:
      
      	N       min     max     sum		mean    stddev
      	27605   2626    7581    7.83992e+07     2840.04 181.029
      
      and with the patch applied:
      
      	N       min     max     sum		mean    stddev
      	27547   1895    12165   6.77461e+07     2459.29 255.02
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5271953c
    • A
      net: sched: cls_u32: fix hnode refcounting · 6d4c4077
      Al Viro 提交于
      cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references
      via ->hlist and via ->tp_root together.  u32_destroy() drops the former
      and, in case when there had been links, leaves the sucker on the list.
      As the result, there's nothing to protect it from getting freed once links
      are dropped.
      That also makes the "is it busy" check incapable of catching the root
      hnode - it *is* busy (there's a reference from tp), but we don't see it as
      something separate.  "Is it our root?" check partially covers that, but
      the problem exists for others' roots as well.
      
      AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
      include oopsen, that is) would be this:
              * count tp->root and tp_c->hlist as separate references.  I.e.
      have u32_init() set refcount to 2, not 1.
      	* in u32_destroy() we always drop the former;
      in u32_destroy_hnode() - the latter.
      
      	That way we have *all* references contributing to refcount.  List
      removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
      an in u32_destroy() in case of tc_u_common going away, along with
      everything reachable from it.  IOW, that way we know that
      u32_destroy_key() won't free something still on the list (or pointed to by
      someone's ->root).
      
      Reproducer:
      
      tc qdisc add dev eth0 ingress
      tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: \
      u32 divisor 1
      tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: \
      u32 divisor 1
      tc filter add dev eth0 parent ffff: protocol ip prio 100 \
      handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 \
      plus 0 eat match ip protocol 6 ff
      tc filter delete dev eth0 parent ffff: protocol ip prio 200
      tc filter change dev eth0 parent ffff: protocol ip prio 100 \
      handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 \
      eat match ip protocol 6 ff
      tc filter delete dev eth0 parent ffff: protocol ip prio 100
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d4c4077
    • J
      udp: Unbreak modules that rely on external __skb_recv_udp() availability · 7e823644
      Jiri Kosina 提交于
      Commit 2276f58a ("udp: use a separate rx queue for packet reception")
      turned static inline __skb_recv_udp() from being a trivial helper around
      __skb_recv_datagram() into a UDP specific implementaion, making it
      EXPORT_SYMBOL_GPL() at the same time.
      
      There are external modules that got broken by __skb_recv_udp() not being
      visible to them. Let's unbreak them by making __skb_recv_udp EXPORT_SYMBOL().
      
      Rationale (one of those) why this is actually "technically correct" thing
      to do: __skb_recv_udp() used to be an inline wrapper around
      __skb_recv_datagram(), which itself (still, and correctly so, I believe)
      is EXPORT_SYMBOL().
      
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 2276f58a ("udp: use a separate rx queue for packet reception")
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e823644
  4. 06 10月, 2018 4 次提交
    • W
      ipv6: take rcu lock in rawv6_send_hdrinc() · a688caa3
      Wei Wang 提交于
      In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
      directly assign the dst to skb and set passed in dst to NULL to avoid
      double free.
      However, in error case, we free skb and then do stats update with the
      dst pointer passed in. This causes use-after-free on the dst.
      Fix it by taking rcu read lock right before dst could get released to
      make sure dst does not get freed until the stats update is done.
      Note: we don't have this issue in ipv4 cause dst is not used for stats
      update in v4.
      
      Syzkaller reported following crash:
      BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
      BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
      Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088
      
      CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
       rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457099
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
      RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
      RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000
      
      Allocated by task 32088:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:105
       ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
       ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
       ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
       fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
       ip6_route_output include/net/ip6_route.h:88 [inline]
       ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
       ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
       rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5356:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x83/0x290 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:141
       dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
       __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
       rcu_do_batch kernel/rcu/tree.c:2576 [inline]
       invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
       __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
       rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
       __do_softirq+0x30b/0xad8 kernel/softirq.c:292
      
      Fixes: 1789a640 ("raw: avoid two atomics in xmit")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a688caa3
    • D
      net: sched: Add policy validation for tc attributes · 8b4c3cdd
      David Ahern 提交于
      A number of TC attributes are processed without proper validation
      (e.g., length checks). Add a tca policy for all input attributes and use
      when invoking nlmsg_parse.
      
      The 2 Fixes tags below cover the latest additions. The other attributes
      are a string (KIND), nested attribute (OPTIONS which does seem to have
      validation in most cases), for dumps only or a flag.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Fixes: d47a6b0e ("net: sched: introduce ingress/egress block index attributes for qdisc")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b4c3cdd
    • M
      rtnetlink: fix rtnl_fdb_dump() for ndmsg header · bd961c9b
      Mauricio Faria de Oliveira 提交于
      Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
      which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
      
      The problem is, the function bails out early if nlmsg_parse() fails, which
      does occur for iproute2 usage of 'struct ndmsg' because the payload length
      is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
      
      This breaks backward compatibility with userspace -- nothing is sent back.
      
      Some examples with iproute2 and netlink library for go [1]:
      
       1) $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
            This one works, as it uses 'struct ifinfomsg'.
      
            fdb_show() @ iproute2/bridge/fdb.c
              """
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
              ...
              if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
              """
      
       2) $ ip --family bridge neigh
          RTNETLINK answers: Invalid argument
          Dump terminated
      
            This one fails, as it uses 'struct ndmsg'.
      
            do_show_or_flush() @ iproute2/ip/ipneigh.c
              """
              .n.nlmsg_type = RTM_GETNEIGH,
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
              """
      
       3) $ ./neighlist
          < no output >
      
            This one fails, as it uses 'struct ndmsg'-based.
      
            neighList() @ netlink/neigh_linux.go
              """
              req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
              msg := Ndmsg{
              """
      
      The actual breakage was introduced by commit 0ff50e83 ("net: rtnetlink:
      bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
      if the payload length (with the _actual_ family header) is less than the
      family header length alone (which is assumed, in parameter 'hdrlen').
      This is true in the examples above with struct ndmsg, with size and payload
      length shorter than struct ifinfomsg.
      
      However, that commit just intends to fix something under the assumption the
      family header is indeed an 'struct ifinfomsg' - by preventing access to the
      payload as such (via 'ifm' pointer) if the payload length is not sufficient
      to actually contain it.
      
      The assumption was introduced by commit 5e6d2435 ("bridge: netlink dump
      interface at par with brctl"), to support iproute2's 'bridge fdb' command
      (not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
      
      So, in order to unbreak the 'struct ndmsg' family headers and still allow
      'struct ifinfomsg' to continue to work, check for the known message sizes
      used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
      not used in this function anyway) then do not parse the data as ifinfomsg.
      
      Same examples with this patch applied (or revert/before the original fix):
      
          $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
          $ ip --family bridge neigh
          dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
          dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
          dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
      
          $ ./neighlist
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
      
      Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068).
      
      References:
      
      [1] netlink library for go (test-case)
          https://github.com/vishvananda/netlink
      
          $ cat ~/go/src/neighlist/main.go
          package main
          import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
          func main() {
              neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
              for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
          }
      
          $ export GOPATH=~/go
          $ go get github.com/vishvananda/netlink
          $ go build neighlist
          $ ~/go/src/neighlist/neighlist
      
      Thanks to David Ahern for suggestions to improve this patch.
      
      Fixes: 0ff50e83 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
      Fixes: 5e6d2435 ("bridge: netlink dump interface at par with brctl")
      Reported-by: NAidan Obley <aobley@pivotal.io>
      Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd961c9b
    • S
      net: bpfilter: Fix type cast and pointer warnings · 33aa8da1
      Shanthosh RK 提交于
      Fixes the following Sparse warnings:
      
      net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
      of expression
      net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
      NULL pointer
      Signed-off-by: NShanthosh RK <shanthosh.rk@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33aa8da1
  5. 05 10月, 2018 4 次提交
    • D
      rxrpc: Fix the data_ready handler · 2cfa2271
      David Howells 提交于
      Fix the rxrpc_data_ready() function to pick up all packets and to not miss
      any.  There are two problems:
      
       (1) The sk_data_ready pointer on the UDP socket is set *after* it is
           bound.  This means that it's open for business before we're ready to
           dequeue packets and there's a tiny window exists in which a packet can
           sneak onto the receive queue, but we never know about it.
      
           Fix this by setting the pointers on the socket prior to binding it.
      
       (2) skb_recv_udp() will return an error (such as ENETUNREACH) if there was
           an error on the transmission side, even though we set the
           sk_error_report hook.  Because rxrpc_data_ready() returns immediately
           in such a case, it never actually removes its packet from the receive
           queue.
      
           Fix this by abstracting out the UDP dequeuing and checksumming into a
           separate function that keeps hammering on skb_recv_udp() until it
           returns -EAGAIN, passing the packets extracted to the remainder of the
           function.
      
      and two potential problems:
      
       (3) It might be possible in some circumstances or in the future for
           packets to be being added to the UDP receive queue whilst rxrpc is
           running consuming them, so the data_ready() handler might get called
           less often than once per packet.
      
           Allow for this by fully draining the queue on each call as (2).
      
       (4) If a packet fails the checksum check, the code currently returns after
           discarding the packet without checking for more.
      
           Allow for this by fully draining the queue on each call as (2).
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      2cfa2271
    • D
      rxrpc: Fix some missed refs to init_net · 5e33a23b
      David Howells 提交于
      Fix some refs to init_net that should've been changed to the appropriate
      network namespace.
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      5e33a23b
    • J
      net/packet: fix packet drop as of virtio gso · 9d2f67e4
      Jianfeng Tan 提交于
      When we use raw socket as the vhost backend, a packet from virito with
      gso offloading information, cannot be sent out in later validaton at
      xmit path, as we did not set correct skb->protocol which is further used
      for looking up the gso function.
      
      To fix this, we set this field according to virito hdr information.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: NJianfeng Tan <jianfeng.tan@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d2f67e4
    • F
      openvswitch: load NAT helper · 17c357ef
      Flavio Leitner 提交于
      Load the respective NAT helper module if the flow uses it.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17c357ef
  6. 04 10月, 2018 1 次提交
  7. 03 10月, 2018 3 次提交
  8. 02 10月, 2018 6 次提交
    • D
      bond: take rcu lock in netpoll_send_skb_on_dev · 6fe94878
      Dave Jones 提交于
      The bonding driver lacks the rcu lock when it calls down into
      netdev_lower_get_next_private_rcu from bond_poll_controller, which
      results in a trace like:
      
      WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
      CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
      Workqueue: bond0 bond_mii_monitor
      RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
      Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
      RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
      RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
      R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
      R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
      FS:  0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
      Call Trace:
       bond_poll_controller+0x52/0x170
       netpoll_poll_dev+0x79/0x290
       netpoll_send_skb_on_dev+0x158/0x2c0
       netpoll_send_udp+0x2d5/0x430
       write_ext_msg+0x1e0/0x210
       console_unlock+0x3c4/0x630
       vprintk_emit+0xfa/0x2f0
       printk+0x52/0x6e
       ? __netdev_printk+0x12b/0x220
       netdev_info+0x64/0x80
       ? bond_3ad_set_carrier+0xe9/0x180
       bond_select_active_slave+0x1fc/0x310
       bond_mii_monitor+0x709/0x9b0
       process_one_work+0x221/0x5e0
       worker_thread+0x4f/0x3b0
       kthread+0x100/0x140
       ? process_one_work+0x5e0/0x5e0
       ? kthread_delayed_work_timer_fn+0x90/0x90
       ret_from_fork+0x24/0x30
      
      We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
      before we call down into netpoll_poll_dev, so just take the lock there.
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fe94878
    • D
      rtnetlink: Fail dump if target netnsid is invalid · 893626d6
      David Ahern 提交于
      Link dumps can return results from a target namespace. If the namespace id
      is invalid, then the dump request should fail if get_target_net fails
      rather than continuing with a dump of the current namespace.
      
      Fixes: 79e1ad14 ("rtnetlink: use netnsid to query interface")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      893626d6
    • F
      Revert "openvswitch: Fix template leak in error cases." · 7f6d6558
      Flavio Leitner 提交于
      This reverts commit 90c7afc9.
      
      When the commit was merged, the code used nf_ct_put() to free
      the entry, but later on commit 76644232 ("openvswitch: Free
      tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which
      is a more appropriate. Now the original problem is removed.
      
      Then 44d6e2f2 ("net: Replace NF_CT_ASSERT() with WARN_ON().")
      replaced a debug assert with a WARN_ON() which is trigged now.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f6d6558
    • L
      tipc: ignore STATE_MSG on wrong link session · d949cfed
      LUU Duc Canh 提交于
      The initial session number when a link is created is based on a random
      value, taken from struct tipc_net->random. It is then incremented for
      each link reset to avoid mixing protocol messages from different link
      sessions.
      
      However, when a bearer is reset all its links are deleted, and will
      later be re-created using the same random value as the first time.
      This means that if the link never went down between creation and
      deletion we will still sometimes have two subsequent sessions with
      the same session number. In virtual environments with potentially
      long transmission times this has turned out to be a real problem.
      
      We now fix this by randomizing the session number each time a link
      is created.
      
      With a session number size of 16 bits this gives a risk of session
      collision of 1/64k. To reduce this further, we also introduce a sanity
      check on the very first STATE message arriving at a link. If this has
      an acknowledge value differing from 0, which is logically impossible,
      we ignore the message. The final risk for session collision is hence
      reduced to 1/4G, which should be sufficient.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d949cfed
    • D
      net: sched: act_ipt: check for underflow in __tcf_ipt_init() · aeadd93f
      Dan Carpenter 提交于
      If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we
      return -EINVAL.  But we don't check whether it's smaller than
      sizeof(struct xt_entry_target) and that could lead to an out of bounds
      read.
      
      Fixes: 7ba699c6 ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeadd93f
    • E
      tcp/dccp: fix lockdep issue when SYN is backlogged · 1ad98e9d
      Eric Dumazet 提交于
      In normal SYN processing, packets are handled without listener
      lock and in RCU protected ingress path.
      
      But syzkaller is known to be able to trick us and SYN
      packets might be processed in process context, after being
      queued into socket backlog.
      
      In commit 06f877d6 ("tcp/dccp: fix other lockdep splats
      accessing ireq_opt") I made a very stupid fix, that happened
      to work mostly because of the regular path being RCU protected.
      
      Really the thing protecting ireq->ireq_opt is RCU read lock,
      and the pseudo request refcnt is not relevant.
      
      This patch extends what I did in commit 449809a6 ("tcp/dccp:
      block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
      pair in the paths that might be taken when processing SYN from
      socket backlog (thus possibly in process context)
      
      Fixes: 06f877d6 ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ad98e9d
  9. 01 10月, 2018 3 次提交
    • Y
      cfg80211: fix use-after-free in reg_process_hint() · 1db58529
      Yu Zhao 提交于
      reg_process_hint_country_ie() can free regulatory_request and return
      REG_REQ_ALREADY_SET. We shouldn't use regulatory_request after it's
      called. KASAN error was observed when this happens.
      
      BUG: KASAN: use-after-free in reg_process_hint+0x839/0x8aa [cfg80211]
      Read of size 4 at addr ffff8800c430d434 by task kworker/1:3/89
      <snipped>
      Workqueue: events reg_todo [cfg80211]
      Call Trace:
       dump_stack+0xc1/0x10c
       ? _atomic_dec_and_lock+0x1ad/0x1ad
       ? _raw_spin_lock_irqsave+0xa0/0xd2
       print_address_description+0x86/0x26f
       ? reg_process_hint+0x839/0x8aa [cfg80211]
       kasan_report+0x241/0x29b
       reg_process_hint+0x839/0x8aa [cfg80211]
       reg_todo+0x204/0x5b9 [cfg80211]
       process_one_work+0x55f/0x8d0
       ? worker_detach_from_pool+0x1b5/0x1b5
       ? _raw_spin_unlock_irq+0x65/0xdd
       ? _raw_spin_unlock_irqrestore+0xf3/0xf3
       worker_thread+0x5dd/0x841
       ? kthread_parkme+0x1d/0x1d
       kthread+0x270/0x285
       ? pr_cont_work+0xe3/0xe3
       ? rcu_read_unlock_sched_notrace+0xca/0xca
       ret_from_fork+0x22/0x40
      
      Allocated by task 2718:
       set_track+0x63/0xfa
       __kmalloc+0x119/0x1ac
       regulatory_hint_country_ie+0x38/0x329 [cfg80211]
       __cfg80211_connect_result+0x854/0xadd [cfg80211]
       cfg80211_rx_assoc_resp+0x3bc/0x4f0 [cfg80211]
      smsc95xx v1.0.6
       ieee80211_sta_rx_queued_mgmt+0x1803/0x7ed5 [mac80211]
       ieee80211_iface_work+0x411/0x696 [mac80211]
       process_one_work+0x55f/0x8d0
       worker_thread+0x5dd/0x841
       kthread+0x270/0x285
       ret_from_fork+0x22/0x40
      
      Freed by task 89:
       set_track+0x63/0xfa
       kasan_slab_free+0x6a/0x87
       kfree+0xdc/0x470
       reg_process_hint+0x31e/0x8aa [cfg80211]
       reg_todo+0x204/0x5b9 [cfg80211]
       process_one_work+0x55f/0x8d0
       worker_thread+0x5dd/0x841
       kthread+0x270/0x285
       ret_from_fork+0x22/0x40
      <snipped>
      Signed-off-by: NYu Zhao <yuzhao@google.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      1db58529
    • F
      mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys · 211710ca
      Felix Fietkau 提交于
      key->sta is only valid after ieee80211_key_link, which is called later
      in this function. Because of that, the IEEE80211_KEY_FLAG_RX_MGMT is
      never set when management frame protection is enabled.
      
      Fixes: e548c49e ("mac80211: add key flag for management keys")
      Cc: stable@vger.kernel.org
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      211710ca
    • S
      cfg80211: fix wext-compat memory leak · 848e616e
      Stefan Seyfried 提交于
      cfg80211_wext_giwrate and sinfo.pertid might allocate sinfo.pertid via
      rdev_get_station(), but never release it. Fix that.
      
      Fixes: 8689c051 ("cfg80211: dynamically allocate per-tid stats for station info")
      Signed-off-by: NStefan Seyfried <seife+kernel@b1-systems.com>
      [johannes: fix error path, use cfg80211_sinfo_release_content(), add Fixes]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      848e616e
  10. 30 9月, 2018 1 次提交
    • L
      tipc: fix failover problem · c140eb16
      LUU Duc Canh 提交于
      We see the following scenario:
      1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
         Since there is a second working link, failover procedure is started.
      2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
         A on node 2. The node item 1->2 goes to state FAILINGOVER.
      3) Linke endpoint A/2 receives the failover, and is supposed to take
         down its parallell link endpoint B/2, while producing a FAILOVER
         message to send back to A/1.
      4) However, B/2 has already been deleted, so no FAILOVER message can
         created.
      5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
         any messages that can bring B/1 up again. We are left with a non-
         redundant link between node 1 and 2.
      
      We fix this with letting endpoint A/2 build a dummy FAILOVER message
      to send to back to A/1, so that the situation can be resolved.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c140eb16