1. 20 8月, 2019 5 次提交
  2. 07 8月, 2019 3 次提交
    • V
      net: dsa: sja1105: Fix memory leak on meta state machine error path · 93fa8587
      Vladimir Oltean 提交于
      When RX timestamping is enabled and two link-local (non-meta) frames are
      received in a row, this constitutes an error.
      
      The tagger is always caching the last link-local frame, in an attempt to
      merge it with the meta follow-up frame when that arrives. To recover
      from the above error condition, the initial cached link-local frame is
      dropped and the second frame in a row is cached (in expectance of the
      second meta frame).
      
      However, when dropping the initial link-local frame, its backing memory
      was being leaked.
      
      Fixes: f3097be2 ("net: dsa: sja1105: Add a state machine for RX timestamping")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93fa8587
    • V
      net: dsa: sja1105: Fix memory leak on meta state machine normal path · f163fed2
      Vladimir Oltean 提交于
      After a meta frame is received, it is associated with the cached
      sp->data->stampable_skb from the DSA tagger private structure.
      
      Cached means its refcount is incremented with skb_get() in order for
      dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.
      
      The mistake is that skb_unref() is not the correct function to use. It
      will correctly decrement the refcount (which will go back to zero) but
      the skb memory will not be freed.  That is the job of kfree_skb(), which
      also calls skb_unref().
      
      But it turns out that freeing the cached stampable_skb is in fact not
      necessary.  It is still a perfectly valid skb, and now it is even
      annotated with the partial RX timestamp.  So remove the skb_copy()
      altogether and simply pass the stampable_skb with a refcount of 1
      (incremented by us, decremented by dsa_switch_rcv) up the stack.
      
      Fixes: f3097be2 ("net: dsa: sja1105: Add a state machine for RX timestamping")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f163fed2
    • R
      net sched: update vlan action for batched events operations · b35475c5
      Roman Mashak 提交于
      Add get_fill_size() routine used to calculate the action size
      when building a batch of events.
      
      Fixes: c7e2b968 ("sched: introduce vlan action")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b35475c5
  3. 06 8月, 2019 7 次提交
    • N
      net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER · 091adf9b
      Nikolay Aleksandrov 提交于
      Most of the bridge device's vlan init bugs come from the fact that its
      default pvid is created at the wrong time, way too early in ndo_init()
      before the device is even assigned an ifindex. It introduces a bug when the
      bridge's dev_addr is added as fdb during the initial default pvid creation
      the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
      which really makes no sense for user-space[0] and is wrong.
      Usually user-space software would ignore such entries, but they are
      actually valid and will eventually have all necessary attributes.
      It makes much more sense to send a notification *after* the device has
      registered and has a proper ifindex allocated rather than before when
      there's a chance that the registration might still fail or to receive
      it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
      from br_vlan_flush() since that case can no longer happen. At
      NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
      br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
      depending why it was called (if called due to NETDEV_REGISTER error
      it'll still be == 1, otherwise it could be any value changed during the
      device life time).
      
      For the demonstration below a small change to iproute2 for printing all fdb
      notifications is added, because it contained a workaround not to show
      entries with ifindex == 0.
      Command executed while monitoring: $ ip l add br0 type bridge
      Before (both ifindex and master == 0):
      $ bridge monitor fdb
      36:7e:8a:b3:56:ba dev * vlan 1 master * permanent
      
      After (proper br0 ifindex):
      $ bridge monitor fdb
      e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent
      
      v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
      v3: send the correct v2 patch with all changes (stub should return 0)
      v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
          the br_vlan_bridge_event stub when bridge vlans are disabled
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389Reported-by: Nmichael-dev <michael-dev@fami-braun.de>
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      091adf9b
    • U
      net/smc: avoid fallback in case of non-blocking connect · cd206360
      Ursula Braun 提交于
      FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
      triggers a fallback to TCP if the socket is in state SMC_INIT.
      But if a nonblocking connect is already started, fallback to TCP
      is no longer possible, even though the socket may still be in state
      SMC_INIT.
      And if a nonblocking connect is already started, a listen() call
      does not make sense.
      
      Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
      Fixes: 50717a37 ("net/smc: nonblocking connect rework")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd206360
    • U
      net/smc: do not schedule tx_work in SMC_CLOSED state · f9cedf1a
      Ursula Braun 提交于
      The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
      tx worker. Make sure the socket is not yet moved into SMC_CLOSED
      state (for instance by a shutdown SHUT_RDWR call).
      
      Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
      Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
      Fixes: 01d2f7e2 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9cedf1a
    • D
      ipv6: Fix unbalanced rcu locking in rt6_update_exception_stamp_rt · cff6a327
      David Ahern 提交于
      The nexthop path in rt6_update_exception_stamp_rt needs to call
      rcu_read_unlock if it fails to find a fib6_nh match rather than
      just returning.
      
      Fixes: e659ba31 ("ipv6: Handle all fib6_nh in a nexthop in exception handling")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cff6a327
    • J
      net/tls: partially revert fix transition through disconnect with close · 5d92e631
      Jakub Kicinski 提交于
      Looks like we were slightly overzealous with the shutdown()
      cleanup. Even though the sock->sk_state can reach CLOSED again,
      socket->state will not got back to SS_UNCONNECTED once
      connections is ESTABLISHED. Meaning we will see EISCONN if
      we try to reconnect, and EINVAL if we try to listen.
      
      Only listen sockets can be shutdown() and reused, but since
      ESTABLISHED sockets can never be re-connected() or used for
      listen() we don't need to try to clean up the ULP state early.
      
      Fixes: 32857cf5 ("net/tls: fix transition through disconnect with close")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d92e631
    • J
      net: fix bpf_xdp_adjust_head regression for generic-XDP · 065af355
      Jesper Dangaard Brouer 提交于
      When generic-XDP was moved to a later processing step by commit
      458bf2f2 ("net: core: support XDP generic on stacked devices.")
      a regression was introduced when using bpf_xdp_adjust_head.
      
      The issue is that after this commit the skb->network_header is now
      changed prior to calling generic XDP and not after. Thus, if the header
      is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header
      also need to be updated again.  Fix by calling skb_reset_network_header().
      
      Fixes: 458bf2f2 ("net: core: support XDP generic on stacked devices.")
      Reported-by: NBrandon Cazander <brandon.cazander@multapplied.net>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      065af355
    • D
      net: sched: use temporary variable for actions indexes · 7be8ef2c
      Dmytro Linkin 提交于
      Currently init call of all actions (except ipt) init their 'parm'
      structure as a direct pointer to nla data in skb. This leads to race
      condition when some of the filter actions were initialized successfully
      (and were assigned with idr action index that was written directly
      into nla data), but then were deleted and retried (due to following
      action module missing or classifier-initiated retry), in which case
      action init code tries to insert action to idr with index that was
      assigned on previous iteration. During retry the index can be reused
      by another action that was inserted concurrently, which causes
      unintended action sharing between filters.
      To fix described race condition, save action idr index to temporary
      stack-allocated variable instead on nla data.
      
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Signed-off-by: NDmytro Linkin <dmitrolin@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be8ef2c
  4. 03 8月, 2019 1 次提交
    • D
      hv_sock: Fix hang when a connection is closed · 685703b4
      Dexuan Cui 提交于
      There is a race condition for an established connection that is being closed
      by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
      'remove_sock' is false):
      
      1 for the initial value;
      1 for the sk being in the bound list;
      1 for the sk being in the connected list;
      1 for the delayed close_work.
      
      After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
      decrease the refcnt to 3.
      
      Concurrently, hvs_close_connection() runs in another thread:
        calls vsock_remove_sock() to decrease the refcnt by 2;
        call sock_put() to decrease the refcnt to 0, and free the sk;
        next, the "release_sock(sk)" may hang due to use-after-free.
      
      In the above, after hvs_release() finishes, if hvs_close_connection() runs
      faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
      because at the beginning of hvs_close_connection(), the refcnt is still 4.
      
      The issue can be resolved if an extra reference is taken when the
      connection is established.
      
      Fixes: a9eeb998 ("hv_sock: Add support for delayed close")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      685703b4
  5. 02 8月, 2019 1 次提交
    • T
      tipc: compat: allow tipc commands without arguments · 4da5f001
      Taras Kondratiuk 提交于
      Commit 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit")
      broke older tipc tools that use compat interface (e.g. tipc-config from
      tipcutils package):
      
      % tipc-config -p
      operation not supported
      
      The commit started to reject TIPC netlink compat messages that do not
      have attributes. It is too restrictive because some of such messages are
      valid (they don't need any arguments):
      
      % grep 'tx none' include/uapi/linux/tipc_config.h
      #define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
      #define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
      #define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
      #define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
      #define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
      #define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */
      
      This patch relaxes the original fix and rejects messages without
      arguments only if such arguments are expected by a command (reg_type is
      non zero).
      
      Fixes: 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit")
      Cc: stable@vger.kernel.org
      Signed-off-by: NTaras Kondratiuk <takondra@cisco.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4da5f001
  6. 01 8月, 2019 1 次提交
    • N
      net: bridge: mcast: don't delete permanent entries when fast leave is enabled · 5c725b6b
      Nikolay Aleksandrov 提交于
      When permanent entries were introduced by the commit below, they were
      exempt from timing out and thus igmp leave wouldn't affect them unless
      fast leave was enabled on the port which was added before permanent
      entries existed. It shouldn't matter if fast leave is enabled or not
      if the user added a permanent entry it shouldn't be deleted on igmp
      leave.
      
      Before:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      $
      
      After:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      Fixes: ccb1c31a ("bridge: add flags to distinguish permanent mdb entires")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c725b6b
  7. 31 7月, 2019 2 次提交
    • A
      compat_ioctl: pppoe: fix PPPOEIOCSFWD handling · 055d8824
      Arnd Bergmann 提交于
      Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
      linux-2.5.69 along with hundreds of other commands, but was always broken
      sincen only the structure is compatible, but the command number is not,
      due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
      sockaddr_pppox)), which is different on 64-bit architectures.
      
      Guillaume Nault adds:
      
        And the implementation was broken until 2016 (see 29e73269 ("pppoe:
        fix reference counting in PPPoE proxy")), and nobody ever noticed. I
        should probably have removed this ioctl entirely instead of fixing it.
        Clearly, it has never been used.
      
      Fix it by adding a compat_ioctl handler for all pppoe variants that
      translates the command number and then calls the regular ioctl function.
      
      All other ioctl commands handled by pppoe are compatible between 32-bit
      and 64-bit, and require compat_ptr() conversion.
      
      This should apply to all stable kernels.
      Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      055d8824
    • J
      tipc: fix unitilized skb list crash · 2948a1fc
      Jon Maloy 提交于
      Our test suite somtimes provokes the following crash:
      
      Description of problem:
      [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
      [ 1092.605072] PGD 0 P4D 0
      [ 1092.607620] Oops: 0000 [#1] SMP PTI
      [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1
      [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018
      [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc]
      [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7
      [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282
      [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045
      [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000
      [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a
      [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000
      [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a
      [ 1092.692614] FS:  0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000
      [ 1092.700702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0
      [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1092.727843] PKRU: 55555554
      [ 1092.730556] Call Trace:
      [ 1092.733010]  <IRQ>
      [ 1092.735034]  tipc_sk_filter_rcv+0x7ca/0xb80 [tipc]
      [ 1092.739828]  ? __kmalloc_node_track_caller+0x1cb/0x290
      [ 1092.744974]  ? dev_hard_start_xmit+0xa5/0x210
      [ 1092.749332]  tipc_sk_rcv+0x389/0x640 [tipc]
      [ 1092.753519]  tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc]
      [ 1092.758224]  tipc_rcv+0x57a/0xf20 [tipc]
      [ 1092.762154]  ? ktime_get_real_ts64+0x40/0xe0
      [ 1092.766432]  ? tpacket_rcv+0x50/0x9f0
      [ 1092.770098]  tipc_l2_rcv_msg+0x4a/0x70 [tipc]
      [ 1092.774452]  __netif_receive_skb_core+0xb62/0xbd0
      [ 1092.779164]  ? enqueue_entity+0xf6/0x630
      [ 1092.783084]  ? kmem_cache_alloc+0x158/0x1c0
      [ 1092.787272]  ? __build_skb+0x25/0xd0
      [ 1092.790849]  netif_receive_skb_internal+0x42/0xf0
      [ 1092.795557]  napi_gro_receive+0xba/0xe0
      [ 1092.799417]  mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core]
      [ 1092.804564]  mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core]
      [ 1092.809536]  mlx5e_napi_poll+0xb2/0xce0 [mlx5_core]
      [ 1092.814415]  ? __wake_up_common_lock+0x89/0xc0
      [ 1092.818861]  net_rx_action+0x149/0x3b0
      [ 1092.822616]  __do_softirq+0xe3/0x30a
      [ 1092.826193]  irq_exit+0x100/0x110
      [ 1092.829512]  do_IRQ+0x85/0xd0
      [ 1092.832483]  common_interrupt+0xf/0xf
      [ 1092.836147]  </IRQ>
      [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0
      [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f
      [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
      [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f
      [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000
      [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940
      [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78
      [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000
      [ 1092.905196]  ? cpuidle_enter_state+0x92/0x2a0
      [ 1092.909555]  do_idle+0x236/0x280
      [ 1092.912785]  cpu_startup_entry+0x6f/0x80
      [ 1092.916715]  start_secondary+0x1a7/0x200
      [ 1092.920642]  secondary_startup_64+0xb7/0xc0
      [...]
      
      The reason is that the skb list tipc_socket::mc_method.deferredq only
      is initialized for connectionless sockets, while nothing stops arriving
      multicast messages from being filtered by connection oriented sockets,
      with subsequent access to the said list.
      
      We fix this by initializing the list unconditionally at socket creation.
      This eliminates the crash, while the message still is dropped further
      down in tipc_sk_filter_rcv() as it should be.
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2948a1fc
  8. 30 7月, 2019 12 次提交
    • D
      rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet · c69565ee
      David Howells 提交于
      Fix the fact that a notification isn't sent to the recvmsg side to indicate
      a call failed when sendmsg() fails to transmit a DATA packet with the error
      ENETUNREACH, EHOSTUNREACH or ECONNREFUSED.
      
      Without this notification, the afs client just sits there waiting for the
      call to complete in some manner (which it's not now going to do), which
      also pins the rxrpc call in place.
      
      This can be seen if the client has a scope-level IPv6 address, but not a
      global-level IPv6 address, and we try and transmit an operation to a
      server's IPv6 address.
      
      Looking in /proc/net/rxrpc/calls shows completed calls just sat there with
      an abort code of RX_USER_ABORT and an error code of -ENETUNREACH.
      
      Fixes: c54e43d7 ("rxrpc: Fix missing start of call timeout")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      c69565ee
    • D
      rxrpc: Fix potential deadlock · 60034d3d
      David Howells 提交于
      There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby
      rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces
      the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which
      the tries to take the already held lock.
      
      Fix this by providing a version of rxrpc_put_peer() that can be called in
      situations where the lock is already held.
      
      The bug may produce the following lockdep report:
      
      ============================================
      WARNING: possible recursive locking detected
      5.2.0-next-20190718 #41 Not tainted
      --------------------------------------------
      kworker/0:3/21678 is trying to acquire lock:
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
      /./include/linux/spinlock.h:343 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      __rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435
      
      but task is already holding lock:
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
      /./include/linux/spinlock.h:343 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline]
      00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
      rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430
      
      Fixes: 330bdcfa ("rxrpc: Fix the keepalive generator [ver #2]")
      Reported-by: syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      60034d3d
    • J
      Revert "mac80211: set NETIF_F_LLTX when using intermediate tx queues" · eef347f8
      Johannes Berg 提交于
      Revert this for now, it has been reported multiple times that it
      completely breaks connectivity on various devices.
      
      Cc: stable@vger.kernel.org
      Fixes: 8dbb000e ("mac80211: set NETIF_F_LLTX when using intermediate tx queues")
      Reported-by: NJean Delvare <jdelvare@suse.de>
      Reported-by: NPeter Lebbing <peter@digitalbrains.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      eef347f8
    • F
      netfilter: ebtables: also count base chain policies · 3b48300d
      Florian Westphal 提交于
      ebtables doesn't include the base chain policies in the rule count,
      so we need to add them manually when we call into the x_tables core
      to allocate space for the comapt offset table.
      
      This lead syzbot to trigger:
      WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649
      xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649
      
      Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com
      Fixes: 2035f3ff ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3b48300d
    • E
      net: sctp: drop unneeded likely() call around IS_ERR() · d4e575ba
      Enrico Weigelt 提交于
      IS_ERR() already calls unlikely(), so this extra unlikely() call
      around IS_ERR() is not needed.
      Signed-off-by: NEnrico Weigelt <info@metux.net>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4e575ba
    • J
      netfilter: ipset: Fix rename concurrency with listing · 6c1f7e2c
      Jozsef Kadlecsik 提交于
      Shijie Luo reported that when stress-testing ipset with multiple concurrent
      create, rename, flush, list, destroy commands, it can result
      
      ipset <version>: Broken LIST kernel message: missing DATA part!
      
      error messages and broken list results. The problem was the rename operation
      was not properly handled with respect of listing. The patch fixes the issue.
      Reported-by: NShijie Luo <luoshijie1@huawei.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      6c1f7e2c
    • S
      netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets · 1b4a7510
      Stefano Brivio 提交于
      In commit 8cc4ccf5 ("ipset: Allow matching on destination MAC address
      for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
      KADT functions for sets matching on MAC addreses the copy of source or
      destination MAC address depending on the configured match.
      
      This was done correctly for hash:mac, but for hash:ip,mac and
      bitmap:ip,mac, copying and pasting the same code block presents an
      obvious problem: in these two set types, the MAC address is the second
      dimension, not the first one, and we are actually selecting the MAC
      address depending on whether the first dimension (IP address) specifies
      source or destination.
      
      Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.
      
      This way, mixing source and destination matches for the two dimensions
      of ip,mac set types works as expected. With this setup:
      
        ip netns add A
        ip link add veth1 type veth peer name veth2 netns A
        ip addr add 192.0.2.1/24 dev veth1
        ip -net A addr add 192.0.2.2/24 dev veth2
        ip link set veth1 up
        ip -net A link set veth2 up
      
        dst=$(ip netns exec A cat /sys/class/net/veth2/address)
      
        ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
        ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
        ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP
      
        ip netns exec A ipset create test_hash hash:ip,mac
        ip netns exec A ipset add test_hash 192.0.2.1,${dst}
        ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
      
      ipset correctly matches a test packet:
      
        # ping -c1 192.0.2.2 >/dev/null
        # echo $?
        0
      Reported-by: NChen Yi <yiche@redhat.com>
      Fixes: 8cc4ccf5 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      1b4a7510
    • S
      netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too · b89d1548
      Stefano Brivio 提交于
      In commit 8cc4ccf5 ("ipset: Allow matching on destination MAC address
      for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
      KADT check that prevents matching on destination MAC addresses for
      hash:mac sets, but forgot to remove the same check for hash:ip,mac set.
      
      Drop this check: functionality is now commented in man pages and there's
      no reason to restrict to source MAC address matching anymore.
      Reported-by: NChen Yi <yiche@redhat.com>
      Fixes: 8cc4ccf5 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      b89d1548
    • J
      net: fix ifindex collision during namespace removal · 55b40dbf
      Jiri Pirko 提交于
      Commit aca51397 ("netns: Fix arbitrary net_device-s corruptions
      on net_ns stop.") introduced a possibility to hit a BUG in case device
      is returning back to init_net and two following conditions are met:
      1) dev->ifindex value is used in a name of another "dev%d"
         device in init_net.
      2) dev->name is used by another device in init_net.
      
      Under real life circumstances this is hard to get. Therefore this has
      been present happily for over 10 years. To reproduce:
      
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip netns add ns1
      $ ip -n ns1 link add dummy1ns1 type dummy
      $ ip -n ns1 link add dummy2ns1 type dummy
      $ ip link set enp0s2 netns ns1
      $ ip -n ns1 link set enp0s2 name dummy0
      [  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
      $ ip link add dev4 type dummy
      $ ip -n ns1 a
      1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
      3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
      4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
      $ ip netns del ns1
      [  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
      [  158.719316] ------------[ cut here ]------------
      [  158.720591] kernel BUG at net/core/dev.c:9824!
      [  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
      [  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  158.727508] Workqueue: netns cleanup_net
      [  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  158.762758] Call Trace:
      [  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
      [  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.769870]  ops_exit_list.isra.0+0xa8/0x150
      [  158.771544]  cleanup_net+0x446/0x8f0
      [  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
      [  158.775294]  process_one_work+0xa1a/0x1740
      [  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
      [  158.779143]  ? do_raw_spin_lock+0x11b/0x280
      [  158.780848]  worker_thread+0x9e/0x1060
      [  158.782500]  ? process_one_work+0x1740/0x1740
      [  158.784454]  kthread+0x31b/0x420
      [  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
      [  158.788286]  ret_from_fork+0x3a/0x50
      [  158.789871] ---[ end trace defd6c657c71f936 ]---
      [  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fix this by checking if a device with the same name exists in init_net
      and fallback to original code - dev%d to allocate name - in case it does.
      
      This was found using syzkaller.
      
      Fixes: aca51397 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55b40dbf
    • G
      net/af_iucv: mark expected switch fall-throughs · 05bba1ed
      Gustavo A. R. Silva 提交于
      Mark switch cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      net/iucv/af_iucv.c: warning: this statement may fall
      through [-Wimplicit-fallthrough=]:  => 537:3, 519:6, 2246:6, 510:6
      
      Notice that, in this particular case, the code comment is
      modified in accordance with what GCC is expecting to find.
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05bba1ed
    • N
      net: bridge: delete local fdb on device init failure · d7bae09f
      Nikolay Aleksandrov 提交于
      On initialization failure we have to delete the local fdb which was
      inserted due to the default pvid creation. This problem has been present
      since the inception of default_pvid. Note that currently there are 2 cases:
      1) in br_dev_init() when br_multicast_init() fails
      2) if register_netdevice() fails after calling ndo_init()
      
      This patch takes care of both since br_vlan_flush() is called on both
      occasions. Also the new fdb delete would be a no-op on normal bridge
      device destruction since the local fdb would've been already flushed by
      br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
      called last when adding a port thus nothing can fail after it.
      
      Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7bae09f
    • J
      net: sched: Fix a possible null-pointer dereference in dequeue_func() · 051c7b39
      Jia-Ju Bai 提交于
      In dequeue_func(), there is an if statement on line 74 to check whether
      skb is NULL:
          if (skb)
      
      When skb is NULL, it is used on line 77:
          prefetch(&skb->end);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, skb->end is used when skb is not NULL.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: 76e3cc12 ("codel: Controlled Delay AQM")
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      051c7b39
  9. 29 7月, 2019 1 次提交
  10. 28 7月, 2019 1 次提交
  11. 27 7月, 2019 1 次提交
  12. 26 7月, 2019 3 次提交
  13. 25 7月, 2019 2 次提交