1. 28 2月, 2020 2 次提交
  2. 27 2月, 2020 9 次提交
    • K
      net/smc: check for valid ib_client_data · a2f2ef4a
      Karsten Graul 提交于
      In smc_ib_remove_dev() check if the provided ib device was actually
      initialized for SMC before.
      
      Reported-by: syzbot+84484ccebdd4e5451d91@syzkaller.appspotmail.com
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2f2ef4a
    • P
      mptcp: add dummy icsk_sync_mss() · dc24f8b4
      Paolo Abeni 提交于
      syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
      callback, and was able to trigger a null pointer dereference:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 8e171067 P4D 8e171067 PUD 93fa2067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc900020b7b80 EFLAGS: 00010246
      RAX: 1ffff110124ba600 RBX: 0000000000000000 RCX: ffff88809fefa600
      RDX: ffff8880994cdb18 RSI: 0000000000000000 RDI: ffff8880925d3140
      RBP: ffffc900020b7bd8 R08: ffffffff870225be R09: fffffbfff140652a
      R10: fffffbfff140652a R11: 0000000000000000 R12: ffff8880925d35d0
      R13: ffff8880925d3140 R14: dffffc0000000000 R15: 1ffff110124ba6ba
      FS:  0000000001a0b880(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a6d6f000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
       netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
       smack_netlabel security/smack/smack_lsm.c:2425 [inline]
       smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
       security_inode_setsecurity+0xb2/0x140 security/security.c:1364
       __vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
       vfs_setxattr fs/xattr.c:224 [inline]
       setxattr+0x335/0x430 fs/xattr.c:451
       __do_sys_fsetxattr fs/xattr.c:506 [inline]
       __se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
       __x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
       do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440199
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffcadc19e48 EFLAGS: 00000246 ORIG_RAX: 00000000000000be
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440199
      RDX: 0000000020000200 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000003 R09: 00000000004002c8
      R10: 0000000000000009 R11: 0000000000000246 R12: 0000000000401a20
      R13: 0000000000401ab0 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      CR2: 0000000000000000
      
      Address the issue adding a dummy icsk_sync_mss callback.
      To properly sync the subflows mss and options list we need some
      additional infrastructure, which will land to net-next.
      
      Reported-by: syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc24f8b4
    • E
      ipv6: restrict IPV6_ADDRFORM operation · b6f61189
      Eric Dumazet 提交于
      IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one.
      While this operation sounds illogical, we have to support it.
      
      One of the things it does for TCP socket is to switch sk->sk_prot
      to tcp_prot.
      
      We now have other layers playing with sk->sk_prot, so we should make
      sure to not interfere with them.
      
      This patch makes sure sk_prot is the default pointer for TCP IPv6 socket.
      
      syzbot reported :
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD a0113067 P4D a0113067 PUD a8771067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427
       __sock_release net/socket.c:605 [inline]
       sock_close+0xe1/0x260 net/socket.c:1283
       __fput+0x2e4/0x740 fs/file_table.c:280
       ____fput+0x15/0x20 fs/file_table.c:313
       task_work_run+0x176/0x1b0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:188 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
       prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
       syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
       do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x45c429
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2ae75dac78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: 0000000000000000 RBX: 00007f2ae75db6d4 RCX: 000000000045c429
      RDX: 0000000000000001 RSI: 000000000000011a RDI: 0000000000000004
      RBP: 000000000076bf20 R08: 0000000000000038 R09: 0000000000000000
      R10: 0000000020000180 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000000a9d R14: 00000000004ccfb4 R15: 000000000076bf2c
      Modules linked in:
      CR2: 0000000000000000
      ---[ end trace 82567b5207e87bae ]---
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f61189
    • U
      net/smc: fix cleanup for linkgroup setup failures · 51e3dfa8
      Ursula Braun 提交于
      If an SMC connection to a certain peer is setup the first time,
      a new linkgroup is created. In case of setup failures, such a
      linkgroup is unusable and should disappear. As a first step the
      linkgroup is removed from the linkgroup list in smc_lgr_forget().
      
      There are 2 problems:
      smc_listen_decline() might be called before linkgroup creation
      resulting in a crash due to calling smc_lgr_forget() with
      parameter NULL.
      If a setup failure occurs after linkgroup creation, the connection
      is never unregistered from the linkgroup, preventing linkgroup
      freeing.
      
      This patch introduces an enhanced smc_lgr_cleanup_early() function
      which
      * contains a linkgroup check for early smc_listen_decline()
        invocations
      * invokes smc_conn_free() to guarantee unregistering of the
        connection.
      * schedules fast linkgroup removal of the unusable linkgroup
      
      And the unused function smcd_conn_free() is removed from smc_core.h.
      
      Fixes: 3b2dec26 ("net/smc: restructure client and server code in af_smc")
      Fixes: 2a0674ff ("net/smc: improve abnormal termination of link groups")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51e3dfa8
    • J
      sched: act: count in the size of action flags bitfield · 1521a67e
      Jiri Pirko 提交于
      The put of the flags was added by the commit referenced in fixes tag,
      however the size of the message was not extended accordingly.
      
      Fix this by adding size of the flags bitfield to the message size.
      
      Fixes: e3822678 ("net: sched: update action implementations to support flags")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1521a67e
    • M
      net: core: devlink.c: Use built-in RCU list checking · 2eb51c75
      Madhuparna Bhowmik 提交于
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.
      
      The devlink->lock is held when devlink_dpipe_table_find()
      is called in non RCU read side section. Therefore, pass struct devlink
      to devlink_dpipe_table_find() for lockdep checking.
      Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eb51c75
    • C
      netfilter: xt_hashlimit: unregister proc file before releasing mutex · 99b79c39
      Cong Wang 提交于
      Before releasing the global mutex, we only unlink the hashtable
      from the hash list, its proc file is still not unregistered at
      this point. So syzbot could trigger a race condition where a
      parallel htable_create() could register the same file immediately
      after the mutex is released.
      
      Move htable_remove_proc_entry() back to mutex protection to
      fix this. And, fold htable_destroy() into htable_put() to make
      the code slightly easier to understand.
      
      Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com
      Fixes: c4a3922d ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99b79c39
    • M
      ethtool: limit bitset size · e34f1753
      Michal Kubecek 提交于
      Syzbot reported that ethnl_compact_sanity_checks() can be tricked into
      reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK
      attributes and even the message by passing a value between (u32)(-31)
      and (u32)(-1) as ETHTOOL_A_BITSET_SIZE.
      
      The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so
      that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but
      ethnl_bitmap32_not_zero() check would try to access up to 512 MB of
      attribute "payload".
      
      Prevent this overflow byt limiting the bitset size. Technically, compact
      bitset format would allow bitset sizes up to almost 2^18 (so that the
      nest size does not exceed U16_MAX) but bitsets used by ethtool are much
      shorter. S16_MAX, the largest value which can be directly used as an
      upper limit in policy, should be a reasonable compromise.
      
      Fixes: 10b518d4 ("ethtool: netlink bitset handling")
      Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com
      Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com
      Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e34f1753
    • A
      net: Fix Tx hash bound checking · 6e11d157
      Amritha Nambiar 提交于
      Fixes the lower and upper bounds when there are multiple TCs and
      traffic is on the the same TC on the same device.
      
      The lower bound is represented by 'qoffset' and the upper limit for
      hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
      mapping when there are multiple TCs, as the queue indices for upper TCs
      will be offset by 'qoffset'.
      
      v2: Fixed commit description based on comments.
      
      Fixes: 1b837d48 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
      Fixes: eadec877 ("net: Add support for subordinate traffic classes to netdev_pick_tx")
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e11d157
  3. 26 2月, 2020 1 次提交
    • S
      nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() · 212d58c1
      Stefano Brivio 提交于
      Phil reports that adding elements, flushing and re-adding them
      right away:
      
        nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
        nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
        nft flush set t s
        nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'
      
      triggers, almost reliably, a crash like this one:
      
        [   71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
        [   71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e #192
        [   71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
        [   71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
        [   71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.334596] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.335780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.339718] Call Trace:
        [   71.340093]  nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
        [   71.340973]  nft_set_destroy+0x20/0x50 [nf_tables]
        [   71.341879]  nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
        [   71.342916]  process_one_work+0x1d5/0x3c0
        [   71.343601]  worker_thread+0x4a/0x3c0
        [   71.344229]  kthread+0xfb/0x130
        [   71.344780]  ? process_one_work+0x3c0/0x3c0
        [   71.345477]  ? kthread_park+0x90/0x90
        [   71.346129]  ret_from_fork+0x35/0x40
        [   71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
        [   71.348153] ---[ end trace 2eaa8149ca759bcc ]---
        [   71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.350025] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.350026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.350030] Kernel panic - not syncing: Fatal exception
        [   71.350412] Kernel Offset: disabled
        [   71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      which is caused by dangling elements that have been deactivated, but
      never removed.
      
      On a flush operation, nft_pipapo_walk() walks through all the elements
      in the mapping table, which are then deactivated by nft_flush_set(),
      one by one, and added to the commit list for removal. Element data is
      then freed.
      
      On transaction commit, nft_pipapo_remove() is called, and failed to
      remove these elements, leading to the stale references in the mapping.
      The first symptom of this, revealed by KASan, is a one-byte
      use-after-free in subsequent calls to nft_pipapo_walk(), which is
      usually not enough to trigger a panic. When stale elements are used
      more heavily, though, such as double-free via nft_pipapo_destroy()
      as in Phil's case, the problem becomes more noticeable.
      
      The issue comes from that fact that, on a flush operation,
      nft_pipapo_remove() won't get the actual key data via elem->key,
      elements to be deleted upon commit won't be found by the lookup via
      pipapo_get(), and removal will be skipped. Key data should be fetched
      via nft_set_ext_key(), instead.
      Reported-by: NPhil Sutter <phil@nwl.cc>
      Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      212d58c1
  4. 25 2月, 2020 1 次提交
    • N
      net: bridge: fix stale eth hdr pointer in br_dev_xmit · 823d81b0
      Nikolay Aleksandrov 提交于
      In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
      if the packet has the vlan header inside (e.g. bridge with disabled
      tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
      to extract the vid before filtering which in turn calls pskb_may_pull()
      and we may end up with a stale eth pointer. Moreover the cached eth header
      pointer will generally be wrong after that operation. Remove the eth header
      caching and just use eth_hdr() directly, the compiler does the right thing
      and calculates it only once so we don't lose anything.
      
      Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      823d81b0
  5. 24 2月, 2020 4 次提交
  6. 23 2月, 2020 2 次提交
  7. 22 2月, 2020 2 次提交
    • J
      netfilter: ipset: Fix forceadd evaluation path · 8af1c6fb
      Jozsef Kadlecsik 提交于
      When the forceadd option is enabled, the hash:* types should find and replace
      the first entry in the bucket with the new one if there are no reuseable
      (deleted or timed out) entries. However, the position index was just not set
      to zero and remained the invalid -1 if there were no reuseable entries.
      
      Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
      Fixes: 23c42a40 ("netfilter: ipset: Introduction of new commands and protocol version 7")
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      8af1c6fb
    • J
      netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports · f66ee041
      Jozsef Kadlecsik 提交于
      In the case of huge hash:* types of sets, due to the single spinlock of
      a set the processing of the whole set under spinlock protection could take
      too long.
      
      There were four places where the whole hash table of the set was processed
      from bucket to bucket under holding the spinlock:
      
      - During resizing a set, the original set was locked to exclude kernel side
        add/del element operations (userspace add/del is excluded by the
        nfnetlink mutex). The original set is actually just read during the
        resize, so the spinlocking is replaced with rcu locking of regions.
        However, thus there can be parallel kernel side add/del of entries.
        In order not to loose those operations a backlog is added and replayed
        after the successful resize.
      - Garbage collection of timed out entries was also protected by the spinlock.
        In order not to lock too long, region locking is introduced and a single
        region is processed in one gc go. Also, the simple timer based gc running
        is replaced with a workqueue based solution. The internal book-keeping
        (number of elements, size of extensions) is moved to region level due to
        the region locking.
      - Adding elements: when the max number of the elements is reached, the gc
        was called to evict the timed out entries. The new approach is that the gc
        is called just for the matching region, assuming that if the region
        (proportionally) seems to be full, then the whole set does. We could scan
        the other regions to check every entry under rcu locking, but for huge
        sets it'd mean a slowdown at adding elements.
      - Listing the set header data: when the set was defined with timeout
        support, the garbage collector was called to clean up timed out entries
        to get the correct element numbers and set size values. Now the set is
        scanned to check non-timed out entries, without actually calling the gc
        for the whole set.
      
      Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
      SOFTIRQ-unsafe lock order issues during working on the patch.
      
      Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
      Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
      Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
      Fixes: 23c42a40 ("netfilter: ipset: Introduction of new commands and protocol version 7")
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      f66ee041
  8. 21 2月, 2020 8 次提交
  9. 20 2月, 2020 4 次提交
    • W
      udp: rehash on disconnect · 303d0403
      Willem de Bruijn 提交于
      As of the below commit, udp sockets bound to a specific address can
      coexist with one bound to the any addr for the same port.
      
      The commit also phased out the use of socket hashing based only on
      port (hslot), in favor of always hashing on {addr, port} (hslot2).
      
      The change broke the following behavior with disconnect (AF_UNSPEC):
      
          server binds to 0.0.0.0:1337
          server connects to 127.0.0.1:80
          server disconnects
          client connects to 127.0.0.1:1337
          client sends "hello"
          server reads "hello"	// times out, packet did not find sk
      
      On connect the server acquires a specific source addr suitable for
      routing to its destination. On disconnect it reverts to the any addr.
      
      The connect call triggers a rehash to a different hslot2. On
      disconnect, add the same to return to the original hslot2.
      
      Skip this step if the socket is going to be unhashed completely.
      
      Fixes: 4cdeeee9 ("net: udp: prefer listeners bound to an address")
      Reported-by: NPavel Roskin <plroskin@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      303d0403
    • R
      net/tls: Fix to avoid gettig invalid tls record · 06f5201c
      Rohit Maheshwari 提交于
      Current code doesn't check if tcp sequence number is starting from (/after)
      1st record's start sequnce number. It only checks if seq number is before
      1st record's end sequnce number. This problem will always be a possibility
      in re-transmit case. If a record which belongs to a requested seq number is
      already deleted, tls_get_record will start looking into list and as per the
      check it will look if seq number is before the end seq of 1st record, which
      will always be true and will return 1st record always, it should in fact
      return NULL.
      As part of the fix, start looking each record only if the sequence number
      lies in the list else return NULL.
      There is one more check added, driver look for the start marker record to
      handle tcp packets which are before the tls offload start sequence number,
      hence return 1st record if the record is tls start marker and seq number is
      before the 1st record's starting sequence number.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NRohit Maheshwari <rohitm@chelsio.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06f5201c
    • M
      bridge: br_stp: Use built-in RCU list checking · 33c4acbe
      Madhuparna Bhowmik 提交于
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
      by default.
      Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33c4acbe
    • A
      net: hsr: Pass lockdep expression to RCU lists · a7a9456e
      Amol Grover 提交于
      node_db is traversed using list_for_each_entry_rcu
      outside an RCU read-side critical section but under the protection
      of hsr->list_lock.
      
      Hence, add corresponding lockdep expression to silence false-positive
      warnings, and harden RCU lists.
      Signed-off-by: NAmol Grover <frextrite@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7a9456e
  10. 19 2月, 2020 7 次提交