1. 27 2月, 2020 6 次提交
    • U
      net/smc: fix cleanup for linkgroup setup failures · 51e3dfa8
      Ursula Braun 提交于
      If an SMC connection to a certain peer is setup the first time,
      a new linkgroup is created. In case of setup failures, such a
      linkgroup is unusable and should disappear. As a first step the
      linkgroup is removed from the linkgroup list in smc_lgr_forget().
      
      There are 2 problems:
      smc_listen_decline() might be called before linkgroup creation
      resulting in a crash due to calling smc_lgr_forget() with
      parameter NULL.
      If a setup failure occurs after linkgroup creation, the connection
      is never unregistered from the linkgroup, preventing linkgroup
      freeing.
      
      This patch introduces an enhanced smc_lgr_cleanup_early() function
      which
      * contains a linkgroup check for early smc_listen_decline()
        invocations
      * invokes smc_conn_free() to guarantee unregistering of the
        connection.
      * schedules fast linkgroup removal of the unusable linkgroup
      
      And the unused function smcd_conn_free() is removed from smc_core.h.
      
      Fixes: 3b2dec26 ("net/smc: restructure client and server code in af_smc")
      Fixes: 2a0674ff ("net/smc: improve abnormal termination of link groups")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51e3dfa8
    • J
      sched: act: count in the size of action flags bitfield · 1521a67e
      Jiri Pirko 提交于
      The put of the flags was added by the commit referenced in fixes tag,
      however the size of the message was not extended accordingly.
      
      Fix this by adding size of the flags bitfield to the message size.
      
      Fixes: e3822678 ("net: sched: update action implementations to support flags")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1521a67e
    • M
      net: core: devlink.c: Use built-in RCU list checking · 2eb51c75
      Madhuparna Bhowmik 提交于
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.
      
      The devlink->lock is held when devlink_dpipe_table_find()
      is called in non RCU read side section. Therefore, pass struct devlink
      to devlink_dpipe_table_find() for lockdep checking.
      Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eb51c75
    • C
      netfilter: xt_hashlimit: unregister proc file before releasing mutex · 99b79c39
      Cong Wang 提交于
      Before releasing the global mutex, we only unlink the hashtable
      from the hash list, its proc file is still not unregistered at
      this point. So syzbot could trigger a race condition where a
      parallel htable_create() could register the same file immediately
      after the mutex is released.
      
      Move htable_remove_proc_entry() back to mutex protection to
      fix this. And, fold htable_destroy() into htable_put() to make
      the code slightly easier to understand.
      
      Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com
      Fixes: c4a3922d ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      99b79c39
    • M
      ethtool: limit bitset size · e34f1753
      Michal Kubecek 提交于
      Syzbot reported that ethnl_compact_sanity_checks() can be tricked into
      reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK
      attributes and even the message by passing a value between (u32)(-31)
      and (u32)(-1) as ETHTOOL_A_BITSET_SIZE.
      
      The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so
      that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but
      ethnl_bitmap32_not_zero() check would try to access up to 512 MB of
      attribute "payload".
      
      Prevent this overflow byt limiting the bitset size. Technically, compact
      bitset format would allow bitset sizes up to almost 2^18 (so that the
      nest size does not exceed U16_MAX) but bitsets used by ethtool are much
      shorter. S16_MAX, the largest value which can be directly used as an
      upper limit in policy, should be a reasonable compromise.
      
      Fixes: 10b518d4 ("ethtool: netlink bitset handling")
      Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com
      Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com
      Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e34f1753
    • A
      net: Fix Tx hash bound checking · 6e11d157
      Amritha Nambiar 提交于
      Fixes the lower and upper bounds when there are multiple TCs and
      traffic is on the the same TC on the same device.
      
      The lower bound is represented by 'qoffset' and the upper limit for
      hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
      mapping when there are multiple TCs, as the queue indices for upper TCs
      will be offset by 'qoffset'.
      
      v2: Fixed commit description based on comments.
      
      Fixes: 1b837d48 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
      Fixes: eadec877 ("net: Add support for subordinate traffic classes to netdev_pick_tx")
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e11d157
  2. 26 2月, 2020 1 次提交
    • S
      nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() · 212d58c1
      Stefano Brivio 提交于
      Phil reports that adding elements, flushing and re-adding them
      right away:
      
        nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
        nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
        nft flush set t s
        nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'
      
      triggers, almost reliably, a crash like this one:
      
        [   71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
        [   71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e #192
        [   71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
        [   71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
        [   71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.334596] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.335780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.339718] Call Trace:
        [   71.340093]  nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
        [   71.340973]  nft_set_destroy+0x20/0x50 [nf_tables]
        [   71.341879]  nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
        [   71.342916]  process_one_work+0x1d5/0x3c0
        [   71.343601]  worker_thread+0x4a/0x3c0
        [   71.344229]  kthread+0xfb/0x130
        [   71.344780]  ? process_one_work+0x3c0/0x3c0
        [   71.345477]  ? kthread_park+0x90/0x90
        [   71.346129]  ret_from_fork+0x35/0x40
        [   71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
        [   71.348153] ---[ end trace 2eaa8149ca759bcc ]---
        [   71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.350025] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.350026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.350030] Kernel panic - not syncing: Fatal exception
        [   71.350412] Kernel Offset: disabled
        [   71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      which is caused by dangling elements that have been deactivated, but
      never removed.
      
      On a flush operation, nft_pipapo_walk() walks through all the elements
      in the mapping table, which are then deactivated by nft_flush_set(),
      one by one, and added to the commit list for removal. Element data is
      then freed.
      
      On transaction commit, nft_pipapo_remove() is called, and failed to
      remove these elements, leading to the stale references in the mapping.
      The first symptom of this, revealed by KASan, is a one-byte
      use-after-free in subsequent calls to nft_pipapo_walk(), which is
      usually not enough to trigger a panic. When stale elements are used
      more heavily, though, such as double-free via nft_pipapo_destroy()
      as in Phil's case, the problem becomes more noticeable.
      
      The issue comes from that fact that, on a flush operation,
      nft_pipapo_remove() won't get the actual key data via elem->key,
      elements to be deleted upon commit won't be found by the lookup via
      pipapo_get(), and removal will be skipped. Key data should be fetched
      via nft_set_ext_key(), instead.
      Reported-by: NPhil Sutter <phil@nwl.cc>
      Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      212d58c1
  3. 25 2月, 2020 1 次提交
    • N
      net: bridge: fix stale eth hdr pointer in br_dev_xmit · 823d81b0
      Nikolay Aleksandrov 提交于
      In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
      if the packet has the vlan header inside (e.g. bridge with disabled
      tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
      to extract the vid before filtering which in turn calls pskb_may_pull()
      and we may end up with a stale eth pointer. Moreover the cached eth header
      pointer will generally be wrong after that operation. Remove the eth header
      caching and just use eth_hdr() directly, the compiler does the right thing
      and calculates it only once so we don't lose anything.
      
      Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      823d81b0
  4. 24 2月, 2020 4 次提交
  5. 23 2月, 2020 2 次提交
  6. 22 2月, 2020 2 次提交
    • J
      netfilter: ipset: Fix forceadd evaluation path · 8af1c6fb
      Jozsef Kadlecsik 提交于
      When the forceadd option is enabled, the hash:* types should find and replace
      the first entry in the bucket with the new one if there are no reuseable
      (deleted or timed out) entries. However, the position index was just not set
      to zero and remained the invalid -1 if there were no reuseable entries.
      
      Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
      Fixes: 23c42a40 ("netfilter: ipset: Introduction of new commands and protocol version 7")
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      8af1c6fb
    • J
      netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports · f66ee041
      Jozsef Kadlecsik 提交于
      In the case of huge hash:* types of sets, due to the single spinlock of
      a set the processing of the whole set under spinlock protection could take
      too long.
      
      There were four places where the whole hash table of the set was processed
      from bucket to bucket under holding the spinlock:
      
      - During resizing a set, the original set was locked to exclude kernel side
        add/del element operations (userspace add/del is excluded by the
        nfnetlink mutex). The original set is actually just read during the
        resize, so the spinlocking is replaced with rcu locking of regions.
        However, thus there can be parallel kernel side add/del of entries.
        In order not to loose those operations a backlog is added and replayed
        after the successful resize.
      - Garbage collection of timed out entries was also protected by the spinlock.
        In order not to lock too long, region locking is introduced and a single
        region is processed in one gc go. Also, the simple timer based gc running
        is replaced with a workqueue based solution. The internal book-keeping
        (number of elements, size of extensions) is moved to region level due to
        the region locking.
      - Adding elements: when the max number of the elements is reached, the gc
        was called to evict the timed out entries. The new approach is that the gc
        is called just for the matching region, assuming that if the region
        (proportionally) seems to be full, then the whole set does. We could scan
        the other regions to check every entry under rcu locking, but for huge
        sets it'd mean a slowdown at adding elements.
      - Listing the set header data: when the set was defined with timeout
        support, the garbage collector was called to clean up timed out entries
        to get the correct element numbers and set size values. Now the set is
        scanned to check non-timed out entries, without actually calling the gc
        for the whole set.
      
      Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
      SOFTIRQ-unsafe lock order issues during working on the patch.
      
      Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
      Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
      Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
      Fixes: 23c42a40 ("netfilter: ipset: Introduction of new commands and protocol version 7")
      Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
      f66ee041
  7. 21 2月, 2020 8 次提交
  8. 20 2月, 2020 4 次提交
    • W
      udp: rehash on disconnect · 303d0403
      Willem de Bruijn 提交于
      As of the below commit, udp sockets bound to a specific address can
      coexist with one bound to the any addr for the same port.
      
      The commit also phased out the use of socket hashing based only on
      port (hslot), in favor of always hashing on {addr, port} (hslot2).
      
      The change broke the following behavior with disconnect (AF_UNSPEC):
      
          server binds to 0.0.0.0:1337
          server connects to 127.0.0.1:80
          server disconnects
          client connects to 127.0.0.1:1337
          client sends "hello"
          server reads "hello"	// times out, packet did not find sk
      
      On connect the server acquires a specific source addr suitable for
      routing to its destination. On disconnect it reverts to the any addr.
      
      The connect call triggers a rehash to a different hslot2. On
      disconnect, add the same to return to the original hslot2.
      
      Skip this step if the socket is going to be unhashed completely.
      
      Fixes: 4cdeeee9 ("net: udp: prefer listeners bound to an address")
      Reported-by: NPavel Roskin <plroskin@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      303d0403
    • R
      net/tls: Fix to avoid gettig invalid tls record · 06f5201c
      Rohit Maheshwari 提交于
      Current code doesn't check if tcp sequence number is starting from (/after)
      1st record's start sequnce number. It only checks if seq number is before
      1st record's end sequnce number. This problem will always be a possibility
      in re-transmit case. If a record which belongs to a requested seq number is
      already deleted, tls_get_record will start looking into list and as per the
      check it will look if seq number is before the end seq of 1st record, which
      will always be true and will return 1st record always, it should in fact
      return NULL.
      As part of the fix, start looking each record only if the sequence number
      lies in the list else return NULL.
      There is one more check added, driver look for the start marker record to
      handle tcp packets which are before the tls offload start sequence number,
      hence return 1st record if the record is tls start marker and seq number is
      before the 1st record's starting sequence number.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NRohit Maheshwari <rohitm@chelsio.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06f5201c
    • M
      bridge: br_stp: Use built-in RCU list checking · 33c4acbe
      Madhuparna Bhowmik 提交于
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
      by default.
      Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33c4acbe
    • A
      net: hsr: Pass lockdep expression to RCU lists · a7a9456e
      Amol Grover 提交于
      node_db is traversed using list_for_each_entry_rcu
      outside an RCU read-side critical section but under the protection
      of hsr->list_lock.
      
      Hence, add corresponding lockdep expression to silence false-positive
      warnings, and harden RCU lists.
      Signed-off-by: NAmol Grover <frextrite@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7a9456e
  9. 19 2月, 2020 10 次提交
  10. 18 2月, 2020 2 次提交
    • X
      sctp: move the format error check out of __sctp_sf_do_9_1_abort · 245709ec
      Xin Long 提交于
      When T2 timer is to be stopped, the asoc should also be deleted,
      otherwise, there will be no chance to call sctp_association_free
      and the asoc could last in memory forever.
      
      However, in sctp_sf_shutdown_sent_abort(), after adding the cmd
      SCTP_CMD_TIMER_STOP for T2 timer, it may return error due to the
      format error from __sctp_sf_do_9_1_abort() and miss adding
      SCTP_CMD_ASSOC_FAILED where the asoc will be deleted.
      
      This patch is to fix it by moving the format error check out of
      __sctp_sf_do_9_1_abort(), and do it before adding the cmd
      SCTP_CMD_TIMER_STOP for T2 timer.
      
      Thanks Hangbin for reporting this issue by the fuzz testing.
      
      v1->v2:
        - improve the comment in the code as Marcelo's suggestion.
      
      Fixes: 96ca468b ("sctp: check invalid value of length parameter in error cause")
      Reported-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      245709ec
    • J
      net: sched: correct flower port blocking · 8a9093c7
      Jason Baron 提交于
      tc flower rules that are based on src or dst port blocking are sometimes
      ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
      ports from the skb for tc flower to match against. However, the port
      dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
      key_control->flags. All callers of __skb_flow_dissect(), zero-out the
      key_control field except for fl_classify() as used by the flower
      classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
      __skb_flow_dissect(), since key_control is allocated on the stack
      and may not be initialized.
      
      Since key_basic and key_control are present for all flow keys, let's
      make sure they are initialized.
      
      Fixes: 62230715 ("flow_dissector: do not dissect l4 ports for fragments")
      Co-developed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a9093c7