1. 07 2月, 2020 1 次提交
  2. 05 2月, 2020 1 次提交
    • E
      bonding/alb: properly access headers in bond_alb_xmit() · 38f88c45
      Eric Dumazet 提交于
      syzbot managed to send an IPX packet through bond_alb_xmit()
      and af_packet and triggered a use-after-free.
      
      First, bond_alb_xmit() was using ipx_hdr() helper to reach
      the IPX header, but ipx_hdr() was using the transport offset
      instead of the network offset. In the particular syzbot
      report transport offset was 0xFFFF
      
      This patch removes ipx_hdr() since it was only (mis)used from bonding.
      
      Then we need to make sure IPv4/IPv6/IPX headers are pulled
      in skb->head before dereferencing anything.
      
      BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
      Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108
       (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...)
      
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53
       [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282
       [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline]
       [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline]
       [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422
       [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469
       [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
       [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline]
       [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224
       [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline]
       [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline]
       [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline]
       [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627
       [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238
       [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278
       [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline]
       [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252
       [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684
       [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996
       [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline]
       [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38f88c45
  3. 30 1月, 2020 2 次提交
  4. 28 1月, 2020 1 次提交
    • W
      udp: segment looped gso packets correctly · 6cd021a5
      Willem de Bruijn 提交于
      Multicast and broadcast packets can be looped from egress to ingress
      pre segmentation with dev_loopback_xmit. That function unconditionally
      sets ip_summed to CHECKSUM_UNNECESSARY.
      
      udp_rcv_segment segments gso packets in the udp rx path. Segmentation
      usually executes on egress, and does not expect packets of this type.
      __udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But
      the offsets are not correct for gso_make_checksum.
      
      UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set
      to the correct pseudo header checksum. Reset ip_summed to this type.
      (CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h)
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: cf329aa4 ("udp: cope with UDP GRO packet misdirection")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cd021a5
  5. 27 1月, 2020 6 次提交
    • V
      devlink: add macro for "fw.roce" · 41c0d917
      Vasundhara Volam 提交于
      Add definition and documentation for the new generic info "fw.roce".
      
      v2: Remove board.nvm_cfg since fw.psid is similar.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41c0d917
    • S
      udp: Support UDP fraglist GRO/GSO. · 9fd1ff5d
      Steffen Klassert 提交于
      This patch extends UDP GRO to support fraglist GRO/GSO
      by using the previously introduced infrastructure.
      If the feature is enabled, all UDP packets are going to
      fraglist GRO (local input and forward).
      
      After validating the csum,  we mark ip_summed as
      CHECKSUM_UNNECESSARY for fraglist GRO packets to
      make sure that the csum is not touched.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fd1ff5d
    • C
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang 提交于
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e24cd75
    • S
      nf_tables: Add set type for arbitrary concatenation of ranges · 3c4287f6
      Stefano Brivio 提交于
      This new set type allows for intervals in concatenated fields,
      which are expressed in the usual way, that is, simple byte
      concatenation with padding to 32 bits for single fields, and
      given as ranges by specifying start and end elements containing,
      each, the full concatenation of start and end values for the
      single fields.
      
      Ranges are expanded to composing netmasks, for each field: these
      are inserted as rules in per-field lookup tables. Bits to be
      classified are divided in 4-bit groups, and for each group, the
      lookup table contains 4^2 buckets, representing all the possible
      values of a bit group. This approach was inspired by the Grouper
      algorithm:
      	http://www.cse.usf.edu/~ligatti/projects/grouper/
      
      Matching is performed by a sequence of AND operations between
      bucket values, with buckets selected according to the value of
      packet bits, for each group. The result of this sequence tells
      us which rules matched for a given field.
      
      In order to concatenate several ranged fields, per-field rules
      are mapped using mapping arrays, one per field, that specify
      which rules should be considered while matching the next field.
      The mapping array for the last field contains a reference to
      the element originally inserted.
      
      The notes in nft_set_pipapo.c cover the algorithm in deeper
      detail.
      
      A pure hash-based approach is of no use here, as ranges need
      to be classified. An implementation based on "proxying" the
      existing red-black tree set type, creating a tree for each
      field, was considered, but deemed impractical due to the fact
      that elements would need to be shared between trees, at least
      as long as we want to keep UAPI changes to a minimum.
      
      A stand-alone implementation of this algorithm is available at:
      	https://pipapo.lameexcu.se
      together with notes about possible future optimisations
      (in pipapo.c).
      
      This algorithm was designed with data locality in mind, and can
      be highly optimised for SIMD instruction sets, as the bulk of
      the matching work is done with repetitive, simple bitwise
      operations.
      
      At this point, without further optimisations, nft_concat_range.sh
      reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
      L2$):
      
      TEST: performance
        net,port                                                      [ OK ]
          baseline (drop from netdev hook):              10190076pps
          baseline hash (non-ranged entries):             6179564pps
          baseline rbtree (match on first field only):    2950341pps
          set with  1000 full, ranged entries:            2304165pps
        port,net                                                      [ OK ]
          baseline (drop from netdev hook):              10143615pps
          baseline hash (non-ranged entries):             6135776pps
          baseline rbtree (match on first field only):    4311934pps
          set with   100 full, ranged entries:            4131471pps
        net6,port                                                     [ OK ]
          baseline (drop from netdev hook):               9730404pps
          baseline hash (non-ranged entries):             4809557pps
          baseline rbtree (match on first field only):    1501699pps
          set with  1000 full, ranged entries:            1092557pps
        port,proto                                                    [ OK ]
          baseline (drop from netdev hook):              10812426pps
          baseline hash (non-ranged entries):             6929353pps
          baseline rbtree (match on first field only):    3027105pps
          set with 30000 full, ranged entries:             284147pps
        net6,port,mac                                                 [ OK ]
          baseline (drop from netdev hook):               9660114pps
          baseline hash (non-ranged entries):             3778877pps
          baseline rbtree (match on first field only):    3179379pps
          set with    10 full, ranged entries:            2082880pps
        net6,port,mac,proto                                           [ OK ]
          baseline (drop from netdev hook):               9718324pps
          baseline hash (non-ranged entries):             3799021pps
          baseline rbtree (match on first field only):    1506689pps
          set with  1000 full, ranged entries:             783810pps
        net,mac                                                       [ OK ]
          baseline (drop from netdev hook):              10190029pps
          baseline hash (non-ranged entries):             5172218pps
          baseline rbtree (match on first field only):    2946863pps
          set with  1000 full, ranged entries:            1279122pps
      
      v4:
       - fix build for 32-bit architectures: 64-bit division needs
         div_u64() (kbuild test robot <lkp@intel.com>)
      v3:
       - rework interface for field length specification,
         NFT_SET_SUBKEY disappears and information is stored in
         description
       - remove scratch area to store closing element of ranges,
         as elements now come with an actual attribute to specify
         the upper range limit (Pablo Neira Ayuso)
       - also remove pointer to 'start' element from mapping table,
         closing key is now accessible via extension data
       - use bytes right away instead of bits for field lengths,
         this way we can also double the inner loop of the lookup
         function to take care of upper and lower bits in a single
         iteration (minor performance improvement)
       - make it clearer that set operations are actually atomic
         API-wise, but we can't e.g. implement flush() as one-shot
         action
       - fix type for 'dup' in nft_pipapo_insert(), check for
         duplicates only in the next generation, and in general take
         care of differentiating generation mask cases depending on
         the operation (Pablo Neira Ayuso)
       - report C implementation matching rate in commit message, so
         that AVX2 implementation can be compared (Pablo Neira Ayuso)
      v2:
       - protect access to scratch maps in nft_pipapo_lookup() with
         local_bh_disable/enable() (Florian Westphal)
       - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
         already implied (Florian Westphal)
       - explain why partial allocation failures don't need handling
         in pipapo_realloc_scratch(), rename 'm' to clone and update
         related kerneldoc to make it clear we're not operating on
         the live copy (Florian Westphal)
       - add expicit check for priv->start_elem in
         nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
         with a NULL start element, and also zero it out in every
         operation that might make it invalid, so that insertion
         doesn't proceed with an invalid element (Florian Westphal)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3c4287f6
    • S
      netfilter: nf_tables: Support for sets with multiple ranged fields · f3a2181e
      Stefano Brivio 提交于
      Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
      to specify the length of each field in a set concatenation.
      
      This allows set implementations to support concatenation of multiple
      ranged items, as they can divide the input key into matching data for
      every single field. Such set implementations would be selected as
      they specify support for NFT_SET_INTERVAL and allow desc->field_count
      to be greater than one. Explicitly disallow this for nft_set_rbtree.
      
      In order to specify the interval for a set entry, userspace would
      include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
      range endpoints as two separate keys, represented by attributes
      NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.
      
      While at it, export the number of 32-bit registers available for
      packet matching, as nftables will need this to know the maximum
      number of field lengths that can be specified.
      
      For example, "packets with an IPv4 address between 192.0.2.0 and
      192.0.2.42, with destination port between 22 and 25", can be
      expressed as two concatenated elements:
      
        NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
        NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25
      
      and NFTA_SET_DESC_CONCAT attribute would contain:
      
        NFTA_LIST_ELEM
          NFTA_SET_FIELD_LEN:		4
        NFTA_LIST_ELEM
          NFTA_SET_FIELD_LEN:		2
      
      v4: No changes
      v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
      v2: No changes
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f3a2181e
    • P
      netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute · 7b225d0b
      Pablo Neira Ayuso 提交于
      Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
      interval between kernel and userspace.
      
      This patch also adds the NFT_SET_EXT_KEY_END extension to store the
      closing element value in this interval.
      
      v4: No changes
      v3: New patch
      
      [sbrivio: refactor error paths and labels; add corresponding
        nft_set_ext_type for new key; rebase]
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7b225d0b
  6. 25 1月, 2020 3 次提交
    • P
      net: sched: Make TBF Qdisc offloadable · ef6aadcc
      Petr Machata 提交于
      Invoke ndo_setup_tc as appropriate to signal init / replacement, destroying
      and dumping of TBF Qdisc.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef6aadcc
    • F
      mptcp: do not inherit inet proto ops · e42f1ac6
      Florian Westphal 提交于
      We need to initialise the struct ourselves, else we expose tcp-specific
      callbacks such as tcp_splice_read which will then trigger splat because
      the socket is an mptcp one:
      
      BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
      Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478
      
      CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3
      Call Trace:
       tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
       tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612
       tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674
       tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791
       do_splice+0x1259/0x1560 fs/splice.c:1205
      
      To prevent build error with ipv6, add the recv/sendmsg function
      declaration to ipv6.h.  The functions are already accessible "thanks"
      to retpoline related work, but they are currently only made visible
      by socket.c specific INDIRECT_CALLABLE macros.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e42f1ac6
    • P
      netfilter: nf_tables: autoload modules from the abort path · eb014de4
      Pablo Neira Ayuso 提交于
      This patch introduces a list of pending module requests. This new module
      list is composed of nft_module_request objects that contain the module
      name and one status field that tells if the module has been already
      loaded (the 'done' field).
      
      In the first pass, from the preparation phase, the netlink command finds
      that a module is missing on this list. Then, a module request is
      allocated and added to this list and nft_request_module() returns
      -EAGAIN. This triggers the abort path with the autoload parameter set on
      from nfnetlink, request_module() is called and the module request enters
      the 'done' state. Since the mutex is released when loading modules from
      the abort phase, the module list is zapped so this is iteration occurs
      over a local list. Therefore, the request_module() calls happen when
      object lists are in consistent state (after fulling aborting the
      transaction) and the commit list is empty.
      
      On the second pass, the netlink command will find that it already tried
      to load the module, so it does not request it again and
      nft_request_module() returns 0. Then, there is a look up to find the
      object that the command was missing. If the module was successfully
      loaded, the command proceeds normally since it finds the missing object
      in place, otherwise -ENOENT is reported to userspace.
      
      This patch also updates nfnetlink to include the reason to enter the
      abort phase, which is required for this new autoload module rationale.
      
      Fixes: ec7470b8 ("netfilter: nf_tables: store transaction list locally while requesting module")
      Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eb014de4
  7. 24 1月, 2020 6 次提交
  8. 23 1月, 2020 8 次提交
  9. 22 1月, 2020 1 次提交
  10. 19 1月, 2020 3 次提交
  11. 17 1月, 2020 1 次提交
  12. 16 1月, 2020 7 次提交