1. 24 2月, 2019 5 次提交
  2. 23 2月, 2019 33 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · ea34a003
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-02-23
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a bug in BPF's LPM deletion logic to match correct prefix
         length, from Alban.
      
      2) Fix AF_XDP teardown by not destroying umem prematurely as it
         is still needed till all outstanding skbs are freed, from Björn.
      
      3) Fix unkillable BPF_PROG_TEST_RUN under preempt kernel by checking
         signal_pending() outside need_resched() condition which is never
         triggered there, from Stanislav.
      
      4) Fix two nfp JIT bugs, one in code emission for K-based xor, and
         another one to explicitly clear upper bits in alu32, from Jiong.
      
      5) Add bpf list address to maintainers file, from Daniel.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea34a003
    • D
      bpf, doc: add bpf list as secondary entry to maintainers file · b4b8bb69
      Daniel Borkmann 提交于
      We recently created a bpf@vger.kernel.org list (https://lore.kernel.org/bpf/)
      for BPF related discussions, originally in context of BPF track at LSF/MM
      for topic discussions. It's *optional* but *desirable* to keep it in Cc for
      BPF related kernel/loader/llvm/tooling threads, meaning also infrastructure
      like llvm that sits on top of kernel but is crucial to BPF. In any case,
      netdev with it's bpf delegate is *as-is* today primary list for patches, so
      nothing changes in the workflow. Main purpose is to have some more awareness
      for the bpf@vger.kernel.org list that folks can Cc for BPF specific topics.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b4b8bb69
    • D
      Merge branch 'udp-a-few-fixes' · 40e8f0b4
      David S. Miller 提交于
      Paolo Abeni says:
      
      ====================
      udp: a few fixes
      
      This series includes some UDP-related fixlet. All this stuff has been
      pointed out by the sparse tool. The first two patches are just annotation
      related, while the last 2 cover some very unlikely races.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40e8f0b4
    • P
      udp: fix possible user after free in error handler · 92b95364
      Paolo Abeni 提交于
      Similar to the previous commit, this addresses the same issue for
      ipv4: use a single fetch operation and use the correct rcu
      annotation.
      
      Fixes: e7cc0824 ("udp: Support for error handlers of tunnels with arbitrary destination port")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92b95364
    • P
      udpv6: fix possible user after free in error handler · 424a7cd0
      Paolo Abeni 提交于
      Before derefencing the encap pointer, commit e7cc0824 ("udp: Support
      for error handlers of tunnels with arbitrary destination port") checks
      for a NULL value, but the two fetch operation can race with removal.
      Fix the above using a single access.
      Also fix a couple of type annotations, to make sparse happy.
      
      Fixes: e7cc0824 ("udp: Support for error handlers of tunnels with arbitrary destination port")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      424a7cd0
    • P
      fou6: fix proto error handler argument type · 5de362df
      Paolo Abeni 提交于
      Last argument of gue6_err_proto_handler() has a wrong type annotation,
      fix it and make sparse happy again.
      
      Fixes: b8a51b38 ("fou, fou6: ICMP error handlers for FoU and GUE")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5de362df
    • P
      udpv6: add the required annotation to mib type · 543fc3fb
      Paolo Abeni 提交于
      In commit 029a3743 ("udp6: cleanup stats accounting in recvmsg()")
      I forgot to add the percpu annotation for the mib pointer. Add it, and
      make sparse happy.
      
      Fixes: 029a3743 ("udp6: cleanup stats accounting in recvmsg()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      543fc3fb
    • Y
      mdio_bus: Fix use-after-free on device_register fails · 6ff7b060
      YueHaibing 提交于
      KASAN has found use-after-free in fixed_mdio_bus_init,
      commit 0c692d07 ("drivers/net/phy/mdio_bus.c: call
      put_device on device_register() failure") call put_device()
      while device_register() fails,give up the last reference
      to the device and allow mdiobus_release to be executed
      ,kfreeing the bus. However in most drives, mdiobus_free
      be called to free the bus while mdiobus_register fails.
      use-after-free occurs when access bus again, this patch
      revert it to let mdiobus_free free the bus.
      
      KASAN report details as below:
      
      BUG: KASAN: use-after-free in mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
      Read of size 4 at addr ffff8881dc824d78 by task syz-executor.0/3524
      
      CPU: 1 PID: 3524 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
       fixed_mdio_bus_init+0x283/0x1000 [fixed_phy]
       ? 0xffffffffc0e40000
       ? 0xffffffffc0e40000
       ? 0xffffffffc0e40000
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f6215c19c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00007f6215c19c70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6215c1a6bc
      R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004
      
      Allocated by task 3524:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       mdiobus_alloc_size+0x54/0x1b0 drivers/net/phy/mdio_bus.c:143
       fixed_mdio_bus_init+0x163/0x1000 [fixed_phy]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 3524:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
       slab_free_hook mm/slub.c:1409 [inline]
       slab_free_freelist_hook mm/slub.c:1436 [inline]
       slab_free mm/slub.c:2986 [inline]
       kfree+0xe1/0x270 mm/slub.c:3938
       device_release+0x78/0x200 drivers/base/core.c:919
       kobject_cleanup lib/kobject.c:662 [inline]
       kobject_release lib/kobject.c:691 [inline]
       kref_put include/linux/kref.h:67 [inline]
       kobject_put+0x146/0x240 lib/kobject.c:708
       put_device+0x1c/0x30 drivers/base/core.c:2060
       __mdiobus_register+0x483/0x560 drivers/net/phy/mdio_bus.c:382
       fixed_mdio_bus_init+0x26b/0x1000 [fixed_phy]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881dc824c80
       which belongs to the cache kmalloc-2k of size 2048
      The buggy address is located 248 bytes inside of
       2048-byte region [ffff8881dc824c80, ffff8881dc825480)
      The buggy address belongs to the page:
      page:ffffea0007720800 count:1 mapcount:0 mapping:ffff8881f6c02800 index:0x0 compound_mapcount: 0
      flags: 0x2fffc0000010200(slab|head)
      raw: 02fffc0000010200 0000000000000000 0000000500000001 ffff8881f6c02800
      raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881dc824c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8881dc824c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8881dc824d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                      ^
       ffff8881dc824d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8881dc824e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 0c692d07 ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ff7b060
    • K
      net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 · 97f0082a
      Kalash Nainwal 提交于
      Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
      keep legacy software happy. This is similar to what was done for
      ipv4 in commit 709772e6 ("net: Fix routing tables with
      id > 255 for legacy software").
      Signed-off-by: NKalash Nainwal <kalash@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97f0082a
    • D
      Merge branch 'bnxt_en-firmware-message-delay-fixes' · a11f5756
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: firmware message delay fixes.
      
      We were seeing some intermittent firmware message timeouts in our lab and
      these 2 small patches fix them.  Please apply to stable as well.  Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a11f5756
    • M
      bnxt_en: Wait longer for the firmware message response to complete. · 0000b81a
      Michael Chan 提交于
      The code waits up to 20 usec for the firmware response to complete
      once we've seen the valid response header in the buffer.  It turns
      out that in some scenarios, this wait time is not long enough.
      Extend it to 150 usec and use usleep_range() instead of udelay().
      
      Fixes: 9751e8e7 ("bnxt_en: reduce timeout on initial HWRM calls")
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0000b81a
    • M
      bnxt_en: Fix typo in firmware message timeout logic. · 67681d02
      Michael Chan 提交于
      The logic that polls for the firmware message response uses a shorter
      sleep interval for the first few passes.  But there was a typo so it
      was using the wrong counter (larger counter) for these short sleep
      passes.  The result is a slightly shorter timeout period for these
      firmware messages than intended.  Fix it by using the proper counter.
      
      Fixes: 9751e8e7 ("bnxt_en: reduce timeout on initial HWRM calls")
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67681d02
    • D
      Merge branch 'bpf-nfp-codegen-fixes' · 7d466e5f
      Daniel Borkmann 提交于
      Jiong Wang says:
      
      ====================
      Code-gen for BPF_ALU | BPF_XOR | BPF_K is wrong when imm is -1,
      also high 32-bit of 64-bit register should always be cleared.
      
      This set fixed both bugs.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7d466e5f
    • J
      nfp: bpf: fix ALU32 high bits clearance bug · f036ebd9
      Jiong Wang 提交于
      NFP BPF JIT compiler is doing a couple of small optimizations when jitting
      ALU imm instructions, some of these optimizations could save code-gen, for
      example:
      
        A & -1 =  A
        A |  0 =  A
        A ^  0 =  A
      
      However, for ALU32, high 32-bit of the 64-bit register should still be
      cleared according to ISA semantics.
      
      Fixes: cd7df56e ("nfp: add BPF to NFP code translator")
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f036ebd9
    • J
      nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K · 71c19024
      Jiong Wang 提交于
      The intended optimization should be A ^ 0 = A, not A ^ -1 = A.
      
      Fixes: cd7df56e ("nfp: add BPF to NFP code translator")
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      71c19024
    • D
      Merge tag 'mac80211-for-davem-2019-02-22' of... · ab01f251
      David S. Miller 提交于
      Merge tag 'mac80211-for-davem-2019-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Three more fixes:
       * mac80211 mesh code wasn't allocating SKB tailroom properly
         in some cases
       * tx_sk_pacing_shift should be 7 for better performance
       * mac80211_hwsim wasn't propagating genlmsg_reply() errors
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab01f251
    • F
      Documentation: networking: switchdev: Update port parent ID section · 80d79ad2
      Florian Fainelli 提交于
      Update the section about switchdev drivers having to implement a
      switchdev_port_attr_get() function to return
      SWITCHDEV_ATTR_ID_PORT_PARENT_ID since that is no longer valid after
      commit bccb3025 ("net: Get rid of
      SWITCHDEV_ATTR_ID_PORT_PARENT_ID").
      
      Fixes: bccb3025 ("net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID")
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80d79ad2
    • J
      net: socket: add check for negative optlen in compat setsockopt · 52baf987
      Jann Horn 提交于
      __sys_setsockopt() already checks for `optlen < 0`. Add an equivalent check
      to the compat path for robustness. This has to be `> INT_MAX` instead of
      `< 0` because the signedness of `optlen` is different here.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52baf987
    • P
      ipv6: route: purge exception on removal · f5b51fe8
      Paolo Abeni 提交于
      When a netdevice is unregistered, we flush the relevant exception
      via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route().
      
      Finally, we end-up calling rt6_remove_exception(), where we release
      the relevant dst, while we keep the references to the related fib6_info and
      dev. Such references should be released later when the dst will be
      destroyed.
      
      There are a number of caches that can keep the exception around for an
      unlimited amount of time - namely dst_cache, possibly even socket cache.
      As a result device registration may hang, as demonstrated by this script:
      
      ip netns add cl
      ip netns add rt
      ip netns add srv
      ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1
      
      ip link add name cl_veth type veth peer name cl_rt_veth
      ip link set dev cl_veth netns cl
      ip -n cl link set dev cl_veth up
      ip -n cl addr add dev cl_veth 2001::2/64
      ip -n cl route add default via 2001::1
      
      ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth
      ip -n cl link set tunv6 up
      ip -n cl addr add 2013::2/64 dev tunv6
      
      ip link set dev cl_rt_veth netns rt
      ip -n rt link set dev cl_rt_veth up
      ip -n rt addr add dev cl_rt_veth 2001::1/64
      
      ip link add name rt_srv_veth type veth peer name srv_veth
      ip link set dev srv_veth netns srv
      ip -n srv link set dev srv_veth up
      ip -n srv addr add dev srv_veth 2002::1/64
      ip -n srv route add default via 2002::2
      
      ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth
      ip -n srv link set tunv6 up
      ip -n srv addr add 2013::1/64 dev tunv6
      
      ip link set dev rt_srv_veth netns rt
      ip -n rt link set dev rt_srv_veth up
      ip -n rt addr add dev rt_srv_veth 2002::2/64
      
      ip netns exec srv netserver & sleep 0.1
      ip netns exec cl ping6 -c 4 2013::1
      ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1
      ip -n rt link set dev rt_srv_veth mtu 1400
      wait %2
      
      ip -n cl link del cl_veth
      
      This commit addresses the issue purging all the references held by the
      exception at time, as we currently do for e.g. ipv6 pcpu dst entries.
      
      v1 -> v2:
       - re-order the code to avoid accessing dst and net after dst_dev_put()
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5b51fe8
    • D
      Merge branch 'nic-thunderx-fix-communication-races-between-VF-PF' · aaaf5985
      David S. Miller 提交于
      Vadim Lomovtsev says:
      
      ====================
      nic: thunderx: fix communication races between VF & PF
      
      The ThunderX CN88XX NIC Virtual Function driver uses mailbox interface
      to communicate to physical function driver. Each of VF has it's own pair
      of mailbox registers to read from and write to. The mailbox registers
      has no protection from possible races, so it has to be implemented
      at software side.
      
      After long term testing by loop of 'ip link set <ifname> up/down'
      command it was found that there are two possible scenarios when
      race condition appears:
       1. VF receives link change message from PF and VF send RX mode
      configuration message to PF in the same time from separate thread.
       2. PF receives RX mode configuration from VF and in the same time,
      in separate thread PF detects link status change and sends appropriate
      message to particular VF.
      
      Both cases leads to mailbox data to be rewritten, NIC VF messaging control
      data to be updated incorrectly and communication sequence gets broken.
      
      This patch series is to address race condition with VF & PF communication.
      
      Changes:
      v1 -> v2
       - 0000: correct typo in cover letter subject: 'betwen' -> 'between';
       - move link state polling request task from pf to vf
         instead of cheking status of mailbox irq;
      v2 -> v3
       - 0003: change return type of nicvf_send_cfg_done() function
         from int to void;
       - 0007: update subject and remove unused variable 'netdev'
         from nicvf_link_status_check_task() function;
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aaaf5985
    • V
      net: thunderx: remove link change polling code and info from nicpf · 2e1c3fff
      Vadim Lomovtsev 提交于
      Since link change polling routine was moved to nicvf side,
      we don't need anymore polling function at nicpf side along
      with link status info for all enabled Vfs as at VF side
      this info is already tracked.
      
      This commit is to remove unnecessary code & fields from
      nicpf structure.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e1c3fff
    • V
      net: thunderx: move link state polling function to VF · 2c632ad8
      Vadim Lomovtsev 提交于
      Move the link change polling task to VF side in order to
      prevent races between VF and PF while sending link change
      message(s). This commit is to implement link change request
      to be initiated by VF.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c632ad8
    • V
      net: thunderx: add mutex to protect mailbox from concurrent calls for same VF · 609ea65c
      Vadim Lomovtsev 提交于
      In some cases it could happen that nicvf_send_msg_to_pf() could be called
      concurrently for the same NIC VF, and thus re-writing mailbox contents and
      breaking messaging sequence with PF by re-writing NICVF data.
      
      This commit is to implement mutex for NICVF to protect mailbox registers
      and NICVF messaging control data from concurrent access.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      609ea65c
    • V
      net: thunderx: rework xcast message structure to make it fit into 64 bit · 53544396
      Vadim Lomovtsev 提交于
      To communicate to PF each of ThunderX NIC VF uses mailbox which is
      pair of 64 bit registers available to both VFn and PF.
      
      This commit is to change the xcast message structure in order to
      fit it into 64 bit.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53544396
    • V
      net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task · 7db730d9
      Vadim Lomovtsev 提交于
      The rx_set_mode invokes number of messages to be send to PF for receive
      mode configuration. In case if there any issues we need to stop sending
      messages and release allocated memory.
      
      This commit is to implement check of nicvf_msg_send_to_pf() result.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7db730d9
    • V
      net: thunderx: make CFG_DONE message to run through generic send-ack sequence · 0dd563b9
      Vadim Lomovtsev 提交于
      At the end of NIC VF initialization VF sends CFG_DONE message to PF without
      using nicvf_msg_send_to_pf routine. This potentially could re-write data in
      mailbox. This commit is to implement common way of sending CFG_DONE message
      by the same way with other configuration messages by using
      nicvf_send_msg_to_pf() routine.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dd563b9
    • V
      net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them. · 2ecbe4f4
      Vadim Lomovtsev 提交于
      Having one work queue for receive mode configuration ndo_set_rx_mode()
      call for all VFs results in making each of them wait till the
      set_rx_mode() call completes for another VF if any of close, set
      receive mode and change flags calls being already invoked. Potentially
      this could cause device state change before appropriate call of receive
      mode configuration completes, so the call itself became meaningless,
      corrupt data or break configuration sequence.
      
      We don't need any delays in NIC VF configuration sequence so having delayed
      work call with 0 delay has no sense.
      
      This commit is to implement one work queue for each NIC VF for set_rx_mode
      task and to let them work independently and replacing delayed_work
      with work_struct.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ecbe4f4
    • V
      net: thunderx: correct typo in macro name · f6d25aca
      Vadim Lomovtsev 提交于
      Correct STREERING to STEERING at macro name for BGX steering register.
      Signed-off-by: NVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6d25aca
    • L
      net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version · efcc9bca
      Lorenzo Bianconi 提交于
      Fix a possible NULL pointer dereference in ip6erspan_set_version checking
      nlattr data pointer
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7549 Comm: syz-executor432 Not tainted 5.0.0-rc6-next-20190218
      #37
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6erspan_newlink+0x66/0x7b0 net/ipv6/ip6_gre.c:2210
        __rtnl_newlink+0x107b/0x16c0 net/core/rtnetlink.c:3176
        rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3234
        rtnetlink_rcv_msg+0x465/0xb00 net/core/rtnetlink.c:5192
        netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485
        rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5210
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg+0xdd/0x130 net/socket.c:631
        ___sys_sendmsg+0x806/0x930 net/socket.c:2136
        __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
        __do_sys_sendmsg net/socket.c:2183 [inline]
        __se_sys_sendmsg net/socket.c:2181 [inline]
        __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
        do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440159
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffa69156e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
      RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000001 R09: 00000000004002c8
      R10: 0000000000000011 R11: 0000000000000246 R12: 00000000004019e0
      R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      ---[ end trace 09f8a7d13b4faaa1 ]---
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 4974d5f6 ("net: ip6_gre: initialize erspan_ver just for erspan tunnels")
      Reported-and-tested-by: syzbot+30191cf1057abd3064af@syzkaller.appspotmail.com
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efcc9bca
    • G
      team: use operstate consistently for linkup · 8c7a7726
      George Wilkie 提交于
      When a port is added to a team, its initial state is derived
      from netif_carrier_ok rather than netif_oper_up.
      If it is carrier up but operationally down at the time of being
      added, the port state.linkup will be set prematurely.
      port state.linkup should be set consistently using
      netif_oper_up rather than netif_carrier_ok.
      
      Fixes: f1d22a1e ("team: account for oper state")
      Signed-off-by: NGeorge Wilkie <gwilkie@vyatta.att-mail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c7a7726
    • D
      r8152: Fix an error on RTL8153-BD MAC Address Passthrough support · c286909f
      David Chen 提交于
      RTL8153-BD is used in Dell DA300 type-C dongle.
      Added RTL8153-BD support to activate MAC address pass through on DA300.
      Apply correction on previously submitted patch in net.git tree.
      Signed-off-by: NDavid Chen <david.chen7@dell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c286909f
    • D
      ipvlan: disallow userns cap_net_admin to change global mode/flags · 7cc9f700
      Daniel Borkmann 提交于
      When running Docker with userns isolation e.g. --userns-remap="default"
      and spawning up some containers with CAP_NET_ADMIN under this realm, I
      noticed that link changes on ipvlan slave device inside that container
      can affect all devices from this ipvlan group which are in other net
      namespaces where the container should have no permission to make changes
      to, such as the init netns, for example.
      
      This effectively allows to undo ipvlan private mode and switch globally to
      bridge mode where slaves can communicate directly without going through
      hostns, or it allows to switch between global operation mode (l2/l3/l3s)
      for everyone bound to the given ipvlan master device. libnetwork plugin
      here is creating an ipvlan master and ipvlan slave in hostns and a slave
      each that is moved into the container's netns upon creation event.
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
           link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
           ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
           inet 10.41.0.1/32 scope link cilium_host
             valid_lft forever preferred_lft forever
        [...]
      
      * Spawn container & change ipvlan mode setting inside of it:
      
        # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
        9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
        # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns (mode switched to l2):
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      Same l3 -> l2 switch would also happen by creating another slave inside
      the container's network namespace when specifying the existing cilium0
      link to derive the actual (bond0) master:
      
        # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      One way to mitigate it is to check CAP_NET_ADMIN permissions of
      the ipvlan master device's ns, and only then allow to change
      mode or flags for all devices bound to it. Above two cases are
      then disallowed after the patch.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cc9f700
    • M
      sctp: don't compare hb_timer expire date before starting it · d1f20c03
      Maciej Kwiecien 提交于
      hb_timer might not start at all for a particular transport because its
      start is conditional. In a result a node is not sending heartbeats.
      
      Function sctp_transport_reset_hb_timer has two roles:
          - initial start of hb_timer for a given transport,
          - update expire date of hb_timer for a given transport.
      The function is optimized to update timer's expire only if it is before
      a new calculated one but this comparison is invalid for a timer which
      has not yet started. Such a timer has expire == 0 and if a new expire
      value is bigger than (MAX_JIFFIES / 2 + 2) then "time_before" macro will
      fail and timer will not start resulting in no heartbeat packets send by
      the node.
      
      This was found when association was initialized within first 5 mins
      after system boot due to jiffies init value which is near to MAX_JIFFIES.
      
      Test kernel version: 4.9.154 (ARCH=arm)
      hb_timer.expire = 0;                //initialized, not started timer
      new_expire = MAX_JIFFIES / 2 + 2;   //or more
      time_before(hb_timer.expire, new_expire) == false
      
      Fixes: ba6f5e33 ("sctp: avoid refreshing heartbeat timer too often")
      Reported-by: NMarcin Stojek <marcin.stojek@nokia.com>
      Tested-by: NMarcin Stojek <marcin.stojek@nokia.com>
      Signed-off-by: NMaciej Kwiecien <maciej.kwiecien@nokia.com>
      Reviewed-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1f20c03
  3. 22 2月, 2019 2 次提交
    • A
      bpf, lpm: fix lookup bug in map_delete_elem · 7c0cdf0b
      Alban Crequy 提交于
      trie_delete_elem() was deleting an entry even though it was not matching
      if the prefixlen was correct. This patch adds a check on matchlen.
      
      Reproducer:
      
      $ sudo bpftool map create /sys/fs/bpf/mylpm type lpm_trie key 8 value 1 entries 128 name mylpm flags 1
      $ sudo bpftool map update pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 aa bb cc dd value hex 01
      $ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
      key: 10 00 00 00 aa bb cc dd  value: 01
      Found 1 element
      $ sudo bpftool map delete pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 ff ff ff ff
      $ echo $?
      0
      $ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
      Found 0 elements
      
      A similar reproducer is added in the selftests.
      
      Without the patch:
      
      $ sudo ./tools/testing/selftests/bpf/test_lpm_map
      test_lpm_map: test_lpm_map.c:485: test_lpm_delete: Assertion `bpf_map_delete_elem(map_fd, key) == -1 && errno == ENOENT' failed.
      Aborted
      
      With the patch: test_lpm_map runs without errors.
      
      Fixes: e454cf59 ("bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE")
      Cc: Craig Gallek <kraig@google.com>
      Signed-off-by: NAlban Crequy <alban@kinvolk.io>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7c0cdf0b
    • F
      mac80211: allocate tailroom for forwarded mesh packets · 51d0af22
      Felix Fietkau 提交于
      Forwarded packets enter the tx path through ieee80211_add_pending_skb,
      which skips the ieee80211_skb_resize call.
      Fixes WARN_ON in ccmp_encrypt_skb and resulting packet loss.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      51d0af22