1. 04 9月, 2018 10 次提交
    • M
      bnxt_en: Clean up unused functions. · ad95c27b
      Michael Chan 提交于
      Remove unused bnxt_subtract_ulp_resources().  Change
      bnxt_get_max_func_irqs() to static since it is only locally used.
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad95c27b
    • M
      bnxt_en: Fix firmware signaled resource change logic in open. · 6b95c3e9
      Michael Chan 提交于
      When the driver detects that resources have changed during open, it
      should reset the rx and tx rings to 0.  This will properly setup the
      init sequence to initialize the default rings again.  We also need
      to signal the RDMA driver to stop and clear its interrupts.  We then
      call the RoCE driver to restart if a new set of default rings is
      successfully reserved.
      
      Fixes: 25e1acd6 ("bnxt_en: Notify firmware about IF state changes.")
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b95c3e9
    • D
      Merge branch 'sctp-two-fixes-for-spp_ipv6_flowlabel-and-spp_dscp-sockopts' · 6570aa1d
      David S. Miller 提交于
      Xin Long says:
      
      ====================
      sctp: two fixes for spp_ipv6_flowlabel and spp_dscp sockopts
      
      This patchset fixes two problems in sctp_apply_peer_addr_params()
      when setting spp_ipv6_flowlabel or spp_dscp.
      ====================
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6570aa1d
    • X
      sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel · 741880e1
      Xin Long 提交于
      When users set params.spp_address and get a trans, ipv6_flowlabel flag
      should be applied into this trans. But even if this one is not an ipv6
      trans, it should not go to apply it into all other transes of the asoc
      but simply ignore it.
      
      Fixes: 0b0dce7a ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      741880e1
    • X
      sctp: fix invalid reference to the index variable of the iterator · af8a2b8b
      Xin Long 提交于
      Now in sctp_apply_peer_addr_params(), if SPP_IPV6_FLOWLABEL flag is set
      and trans is NULL, it would use trans as the index variable to traverse
      transport_addr_list, then trans is set as the last transport of it.
      
      Later, if SPP_DSCP flag is set, it would enter into the wrong branch as
      trans is actually an invalid reference.
      
      So fix it by using a new index variable to traverse transport_addr_list
      for both SPP_DSCP and SPP_IPV6_FLOWLABEL flags process.
      
      Fixes: 0b0dce7a ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
      Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af8a2b8b
    • I
      net/ibm/emac: wrong emac_calc_base call was used by typo · bf68066f
      Ivan Mikhaylov 提交于
      __emac_calc_base_mr1 was used instead of __emac4_calc_base_mr1
      by copy-paste mistake for emac4syn.
      
      Fixes: 45d6e545 ("net/ibm/emac: add 8192 rx/tx fifo size")
      Signed-off-by: NIvan Mikhaylov <ivan@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf68066f
    • V
      net: sched: null actions array pointer before releasing action · c10bbfae
      Vlad Buslov 提交于
      Currently, tcf_action_delete() nulls actions array pointer after putting
      and deleting it. However, if tcf_idr_delete_index() returns an error,
      pointer to action is not set to null. That results it being released second
      time in error handling code of tca_action_gd().
      
      Kasan error:
      
      [  807.367755] ==================================================================
      [  807.375844] BUG: KASAN: use-after-free in tc_setup_cb_call+0x14e/0x250
      [  807.382763] Read of size 8 at addr ffff88033e636000 by task tc/2732
      
      [  807.391289] CPU: 0 PID: 2732 Comm: tc Tainted: G        W         4.19.0-rc1+ #799
      [  807.399542] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  807.407948] Call Trace:
      [  807.410763]  dump_stack+0x92/0xeb
      [  807.414456]  print_address_description+0x70/0x360
      [  807.419549]  kasan_report+0x14d/0x300
      [  807.423582]  ? tc_setup_cb_call+0x14e/0x250
      [  807.428150]  tc_setup_cb_call+0x14e/0x250
      [  807.432539]  ? nla_put+0x65/0xe0
      [  807.436146]  fl_dump+0x394/0x3f0 [cls_flower]
      [  807.440890]  ? fl_tmplt_dump+0x140/0x140 [cls_flower]
      [  807.446327]  ? lock_downgrade+0x320/0x320
      [  807.450702]  ? lock_acquire+0xe2/0x220
      [  807.454819]  ? is_bpf_text_address+0x5/0x140
      [  807.459475]  ? memcpy+0x34/0x50
      [  807.462980]  ? nla_put+0x65/0xe0
      [  807.466582]  tcf_fill_node+0x341/0x430
      [  807.470717]  ? tcf_block_put+0xe0/0xe0
      [  807.474859]  tcf_node_dump+0xdb/0xf0
      [  807.478821]  fl_walk+0x8e/0x170 [cls_flower]
      [  807.483474]  tcf_chain_dump+0x35a/0x4d0
      [  807.487703]  ? tfilter_notify+0x170/0x170
      [  807.492091]  ? tcf_fill_node+0x430/0x430
      [  807.496411]  tc_dump_tfilter+0x362/0x3f0
      [  807.500712]  ? tc_del_tfilter+0x850/0x850
      [  807.505104]  ? kasan_unpoison_shadow+0x30/0x40
      [  807.509940]  ? __mutex_unlock_slowpath+0xcf/0x410
      [  807.515031]  netlink_dump+0x263/0x4f0
      [  807.519077]  __netlink_dump_start+0x2a0/0x300
      [  807.523817]  ? tc_del_tfilter+0x850/0x850
      [  807.528198]  rtnetlink_rcv_msg+0x46a/0x6d0
      [  807.532671]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  807.536878]  ? tc_del_tfilter+0x850/0x850
      [  807.541280]  netlink_rcv_skb+0x18d/0x200
      [  807.545570]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  807.549773]  ? netlink_ack+0x500/0x500
      [  807.553913]  netlink_unicast+0x2d0/0x370
      [  807.558212]  ? netlink_attachskb+0x340/0x340
      [  807.562855]  ? _copy_from_iter_full+0xe9/0x3e0
      [  807.567677]  ? import_iovec+0x11e/0x1c0
      [  807.571890]  netlink_sendmsg+0x3b9/0x6a0
      [  807.576192]  ? netlink_unicast+0x370/0x370
      [  807.580684]  ? netlink_unicast+0x370/0x370
      [  807.585154]  sock_sendmsg+0x6b/0x80
      [  807.589015]  ___sys_sendmsg+0x4a1/0x520
      [  807.593230]  ? copy_msghdr_from_user+0x210/0x210
      [  807.598232]  ? do_wp_page+0x174/0x880
      [  807.602276]  ? __handle_mm_fault+0x749/0x1c10
      [  807.607021]  ? __handle_mm_fault+0x1046/0x1c10
      [  807.611849]  ? __pmd_alloc+0x320/0x320
      [  807.615973]  ? check_chain_key+0x140/0x1f0
      [  807.620450]  ? check_chain_key+0x140/0x1f0
      [  807.624929]  ? __fget_light+0xbc/0xd0
      [  807.628970]  ? __sys_sendmsg+0xd7/0x150
      [  807.633172]  __sys_sendmsg+0xd7/0x150
      [  807.637201]  ? __ia32_sys_shutdown+0x30/0x30
      [  807.641846]  ? up_read+0x53/0x90
      [  807.645442]  ? __do_page_fault+0x484/0x780
      [  807.649949]  ? do_syscall_64+0x1e/0x2c0
      [  807.654164]  do_syscall_64+0x72/0x2c0
      [  807.658198]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  807.663625] RIP: 0033:0x7f42e9870150
      [  807.667568] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
      [  807.687328] RSP: 002b:00007ffdbf595b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  807.695564] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f42e9870150
      [  807.703083] RDX: 0000000000000000 RSI: 00007ffdbf595b80 RDI: 0000000000000003
      [  807.710605] RBP: 00007ffdbf599d90 R08: 0000000000679bc0 R09: 000000000000000f
      [  807.718127] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdbf599d88
      [  807.725651] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      [  807.735048] Allocated by task 2687:
      [  807.738902]  kasan_kmalloc+0xa0/0xd0
      [  807.742852]  __kmalloc+0x118/0x2d0
      [  807.746615]  tcf_idr_create+0x44/0x320
      [  807.750738]  tcf_nat_init+0x41e/0x530 [act_nat]
      [  807.755638]  tcf_action_init_1+0x4e0/0x650
      [  807.760104]  tcf_action_init+0x1ce/0x2d0
      [  807.764395]  tcf_exts_validate+0x1d8/0x200
      [  807.768861]  fl_change+0x55a/0x26b4 [cls_flower]
      [  807.773845]  tc_new_tfilter+0x748/0xa20
      [  807.778051]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  807.782517]  netlink_rcv_skb+0x18d/0x200
      [  807.786804]  netlink_unicast+0x2d0/0x370
      [  807.791095]  netlink_sendmsg+0x3b9/0x6a0
      [  807.795387]  sock_sendmsg+0x6b/0x80
      [  807.799240]  ___sys_sendmsg+0x4a1/0x520
      [  807.803445]  __sys_sendmsg+0xd7/0x150
      [  807.807473]  do_syscall_64+0x72/0x2c0
      [  807.811506]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  807.818776] Freed by task 2728:
      [  807.822283]  __kasan_slab_free+0x122/0x180
      [  807.826752]  kfree+0xf4/0x2f0
      [  807.830080]  __tcf_action_put+0x5a/0xb0
      [  807.834281]  tcf_action_put_many+0x46/0x70
      [  807.838747]  tca_action_gd+0x232/0xc40
      [  807.842862]  tc_ctl_action+0x215/0x230
      [  807.846977]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  807.851444]  netlink_rcv_skb+0x18d/0x200
      [  807.855731]  netlink_unicast+0x2d0/0x370
      [  807.860021]  netlink_sendmsg+0x3b9/0x6a0
      [  807.864312]  sock_sendmsg+0x6b/0x80
      [  807.868166]  ___sys_sendmsg+0x4a1/0x520
      [  807.872372]  __sys_sendmsg+0xd7/0x150
      [  807.876401]  do_syscall_64+0x72/0x2c0
      [  807.880431]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  807.887704] The buggy address belongs to the object at ffff88033e636000
                      which belongs to the cache kmalloc-256 of size 256
      [  807.900909] The buggy address is located 0 bytes inside of
                      256-byte region [ffff88033e636000, ffff88033e636100)
      [  807.913155] The buggy address belongs to the page:
      [  807.918322] page:ffffea000cf98d80 count:1 mapcount:0 mapping:ffff88036f80ee00 index:0x0 compound_mapcount: 0
      [  807.928831] flags: 0x5fff8000008100(slab|head)
      [  807.933647] raw: 005fff8000008100 ffffea000db44f00 0000000400000004 ffff88036f80ee00
      [  807.942050] raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
      [  807.950456] page dumped because: kasan: bad access detected
      
      [  807.958240] Memory state around the buggy address:
      [  807.963405]  ffff88033e635f00: fc fc fc fc fb fb fb fb fb fb fb fc fc fc fc fb
      [  807.971288]  ffff88033e635f80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      [  807.979166] >ffff88033e636000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  807.994882]                    ^
      [  807.998477]  ffff88033e636080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  808.006352]  ffff88033e636100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      [  808.014230] ==================================================================
      [  808.022108] Disabling lock debugging due to kernel taint
      
      Fixes: edfaf94f ("net_sched: improve and refactor tcf_action_put_many()")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c10bbfae
    • G
      vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition · c48300c9
      Gleb Fotengauer-Malinovskiy 提交于
      The _IOC_READ flag fits this ioctl request more because this request
      actually only writes to, but doesn't read from userspace.
      See NOTEs in include/uapi/asm-generic/ioctl.h for more information.
      
      Fixes: 429711ae ("vhost: switch to use new message format")
      Signed-off-by: NGleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c48300c9
    • A
      r8169: add support for NCube 8168 network card · 9fd0e09a
      Anthony Wong 提交于
      This card identifies itself as:
        Ethernet controller [0200]: NCube Device [10ff:8168] (rev 06)
        Subsystem: TP-LINK Technologies Co., Ltd. Device [7470:3468]
      
      Adding a new entry to rtl8169_pci_tbl makes the card work.
      
      Link: http://launchpad.net/bugs/1788730Signed-off-by: NAnthony Wong <anthony.wong@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fd0e09a
    • H
      ip6_tunnel: respect ttl inherit for ip6tnl · 36feaac3
      Hangbin Liu 提交于
      man ip-tunnel ttl section says:
      0 is a special value meaning that packets inherit the TTL value.
      
      IPv4 tunnel respect this in ip_tunnel_xmit(), but IPv6 tunnel has not
      implement it yet. To make IPv6 behave consistently with IP tunnel,
      add ipv6 tunnel inherit support.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36feaac3
  2. 03 9月, 2018 12 次提交
    • V
      uapi: Fix linux/rds.h userspace compilation errors. · 59a03fea
      Vinson Lee 提交于
      Include linux/in6.h for struct in6_addr.
      
      /usr/include/linux/rds.h:156:18: error: field ‘laddr’ has incomplete type
        struct in6_addr laddr;
                        ^~~~~
      /usr/include/linux/rds.h:157:18: error: field ‘faddr’ has incomplete type
        struct in6_addr faddr;
                        ^~~~~
      /usr/include/linux/rds.h:178:18: error: field ‘laddr’ has incomplete type
        struct in6_addr laddr;
                        ^~~~~
      /usr/include/linux/rds.h:179:18: error: field ‘faddr’ has incomplete type
        struct in6_addr faddr;
                        ^~~~~
      /usr/include/linux/rds.h:198:18: error: field ‘bound_addr’ has incomplete type
        struct in6_addr bound_addr;
                        ^~~~~~~~~~
      /usr/include/linux/rds.h:199:18: error: field ‘connected_addr’ has incomplete type
        struct in6_addr connected_addr;
                        ^~~~~~~~~~~~~~
      /usr/include/linux/rds.h:219:18: error: field ‘local_addr’ has incomplete type
        struct in6_addr local_addr;
                        ^~~~~~~~~~
      /usr/include/linux/rds.h:221:18: error: field ‘peer_addr’ has incomplete type
        struct in6_addr peer_addr;
                        ^~~~~~~~~
      /usr/include/linux/rds.h:245:18: error: field ‘src_addr’ has incomplete type
        struct in6_addr src_addr;
                        ^~~~~~~~
      /usr/include/linux/rds.h:246:18: error: field ‘dst_addr’ has incomplete type
        struct in6_addr dst_addr;
                        ^~~~~~~~
      
      Fixes: b7ff8b10 ("rds: Extend RDS API for IPv6 support")
      Signed-off-by: NVinson Lee <vlee@freedesktop.org>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59a03fea
    • J
      net: cadence: Fix a sleep-in-atomic-context bug in macb_halt_tx() · 16fe10cf
      Jia-Ju Bai 提交于
      The kernel module may sleep with holding a spinlock.
      
      The function call paths (from bottom to top) in Linux-4.16 are:
      
      [FUNC] usleep_range
      drivers/net/ethernet/cadence/macb_main.c, 648:
      	usleep_range in macb_halt_tx
      drivers/net/ethernet/cadence/macb_main.c, 730:
      	macb_halt_tx in macb_tx_error_task
      drivers/net/ethernet/cadence/macb_main.c, 721:
      	_raw_spin_lock_irqsave in macb_tx_error_task
      
      To fix this bug, usleep_range() is replaced with udelay().
      
      This bug is found by my static analysis tool DSAC.
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16fe10cf
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a80afe89
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-09-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix one remaining buggy offset override in sockmap's bpf_msg_pull_data()
         when linearizing multiple scatterlist elements, from Tushar.
      
      2) Fix BPF sockmap's misuse of ULP when a collision with another ULP is
         found on map update where it would release existing ULP. syzbot found and
         triggered this couple of times now, fix from John.
      
      3) Add missing xskmap type to bpftool so it will properly show the type
         on map dump, from Prashant.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a80afe89
    • D
      net/ipv6: Only update MTU metric if it set · 15a81b41
      David Ahern 提交于
      Jan reported a regression after an update to 4.18.5. In this case ipv6
      default route is setup by systemd-networkd based on data from an RA. The
      RA contains an MTU of 1492 which is used when the route is first inserted
      but then systemd-networkd pushes down updates to the default route
      without the mtu set.
      
      Prior to the change to fib6_info, metrics such as MTU were held in the
      dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if
      any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not,
      the value from the metrics is used if it is set and finally falling
      back to the idev value.
      
      After the fib6_info change metrics are contained in the fib6_info struct
      and there is no equivalent to rt6i_pmtu. To maintain consistency with
      the old behavior the new code should only reset the MTU in the metrics
      if the route update has it set.
      
      Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info")
      Reported-by: NJan Janssen <medhefgo@web.de>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15a81b41
    • T
      net: ethernet: cpsw-phy-sel: prefer phandle for phy sel · 18eb8aea
      Tony Lindgren 提交于
      The cpsw-phy-sel device is not a child of the cpsw interconnect target
      module. It lives in the system control module.
      
      Let's fix this issue by trying to use cpsw-phy-sel phandle first if it
      exists and if not fall back to current usage of trying to find the
      cpsw-phy-sel child. That way the phy sel driver can be a child of the
      system control module where it belongs in the device tree.
      
      Without this fix, we cannot have a proper interconnect target module
      hierarchy in device tree for things like genpd.
      
      Note that deferred probe is mostly not supported by cpsw and this patch
      does not attempt to fix that. In case deferred probe support is needed,
      this could be added to cpsw_slave_open() and phy_connect() so they start
      handling and returning errors.
      
      For documenting it, looks like the cpsw-phy-sel is used for all cpsw device
      tree nodes. It's missing the related binding documentation, so let's also
      update the binding documentation accordingly.
      
      Cc: devicetree@vger.kernel.org
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18eb8aea
    • T
      dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle · 10d7fac4
      Tony Lindgren 提交于
      The current cpsw usage for cpsw-phy-sel is undocumented but is used for
      all the boards using cpsw. And cpsw-phy-sel is not really a child of
      the cpsw device, it lives in the system control module instead.
      
      Let's document the existing usage, and improve it a bit where we prefer
      to use a phandle instead of a child device for it. That way we can
      properly describe the hardware in dts files for things like genpd.
      
      Cc: devicetree@vger.kernel.org
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10d7fac4
    • D
      Merge branch 'igmp-fix-two-incorrect-unsolicit-report-count-issues' · c60e06c3
      David S. Miller 提交于
      Hangbin Liu says:
      
      ====================
      igmp: fix two incorrect unsolicit report count issues
      
      Just like the subject, fix two minor igmp unsolicit report count issues.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c60e06c3
    • H
      igmp: fix incorrect unsolicit report count after link down and up · ff06525f
      Hangbin Liu 提交于
      After link down and up, i.e. when call ip_mc_up(), we doesn't init
      im->unsolicit_count. So after igmp_timer_expire(), we will not start
      timer again and only send one unsolicit report at last.
      
      Fix it by initializing im->unsolicit_count in igmp_group_added(), so
      we can respect igmp robustness value.
      
      Fixes: 24803f38 ("igmp: do not remove igmp souce list info when set link down")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff06525f
    • H
      igmp: fix incorrect unsolicit report count when join group · 4fb7253e
      Hangbin Liu 提交于
      We should not start timer if im->unsolicit_count equal to 0 after decrease.
      Or we will send one more unsolicit report message. i.e. 3 instead of 2 by
      default.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fb7253e
    • J
      bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP · 597222f7
      John Fastabend 提交于
      Currently we check sk_user_data is non NULL to determine if the sk
      exists in a map. However, this is not sufficient to ensure the psock
      or the ULP ops are not in use by another user, such as kcm or TLS. To
      avoid this when adding a sock to a map also verify it is of the
      correct ULP type. Additionally, when releasing a psock verify that
      it is the TCP_ULP_BPF type before releasing the ULP. The error case
      where we abort an update due to ULP collision can cause this error
      path.
      
      For example,
      
        __sock_map_ctx_update_elem()
           [...]
           err = tcp_set_ulp_id(sock, TCP_ULP_BPF) <- collides with TLS
           if (err)                                <- so err out here
              goto out_free
           [...]
        out_free:
           smap_release_sock() <- calling tcp_cleanup_ulp releases the
                                  TLS ULP incorrectly.
      
      Fixes: 2f857d04 ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      597222f7
    • P
      tools/bpf: bpftool, add xskmap in map types · 97911e0c
      Prashant Bhole 提交于
      When listed all maps, bpftool currently shows (null) for xskmap.
      Added xskmap type in map_type_name[] to show correct type.
      Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      97911e0c
    • T
      bpf: Fix bpf_msg_pull_data() · 9db39f4d
      Tushar Dave 提交于
      Helper bpf_msg_pull_data() mistakenly reuses variable 'offset' while
      linearizing multiple scatterlist elements. Variable 'offset' is used
      to find first starting scatterlist element
          i.e. msg->data = sg_virt(&sg[first_sg]) + start - offset"
      
      Use different variable name while linearizing multiple scatterlist
      elements so that value contained in variable 'offset' won't get
      overwritten.
      
      Fixes: 015632bb ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
      Signed-off-by: NTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9db39f4d
  3. 02 9月, 2018 1 次提交
    • A
      ipv6: don't get lwtstate twice in ip6_rt_copy_init() · 93bbadd6
      Alexey Kodanev 提交于
      Commit 80f1a0f4 ("net/ipv6: Put lwtstate when destroying fib6_info")
      partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
      with ip6_rt_copy_init(), and it should be done only once there.
      
      rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
      ip6_rt_copy_init(), so there is no need to get it again at the end.
      
      With this patch, lwtstate also isn't copied from RTF_REJECT routes.
      
      [1]:
      unreferenced object 0xffff880b6aaa14e0 (size 64):
        comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
        hex dump (first 32 bytes):
          01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
          [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
          [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
          [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
          [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
          [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
          [<000000004c893c76>] netlink_unicast+0x417/0x5a0
          [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
          [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
          [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
          [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
          [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
          [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000006d21f353>] 0xffffffffffffffff
      
      Fixes: 6edb3c96 ("net/ipv6: Defer initialization of dst to data path")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93bbadd6
  4. 01 9月, 2018 8 次提交
    • T
      ibmvnic: Include missing return code checks in reset function · f611a5b4
      Thomas Falcon 提交于
      Check the return codes of these functions and halt reset
      in case of failure. The driver will remain in a dormant state
      until the next reset event, when device initialization will be
      re-attempted.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f611a5b4
    • S
      selftests: pmtu: detect correct binary to ping ipv6 addresses · c81c7012
      Sabrina Dubroca 提交于
      Some systems don't have the ping6 binary anymore, and use ping for
      everything. Detect the absence of ping6 and try to use ping instead.
      
      Fixes: d1f1b9cb ("selftests: net: Introduce first PMTU test")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c81c7012
    • S
      selftests: pmtu: maximum MTU for vti4 is 2^16-1-20 · 902b5417
      Sabrina Dubroca 提交于
      Since commit 82612de1 ("ip_tunnel: restore binding to ifaces with a
      large mtu"), the maximum MTU for vti4 is based on IP_MAX_MTU instead of
      the mysterious constant 0xFFF8.  This makes this selftest fail.
      
      Fixes: 82612de1 ("ip_tunnel: restore binding to ifaces with a large mtu")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      902b5417
    • F
      tcp: do not restart timewait timer on rst reception · 63cc357f
      Florian Westphal 提交于
      RFC 1337 says:
       ''Ignore RST segments in TIME-WAIT state.
         If the 2 minute MSL is enforced, this fix avoids all three hazards.''
      
      So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
      expire rather than removing it instantly when a reset is received.
      
      However, Linux will also re-start the TIME-WAIT timer.
      
      This causes connect to fail when tying to re-use ports or very long
      delays (until syn retry interval exceeds MSL).
      
      packetdrill test case:
      // Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
      `sysctl net.ipv4.tcp_rfc1337=1`
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      // Receive first segment
      0.310 < P. 1:1001(1000) ack 1 win 46
      
      // Send one ACK
      0.310 > . 1:1(0) ack 1001
      
      // read 1000 byte
      0.310 read(4, ..., 1000) = 1000
      
      // Application writes 100 bytes
      0.350 write(4, ..., 100) = 100
      0.350 > P. 1:101(100) ack 1001
      
      // ACK
      0.500 < . 1001:1001(0) ack 101 win 257
      
      // close the connection
      0.600 close(4) = 0
      0.600 > F. 101:101(0) ack 1001 win 244
      
      // Our side is in FIN_WAIT_1 & waits for ack to fin
      0.7 < . 1001:1001(0) ack 102 win 244
      
      // Our side is in FIN_WAIT_2 with no outstanding data.
      0.8 < F. 1001:1001(0) ack 102 win 244
      0.8 > . 102:102(0) ack 1002 win 244
      
      // Our side is now in TIME_WAIT state, send ack for fin.
      0.9 < F. 1002:1002(0) ack 102 win 244
      0.9 > . 102:102(0) ack 1002 win 244
      
      // Peer reopens with in-window SYN:
      1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      
      // Therefore, reply with ACK.
      1.000 > . 102:102(0) ack 1002 win 244
      
      // Peer sends RST for this ACK.  Normally this RST results
      // in tw socket removal, but rfc1337=1 setting prevents this.
      1.100 < R 1002:1002(0) win 244
      
      // second syn. Due to rfc1337=1 expect another pure ACK.
      31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      31.0 > . 102:102(0) ack 1002 win 244
      
      // .. and another RST from peer.
      31.1 < R 1002:1002(0) win 244
      31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`
      
      // third syn after one minute.  Time-Wait socket should have expired by now.
      63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      
      // so we expect a syn-ack & 3whs to proceed from here on.
      63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      
      Without this patch, 'ss' shows restarts of tw timer and last packet is
      thus just another pure ack, more than one minute later.
      
      This restores the original code from commit 283fd6cf0be690a83
      ("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .
      
      For some reason the else branch was removed/lost in 1f28b683339f7
      ("Merge in TCP/UDP optimizations and [..]") and timer restart became
      unconditional.
      Reported-by: NMichal Tesar <mtesar@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63cc357f
    • P
      net/rds: RDS is not Radio Data System · b0e0b0ab
      Pavel Machek 提交于
      Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is
      easily confused with Radio Data System (which we may want to support
      in kernel, too).
      Signed-off-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0e0b0ab
    • D
      hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe() · e04e7a7b
      Dexuan Cui 提交于
      This patch fixes the race between netvsc_probe() and
      rndis_set_subchannel(), which can cause a deadlock.
      
      These are the related 3 paths which show the deadlock:
      
      path #1:
          Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
          Call Trace:
           schedule
           schedule_preempt_disabled
           __mutex_lock
           __device_attach
           bus_probe_device
           device_add
           vmbus_device_register
           vmbus_onoffer
           vmbus_onmessage_work
           process_one_work
           worker_thread
           kthread
           ret_from_fork
      
      path #2:
          schedule
           schedule_preempt_disabled
           __mutex_lock
           netvsc_probe
           vmbus_probe
           really_probe
           __driver_attach
           bus_for_each_dev
           driver_attach_async
           async_run_entry_fn
           process_one_work
           worker_thread
           kthread
           ret_from_fork
      
      path #3:
          Workqueue: events netvsc_subchan_work [hv_netvsc]
          Call Trace:
           schedule
           rndis_set_subchannel
           netvsc_subchan_work
           process_one_work
           worker_thread
           kthread
           ret_from_fork
      
      Before path #1 finishes, path #2 can start to run, because just before
      the "bus_probe_device(dev);" in device_add() in path #1, there is a line
      "object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can
      immediately try to load hv_netvsc and hence path #2 can start to run.
      
      Next, path #2 offloads the subchannal's initialization to a workqueue,
      i.e. path #3, so we can end up in a deadlock situation like this:
      
      Path #2 gets the device lock, and is trying to get the rtnl lock;
      Path #3 gets the rtnl lock and is waiting for all the subchannel messages
      to be processed;
      Path #1 is trying to get the device lock, but since #2 is not releasing
      the device lock, path #1 has to sleep; since the VMBus messages are
      processed one by one, this means the sub-channel messages can't be
      procedded, so #3 has to sleep with the rtnl lock held, and finally #2
      has to sleep... Now all the 3 paths are sleeping and we hit the deadlock.
      
      With the patch, we can make sure #2 gets both the device lock and the
      rtnl lock together, gets its job done, and releases the locks, so #1
      and #3 will not be blocked for ever.
      
      Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e04e7a7b
    • J
      nfp: wait for posted reconfigs when disabling the device · 9ad716b9
      Jakub Kicinski 提交于
      To avoid leaking a running timer we need to wait for the
      posted reconfigs after netdev is unregistered.  In common
      case the process of deinitializing the device will perform
      synchronous reconfigs which wait for posted requests, but
      especially with VXLAN ports being actively added and removed
      there can be a race condition leaving a timer running after
      adapter structure is freed leading to a crash.
      
      Add an explicit flush after deregistering and for a good
      measure a warning to check if timer is running just before
      structures are freed.
      
      Fixes: 3d780b92 ("nfp: add async reconfiguration mechanism")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ad716b9
    • E
      Revert "packet: switch kvzalloc to allocate memory" · 3a7ad063
      Eric Dumazet 提交于
      This reverts commit 71e41286.
      
      mmap()/munmap() can not be backed by kmalloced pages :
      
      We fault in :
      
          VM_BUG_ON_PAGE(PageSlab(page), page);
      
          unmap_single_vma+0x8a/0x110
          unmap_vmas+0x4b/0x90
          unmap_region+0xc9/0x140
          do_munmap+0x274/0x360
          vm_munmap+0x81/0xc0
          SyS_munmap+0x2b/0x40
          do_syscall_64+0x13e/0x1c0
          entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: 71e41286 ("packet: switch kvzalloc to allocate memory")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJohn Sperbeck <jsperbeck@google.com>
      Bisected-by: NJohn Sperbeck <jsperbeck@google.com>
      Cc: Zhang Yu <zhangyu31@baidu.com>
      Cc: Li RongQing <lirongqing@baidu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a7ad063
  5. 30 8月, 2018 9 次提交