1. 19 3月, 2019 14 次提交
    • B
      i40e: report correct statistics when XDP is enabled · 090ce34b
      Björn Töpel 提交于
      commit cdec2141c24ef177d929765c5a6f95549c266fb3 upstream.
      
      When XDP is enabled, the driver will report incorrect
      statistics. Received frames will reported as transmitted frames.
      
      This commits fixes the i40e implementation of ndo_get_stats64 (struct
      net_device_ops), so that iproute2 will report correct statistics
      (e.g. when running "ip -stats link show dev eth0") even when XDP is
      enabled.
      Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Fixes: 74608d17 ("i40e: add support for XDP_TX action")
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Emeric Brun <ebrun@haproxy.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      090ce34b
    • M
      bonding: fix PACKET_ORIGDEV regression · 795cb33c
      Michal Soltys 提交于
      [ Upstream commit 3c963a3306eada999be5ebf4f293dfa3d3945487 ]
      
      This patch fixes a subtle PACKET_ORIGDEV regression which was a side
      effect of fixes introduced by:
      
      6a9e461f bonding: pass link-local packets to bonding master also.
      
      ... to:
      
      b89f04c6 bonding: deliver link-local packets with skb->dev set to link that packets arrived on
      
      While 6a9e461f restored pre-b89f04c6 presence of link-local
      packets on bonding masters (which is required e.g. by linux bridges
      participating in spanning tree or needed for lab-like setups created
      with group_fwd_mask) it also caused the originating device
      information to be lost due to cloning.
      
      Maciej Żenczykowski proposed another solution that doesn't require
      packet cloning and retains original device information - instead of
      returning RX_HANDLER_PASS for all link-local packets it's now limited
      only to packets from inactive slaves.
      
      At the same time, packets passed to bonding masters retain correct
      information about the originating device and PACKET_ORIGDEV can be used
      to determine it.
      
      This elegantly solves all issues so far:
      
      - link-local packets that were removed from bonding masters
      - LLDP daemons being forced to explicitly bind to slave interfaces
      - PACKET_ORIGDEV having no effect on bond interfaces
      
      Fixes: 6a9e461f (bonding: pass link-local packets to bonding master also.)
      Reported-by: NVincent Bernat <vincent@bernat.ch>
      Signed-off-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      795cb33c
    • D
      ipvlan: disallow userns cap_net_admin to change global mode/flags · 1856bbbe
      Daniel Borkmann 提交于
      [ Upstream commit 7cc9f7003a969d359f608ebb701d42cafe75b84a ]
      
      When running Docker with userns isolation e.g. --userns-remap="default"
      and spawning up some containers with CAP_NET_ADMIN under this realm, I
      noticed that link changes on ipvlan slave device inside that container
      can affect all devices from this ipvlan group which are in other net
      namespaces where the container should have no permission to make changes
      to, such as the init netns, for example.
      
      This effectively allows to undo ipvlan private mode and switch globally to
      bridge mode where slaves can communicate directly without going through
      hostns, or it allows to switch between global operation mode (l2/l3/l3s)
      for everyone bound to the given ipvlan master device. libnetwork plugin
      here is creating an ipvlan master and ipvlan slave in hostns and a slave
      each that is moved into the container's netns upon creation event.
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
           link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
           ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
           inet 10.41.0.1/32 scope link cilium_host
             valid_lft forever preferred_lft forever
        [...]
      
      * Spawn container & change ipvlan mode setting inside of it:
      
        # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
        9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
        # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns (mode switched to l2):
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      Same l3 -> l2 switch would also happen by creating another slave inside
      the container's network namespace when specifying the existing cilium0
      link to derive the actual (bond0) master:
      
        # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      One way to mitigate it is to check CAP_NET_ADMIN permissions of
      the ipvlan master device's ns, and only then allow to change
      mode or flags for all devices bound to it. Above two cases are
      then disallowed after the patch.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1856bbbe
    • G
      team: use operstate consistently for linkup · e5c31b5a
      George Wilkie 提交于
      [ Upstream commit 8c7a77267eec81dd81af8412f29e50c0b1082548 ]
      
      When a port is added to a team, its initial state is derived
      from netif_carrier_ok rather than netif_oper_up.
      If it is carrier up but operationally down at the time of being
      added, the port state.linkup will be set prematurely.
      port state.linkup should be set consistently using
      netif_oper_up rather than netif_carrier_ok.
      
      Fixes: f1d22a1e ("team: account for oper state")
      Signed-off-by: NGeorge Wilkie <gwilkie@vyatta.att-mail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5c31b5a
    • Y
      mdio_bus: Fix use-after-free on device_register fails · 96a3b144
      YueHaibing 提交于
      [ Upstream commit 6ff7b060535e87c2ae14dd8548512abfdda528fb ]
      
      KASAN has found use-after-free in fixed_mdio_bus_init,
      commit 0c692d07 ("drivers/net/phy/mdio_bus.c: call
      put_device on device_register() failure") call put_device()
      while device_register() fails,give up the last reference
      to the device and allow mdiobus_release to be executed
      ,kfreeing the bus. However in most drives, mdiobus_free
      be called to free the bus while mdiobus_register fails.
      use-after-free occurs when access bus again, this patch
      revert it to let mdiobus_free free the bus.
      
      KASAN report details as below:
      
      BUG: KASAN: use-after-free in mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
      Read of size 4 at addr ffff8881dc824d78 by task syz-executor.0/3524
      
      CPU: 1 PID: 3524 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
       fixed_mdio_bus_init+0x283/0x1000 [fixed_phy]
       ? 0xffffffffc0e40000
       ? 0xffffffffc0e40000
       ? 0xffffffffc0e40000
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f6215c19c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00007f6215c19c70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6215c1a6bc
      R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004
      
      Allocated by task 3524:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       mdiobus_alloc_size+0x54/0x1b0 drivers/net/phy/mdio_bus.c:143
       fixed_mdio_bus_init+0x163/0x1000 [fixed_phy]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 3524:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
       slab_free_hook mm/slub.c:1409 [inline]
       slab_free_freelist_hook mm/slub.c:1436 [inline]
       slab_free mm/slub.c:2986 [inline]
       kfree+0xe1/0x270 mm/slub.c:3938
       device_release+0x78/0x200 drivers/base/core.c:919
       kobject_cleanup lib/kobject.c:662 [inline]
       kobject_release lib/kobject.c:691 [inline]
       kref_put include/linux/kref.h:67 [inline]
       kobject_put+0x146/0x240 lib/kobject.c:708
       put_device+0x1c/0x30 drivers/base/core.c:2060
       __mdiobus_register+0x483/0x560 drivers/net/phy/mdio_bus.c:382
       fixed_mdio_bus_init+0x26b/0x1000 [fixed_phy]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881dc824c80
       which belongs to the cache kmalloc-2k of size 2048
      The buggy address is located 248 bytes inside of
       2048-byte region [ffff8881dc824c80, ffff8881dc825480)
      The buggy address belongs to the page:
      page:ffffea0007720800 count:1 mapcount:0 mapping:ffff8881f6c02800 index:0x0 compound_mapcount: 0
      flags: 0x2fffc0000010200(slab|head)
      raw: 02fffc0000010200 0000000000000000 0000000500000001 ffff8881f6c02800
      raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881dc824c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8881dc824c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8881dc824d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                      ^
       ffff8881dc824d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8881dc824e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 0c692d07 ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96a3b144
    • J
      net/mlx4_core: Fix qp mtt size calculation · c3bdcd9d
      Jack Morgenstein 提交于
      [ Upstream commit 8511a653e9250ef36b95803c375a7be0e2edb628 ]
      
      Calculation of qp mtt size (in function mlx4_RST2INIT_wrapper)
      ultimately depends on function roundup_pow_of_two.
      
      If the amount of memory required by the QP is less than one page,
      roundup_pow_of_two is called with argument zero.  In this case, the
      roundup_pow_of_two result is undefined.
      
      Calling roundup_pow_of_two with a zero argument resulted in the
      following stack trace:
      
      UBSAN: Undefined behaviour in ./include/linux/log2.h:61:13
      shift exponent 64 is too large for 64-bit type 'long unsigned int'
      CPU: 4 PID: 26939 Comm: rping Tainted: G OE 4.19.0-rc1
      Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
      Call Trace:
      dump_stack+0x9a/0xeb
      ubsan_epilogue+0x9/0x7c
      __ubsan_handle_shift_out_of_bounds+0x254/0x29d
      ? __ubsan_handle_load_invalid_value+0x180/0x180
      ? debug_show_all_locks+0x310/0x310
      ? sched_clock+0x5/0x10
      ? sched_clock+0x5/0x10
      ? sched_clock_cpu+0x18/0x260
      ? find_held_lock+0x35/0x1e0
      ? mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]
      mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]
      
      Fix this by explicitly testing for zero, and returning one if the
      argument is zero (assuming that the next higher power of 2 in this case
      should be one).
      
      Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3bdcd9d
    • J
      net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling · c3bcf8cb
      Jack Morgenstein 提交于
      [ Upstream commit c07d27927f2f2e96fcd27bb9fb330c9ea65612d0 ]
      
      In procedures mlx4_cmd_use_events() and mlx4_cmd_use_polling(), we need to
      guarantee that there are no FW commands in progress on the comm channel
      (for VFs) or wrapped FW commands (on the PF) when SRIOV is active.
      
      We do this by also taking the slave_cmd_mutex when SRIOV is active.
      
      This is especially important when switching from event to polling, since we
      free the command-context array during the switch.  If there are FW commands
      in progress (e.g., waiting for a completion event), the completion event
      handler will access freed memory.
      
      Since the decision to use comm_wait or comm_poll is taken before grabbing
      the event_sem/poll_sem in mlx4_comm_cmd_wait/poll, we must take the
      slave_cmd_mutex as well (to guarantee that the decision to use events or
      polling and the call to the appropriate cmd function are atomic).
      
      Fixes: a7e1f049 ("net/mlx4_core: Fix deadlock when switching between polling and event fw commands")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3bcf8cb
    • J
      net/mlx4_core: Fix reset flow when in command polling mode · 1f34d8d2
      Jack Morgenstein 提交于
      [ Upstream commit e15ce4b8d11227007577e6dc1364d288b8874fbe ]
      
      As part of unloading a device, the driver switches from
      FW command event mode to FW command polling mode.
      
      Part of switching over to polling mode is freeing the command context array
      memory (unfortunately, currently, without NULLing the command context array
      pointer).
      
      The reset flow calls "complete" to complete all outstanding fw commands
      (if we are in event mode). The check for event vs. polling mode here
      is to test if the command context array pointer is NULL.
      
      If the reset flow is activated after the switch to polling mode, it will
      attempt (incorrectly) to complete all the commands in the context array --
      because the pointer was not NULLed when the driver switched over to polling
      mode.
      
      As a result, we have a use-after-free situation, which results in a
      kernel crash.
      
      For example:
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
      PGD 0
      Oops: 0000 [#1] SMP
      Modules linked in: netconsole nfsv3 nfs_acl nfs lockd grace ...
      CPU: 2 PID: 940 Comm: kworker/2:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
      Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  04/28/2016
      Workqueue: events hv_eject_device_work [pci_hyperv]
      task: ffff8d1734ca0fd0 ti: ffff8d17354bc000 task.ti: ffff8d17354bc000
      RIP: 0010:[<ffffffff876c4a8e>]  [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
      RSP: 0018:ffff8d17354bfa38  EFLAGS: 00010082
      RAX: 0000000000000000 RBX: ffff8d17362d42c8 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8d17362d42c8
      RBP: ffff8d17354bfa70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000298 R11: ffff8d173610e000 R12: ffff8d17362d42d0
      R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000003
      FS:  0000000000000000(0000) GS:ffff8d1802680000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000f16d8000 CR4: 00000000001406e0
      Call Trace:
       [<ffffffff876c7adc>] complete+0x3c/0x50
       [<ffffffffc04242f0>] mlx4_cmd_wake_completions+0x70/0x90 [mlx4_core]
       [<ffffffffc041e7b1>] mlx4_enter_error_state+0xe1/0x380 [mlx4_core]
       [<ffffffffc041fa4b>] mlx4_comm_cmd+0x29b/0x360 [mlx4_core]
       [<ffffffffc041ff51>] __mlx4_cmd+0x441/0x920 [mlx4_core]
       [<ffffffff877f62b1>] ? __slab_free+0x81/0x2f0
       [<ffffffff87951384>] ? __radix_tree_lookup+0x84/0xf0
       [<ffffffffc043a8eb>] mlx4_free_mtt_range+0x5b/0xb0 [mlx4_core]
       [<ffffffffc043a957>] mlx4_mtt_cleanup+0x17/0x20 [mlx4_core]
       [<ffffffffc04272c7>] mlx4_free_eq+0xa7/0x1c0 [mlx4_core]
       [<ffffffffc042803e>] mlx4_cleanup_eq_table+0xde/0x130 [mlx4_core]
       [<ffffffffc0433e08>] mlx4_unload_one+0x118/0x300 [mlx4_core]
       [<ffffffffc0434191>] mlx4_remove_one+0x91/0x1f0 [mlx4_core]
      
      The fix is to set the command context array pointer to NULL after freeing
      the array.
      
      Fixes: f5aef5aa ("net/mlx4_core: Activate reset flow upon fatal command cases")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f34d8d2
    • E
      vxlan: test dev->flags & IFF_UP before calling gro_cells_receive() · f09a656b
      Eric Dumazet 提交于
      [ Upstream commit 59cbf56fcd98ba2a715b6e97c4e43f773f956393 ]
      
      Same reasons than the ones explained in commit 4179cb5a4c92
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx() or gro_cells_receive() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro_cells infrastructure, as
      gro_cells_destroy() will be called only after a full rcu
      grace period is observed after IFF_UP has been cleared.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Otherwise we risk use-after-free and/or crashes.
      
      Fixes: d342894c ("vxlan: virtual extensible lan")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f09a656b
    • S
      vxlan: Fix GRO cells race condition between receive and link delete · 9f7aeee6
      Stefano Brivio 提交于
      [ Upstream commit ad6c9986bcb627c7c22b8f9e9a934becc27df87c ]
      
      If we receive a packet while deleting a VXLAN device, there's a chance
      vxlan_rcv() is called at the same time as vxlan_dellink(). This is fine,
      except that vxlan_dellink() should never ever touch stuff that's still in
      use, such as the GRO cells list.
      
      Otherwise, vxlan_rcv() crashes while queueing packets via
      gro_cells_receive().
      
      Move the gro_cells_destroy() to vxlan_uninit(), which runs after the RCU
      grace period is elapsed and nothing needs the gro_cells anymore.
      
      This is now done in the same way as commit 8e816df8 ("geneve: Use GRO
      cells infrastructure.") originally implemented for GENEVE.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: 58ce31cc ("vxlan: GRO support at tunnel layer")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f7aeee6
    • M
      ravb: Decrease TxFIFO depth of Q3 and Q2 to one · ec61b953
      Masaru Nagai 提交于
      [ Upstream commit ae9819e339b451da7a86ab6fe38ecfcb6814e78a ]
      
      Hardware has the CBS (Credit Based Shaper) which affects only Q3
      and Q2. When updating the CBS settings, even if the driver does so
      after waiting for Tx DMA finished, there is a possibility that frame
      data still remains in TxFIFO.
      
      To avoid this, decrease TxFIFO depth of Q3 and Q2 to one.
      
      This patch has been exercised this using netperf TCP_MAERTS, TCP_STREAM
      and UDP_STREAM tests run on an Ebisu board. No performance change was
      detected, outside of noise in the tests, both in terms of throughput and
      CPU utilisation.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: NMasaru Nagai <masaru.nagai.vx@renesas.com>
      Signed-off-by: NKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      [simon: updated changelog]
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec61b953
    • X
      pptp: dst_release sk_dst_cache in pptp_sock_destruct · 34dc08b9
      Xin Long 提交于
      [ Upstream commit 9417d81f4f8adfe20a12dd1fadf73a618cbd945d ]
      
      sk_setup_caps() is called to set sk->sk_dst_cache in pptp_connect,
      so we have to dst_release(sk->sk_dst_cache) in pptp_sock_destruct,
      otherwise, the dst refcnt will leak.
      
      It can be reproduced by this syz log:
      
        r1 = socket$pptp(0x18, 0x1, 0x2)
        bind$pptp(r1, &(0x7f0000000100)={0x18, 0x2, {0x0, @local}}, 0x1e)
        connect$pptp(r1, &(0x7f0000000000)={0x18, 0x2, {0x3, @remote}}, 0x1e)
      
      Consecutive dmesg warnings will occur:
      
        unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      v1->v2:
        - use rcu_dereference_protected() instead of rcu_dereference_check(),
          as suggested by Eric.
      
      Fixes: 00959ade ("PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34dc08b9
    • B
      lan743x: Fix TX Stall Issue · ab13fe32
      Bryan Whitehead 提交于
      [ Upstream commit deb6bfabdbb634e91f36a4e9cb00a7137d72d886 ]
      
      It has been observed that tx queue may stall while downloading
      from certain web sites (example www.speedtest.net)
      
      The cause has been tracked down to a corner case where
      the tx interrupt vector was disabled automatically, but
      was not re enabled later.
      
      The lan743x has two mechanisms to enable/disable individual
      interrupts. Interrupts can be enabled/disabled by individual
      source, and they can also be enabled/disabled by individual
      vector which has been mapped to the source. Both must be
      enabled for interrupts to work properly.
      
      The TX code path, primarily uses the interrupt enable/disable of
      the TX source bit, while leaving the vector enabled all the time.
      
      However, while investigating this issue it was noticed that
      the driver requested the use of the vector auto clear feature.
      
      The test above revealed a case where the vector enable was
      cleared unintentionally.
      
      This patch fixes the issue by deleting the lines that request
      the vector auto clear feature to be used.
      
      Fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver")
      Signed-off-by: NBryan Whitehead <Bryan.Whitehead@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab13fe32
    • B
      lan743x: Fix RX Kernel Panic · 22326473
      Bryan Whitehead 提交于
      [ Upstream commit dd9d9f5907bb475f8b1796c47d4ecc7fb9b72136 ]
      
      It has been noticed that running the speed test at
      www.speedtest.net occasionally causes a kernel panic.
      
      Investigation revealed that under this test RX buffer allocation
      sometimes fails and returns NULL. But the lan743x driver did
      not handle this case.
      
      This patch fixes this issue by attempting to allocate a buffer
      before sending the new rx packet to the OS. If the allocation
      fails then the new rx packet is dropped and the existing buffer
      is reused in the DMA ring.
      
      Updates for v2:
          Additional 2 locations where allocation was not checked,
              has been changed to reuse existing buffer.
      
      Fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver")
      Signed-off-by: NBryan Whitehead <Bryan.Whitehead@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22326473
  2. 14 3月, 2019 19 次提交
  3. 10 3月, 2019 7 次提交