1. 05 6月, 2018 40 次提交
    • D
      net/ipv6: prevent use after free in ip6_route_mpath_notify · f7225172
      David Ahern 提交于
      syzbot reported a use-after-free:
      
      BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
      Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555
      
      CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
       ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
       ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Allocated by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:104
       __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
       ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
       ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
       ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Freed by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:140
       dst_release_immediate+0x71/0x9e net/core/dst.c:205
       fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      The problem is that rt_last can point to a deleted route if the insert
      fails.
      
      One reproducer is to insert a route and then add a multipath route that
      has a duplicate nexthop.e.g,:
          $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
          $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2
      
      Fix by not setting rt_last until the it is verified the insert succeeded.
      
      Fixes: 3b1137fe ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7225172
    • K
      net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default. · 69e2eccc
      Kun Yi 提交于
      BCM54612E have 4 multi-functional LED pins that can be configured
      through register setting; the LED4 pin can be configured to a 125MHz
      reference clock output by setting the spare register. Since the dedicated
      CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
      pin is the only pin to provide such function in this package, and therefore
      it is beneficial to just enable the reference clock by default.
      Signed-off-by: NKun Yi <kunyi@google.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69e2eccc
    • G
      l2tp: fix refcount leakage on PPPoL2TP sockets · 3d609342
      Guillaume Nault 提交于
      Commit d02ba2a6 ("l2tp: fix race in pppol2tp_release with session
      object destroy") tried to fix a race condition where a PPPoL2TP socket
      would disappear while the L2TP session was still using it. However, it
      missed the root issue which is that an L2TP session may accept to be
      reconnected if its associated socket has entered the release process.
      
      The tentative fix makes the session hold the socket it is connected to.
      That saves the kernel from crashing, but introduces refcount leakage,
      preventing the socket from completing the release process. Once stalled,
      everything the socket depends on can't be released anymore, including
      the L2TP session and the l2tp_ppp module.
      
      The root issue is that, when releasing a connected PPPoL2TP socket, the
      session's ->sk pointer (RCU-protected) is reset to NULL and we have to
      wait for a grace period before destroying the socket. The socket drops
      the session in its ->sk_destruct callback function, so the session
      will exist until the last reference on the socket is dropped.
      Therefore, there is a time frame where pppol2tp_connect() may accept
      reconnecting a session, as it only checks ->sk to figure out if the
      session is connected. This time frame is shortened by the fact that
      pppol2tp_release() calls l2tp_session_delete(), making the session
      unreachable before resetting ->sk. However, pppol2tp_connect() may
      grab the session before it gets unhashed by l2tp_session_delete(), but
      it may test ->sk after the later got reset. The race is not so hard to
      trigger and syzbot found a pretty reliable reproducer:
      https://syzkaller.appspot.com/bug?id=418578d2a4389074524e04d641eacb091961b2cf
      
      Before d02ba2a6, another race could let pppol2tp_release()
      overwrite the ->__sk pointer of an L2TP session, thus tricking
      pppol2tp_put_sk() into calling sock_put() on a socket that is different
      than the one for which pppol2tp_release() was originally called. To get
      there, we had to trigger the race described above, therefore having one
      PPPoL2TP socket being released, while the session it is connected to is
      reconnecting to a different PPPoL2TP socket. When releasing this new
      socket fast enough, pppol2tp_release() overwrites the session's
      ->__sk pointer with the address of the new socket, before the first
      pppol2tp_put_sk() call gets scheduled. Then the pppol2tp_put_sk() call
      invoked by the original socket will sock_put() the new socket,
      potentially dropping its last reference. When the second
      pppol2tp_put_sk() finally runs, its socket has already been freed.
      
      With d02ba2a6, the session takes a reference on both sockets.
      Furthermore, the session's ->sk pointer is reset in the
      pppol2tp_session_close() callback function rather than in
      pppol2tp_release(). Therefore, ->__sk can't be overwritten and
      pppol2tp_put_sk() is called only once (l2tp_session_delete() will only
      run pppol2tp_session_close() once, to protect the session against
      concurrent deletion requests). Now pppol2tp_put_sk() will properly
      sock_put() the original socket, but the new socket will remain, as
      l2tp_session_delete() prevented the release process from completing.
      Here, we don't depend on the ->__sk race to trigger the bug. Getting
      into the pppol2tp_connect() race is enough to leak the reference, no
      matter when new socket is released.
      
      So it all boils down to pppol2tp_connect() failing to realise that the
      session has already been connected. This patch drops the unneeded extra
      reference counting (mostly reverting d02ba2a6) and checks that
      neither ->sk nor ->__sk is set before allowing a session to be
      connected.
      
      Fixes: d02ba2a6 ("l2tp: fix race in pppol2tp_release with session object destroy")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d609342
    • D
      Merge branch 'net-phy-improve-PM-handling-of-PHY-MDIO' · 7a723099
      David S. Miller 提交于
      Heiner Kallweit says:
      
      ====================
      net: phy: improve PM handling of PHY/MDIO
      
      Current implementation of MDIO bus PM ops doesn't actually implement
      bus-specific PM ops but just calls PM ops defined on a device level
      what doesn't seem to be fully in line with the core PM model.
      
      When looking e.g. at __device_suspend() the PM core looks for PM ops
      of a device in a specific order:
      1. device PM domain
      2. device type
      3. device class
      4. device bus
      
      I think it has good reason that there's no PM ops on device level.
      The situation can be improved by modeling PHY's as device type of
      a MDIO device. If for some other type of MDIO device PM ops are
      needed, it could be modeled as struct device_type as well.
      ====================
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a723099
    • H
      net: phy: remove PM ops from MDIO bus · 9107c05e
      Heiner Kallweit 提交于
      Current implementation of MDIO bus PM ops doesn't actually implement
      bus-specific PM ops but just calls PM ops defined on a device level
      what doesn't seem to be fully in line with the core PM model.
      
      When looking e.g. at __device_suspend() the PM core looks for PM ops
      of a device in a specific order:
      1. device PM domain
      2. device type
      3. device class
      4. device bus
      
      I think it has good reason that there's no PM ops on device level.
      
      Now that a device type representation of PHY's as special type of MDIO
      devices was added (only user of MDIO bus PM ops), the MDIO bus
      PM ops can be removed including member pm of struct mdio_device.
      
      If for some other type of MDIO device PM ops are needed, it should be
      modeled as struct device_type as well.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9107c05e
    • H
      net: phy: add struct device_type representation of a PHY · 7f4828ff
      Heiner Kallweit 提交于
      A PHY is a type of MDIO device, so let's model it as struct device_type
      and place PM ops, attribute groups and release callback on device type
      level. For this the attribute definitions have to be moved.
      This change allows us to get rid of the PM ops on a bus level in a second
      step.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f4828ff
    • D
      7d840a60
    • D
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · d67b66b4
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2018-06-04
      
      This series contains a smorgasbord of updates to documentation, e1000e,
      igb, ixgbe, ixgbevf and i40e.
      
      Benjamin Poirier fixes a potential kernel crash due to NULL pointer
      dereference in e1000e.
      
      Jeff updates the kernel documentation for e100 and e1000 to correct
      default values and URLs which were incorrect in the documentation.  Also
      took the time to update these to the new reStructured text format for
      kernel documentation.
      
      Joanna Yurdal fixes a missing PTP transmit timestamp by ensuring that
      TSICR gets cleared when ICR is cleared.
      
      Sergey updates igb to reset all the transmit queues at one time so that
      we only have to wait once for all the queues to be reset.
      
      Alex fixes ixgbevf so that malicious driver detection (MDD) can co-exist
      with XDP.
      
      Emil and Tony extend the RTNL lock to ensure we get the most up-to-date
      values for the bits and avoid a possible race condition when going down.
      
      YueHaibing from Huawei introduces a helper function in ixgbe for
      operation reads to simplify the code a bit more.
      
      Daniel Borkmann adds support for XDP meta data when using build SKB
      for i40e.
      
      Shannon Nelson provides twp fixes for the IPSec code in ixgbe, first is
      to make sure we do not try to offload the decryption of any incoming
      packet that is destined for the management engine.  The other fix is to
      resolve a cast problem introduced by a sparse cleanup patch.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d67b66b4
    • X
      net: hns: Fix the process of adding broadcast addresses to tcam · f0b964e5
      Xi Wang 提交于
      If the multicast mask value in device tree is configured not all
      0xff, the broadcast mac will be lost from tcam table after the
      execution of command 'ifconfig up'. The address is appended by
      hns_ae_start, but will be clear later by hns_nic_set_rx_mode
      called in dev_open process.
      
      This patch fixed it by not use the multicast mask when add a
      broadcast address.
      
      Fixes: b5996f11 ("net: add Hisilicon Network Subsystem basic ethernet support")
      Signed-off-by: NXi Wang <wangxi11@huawei.com>
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0b964e5
    • V
      net: sched: return error code when tcf proto is not found · 0e399035
      Vlad Buslov 提交于
      If requested tcf proto is not found, get and del filter netlink protocol
      handlers output error message to extack, but do not return actual error
      code. Add check to return ENOENT when result of tp find function is NULL
      pointer.
      
      Fixes: c431f89b ("net: sched: split tc_ctl_tfilter into three handlers")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e399035
    • D
      team: use netdev_features_t instead of u32 · 25ea6654
      Dan Carpenter 提交于
      This code was introduced in 2011 around the same time that we made
      netdev_features_t a u64 type.  These days a u32 is not big enough to
      hold all the potential features.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25ea6654
    • D
      net_failover: Use netdev_features_t instead of u32 · a746407a
      Dan Carpenter 提交于
      The features mask needs to be a netdev_features_t (u64) because a u32
      is not big enough.
      
      Fixes: cfc80d9a ("net: Introduce net_failover driver")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a746407a
    • Y
      qed: use dma_zalloc_coherent instead of allocator/memset · ff2e351e
      YueHaibing 提交于
      Use dma_zalloc_coherent instead of dma_alloc_coherent
      followed by memset 0.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff2e351e
    • Y
      wan/fsl_ucc_hdlc: use dma_zalloc_coherent instead of allocator/memset · 1f55c286
      YueHaibing 提交于
      Use dma_zalloc_coherent instead of dma_alloc_coherent
      followed by memset 0.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f55c286
    • D
      Merge branch 'for-upstream' of... · 828da432
      David S. Miller 提交于
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2018-06-04
      
      Here's one last bluetooth-next pull request for the 4.18 kernel:
      
       - New USB device IDs for Realtek 8822BE and 8723DE
       - reset/resume fix for Dell Inspiron 5565
       - Fix HCI_UART_INIT_PENDING flag behavior
       - Fix patching behavior for some ATH3012 models
       - A few other minor cleanups & fixes
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      828da432
    • O
      docs: networking: fix minor typos in various documentation files · bb38ccce
      Olivier Gayot 提交于
      This patch fixes some typos/misspelling errors in the
      Documentation/networking files.
      Signed-off-by: NOlivier Gayot <olivier.gayot@sigexec.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb38ccce
    • M
      net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets · f396922d
      Maciej Żenczykowski 提交于
      It is not safe to do so because such sockets are already in the
      hash tables and changing these options can result in invalidating
      the tb->fastreuse(port) caching.
      
      This can have later far reaching consequences wrt. bind conflict checks
      which rely on these caches (for optimization purposes).
      
      Not to mention that you can currently end up with two identical
      non-reuseport listening sockets bound to the same local ip:port
      by clearing reuseport on them after they've already both been bound.
      
      There is unfortunately no EISBOUND error or anything similar,
      and EISCONN seems to be misleading for a bound-but-not-connected
      socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT
      is the closest you can get to meaning 'socket in bad state'.
      (although perhaps EINVAL wouldn't be a bad choice either?)
      
      This does unfortunately run the risk of breaking buggy
      userspace programs...
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f396922d
    • M
      net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization · 79e9fed4
      Maciej Żenczykowski 提交于
      This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
      to an integer.
      
      It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
      while 2 enables timewait socket reuse only for sockets that we can
      prove are loopback connections:
        ie. bound to 'lo' interface or where one of source or destination
        IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.
      
      This enables quicker reuse of ephemeral ports for loopback connections
      - where tcp_tw_reuse is 100% safe from a protocol perspective
      (this assumes no artificially induced packet loss on 'lo').
      
      This also makes estblishing many loopback connections *much* faster
      (allocating ports out of the first half of the ephemeral port range
      is significantly faster, then allocating from the second half)
      
      Without this change in a 32K ephemeral port space my sample program
      (it just establishes and closes [::1]:ephemeral -> [::1]:server_port
      connections in a tight loop) fails after 32765 connections in 24 seconds.
      With it enabled 50000 connections only take 4.7 seconds.
      
      This is particularly problematic for IPv6 where we only have one local
      address and cannot play tricks with varying source IP from 127.0.0.0/8
      pool.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e9fed4
    • Y
      qed: Add srq core support for RoCE and iWARP · 39dbc646
      Yuval Bason 提交于
      This patch adds support for configuring SRQ and provides the necessary
      APIs for rdma upper layer driver (qedr) to enable the SRQ feature.
      Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NYuval Bason <yuval.bason@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39dbc646
    • D
      Merge branch 'bnx2-warnings' · 7a9ee41b
      David S. Miller 提交于
      Varsha Rao says:
      
      ====================
      net: bnx2: Fix checkpatch and clang warnings
      
      This patchset fixes NULL comparison and extra parentheses, checkpatch
      and clang warnings.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a9ee41b
    • V
      net: ethernet: bnx2: Replace NULL comparison · b8aac410
      Varsha Rao 提交于
      This patch fixes the checkpatch issue of NULL comparison. Replace x == NULL
      with !x, by using the following coccinelle script:
      
      @disable is_null@
      expression e;
      @@
      -e==NULL
      +!e
      Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8aac410
    • V
      net: ethernet: bnx2: Remove extra parentheses · 6dc5aa21
      Varsha Rao 提交于
      The following coccinelle script removes extra parentheses to fix the
      clang warning of extraneous parentheses.
      
      @disable paren@
      identifier i;
      expression e;
      statement s;
      @@
      if (
      -(i == e)
      +i == e
       )
      s
      Suggested-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dc5aa21
    • Y
      net: gemini: fix spelling mistake: "it" -> "is" · 13ce3bc9
      YueHaibing 提交于
      Trivial fix to spelling mistake in gemini dev_warn message
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13ce3bc9
    • P
      cls_flower: Fix comparing of old filter mask with new filter · f6521c58
      Paul Blakey 提交于
      We incorrectly compare the mask and the result is that we can't modify
      an already existing rule.
      
      Fix that by comparing correctly.
      
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Reported-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6521c58
    • P
      cls_flower: Fix missing free of rhashtable · de9dc650
      Paul Blakey 提交于
      When destroying the instance, destroy the head rhashtable.
      
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Reported-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de9dc650
    • R
      net: skbuff.h: drop unneeded <linux/slab.h> · 1f4c7413
      Randy Dunlap 提交于
      <linux/skbuff.h> does not use nor need <linux/slab.h>, so drop this
      header file from skbuff.h.
      
      <linux/skbuff.h> is currently #included in around 1200 C source and
      header files, making it the 31st most-used header file.
      
      Build tested [allmodconfig] on 20 arch-es.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f4c7413
    • Y
      net: chelsio: Use zeroing memory allocator instead of allocator/memset · 40434a67
      YueHaibing 提交于
      Use dma_zalloc_coherent for allocating zeroed
      memory and remove unnecessary memset function.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40434a67
    • D
      rxrpc: Fix handling of call quietly cancelled out on server · 1a025028
      David Howells 提交于
      Sometimes an in-progress call will stop responding on the fileserver when
      the fileserver quietly cancels the call with an internally marked abort
      (RX_CALL_DEAD), without sending an ABORT to the client.
      
      This causes the client's call to eventually expire from lack of incoming
      packets directed its way, which currently leads to it being cancelled
      locally with ETIME.  Note that it's not currently clear as to why this
      happens as it's really hard to reproduce.
      
      The rotation policy implement by kAFS, however, doesn't differentiate
      between ETIME meaning we didn't get any response from the server and ETIME
      meaning the call got cancelled mid-flow.  The latter leads to an oops when
      fetching data as the rotation partially resets the afs_read descriptor,
      which can result in a cleared page pointer being dereferenced because that
      page has already been filled.
      
      Handle this by the following means:
      
       (1) Set a flag on a call when we receive a packet for it.
      
       (2) Store the highest packet serial number so far received for a call
           (bearing in mind this may wrap).
      
       (3) If, when the "not received anything recently" timeout expires on a
           call, we've received at least one packet for a call and the connection
           as a whole has received packets more recently than that call, then
           cancel the call locally with ECONNRESET rather than ETIME.
      
           This indicates that the call was definitely in progress on the server.
      
       (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME,
           don't try the next server, but rather abort the call.
      
           This avoids the oops as we don't try to reuse the afs_read struct.
           Rather, as-yet ungotten pages will be reread at a later data.
      
      Also:
      
       (5) Add an rxrpc tracepoint to log detection of the call being reset.
      
      Without this, I occasionally see an oops like the following:
      
          general protection fault: 0000 [#1] SMP PTI
          ...
          RIP: 0010:_copy_to_iter+0x204/0x310
          RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206
          RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560
          RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000
          RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400
          R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958
          R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560
          FS:  00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0
          Call Trace:
           skb_copy_datagram_iter+0x14e/0x289
           rxrpc_recvmsg_data.isra.0+0x6f3/0xf68
           ? trace_buffer_unlock_commit_regs+0x4f/0x89
           rxrpc_kernel_recv_data+0x149/0x421
           afs_extract_data+0x1e0/0x798
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_deliver_fs_fetch_data+0x33a/0x5ab
           afs_deliver_to_call+0x1ee/0x5e0
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_wait_for_call_to_complete+0x12b/0x52e
           ? wake_up_q+0x54/0x54
           afs_make_call+0x287/0x462
           ? afs_fs_fetch_data+0x3e6/0x3ed
           ? rcu_read_lock_sched_held+0x5d/0x63
           afs_fs_fetch_data+0x3e6/0x3ed
           afs_fetch_data+0xbb/0x14a
           afs_readpages+0x317/0x40d
           __do_page_cache_readahead+0x203/0x2ba
           ? ondemand_readahead+0x3a7/0x3c1
           ondemand_readahead+0x3a7/0x3c1
           generic_file_buffered_read+0x18b/0x62f
           __vfs_read+0xdb/0xfe
           vfs_read+0xb2/0x137
           ksys_read+0x50/0x8c
           do_syscall_64+0x7d/0x1a0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Note the weird value in RDI which is a result of trying to kmap() a NULL
      page pointer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a025028
    • C
      Allow ethtool to change tun link settings · 4e24f2dd
      Chas Williams 提交于
      Let user space set whatever it would like to advertise for the
      tun interface.  Preserve the existing defaults.
      Signed-off-by: NChas Williams <3chas3@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e24f2dd
    • D
      Merge branch 'sh_eth-fix-and-clean-up-sh_eth_soft_swap' · 4cd328f8
      David S. Miller 提交于
      Sergei Shtylyov says:
      
      ====================
      sh_eth: fix & clean up sh_eth_soft_swap()
      
      Here's a set of 3 patches against DaveM's 'net-next.git' repo. First one fixes an
      old buffer endiannes issue (luckily, the ARM SoCs are smart enough to not actually
      care) plus couple clean ups around sh_eth_soft_swap()...
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cd328f8
    • S
      sh_eth: use DIV_ROUND_UP() in sh_eth_soft_swap() · 1100149a
      Sergei Shtylyov 提交于
      When initializing 'maxp' in sh_eth_soft_swap(), the buffer length needs
      to be rounded  up -- that's just asking for DIV_ROUND_UP()!
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1100149a
    • S
      sh_eth: uninline sh_eth_soft_swap() · bb2fa4e8
      Sergei Shtylyov 提交于
      sh_eth_tsu_soft_swap() is called twice by the driver, remove *inline* and
      move  that function  from the header to the driver itself to let gcc decide
      whether to expand it inline or not...
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb2fa4e8
    • S
      sh_eth: make sh_eth_soft_swap() work on ARM · 232b6743
      Sergei Shtylyov 提交于
      Browsing  thru the driver disassembly, I noticed that ARM gcc generated
      no  code  whatsoever for sh_eth_soft_swap() while building a little-endian
      kernel -- apparently __LITTLE_ENDIAN__ was not being #define'd, however
      it got implicitly #define'd when building with the SH gcc (I could only
      find the explicit #define __LITTLE_ENDIAN that was #include'd when building
      a little-endian kernel).  Luckily, the Ether controller  only doing big-
      endian DMA is encountered on the early SH771x SoCs only and all ARM SoCs
      implement EDMR.DE and thus set 'sh_eth_cpu_data::hw_swap'. But anyway, we
      need to fix the #ifdef inside sh_eth_soft_swap() to something that would
      work on all architectures...
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232b6743
    • S
      ixgbe: fix broken ipsec Rx with proper cast on spi · 9a75fa5c
      Shannon Nelson 提交于
      Fix up a cast problem introduced by a sparse cleanup patch.  This fixes
      a problem where the encrypted packets were not recognized on Rx and
      subsequently dropped.
      
      Fixes: 9cfbfa70 ("ixgbe: cleanup sparse warnings")
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9a75fa5c
    • S
      ixgbe: check ipsec ip addr against mgmt filters · 2a8a1552
      Shannon Nelson 提交于
      Make sure we don't try to offload the decryption of an incoming
      packet that should get delivered to the management engine.  This
      is a corner case that will likely be very seldom seen, but could
      really confuse someone if they were to hit it.
      Suggested-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2a8a1552
    • D
      Merge branch 'mlxsw-Fixes-in-offloading-of-mirror-to-gretap' · 20677108
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      mlxsw: Fixes in offloading of mirror-to-gretap
      
      Petr says:
      
      These two patches fix issues in offloading of mirror-to-gretap when
      bridge is present in the underlay.
      
      In patch #1, reconsideration of SPAN configuration is not done right at
      the point that SWITCHDEV_OBJ_ID_PORT_VLAN deletion notification is
      distributed, but is postponed, because the notifications are actually
      distributed before the relevant change is implemented in the bridge.
      
      In patch #2, a problem in configuring VLAN tagging in situations when a
      VLAN device is on top of an 802.1Q bridge whose egress port is marked as
      "egress untagged". In that case, mlxsw would neglect to suppress the
      tagging implicitly assumed after the VLAN device was seen.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20677108
    • P
      mlxsw: spectrum_span: Suppress VLAN on BRIDGE_VLAN_INFO_UNTAGGED · 1fc68bb7
      Petr Machata 提交于
      When offloading mirroring to gretap or ip6gretap netdevices, an 802.1q
      bridge is one of the soft devices permissible in the underlay when
      resolving the packet path. After the packet path is resolved to a
      particular bridge egress device, flags on packet VLAN determine whether
      the egressed packet should be tagged.
      
      The current logic however only ever sets the VLAN tag, never suppresses
      it. Thus if there's a VLAN netdevice above the bridge that determines
      the packet VLAN, that VLAN is never unset, and mirroring is configured
      with VLAN tagging.
      
      Fix by setting the packet VLAN on both branches: set to zero (for unset)
      when BRIDGE_VLAN_INFO_UNTAGGED, copy the resolved VLAN (e.g. from bridge
      PVID) otherwise.
      
      Fixes: 946a11e7 ("mlxsw: spectrum_span: Allow bridge for gretap mirror")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc68bb7
    • P
      mlxsw: spectrum_switchdev: Postpone respin on object deletion · f07ff014
      Petr Machata 提交于
      VLAN deletion notifications are emitted before the relevant change is
      projected to bridge configuration. Thus, like with VLAN addition,
      schedule SPAN respin for later.
      
      Fixes: c520bc69 ("mlxsw: Respin SPAN on switchdev events")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f07ff014
    • T
      ixgbe: fix possible race in reset subtask · 88adce4e
      Tony Nguyen 提交于
      Similar to ixgbevf, the same possibility for race exists. Extend the RTNL
      lock in ixgbe_reset_subtask() to protect the state bits; this is to make
      sure that we get the most up-to-date values for the bits and avoid a
      possible race when going down.
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      88adce4e
    • D
      bpf, i40e: add meta data support · cc5b114d
      Daniel Borkmann 提交于
      Add support for XDP meta data when using build skb variant of
      the i40e driver. Implementation is analogous to the existing
      ixgbe and ixgbevf support for meta data from 366a88fe ("bpf,
      ixgbe: add meta data support") and be833332 ("ixgbevf: Add
      support for meta data"). With the build skb variant we get
      192 bytes of extra headroom which can be used for encaps or
      meta data.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Tested-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cc5b114d