1. 22 4月, 2017 40 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · fb796707
      David S. Miller 提交于
      Both conflict were simple overlapping changes.
      
      In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
      conversion of the driver in net-next to use in-netdev stats.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb796707
    • L
      Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux · 94836ecf
      Linus Torvalds 提交于
      Pull nfsd bugfix from Bruce Fields:
       "Fix a 4.11 regression that triggers a BUG() on an attempt to use an
        unsupported NFSv4 compound op"
      
      * tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux:
        nfsd: fix oops on unsupported operation
      94836ecf
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 057a650b
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Don't race in IPSEC dumps, from Yuejie Shi.
      
       2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.
      
       3) Fix out of bounds access in ipv6 segment routing code, from David
          Lebrun.
      
       4) Don't write into the header of cloned SKBs in smsc95xx driver, from
          James Hughes.
      
       5) Several other drivers have this bug too, fix them. From Eric
          Dumazet.
      
       6) Fix access to uninitialized data in TC action cookie code, from
          Wolfgang Bumiller.
      
       7) Fix double free in IPV6 segment routing, again from David Lebrun.
      
       8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.
      
       9) Fix use after free in qrtr code, from Dan Carpenter.
      
      10) Don't double-destroy devices in ip6mr code, from Nikolay
          Aleksandrov.
      
      11) Don't pass out-of-range TX queue indices into drivers, from Tushar
          Dave.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
        netpoll: Check for skb->queue_mapping
        ip6mr: fix notification device destruction
        bpf, doc: update bpf maintainers entry
        net: qrtr: potential use after free in qrtr_sendmsg()
        bpf: Fix values type used in test_maps
        net: ipv6: RTF_PCPU should not be settable from userspace
        gso: Validate assumption of frag_list segementation
        kaweth: use skb_cow_head() to deal with cloned skbs
        ch9200: use skb_cow_head() to deal with cloned skbs
        lan78xx: use skb_cow_head() to deal with cloned skbs
        sr9700: use skb_cow_head() to deal with cloned skbs
        cx82310_eth: use skb_cow_head() to deal with cloned skbs
        smsc75xx: use skb_cow_head() to deal with cloned skbs
        ipv6: sr: fix double free of skb after handling invalid SRH
        MAINTAINERS: Add "B:" field for networking.
        net sched actions: allocate act cookie early
        qed: Fix issue in populating the PFC config paramters.
        qed: Fix possible system hang in the dcbnl-getdcbx() path.
        qed: Fix sending an invalid PFC error mask to MFW.
        qed: Fix possible error in populating max_tc field.
        ...
      057a650b
    • D
      net: Remove NET_CORE_BUDGET_USECS from sysctl binary interface. · 1f4407e2
      David S. Miller 提交于
      We are not supposed to add new entries to this thing
      any more.
      
      Thanks to Eric Dumazet for noticing this.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f4407e2
    • T
      netpoll: Check for skb->queue_mapping · c70b17b7
      Tushar Dave 提交于
      Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
      otherwise skbs with queue_mapping greater than real_num_tx_queues
      can be sent to the underlying driver and can result in kernel panic.
      
      One such event is running netconsole and enabling VF on the same
      device. Or running netconsole and changing number of tx queues via
      ethtool on same device.
      
      e.g.
      Unable to handle kernel NULL pointer dereference
      tsk->{mm,active_mm}->context = 0000000000001525
      tsk->{mm,active_mm}->pgd = fff800130ff9a000
                    \|/ ____ \|/
                    "@'/ .. \`@"
                    /_| \__/ |_\
                       \__U_/
      kworker/48:1(475): Oops [#1]
      CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G           OE
      4.11.0-rc3-davem-net+ #7
      Workqueue: events queue_process
      task: fff80013113299c0 task.stack: fff800131132c000
      TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
      00000000    Tainted: G           OE
      TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
      g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
      0000000000000001
      g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
      00000000000000c0
      o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
      0000000000000003
      o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
      000000000049ed94
      RPC: <set_next_entity+0x34/0xb80>
      l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
      0000000000000000
      l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
      fff8001fa7605028
      i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
      0000000000000000
      i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
      00000000103fa4b0
      I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
      Call Trace:
       [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
       [0000000000998c74] netpoll_start_xmit+0xf4/0x200
       [0000000000998e10] queue_process+0x90/0x160
       [0000000000485fa8] process_one_work+0x188/0x480
       [0000000000486410] worker_thread+0x170/0x4c0
       [000000000048c6b8] kthread+0xd8/0x120
       [0000000000406064] ret_from_fork+0x1c/0x2c
       [0000000000000000]           (null)
      Disabling lock debugging due to kernel taint
      Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
      Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
      Caller[0000000000998e10]: queue_process+0x90/0x160
      Caller[0000000000485fa8]: process_one_work+0x188/0x480
      Caller[0000000000486410]: worker_thread+0x170/0x4c0
      Caller[000000000048c6b8]: kthread+0xd8/0x120
      Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
      Caller[0000000000000000]:           (null)
      Signed-off-by: NTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c70b17b7
    • N
      ip6mr: fix notification device destruction · 723b929c
      Nikolay Aleksandrov 提交于
      Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
      because we call unregister_netdevice_many for a device that is already
      being destroyed. In IPv4's ipmr that has been resolved by two commits
      long time ago by introducing the "notify" parameter to the delete
      function and avoiding the unregister when called from a notifier, so
      let's do the same for ip6mr.
      
      The trace from Andrey:
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:6813!
      invalid opcode: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Workqueue: netns cleanup_net
      task: ffff880069208000 task.stack: ffff8800692d8000
      RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
      RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
      RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
      RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
      R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
      R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
      FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
      Call Trace:
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
       ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
       notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
       call_netdevice_notifiers net/core/dev.c:1663
       rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many net/core/dev.c:7880
       default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
       ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
       cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
       process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
       worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
      Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
      47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
      0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
      RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
      ---[ end trace e0b29c57e9b3292c ]---
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      723b929c
    • A
      net: dsa: LAN9303: add I2C dependency · 239c599a
      Arnd Bergmann 提交于
      With CONFIG_I2C=m and NET_DSA_SMSC_LAN9303=y, we run into a link error:
      
      drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_byte_reg_read':
      regmap-i2c.c:(.text.regmap_smbus_byte_reg_read+0x18): undefined reference to `i2c_smbus_read_byte_data'
      drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_byte_reg_write':
      regmap-i2c.c:(.text.regmap_smbus_byte_reg_write+0x18): undefined reference to `i2c_smbus_write_byte_data'
      drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_reg_read':
      regmap-i2c.c:(.text.regmap_smbus_word_reg_read+0x18): undefined reference to `i2c_smbus_read_word_data'
      drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_read_swapped':
      regmap-i2c.c:(.text.regmap_smbus_word_read_swapped+0x18): undefined reference to `i2c_smbus_read_word_data'
      drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_write_swapped':
      
      This adds a Kconfig dependency to avoid the broken configuration.
      
      Fixes: be4e119f ("net: dsa: LAN9303: add I2C managed mode support")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      239c599a
    • D
      Merge tag 'nfc-next-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next · 69e3948a
      David S. Miller 提交于
      Samuel Ortiz says:
      
      ====================
      NFC 4.12 pull request
      
      This is the NFC pull request for 4.12. We have:
      
      - Improvements for the pn533 command queue handling and device
        registration order.
      - Removal of platform data for the pn544 and st21nfca drivers.
      - Additional device tree options to support more trf7970a hardware options.
      - Support for Sony's RC-S380P through the port100 driver.
      - Removal of the obsolte nfcwilink driver.
      - Headers inclusion cleanups (miscdevice.h, unaligned.h) for many drivers.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69e3948a
    • M
      bonding: fix wq initialization for links created via netlink · ea8ffc08
      Mahesh Bandewar 提交于
      Earlier patch 4493b81b ("bonding: initialize work-queues during
      creation of bond") moved the work-queue initialization from bond_open()
      to bond_create(). However this caused the link those are created using
      netlink 'create bond option' (ip link add bondX type bond); create the
      new trunk without initializing work-queues. Prior to the above mentioned
      change, ndo_open was in both paths and things worked correctly. The
      consequence is visible in the report shared by Joe Stringer -
      
      I've noticed that this patch breaks bonding within namespaces if
      you're not careful to perform device cleanup correctly.
      
      Here's my repro script, you can run on any net-next with this patch
      and you'll start seeing some weird behaviour:
      
      ip netns add foo
      ip li add veth0 type veth peer name veth0+ netns foo
      ip li add veth1 type veth peer name veth1+ netns foo
      ip netns exec foo ip li add bond0 type bond
      ip netns exec foo ip li set dev veth0+ master bond0
      ip netns exec foo ip li set dev veth1+ master bond0
      ip netns exec foo ip addr add dev bond0 192.168.0.1/24
      ip netns exec foo ip li set dev bond0 up
      ip li del dev veth0
      ip li del dev veth1
      
      The second to last command segfaults, last command hangs. rtnl is now
      permanently locked. It's not a problem if you take bond0 down before
      deleting veths, or delete bond0 before deleting veths. If you delete
      either end of the veth pair as per above, either inside or outside the
      namespace, it hits this problem.
      
      Here's some kernel logs:
      [ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link
      [ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link
      [ 1281.193863] bond0: Releasing backup interface veth0+
      [ 1281.193866] bond0: the permanent HWaddr of veth0+ -
      16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of
      veth0+ to a different address to avoid conflicts
      [ 1281.193867] ------------[ cut here ]------------
      [ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511
      __queue_delayed_work+0x13f/0x150
      [ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6
      nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs
      lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse
      serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc
      configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt
      shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
      nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid
      hid mptspi mptscsih e1000 mptbase ahci libahci
      [ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G        W
      4.10.0-bisect-bond-v0.14 #37
      [ 1281.193906] Hardware name: VMware, Inc. VMware Virtual
      Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
      [ 1281.193906] Call Trace:
      [ 1281.193912]  dump_stack+0x63/0x89
      [ 1281.193915]  __warn+0xd1/0xf0
      [ 1281.193917]  warn_slowpath_null+0x1d/0x20
      [ 1281.193918]  __queue_delayed_work+0x13f/0x150
      [ 1281.193920]  queue_delayed_work_on+0x27/0x40
      [ 1281.193929]  bond_change_active_slave+0x25b/0x670 [bonding]
      [ 1281.193932]  ? synchronize_rcu_expedited+0x27/0x30
      [ 1281.193935]  __bond_release_one+0x489/0x510 [bonding]
      [ 1281.193939]  ? addrconf_notify+0x1b7/0xab0
      [ 1281.193942]  bond_netdev_event+0x2c5/0x2e0 [bonding]
      [ 1281.193944]  ? netconsole_netdev_event+0x124/0x190 [netconsole]
      [ 1281.193947]  notifier_call_chain+0x49/0x70
      [ 1281.193948]  raw_notifier_call_chain+0x16/0x20
      [ 1281.193950]  call_netdevice_notifiers_info+0x35/0x60
      [ 1281.193951]  rollback_registered_many+0x23b/0x3e0
      [ 1281.193953]  unregister_netdevice_many+0x24/0xd0
      [ 1281.193955]  rtnl_delete_link+0x3c/0x50
      [ 1281.193956]  rtnl_dellink+0x8d/0x1b0
      [ 1281.193960]  rtnetlink_rcv_msg+0x95/0x220
      [ 1281.193962]  ? __kmalloc_node_track_caller+0x35/0x280
      [ 1281.193964]  ? __netlink_lookup+0xf1/0x110
      [ 1281.193966]  ? rtnl_newlink+0x830/0x830
      [ 1281.193967]  netlink_rcv_skb+0xa7/0xc0
      [ 1281.193969]  rtnetlink_rcv+0x28/0x30
      [ 1281.193970]  netlink_unicast+0x15b/0x210
      [ 1281.193971]  netlink_sendmsg+0x319/0x390
      [ 1281.193974]  sock_sendmsg+0x38/0x50
      [ 1281.193975]  ___sys_sendmsg+0x25c/0x270
      [ 1281.193978]  ? mem_cgroup_commit_charge+0x76/0xf0
      [ 1281.193981]  ? page_add_new_anon_rmap+0x89/0xc0
      [ 1281.193984]  ? lru_cache_add_active_or_unevictable+0x35/0xb0
      [ 1281.193985]  ? __handle_mm_fault+0x4e9/0x1170
      [ 1281.193987]  __sys_sendmsg+0x45/0x80
      [ 1281.193989]  SyS_sendmsg+0x12/0x20
      [ 1281.193991]  do_syscall_64+0x6e/0x180
      [ 1281.193993]  entry_SYSCALL64_slow_path+0x25/0x25
      [ 1281.193995] RIP: 0033:0x7f6ec122f5a0
      [ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX:
      000000000000002e
      [ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0
      [ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003
      [ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003
      [ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f
      [ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450
      [ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]---
      
      Fixes: 4493b81b ("bonding: initialize work-queues during creation of bond")
      Reported-by: NJoe Stringer <joe@ovn.org>
      Tested-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea8ffc08
    • D
      bpf, doc: update bpf maintainers entry · cdb90499
      Daniel Borkmann 提交于
      Add various related files that have been missing under
      BPF entry covering essential parts of its infrastructure
      and also add myself as co-maintainer.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdb90499
    • A
      net: arc_emac: switch to phy_start()/phy_stop() · b18b7453
      Alexander Kochetkov 提交于
      Currently driver use phy_start_aneg() in arc_emac_open() to bring
      up PHY. But phy_start() function is more appropriate for this purposes.
      Besides that it call phy_start_aneg() as part of PHY startup sequence
      it also can correctly bring up PHY from error and suspended states.
      So the patch replace phy_start_aneg() to phy_start().
      
      Also the patch add call to phy_stop() to arc_emac_stop() to allow
      the PHY device to be fully suspended when the interface is unused.
      Signed-off-by: NAlexander Kochetkov <al.kochet@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b18b7453
    • D
      net: qrtr: potential use after free in qrtr_sendmsg() · 6f60f438
      Dan Carpenter 提交于
      If skb_pad() fails then it frees the skb so we should check for errors.
      
      Fixes: bdabad3e ("net: Add Qualcomm IPC router")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f60f438
    • D
      bpf: Fix values type used in test_maps · 89087c45
      David Miller 提交于
      Maps of per-cpu type have their value element size adjusted to 8 if it
      is specified smaller during various map operations.
      
      This makes test_maps as a 32-bit binary fail, in fact the kernel
      writes past the end of the value's array on the user's stack.
      
      To be quite honest, I think the kernel should reject creation of a
      per-cpu map that doesn't have a value size of at least 8 if that's
      what the kernel is going to silently adjust to later.
      
      If the user passed something smaller, it is a sizeof() calcualtion
      based upon the type they will actually use (just like in this testcase
      code) in later calls to the map operations.
      
      Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      89087c45
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 6b633e82
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2017-04-20
      
      This adds the basic infrastructure for IPsec hardware
      offloading, it creates a configuration API and adjusts
      the packet path.
      
      1) Add the needed netdev features to configure IPsec offloads.
      
      2) Add the IPsec hardware offloading API.
      
      3) Prepare the ESP packet path for hardware offloading.
      
      4) Add gso handlers for esp4 and esp6, this implements
         the software fallback for GSO packets.
      
      5) Add xfrm replay handler functions for offloading.
      
      6) Change ESP to use a synchronous crypto algorithm on
         offloading, we don't have the option for asynchronous
         returns when we handle IPsec at layer2.
      
      7) Add a xfrm validate function to validate_xmit_skb. This
         implements the software fallback for non GSO packets.
      
      8) Set the inner_network and inner_transport members of
         the SKB, as well as encapsulation, to reflect the actual
         positions of these headers, and removes them only once
         encryption is done on the payload.
         From Ilan Tayari.
      
      9) Prepare the ESP GRO codepath for hardware offloading.
      
      10) Fix incorrect null pointer check in esp6.
          From Colin Ian King.
      
      11) Fix for the GSO software fallback path to detect the
          fallback correctly.
          From Ilan Tayari.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b633e82
    • S
      MAINTAINERS: Add new IPsec offloading files. · 77999328
      Steffen Klassert 提交于
      This adds two new files to IPsec maintenance scope:
      
      net/ipv4/esp4_offload.c
      net/ipv6/ip6_offload.c
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77999328
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 072cec77
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-04-19
      
      This series contains updates to i40e and i40evf only, most notable being
      the addition of trace points for BPF programs.
      
      Tobias Klauser updates i40evf to use net_device stats struct instead
      of a local private copy.
      
      Preethi updates the VF driver to not enable receive checksum offload by
      default for tunneled packets.
      
      Alex fixes an issue he introduced when he converted the code over to
      using the length field to determine if a descriptor was done or not.
      
      Mitch adds the ability to dump additional information on the VFs, which
      is not available through 'ip link show' using debugfs.
      
      Scott adds trace points to the drivers so that BPF programs can be
      attached for feature testing and verification.
      
      Jingjing adds admin queue functions for Pipeline Personalization Profile
      commands.
      
      Jake does most of the heavy lifting in this series, starting with the
      a reduction in the scope of the RTNL lock being held while resetting VFs
      to allow multiple PFs to reset in a timely manner.  Factored out the
      direct queue modification so that we are able to re-use the code.
      Reduced the wait time for admin queue commands to complete, since we were
      waiting a minimum of a millisecond, when in practice the admin queue
      command is processed often much faster.  Cleaned up code (flag) we never
      use.  Make the code to resetting all the VFs optimized for parallel
      computing instead of the current way is a serialized fashion, to help
      reduce the time it takes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      072cec77
    • S
      netvsc: fix use after free on module removal · 76bb5db5
      stephen hemminger 提交于
      The NAPI data structure is embedded in the netvsc_device structure
      and is freed when device is closed. There is still a reference
      (in NAPI list) to this which causes a crash in netif_napi_del
      when device is removed. Fix by managing NAPI instances correctly.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76bb5db5
    • D
      Merge branch 'tc-filter-cleanup-destroy-delete' · dfb05553
      David S. Miller 提交于
      Cong Wang says:
      
      ====================
      net_sched: clean up tc filter destroy and delete logic
      
      The first patch fixes a potenial race condition, the second one
      is pure cleanup.
      ====================
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfb05553
    • W
      net_sched: remove useless NULL to tp->root · 43920538
      WANG Cong 提交于
      There is no need to NULL tp->root in ->destroy(), since tp is
      going to be freed very soon, and existing readers are still
      safe to read them.
      
      For cls_route, we always init its tp->root, so it can't be NULL,
      we can drop more useless code.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43920538
    • W
      net_sched: move the empty tp check from ->destroy() to ->delete() · 763dbf63
      WANG Cong 提交于
      We could have a race condition where in ->classify() path we
      dereference tp->root and meanwhile a parallel ->destroy() makes it
      a NULL. Daniel cured this bug in commit d9363774
      ("net, sched: respect rcu grace period on cls destruction").
      
      This happens when ->destroy() is called for deleting a filter to
      check if we are the last one in tp, this tp is still linked and
      visible at that time. The root cause of this problem is the semantic
      of ->destroy(), it does two things (for non-force case):
      
      1) check if tp is empty
      2) if tp is empty we could really destroy it
      
      and its caller, if cares, needs to check its return value to see if it
      is really destroyed. Therefore we can't unlink tp unless we know it is
      empty.
      
      As suggested by Daniel, we could actually move the test logic to ->delete()
      so that we can safely unlink tp after ->delete() tells us the last one is
      just deleted and before ->destroy().
      
      Fixes: 1e052be6 ("net_sched: destroy proto tp when all filters are gone")
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      763dbf63
    • D
      net: ipv6: RTF_PCPU should not be settable from userspace · 557c44be
      David Ahern 提交于
      Andrey reported a fault in the IPv6 route code:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff880069809600 task.stack: ffff880062dc8000
      RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
      RSP: 0018:ffff880062dced30 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
      RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
      RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
      FS:  00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
      Call Trace:
       ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
       ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
      ...
      
      Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
      set. Flags passed to the kernel are blindly copied to the allocated
      rt6_info by ip6_route_info_create making a newly inserted route appear
      as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
      and expects rt->dst.from to be set - which it is not since it is not
      really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
      generates the fault.
      
      Fix by checking for the flag and failing with EINVAL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      557c44be
    • D
      bpf: add napi_id read access to __sk_buff · b1d9fc41
      Daniel Borkmann 提交于
      Add napi_id access to __sk_buff for socket filter program types, tc
      program types and other bpf_convert_ctx_access() users. Having access
      to skb->napi_id is useful for per RX queue listener siloing, f.e.
      in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
      used, meaning SO_REUSEPORT enabled listeners can then select the
      corresponding socket at SYN time already [1]. The skb is marked via
      skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
      
      Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d433902
      ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
      the NAPI ID associated with the queue for steering, which requires a
      prior sk_mark_napi_id() after the socket was looked up.
      
      Semantics for the __sk_buff napi_id access are similar, meaning if
      skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
      then an invalid napi_id of 0 is returned to the program, otherwise a
      valid non-zero napi_id.
      
        [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdfSuggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1d9fc41
    • K
      netvsc: Deal with rescinded channels correctly · 73e64fa4
      K. Y. Srinivasan 提交于
      We will not be able to send packets over a channel that has been
      rescinded. Make necessary adjustments so we can properly cleanup
      even when the channel is rescinded. This issue can be trigerred
      in the NIC hot-remove path.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73e64fa4
    • D
      Merge branch 'ibmvnic-updates-and-bug-fixes' · 87e978ed
      David S. Miller 提交于
      Nathan Fontenot says:
      
      ====================
      ibmvnic: Updates and bug fixes
      
      This set of patches is a series of updates to remove some unneeded
      and unused code in the driver as well as bug fixes for the
      ibmvnic driver.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87e978ed
    • N
      ibmvnic: Remove unused bouce buffer · d76e0fec
      Nathan Fontenot 提交于
      The bounce buffer is not used in the ibmvnic driver, just
      get rid of it.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d76e0fec
    • N
      ibmvnic: Allocate zero-filled memory for sub crqs · 7f7adc50
      Nathan Fontenot 提交于
      Update the allocation of memory for the sub crq structs and their
      associated pages to allocate zero-filled memory.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f7adc50
    • B
      ibmvnic: Disable irq prior to close · dd9c20fa
      Brian King 提交于
          Add some code to call disable_irq on all the vnic interface's irqs.
          This fixes a crash observed when closing an active interface, as
          seen in the oops below when we try to access a buffer in the interrupt
          handler which we've already freed.
      
          Unable to handle kernel paging request for data at address 0x00000001
          Faulting instruction address: 0xd000000003886824
          Oops: Kernel access of bad area, sig: 11 [#1]
          SMP NR_CPUS=2048 NUMA pSeries
          Modules linked in: ibmvnic(OEN) rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_
          Supported: No, Unsupported modules are loaded
          CPU: 8 PID: 0 Comm: swapper/8 Tainted: G           OE   NX 4.4.49-92.11-default #1
          task: c00000007f990110 ti: c0000000fffa0000 task.ti: c00000007f9b8000
          NIP: d000000003886824 LR: d000000003886824 CTR: c0000000007eff60
          REGS: c0000000fffa3a70 TRAP: 0300   Tainted: G           OE   NX  (4.4.49-92.11-default)
          MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22008042  XER: 20000008
          CFAR: c000000000008468 DAR: 0000000000000001 DSISR: 40000000 SOFTE: 0
          GPR00: d000000003886824 c0000000fffa3cf0 d000000003894118 0000000000000000
          GPR04: 0000000000000000 0000000000000000 c000000001249da0 0000000000000000
          GPR08: 000000000000000e 0000000000000000 c0000000ccb00000 d000000003889180
          GPR12: c0000000007eff60 c000000007af4c00 0000000000000001 c0000000010def30
          GPR16: c00000007f9b8000 c000000000b98c30 c00000007f9b8080 c000000000bab858
          GPR20: 0000000000000005 0000000000000000 c0000000ff5d7e80 c0000000f809f648
          GPR24: c0000000ff5d7ec8 0000000000000000 0000000000000000 c0000000ccb001a0
          GPR28: 000000000000000a c0000000f809f600 c0000000fd4cd900 c0000000f9cd5b00
          NIP [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic]
          LR [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic]
          Call Trace:
          [c0000000fffa3cf0] [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic] (unreliable)
          [c0000000fffa3dd0] [c000000000132940] __handle_irq_event_percpu+0x90/0x2e0
          [c0000000fffa3e90] [c000000000132bcc] handle_irq_event_percpu+0x3c/0x90
          [c0000000fffa3ed0] [c000000000132c88] handle_irq_event+0x68/0xc0
          [c0000000fffa3f00] [c000000000137edc] handle_fasteoi_irq+0xec/0x250
          [c0000000fffa3f30] [c000000000131b04] generic_handle_irq+0x54/0x80
          [c0000000fffa3f60] [c000000000011190] __do_irq+0x80/0x1d0
          [c0000000fffa3f90] [c0000000000248d8] call_do_irq+0x14/0x24
          [c00000007f9bb9e0] [c000000000011380] do_IRQ+0xa0/0x120
          [c00000007f9bba40] [c000000000002594] hardware_interrupt_common+0x114/0x180
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd9c20fa
    • N
      ibmvnic: Correct crq and resource releasing · 37489055
      Nathan Fontenot 提交于
      We should not be releasing the crq's when calling close for the
      adapter, these need to remain open to facilitate operations such
      as updating the mac address. The crq's should be released in the
      adpaters remove routine.
      
      Additionally, we need to call release_reources from remove. This
      corrects the scenario of trying to remove an adapter that has only
      been probed.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37489055
    • N
      ibmvnic: Remove inflight list · 661a2622
      Nathan Fontenot 提交于
      The inflight list used to track memory that is allocated for crq that are
      inflight is not needed. The one piece of the inflight list that does need
      to be cleaned at module exit is the error buffer list which is already
      attached to the adapter struct.
      
      This patch removes the inflight list and moves checking the error buffer
      list to ibmvnic_remove.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      661a2622
    • B
      ibmvnic: Do not disable IRQ after scheduling tasklet · ed7ecbf7
      Brian King 提交于
      Since the primary CRQ is only used for service functions and
      not in the performance path, simplify the code a bit and avoid
      disabling the IRQ.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed7ecbf7
    • B
      ibmvnic: Fixup atomic API usage · 58c8c0c0
      Brian King 提交于
      Replace a couple of modifications of an atomic followed
      by a read of the atomic, which is no longer atomic, to
      use atomic_XX_return variants to avoid race conditions.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58c8c0c0
    • B
      ibmvnic: Unmap longer term buffer before free · 59af56c2
      Brian King 提交于
      Make sure we unregister long term buffers from the adapter
      prior to DMA unmapping it and freeing the buffer. Failure
      to do so could result in a DMA to a now invalid address.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59af56c2
    • M
      ibmvnic: Fix ibmvnic_change_mac_addr struct format · 993a82b0
      Murilo Fossa Vicentini 提交于
      The ibmvnic_change_mac_addr struct alignment was not matching the defined
      format in PAPR+, it had the reserved and return code fields swapped. As a
      consequence, the CHANGE_MAC_ADDR_RSP commands were being improperly handled
      and executed even when the operation wasn't successfully completed by the
      system firmware.
      
      Also changing the endianness of the debug message to make it easier to
      parse the CRQ content.
      Signed-off-by: NMurilo Fossa Vicentini <muvic@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      993a82b0
    • T
      ibmvnic: Report errors when failing to release sub-crqs · ffa73855
      Thomas Falcon 提交于
      Add reporting of errors when releasing sub-crqs fails.
      Signed-off-by: NThomas Falcon <tlfalcon@us.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffa73855
    • I
      gso: Validate assumption of frag_list segementation · 43170c4e
      Ilan Tayari 提交于
      Commit 07b26c94 ("gso: Support partial splitting at the frag_list
      pointer") assumes that all SKBs in a frag_list (except maybe the last
      one) contain the same amount of GSO payload.
      
      This assumption is not always correct, resulting in the following
      warning message in the log:
          skb_segment: too many frags
      
      For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
      one frag, and some with 2 frags.
      After GRO, the frag_list SKBs end up having different amounts of payload.
      If this frag_list SKB is then forwarded, the aforementioned assumption
      is violated.
      
      Validate the assumption, and fall back to software GSO if it not true.
      
      Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
      Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43170c4e
    • A
      liquidio: remove unnecessary variable assignment · ca1cb28d
      Arnd Bergmann 提交于
      gcc points out an useless assignment that was added during code refactoring:
      
      drivers/net/ethernet/cavium/liquidio/lio_ethtool.c: In function 'octnet_intrmod_callback':
      drivers/net/ethernet/cavium/liquidio/lio_ethtool.c:1315:59: error: parameter 'oct_dev' set but not used [-Werror=unused-but-set-parameter]
      
      This is harmless but can clearly be remove to avoid the warning.
      
      Fixes: 50c0add5 ("liquidio: refactor interrupt moderation code")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca1cb28d
    • D
      Merge branch 'skb_cow_head' · 918b7024
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      net: use skb_cow_head() to deal with cloned skbs
      
      James Hughes found an issue with smsc95xx driver. Same problematic code
      is found in other drivers.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      918b7024
    • E
      kaweth: use skb_cow_head() to deal with cloned skbs · 39fba783
      Eric Dumazet 提交于
      We can use skb_cow_head() to properly deal with clones,
      especially the ones coming from TCP stack that allow their head being
      modified. This avoids a copy.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39fba783
    • E
      ch9200: use skb_cow_head() to deal with cloned skbs · 6bc6895b
      Eric Dumazet 提交于
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: 4a476bd6 ("usbnet: New driver for QinHeng CH9200 devices")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bc6895b
    • E
      lan78xx: use skb_cow_head() to deal with cloned skbs · d4ca7359
      Eric Dumazet 提交于
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Cc: Woojung Huh <woojung.huh@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4ca7359