1. 21 5月, 2016 40 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 087afe8a
      Linus Torvalds 提交于
      Pull networking fixes and more updates from David Miller:
      
       1) Tunneling fixes from Tom Herbert and Alexander Duyck.
      
       2) AF_UNIX updates some struct sock bit fields with the socket lock,
          whereas setsockopt() sets overlapping ones with locking.  Seperate
          out the synchronized vs.  the AF_UNIX unsynchronized ones to avoid
          corruption.  From Andrey Ryabinin.
      
       3) Mount BPF filesystem with mount_nodev rather than mount_ns, from
          Eric Biederman.
      
       4) A couple kmemdup conversions, from Muhammad Falak R Wani.
      
       5) BPF verifier fixes from Alexei Starovoitov.
      
       6) Don't let tunneled UDP packets get stuck in socket queues, if
          something goes wrong during the encapsulation just drop the packet
          rather than signalling an error up the call stack.  From Hannes
          Frederic Sowa.
      
       7) SKB ref after free in batman-adv, from Florian Westphal.
      
       8) TCP iSCSI, ocfs2, rds, and tipc have to disable BH in it's TCP
          callbacks since the TCP stack runs pre-emptibly now.  From Eric
          Dumazet.
      
       9) Fix crash in fixed_phy_add, from Rabin Vincent.
      
      10) Fix length checks in xen-netback, from Paul Durrant.
      
      11) Fix mixup in KEY vs KEYID macsec attributes, from Sabrina Dubroca.
      
      12) RDS connection spamming bug fixes from Sowmini Varadhan
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
        net: suppress warnings on dev_alloc_skb
        uapi glibc compat: fix compilation when !__USE_MISC in glibc
        udp: prevent skbs lingering in tunnel socket queues
        bpf: teach verifier to recognize imm += ptr pattern
        bpf: support decreasing order in direct packet access
        net: usb: ch9200: use kmemdup
        ps3_gelic: use kmemdup
        net:liquidio: use kmemdup
        bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
        net: cdc_ncm: update datagram size after changing mtu
        tuntap: correctly wake up process during uninit
        intel: Add support for IPv6 IP-in-IP offload
        ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE
        RDS: TCP: Avoid rds connection churn from rogue SYNs
        RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp
        net: sock: move ->sk_shutdown out of bitfields.
        ipv6: Don't reset inner headers in ip6_tnl_xmit
        ip4ip6: Support for GSO/GRO
        ip6ip6: Support for GSO/GRO
        ipv6: Set features for IPv6 tunnels
        ...
      087afe8a
    • P
      locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait() · 54cf809b
      Peter Zijlstra 提交于
      Similar to commits:
      
        51d7d520 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
        d86b8da0 ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
      
      qspinlock suffers from the fact that the _Q_LOCKED_VAL store is
      unordered inside the ACQUIRE of the lock.
      
      And while this is not a problem for the regular mutual exclusive
      critical section usage of spinlocks, it breaks creative locking like:
      
      	spin_lock(A)			spin_lock(B)
      	spin_unlock_wait(B)		if (!spin_is_locked(A))
      	do_something()			  do_something()
      
      In that both CPUs can end up running do_something at the same time,
      because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait()
      spin_is_locked() loads (even on x86!!).
      
      To avoid making the normal case slower, add smp_mb()s to the less used
      spin_unlock_wait() / spin_is_locked() side of things to avoid this
      problem.
      Reported-and-tested-by: NDavidlohr Bueso <dave@stgolabs.net>
      Reported-by: NGiovanni Gherdovich <ggherdovich@suse.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org   # v4.2 and later
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cf809b
    • L
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · b99a9e87
      Linus Torvalds 提交于
      Pull cifs fixes from Steve French:
       "Two small cifs fixes, including one spnego upcall cifs security fix
        for stable"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Remove some obsolete comments
        cifs: Create dedicated keyring for spnego operations
      b99a9e87
    • N
      net: suppress warnings on dev_alloc_skb · 95829b3a
      Neil Horman 提交于
      Noticed an allocation failure in a network driver the other day on a 32 bit
      system:
      
      DMA-API: debugging out of memory - disabling
      bnx2fc: adapter_lookup: hba NULL
      lldpad: page allocation failure. order:0, mode:0x4120
      Pid: 4556, comm: lldpad Not tainted 2.6.32-639.el6.i686.debug #1
      Call Trace:
       [<c08a4086>] ? printk+0x19/0x23
       [<c05166a4>] ? __alloc_pages_nodemask+0x664/0x830
       [<c0649d02>] ? free_object+0x82/0xa0
       [<fb4e2c9b>] ? ixgbe_alloc_rx_buffers+0x10b/0x1d0 [ixgbe]
       [<fb4e2fff>] ? ixgbe_configure_rx_ring+0x29f/0x420 [ixgbe]
       [<fb4e228c>] ? ixgbe_configure_tx_ring+0x15c/0x220 [ixgbe]
       [<fb4e3709>] ? ixgbe_configure+0x589/0xc00 [ixgbe]
       [<fb4e7be7>] ? ixgbe_open+0xa7/0x5c0 [ixgbe]
       [<fb503ce6>] ? ixgbe_init_interrupt_scheme+0x5b6/0x970 [ixgbe]
       [<fb4e8e54>] ? ixgbe_setup_tc+0x1a4/0x260 [ixgbe]
       [<fb505a9f>] ? ixgbe_dcbnl_set_state+0x7f/0x90 [ixgbe]
       [<c088d80d>] ? dcb_doit+0x10ed/0x16d0
      ...
      
      Thought that perhaps the big splat in the logs wasn't really necessecary, as
      all call sites for dev_alloc_skb:
      
      a) check the return code for the function
      
      and
      
      b) either print their own error message or have a recovery path that makes the
      warning moot.
      
      Fix it by modifying dev_alloc_pages to pass __GFP_NOWARN as a gfp flag to
      suppress the warning
      
      applies to the net tree
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Alexander Duyck <alexander.duyck@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95829b3a
    • N
      uapi glibc compat: fix compilation when !__USE_MISC in glibc · f0a3fdca
      Nicolas Dichtel 提交于
      These structures are defined only if __USE_MISC is set in glibc net/if.h
      headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.
      
      CC: Jan Engelhardt <jengelh@inai.de>
      CC: Josh Boyer <jwboyer@fedoraproject.org>
      CC: Stephen Hemminger <shemming@brocade.com>
      CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
      CC: Gabriel Laskar <gabriel@lse.epita.fr>
      CC: Mikko Rapeli <mikko.rapeli@iki.fi>
      Fixes: 4a91cb61 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0a3fdca
    • H
      udp: prevent skbs lingering in tunnel socket queues · e5aed006
      Hannes Frederic Sowa 提交于
      In case we find a socket with encapsulation enabled we should call
      the encap_recv function even if just a udp header without payload is
      available. The callbacks are responsible for correctly verifying and
      dropping the packets.
      
      Also, in case the header validation fails for geneve and vxlan we
      shouldn't put the skb back into the socket queue, no one will pick
      them up there.  Instead we can simply discard them in the respective
      encap_recv functions.
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5aed006
    • D
      Merge branch 'bpf-verifier-fixes' · cb543e80
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      bpf: verifier fixes
      
      Further testing of 'direct packet access' uncovered
      several usability issues. Fix them.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb543e80
    • A
      bpf: teach verifier to recognize imm += ptr pattern · 1b9b69ec
      Alexei Starovoitov 提交于
      Humans don't write C code like:
        u8 *ptr = skb->data;
        int imm = 4;
        imm += ptr;
      but from llvm backend point of view 'imm' and 'ptr' are registers and
      imm += ptr may be preferred vs ptr += imm depending which register value
      will be used further in the code, while verifier can only recognize ptr += imm.
      That caused small unrelated changes in the C code of the bpf program to
      trigger rejection by the verifier. Therefore teach the verifier to recognize
      both ptr += imm and imm += ptr.
      For example:
      when R6=pkt(id=0,off=0,r=62) R7=imm22
      after r7 += r6 instruction
      will be R6=pkt(id=0,off=0,r=62) R7=pkt(id=0,off=22,r=62)
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b9b69ec
    • A
      bpf: support decreasing order in direct packet access · d91b28ed
      Alexei Starovoitov 提交于
      when packet headers are accessed in 'decreasing' order (like TCP port
      may be fetched before the program reads IP src) the llvm may generate
      the following code:
      [...]                // R7=pkt(id=0,off=22,r=70)
      r2 = *(u32 *)(r7 +0) // good access
      [...]
      r7 += 40             // R7=pkt(id=0,off=62,r=70)
      r8 = *(u32 *)(r7 +0) // good access
      [...]
      r1 = *(u32 *)(r7 -20) // this one will fail though it's within a safe range
                            // it's doing *(u32*)(skb->data + 42)
      Fix verifier to recognize such code pattern
      
      Alos turned out that 'off > range' condition is not a verifier bug.
      It's a buggy program that may do something like:
      if (ptr + 50 > data_end)
        return 0;
      ptr += 60;
      *(u32*)ptr;
      in such case emit
      "invalid access to packet, off=0 size=4, R1(id=0,off=60,r=50)" error message,
      so all information is available for the program author to fix the program.
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d91b28ed
    • M
      net: usb: ch9200: use kmemdup · 238a9584
      Muhammad Falak R Wani 提交于
      Use kmemdup when some other buffer is immediately copied into allocated
      region. It replaces call to allocation followed by memcpy, by a single
      call to kmemdup.
      Signed-off-by: NMuhammad Falak R Wani <falakreyaz@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      238a9584
    • M
      ps3_gelic: use kmemdup · 5877debe
      Muhammad Falak R Wani 提交于
      Use kmemdup when some other buffer is immediately copied into allocated
      region. It replaces call to allocation followed by memcpy, by a single
      call to kmemdup.
      Signed-off-by: NMuhammad Falak R Wani <falakreyaz@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5877debe
    • M
      net:liquidio: use kmemdup · 7c542772
      Muhammad Falak R Wani 提交于
      Use kmemdup when some other buffer is immediately copied into allocated
      region. It replaces call to allocation followed by memcpy, by a single
      call to kmemdup.
      Signed-off-by: NMuhammad Falak R Wani <falakreyaz@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c542772
    • E
      bpf: Use mount_nodev not mount_ns to mount the bpf filesystem · e27f4a94
      Eric W. Biederman 提交于
      While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
      bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
      with current->nsproxy->mnt_ns. As the code does not acquire a
      reference to the mount namespace it can not possibly be correct to
      store the mount namespace on the superblock as it does.
      
      Replace mount_ns with mount_nodev so that each mount of the bpf
      filesystem returns a distinct instance, and the code is not buggy.
      
      In discussion with Hannes Frederic Sowa it was reported that the use
      of mount_ns was an attempt to have one bpf instance per mount
      namespace, in an attempt to keep resources that pin resources from
      hiding.  That intent simply does not work, the vfs is not built to
      allow that kind of behavior.  Which means that the bpf filesystem
      really is buggy both semantically and in it's implemenation as it does
      not nor can it implement the original intent.
      
      This change is userspace visible, but my experience with similar
      filesystems leads me to believe nothing will break with a model of each
      mount of the bpf filesystem is distinct from all others.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e27f4a94
    • D
      Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of... · 56025caa
      David S. Miller 提交于
      Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers patches for 4.7
      
      Major changes:
      
      iwlwifi
      
      * remove IWLWIFI_DEBUG_EXPERIMENTAL_UCODE kconfig option
      * work for RX multiqueue continues
      * dynamic queue allocation work continues
      * add Luca as maintainer
      * a bunch of fixes and improvements all over
      
      brcmfmac
      
      * add 4356 sdio support
      
      ath6kl
      
      * add ability to set debug uart baud rate with a module parameter
      
      wil6210
      
      * add debugfs file to configure firmware led functionality
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56025caa
    • R
      net: cdc_ncm: update datagram size after changing mtu · 05a56487
      Rafal Redzimski 提交于
      Current implementation updates the mtu size and notify cdc_ncm
      device using USB_CDC_SET_MAX_DATAGRAM_SIZE request about datagram
      size change instead of changing rx_urb_size.
      
      Whenever mtu is being changed, datagram size should also be
      updated. Also updating maxmtu formula so it takes max_datagram_size with
      use of cdc_ncm_max_dgram_size() and not ctx.
      Signed-off-by: NRobert Dobrowolski <robert.dobrowolski@linux.intel.com>
      Signed-off-by: NRafal Redzimski <rafal.f.redzimski@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05a56487
    • J
      tuntap: correctly wake up process during uninit · addf8fc4
      Jason Wang 提交于
      We used to check dev->reg_state against NETREG_REGISTERED after each
      time we are woke up. But after commit 9e641bdc ("net-tun:
      restructure tun_do_read for better sleep/wakeup efficiency"), it uses
      skb_recv_datagram() which does not check dev->reg_state. This will
      result if we delete a tun/tap device after a process is blocked in the
      reading. The device will wait for the reference count which was held
      by that process for ever.
      
      Fixes this by using RCV_SHUTDOWN which will be checked during
      sk_recv_datagram() before trying to wake up the process during uninit.
      
      Fixes: 9e641bdc ("net-tun: restructure tun_do_read for better
      sleep/wakeup efficiency")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Xi Wang <xii@google.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      addf8fc4
    • D
      Merge branch 'GREoIPV6-followups' · 7fd3c56d
      David S. Miller 提交于
      Alexander Duyck says:
      
      ====================
      Follow-ups for GUEoIPv6 patches
      
      This patch series is meant to be applied after:
      [PATCH v7 net-next 00/16] ipv6: Enable GUEoIPv6 and more fixes for v6 tunneling
      
      The first patch addresses an issue we already resolved in the GREv4 and is
      now present in GREv6 with the introduction of FOU/GUE for IPv6 based GRE
      tunnels.
      
      The second patch goes through and enables IPv6 tunnel offloads for the Intel
      NICs that already support the IPv4 based IP-in-IP tunnel offloads.  I have
      only done a bit of touch testing but have seen ~20 Gb/s over an i40e
      interface using a v4-in-v6 tunnel, and I have verified IPv6 GRE is still
      passing traffic at around the same rate.  I plan to do further testing but
      with these patches present it should enable a wider audience to be able to
      test the new features introduced in Tom's patchset with hardware offloads.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fd3c56d
    • A
      intel: Add support for IPv6 IP-in-IP offload · bf2d1df3
      Alexander Duyck 提交于
      This patch adds support for offloading IPXIP6 type packets that represent
      either IPv4 or IPv6 encapsulated inside of an IPv6 outer IP header.  In
      addition with this change we should also be able to support FOU
      encapsulated traffic with outer IPv6 headers.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf2d1df3
    • A
      ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE · 6a553681
      Alexander Duyck 提交于
      This patch addresses the same issue we had for IPv4 where enabling GRE with
      an inner checksum cannot be supported with FOU/GUE due to the fact that
      they will jump past the GRE header at it is treated like a tunnel header.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a553681
    • D
      Merge branch 'rds-conn-spamming' · 9a0351df
      David S. Miller 提交于
      Sowmini Varadhan says:
      
      ====================
      RDS: TCP: connection spamming fixes
      
      We have been testing the RDS-TCP code with a connection spammer
      that sends incoming SYNs to the RDS listen port well after
      an rds-tcp connection has been established, and found a few
      race-windows that are fixed by this patch series.
      
      Patch 1 avoids a null pointer deref when an incoming SYN
      shows up when a netns is being dismantled, or when the
      rds-tcp module is being unloaded.
      
      Patch 2 addresses the case when a SYN is received after the
      connection arbitration algorithm has converged: the incoming
      SYN should not needlessly quiesce the transmit path, and it
      should not result in needless TCP connection resets due to
      re-execution of the connection arbitration logic.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a0351df
    • S
      RDS: TCP: Avoid rds connection churn from rogue SYNs · c948bb5c
      Sowmini Varadhan 提交于
      When a rogue SYN is received after the connection arbitration
      algorithm has converged, the incoming SYN should not needlessly
      quiesce the transmit path, and it should not result in needless
      TCP connection resets due to re-execution of the connection
      arbitration logic.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c948bb5c
    • S
      RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp · 37e14f4f
      Sowmini Varadhan 提交于
      There are two instances where we want to terminate RDS-TCP: when
      exiting the netns or during module unload. In either case, the
      termination sequence is to stop the listen socket, mark the
      rtn->rds_tcp_listen_sock as null, and flush any accept workqs.
      Thus any workqs that get flushed at this point will encounter a
      null rds_tcp_listen_sock, and must exit gracefully to allow
      the RDS-TCP termination to complete successfully.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37e14f4f
    • L
      Merge tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · be1332c0
      Linus Torvalds 提交于
      Pull GFS2 updates from Bob Peterson:
       "We've got nine patches this time:
      
         - Abhi Das has two patches that fix a GFS2 splice issue (and an
           adjustment).
      
         - Ben Marzinski has a patch which allows the proper unmount of a GFS2
           file system after hitting a withdraw error.
      
         - I have a patch to fix a problem where GFS2 would dereference an
           error value, plus three cosmetic / refactoring patches.
      
         - Daniel DeFreez has a patch to fix two glock reference count
           problems, where GFS2 was not properly "uninitializing" its glock
           holder on error paths.
      
         - Denys Vlasenko has a patch to change a function to not be inlined,
           thus reducing the memory footprint of the GFS2 module"
      
      * tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        GFS2: Refactor gfs2_remove_from_journal
        GFS2: Remove allocation parms from gfs2_rbm_find
        gfs2: use inode_lock/unlock instead of accessing i_mutex directly
        GFS2: Add calls to gfs2_holder_uninit in two error handlers
        GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
        GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
        gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
        GFS2: Get rid of dead code in inode_go_demote_ok
        GFS2: ignore unlock failures after withdraw
      be1332c0
    • A
      net: sock: move ->sk_shutdown out of bitfields. · fc64869c
      Andrey Ryabinin 提交于
      ->sk_shutdown bits share one bitfield with some other bits in sock struct,
      such as ->sk_no_check_[r,t]x, ->sk_userlocks ...
      sock_setsockopt() may write to these bits, while holding the socket lock.
      
      In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only
      unix_state_lock(). So concurrent setsockopt() and shutdown() may lead
      to corrupting these bits.
      
      Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte.
      This will not change the 'struct sock' size since ->sk_shutdown moved into
      previously unused 16-bit hole.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc64869c
    • D
      Merge branch 'GREoIPV6' · 911f6b1f
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      ipv6: Enable GUEoIPv6 and more fixes for v6 tunneling
      
      This patch set:
        - Fixes GRE6 to process translate flags correctly from configuration
        - Adds support for GSO and GRO for ip6ip6 and ip4ip6
        - Add support for FOU and GUE in IPv6
        - Support GRE, ip6ip6 and ip4ip6 over FOU/GUE
        - Fixes ip6_input to deal with UDP encapsulations
        - Some other minor fixes
      
      v2:
        - Removed a check of GSO types in MPLS
        - Define GSO type SKB_GSO_IPXIP6 and SKB_GSO_IPXIP4 (based on input
          from Alexander)
        - Don't define GSO types specifically for IP6IP6 and IP4IP6, above
          fix makes that unnecessary
        - Don't bother clearing encapsulation flag in UDP tunnel segment
          (another item suggested by Alexander).
      
      v3:
        - Address some minor comments from Alexander
      
      v4:
        - Rebase on changes to fix IP TX tunnels
        - Fix MTU issues in ip4ip6, ip6ip6
        - Add test data for above
      
      v5:
        - Address feedback from Shmulik Ladkani regarding extension header
          code that does not return next header but in instead relies
          on returning value via nhoff. Solution here is to fix EH
          processing to return nexthdr value.
        - Refactored IPv4 encaps so that we won't need to create
          a ip6_tunnel_core.c when adding encap support IPv6.
      
      v6:
        - Fix build issues with regard to new GSO constants
        - FIx MTU calculation issues ip6_tunnel.c pointed out byt ALex
        - Add encap_hlen into headroom for GREv6 to work with FOU/GUE
      
      v7:
        - Added skb_set_inner_ipproto to ip4ip6 and ip6ip6
        - Clarified max_headroom in ip6_tnl_xmit
        - Set features for IPv6 tunnels
        - Other cleanup suggested by Alexander
        - Above fixes throughput performance issues in ip4ip6 and ip6ip6,
          updated test results to reflect that
      
      Tested: Various cases of IP tunnels with netperf TCP_STREAM and TCP_RR.
      
          - IPv4/GRE/GUE/IPv6 with RCO
            1 TCP_STREAM
            	6616 Mbps
            200 TCP_RR
      	1244043 tps
              141/243/446 90/95/99% latencies
      	86.61% CPU utilization
      
          - IPv6/GRE/GUE/IPv6 with RCO
            1 TCP_STREAM
      	6940 Mbps
            200 TCP_RR
      	1270903 tps
      	138/236/440 90/95/99% latencies
      	87.51% CPU utilization
      
           - IP6IP6
            1 TCP_STREAM
      	5307 Mbps
            200 TCP_RR
      	498981 tps
      	388/498/631 90/95/99% latencies
      	19.75% CPU utilization (1 CPU saturated)
      
           - IP6IP6/GUE with RCO
            1 TCP_STREAM
      	5575 Mbps
            200 TCP_RR
      	1233818 tps
      	143/244/451 90/95/99% latencies
      	87.57 CPU utilization
      
           - IP4IP6
            1 TCP_STREAM
      	5235 Mbps
            200 TCP_RR
      	763774 tps
      	250/318/466 90/95/99% latencies
      	35.25% CPU utilization (1 CPU saturated)
      
           - IP4IP6/GUE with RCO
            1 TCP_STREAM
      	5337 Mbps
            200 TCP_RR
      	1196385 tps
      	148/251/460 90/95/99% latencies
      	87.56 CPU utilization
      
           - GRE with keyid
            200 TCP_RR
      	744173 tps
      	258/332/461 90/95/99% latencies
      	34.59% CPU utilization (1 CPU saturated)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      911f6b1f
    • T
      ipv6: Don't reset inner headers in ip6_tnl_xmit · 3ee93eaf
      Tom Herbert 提交于
      Since iptunnel_handle_offloads() is called in all paths we can
      probably drop the block in ip6_tnl_xmit that was checking for
      skb->encapsulation and resetting the inner headers.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ee93eaf
    • T
      ip4ip6: Support for GSO/GRO · b8921ca8
      Tom Herbert 提交于
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8921ca8
    • T
      ip6ip6: Support for GSO/GRO · 815d22e5
      Tom Herbert 提交于
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      815d22e5
    • T
      ipv6: Set features for IPv6 tunnels · 51c052d4
      Tom Herbert 提交于
      Need to set dev features, use same values that are used in GREv6.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c052d4
    • T
      ip6_tunnel: Add support for fou/gue encapsulation · b3a27b51
      Tom Herbert 提交于
      Add netlink and setup for encapsulation
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3a27b51
    • T
      ip6_gre: Add support for fou/gue encapsulation · 1faf3d9f
      Tom Herbert 提交于
      Add netlink and setup for encapsulation
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1faf3d9f
    • T
      fou: Add encap ops for IPv6 tunnels · aa3463d6
      Tom Herbert 提交于
      This patch add a new fou6 module that provides encapsulation
      operations for IPv6.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa3463d6
    • T
      ip6_tun: Add infrastructure for doing encapsulation · 058214a4
      Tom Herbert 提交于
      Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
      for getting encap hlen, setting up encap on a tunnel, performing
      encapsulation operation.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058214a4
    • T
      fou: Support IPv6 in fou · 5f914b68
      Tom Herbert 提交于
      This patch adds receive path support for IPv6 with fou.
      
      - Add address family to fou structure for open sockets. This supports
        AF_INET and AF_INET6. Lookups for fou ports are performed on both the
        port number and family.
      - In fou and gue receive adjust tot_len in IPv4 header or payload_len
        based on address family.
      - Allow AF_INET6 in FOU_ATTR_AF netlink attribute.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f914b68
    • T
      fou: Split out {fou,gue}_build_header · dc969b81
      Tom Herbert 提交于
      Create __fou_build_header and __gue_build_header. These implement the
      protocol generic parts of building the fou and gue header.
      fou_build_header and gue_build_header implement the IPv4 specific
      functions and call the __*_build_header functions.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc969b81
    • T
      fou: Call setup_udp_tunnel_sock · 440924bb
      Tom Herbert 提交于
      Use helper function to set up UDP tunnel related information for a fou
      socket.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      440924bb
    • T
      net: Cleanup encap items in ip_tunnels.h · 55c2bc14
      Tom Herbert 提交于
      Consolidate all the ip_tunnel_encap definitions in one spot in the
      header file. Also, move ip_encap_hlen and ip_tunnel_encap from
      ip_tunnel.c to ip_tunnels.h so they call be called without a dependency
      on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55c2bc14
    • T
      ipv6: Change "final" protocol processing for encapsulation · 1da44f9c
      Tom Herbert 提交于
      When performing foo-over-UDP, UDP packets are processed by the
      encapsulation handler which returns another protocol to process.
      This may result in processing two (or more) protocols in the
      loop that are marked as INET6_PROTO_FINAL. The actions taken
      for hitting a final protocol, in particular the skb_postpull_rcsum
      can only be performed once.
      
      This patch set adds a check of a final protocol has been seen. The
      rules are:
        - If the final protocol has not been seen any protocol is processed
          (final and non-final). In the case of a final protocol, the final
          actions are taken (like the skb_postpull_rcsum)
        - If a final protocol has been seen (e.g. an encapsulating UDP
          header) then no further non-final protocols are allowed
          (e.g. extension headers). For more final protocols the
          final actions are not taken (e.g. skb_postpull_rcsum).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1da44f9c
    • T
      ipv6: Fix nexthdr for reinjection · 4c64242a
      Tom Herbert 提交于
      In ip6_input_finish the nexthdr protocol is retrieved from the
      next header offset that is returned in the cb of the skb.
      This method does not work for UDP encapsulation that may not
      even have a concept of a nexthdr field (e.g. FOU).
      
      This patch checks for a final protocol (INET6_PROTO_FINAL) when a
      protocol handler returns > 0. If the protocol is not final then
      resubmission is performed on nhoff value. If the protocol is final
      then the nexthdr is taken to be the return value.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c64242a
    • T
      net: define gso types for IPx over IPv4 and IPv6 · 7e13318d
      Tom Herbert 提交于
      This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
      SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
      NETIF_F_GSO_IPXIP6. These are used to described IP in IP
      tunnel and what the outer protocol is. The inner protocol
      can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
      SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
      are removed (these are both instances of SKB_GSO_IPXIP4).
      SKB_GSO_IPXIP6 will be used when support for GSO with IP
      encapsulation over IPv6 is added.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e13318d