1. 22 4月, 2017 25 次提交
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 6b633e82
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2017-04-20
      
      This adds the basic infrastructure for IPsec hardware
      offloading, it creates a configuration API and adjusts
      the packet path.
      
      1) Add the needed netdev features to configure IPsec offloads.
      
      2) Add the IPsec hardware offloading API.
      
      3) Prepare the ESP packet path for hardware offloading.
      
      4) Add gso handlers for esp4 and esp6, this implements
         the software fallback for GSO packets.
      
      5) Add xfrm replay handler functions for offloading.
      
      6) Change ESP to use a synchronous crypto algorithm on
         offloading, we don't have the option for asynchronous
         returns when we handle IPsec at layer2.
      
      7) Add a xfrm validate function to validate_xmit_skb. This
         implements the software fallback for non GSO packets.
      
      8) Set the inner_network and inner_transport members of
         the SKB, as well as encapsulation, to reflect the actual
         positions of these headers, and removes them only once
         encryption is done on the payload.
         From Ilan Tayari.
      
      9) Prepare the ESP GRO codepath for hardware offloading.
      
      10) Fix incorrect null pointer check in esp6.
          From Colin Ian King.
      
      11) Fix for the GSO software fallback path to detect the
          fallback correctly.
          From Ilan Tayari.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b633e82
    • S
      MAINTAINERS: Add new IPsec offloading files. · 77999328
      Steffen Klassert 提交于
      This adds two new files to IPsec maintenance scope:
      
      net/ipv4/esp4_offload.c
      net/ipv6/ip6_offload.c
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77999328
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 072cec77
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-04-19
      
      This series contains updates to i40e and i40evf only, most notable being
      the addition of trace points for BPF programs.
      
      Tobias Klauser updates i40evf to use net_device stats struct instead
      of a local private copy.
      
      Preethi updates the VF driver to not enable receive checksum offload by
      default for tunneled packets.
      
      Alex fixes an issue he introduced when he converted the code over to
      using the length field to determine if a descriptor was done or not.
      
      Mitch adds the ability to dump additional information on the VFs, which
      is not available through 'ip link show' using debugfs.
      
      Scott adds trace points to the drivers so that BPF programs can be
      attached for feature testing and verification.
      
      Jingjing adds admin queue functions for Pipeline Personalization Profile
      commands.
      
      Jake does most of the heavy lifting in this series, starting with the
      a reduction in the scope of the RTNL lock being held while resetting VFs
      to allow multiple PFs to reset in a timely manner.  Factored out the
      direct queue modification so that we are able to re-use the code.
      Reduced the wait time for admin queue commands to complete, since we were
      waiting a minimum of a millisecond, when in practice the admin queue
      command is processed often much faster.  Cleaned up code (flag) we never
      use.  Make the code to resetting all the VFs optimized for parallel
      computing instead of the current way is a serialized fashion, to help
      reduce the time it takes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      072cec77
    • S
      netvsc: fix use after free on module removal · 76bb5db5
      stephen hemminger 提交于
      The NAPI data structure is embedded in the netvsc_device structure
      and is freed when device is closed. There is still a reference
      (in NAPI list) to this which causes a crash in netif_napi_del
      when device is removed. Fix by managing NAPI instances correctly.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76bb5db5
    • D
      Merge branch 'tc-filter-cleanup-destroy-delete' · dfb05553
      David S. Miller 提交于
      Cong Wang says:
      
      ====================
      net_sched: clean up tc filter destroy and delete logic
      
      The first patch fixes a potenial race condition, the second one
      is pure cleanup.
      ====================
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfb05553
    • W
      net_sched: remove useless NULL to tp->root · 43920538
      WANG Cong 提交于
      There is no need to NULL tp->root in ->destroy(), since tp is
      going to be freed very soon, and existing readers are still
      safe to read them.
      
      For cls_route, we always init its tp->root, so it can't be NULL,
      we can drop more useless code.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43920538
    • W
      net_sched: move the empty tp check from ->destroy() to ->delete() · 763dbf63
      WANG Cong 提交于
      We could have a race condition where in ->classify() path we
      dereference tp->root and meanwhile a parallel ->destroy() makes it
      a NULL. Daniel cured this bug in commit d9363774
      ("net, sched: respect rcu grace period on cls destruction").
      
      This happens when ->destroy() is called for deleting a filter to
      check if we are the last one in tp, this tp is still linked and
      visible at that time. The root cause of this problem is the semantic
      of ->destroy(), it does two things (for non-force case):
      
      1) check if tp is empty
      2) if tp is empty we could really destroy it
      
      and its caller, if cares, needs to check its return value to see if it
      is really destroyed. Therefore we can't unlink tp unless we know it is
      empty.
      
      As suggested by Daniel, we could actually move the test logic to ->delete()
      so that we can safely unlink tp after ->delete() tells us the last one is
      just deleted and before ->destroy().
      
      Fixes: 1e052be6 ("net_sched: destroy proto tp when all filters are gone")
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      763dbf63
    • D
      bpf: add napi_id read access to __sk_buff · b1d9fc41
      Daniel Borkmann 提交于
      Add napi_id access to __sk_buff for socket filter program types, tc
      program types and other bpf_convert_ctx_access() users. Having access
      to skb->napi_id is useful for per RX queue listener siloing, f.e.
      in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
      used, meaning SO_REUSEPORT enabled listeners can then select the
      corresponding socket at SYN time already [1]. The skb is marked via
      skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
      
      Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d433902
      ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
      the NAPI ID associated with the queue for steering, which requires a
      prior sk_mark_napi_id() after the socket was looked up.
      
      Semantics for the __sk_buff napi_id access are similar, meaning if
      skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
      then an invalid napi_id of 0 is returned to the program, otherwise a
      valid non-zero napi_id.
      
        [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdfSuggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1d9fc41
    • K
      netvsc: Deal with rescinded channels correctly · 73e64fa4
      K. Y. Srinivasan 提交于
      We will not be able to send packets over a channel that has been
      rescinded. Make necessary adjustments so we can properly cleanup
      even when the channel is rescinded. This issue can be trigerred
      in the NIC hot-remove path.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73e64fa4
    • D
      Merge branch 'ibmvnic-updates-and-bug-fixes' · 87e978ed
      David S. Miller 提交于
      Nathan Fontenot says:
      
      ====================
      ibmvnic: Updates and bug fixes
      
      This set of patches is a series of updates to remove some unneeded
      and unused code in the driver as well as bug fixes for the
      ibmvnic driver.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87e978ed
    • N
      ibmvnic: Remove unused bouce buffer · d76e0fec
      Nathan Fontenot 提交于
      The bounce buffer is not used in the ibmvnic driver, just
      get rid of it.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d76e0fec
    • N
      ibmvnic: Allocate zero-filled memory for sub crqs · 7f7adc50
      Nathan Fontenot 提交于
      Update the allocation of memory for the sub crq structs and their
      associated pages to allocate zero-filled memory.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f7adc50
    • B
      ibmvnic: Disable irq prior to close · dd9c20fa
      Brian King 提交于
          Add some code to call disable_irq on all the vnic interface's irqs.
          This fixes a crash observed when closing an active interface, as
          seen in the oops below when we try to access a buffer in the interrupt
          handler which we've already freed.
      
          Unable to handle kernel paging request for data at address 0x00000001
          Faulting instruction address: 0xd000000003886824
          Oops: Kernel access of bad area, sig: 11 [#1]
          SMP NR_CPUS=2048 NUMA pSeries
          Modules linked in: ibmvnic(OEN) rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_
          Supported: No, Unsupported modules are loaded
          CPU: 8 PID: 0 Comm: swapper/8 Tainted: G           OE   NX 4.4.49-92.11-default #1
          task: c00000007f990110 ti: c0000000fffa0000 task.ti: c00000007f9b8000
          NIP: d000000003886824 LR: d000000003886824 CTR: c0000000007eff60
          REGS: c0000000fffa3a70 TRAP: 0300   Tainted: G           OE   NX  (4.4.49-92.11-default)
          MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22008042  XER: 20000008
          CFAR: c000000000008468 DAR: 0000000000000001 DSISR: 40000000 SOFTE: 0
          GPR00: d000000003886824 c0000000fffa3cf0 d000000003894118 0000000000000000
          GPR04: 0000000000000000 0000000000000000 c000000001249da0 0000000000000000
          GPR08: 000000000000000e 0000000000000000 c0000000ccb00000 d000000003889180
          GPR12: c0000000007eff60 c000000007af4c00 0000000000000001 c0000000010def30
          GPR16: c00000007f9b8000 c000000000b98c30 c00000007f9b8080 c000000000bab858
          GPR20: 0000000000000005 0000000000000000 c0000000ff5d7e80 c0000000f809f648
          GPR24: c0000000ff5d7ec8 0000000000000000 0000000000000000 c0000000ccb001a0
          GPR28: 000000000000000a c0000000f809f600 c0000000fd4cd900 c0000000f9cd5b00
          NIP [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic]
          LR [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic]
          Call Trace:
          [c0000000fffa3cf0] [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic] (unreliable)
          [c0000000fffa3dd0] [c000000000132940] __handle_irq_event_percpu+0x90/0x2e0
          [c0000000fffa3e90] [c000000000132bcc] handle_irq_event_percpu+0x3c/0x90
          [c0000000fffa3ed0] [c000000000132c88] handle_irq_event+0x68/0xc0
          [c0000000fffa3f00] [c000000000137edc] handle_fasteoi_irq+0xec/0x250
          [c0000000fffa3f30] [c000000000131b04] generic_handle_irq+0x54/0x80
          [c0000000fffa3f60] [c000000000011190] __do_irq+0x80/0x1d0
          [c0000000fffa3f90] [c0000000000248d8] call_do_irq+0x14/0x24
          [c00000007f9bb9e0] [c000000000011380] do_IRQ+0xa0/0x120
          [c00000007f9bba40] [c000000000002594] hardware_interrupt_common+0x114/0x180
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd9c20fa
    • N
      ibmvnic: Correct crq and resource releasing · 37489055
      Nathan Fontenot 提交于
      We should not be releasing the crq's when calling close for the
      adapter, these need to remain open to facilitate operations such
      as updating the mac address. The crq's should be released in the
      adpaters remove routine.
      
      Additionally, we need to call release_reources from remove. This
      corrects the scenario of trying to remove an adapter that has only
      been probed.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37489055
    • N
      ibmvnic: Remove inflight list · 661a2622
      Nathan Fontenot 提交于
      The inflight list used to track memory that is allocated for crq that are
      inflight is not needed. The one piece of the inflight list that does need
      to be cleaned at module exit is the error buffer list which is already
      attached to the adapter struct.
      
      This patch removes the inflight list and moves checking the error buffer
      list to ibmvnic_remove.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      661a2622
    • B
      ibmvnic: Do not disable IRQ after scheduling tasklet · ed7ecbf7
      Brian King 提交于
      Since the primary CRQ is only used for service functions and
      not in the performance path, simplify the code a bit and avoid
      disabling the IRQ.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed7ecbf7
    • B
      ibmvnic: Fixup atomic API usage · 58c8c0c0
      Brian King 提交于
      Replace a couple of modifications of an atomic followed
      by a read of the atomic, which is no longer atomic, to
      use atomic_XX_return variants to avoid race conditions.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58c8c0c0
    • B
      ibmvnic: Unmap longer term buffer before free · 59af56c2
      Brian King 提交于
      Make sure we unregister long term buffers from the adapter
      prior to DMA unmapping it and freeing the buffer. Failure
      to do so could result in a DMA to a now invalid address.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59af56c2
    • M
      ibmvnic: Fix ibmvnic_change_mac_addr struct format · 993a82b0
      Murilo Fossa Vicentini 提交于
      The ibmvnic_change_mac_addr struct alignment was not matching the defined
      format in PAPR+, it had the reserved and return code fields swapped. As a
      consequence, the CHANGE_MAC_ADDR_RSP commands were being improperly handled
      and executed even when the operation wasn't successfully completed by the
      system firmware.
      
      Also changing the endianness of the debug message to make it easier to
      parse the CRQ content.
      Signed-off-by: NMurilo Fossa Vicentini <muvic@linux.vnet.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      993a82b0
    • T
      ibmvnic: Report errors when failing to release sub-crqs · ffa73855
      Thomas Falcon 提交于
      Add reporting of errors when releasing sub-crqs fails.
      Signed-off-by: NThomas Falcon <tlfalcon@us.ibm.com>
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffa73855
    • A
      liquidio: remove unnecessary variable assignment · ca1cb28d
      Arnd Bergmann 提交于
      gcc points out an useless assignment that was added during code refactoring:
      
      drivers/net/ethernet/cavium/liquidio/lio_ethtool.c: In function 'octnet_intrmod_callback':
      drivers/net/ethernet/cavium/liquidio/lio_ethtool.c:1315:59: error: parameter 'oct_dev' set but not used [-Werror=unused-but-set-parameter]
      
      This is harmless but can clearly be remove to avoid the warning.
      
      Fixes: 50c0add5 ("liquidio: refactor interrupt moderation code")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca1cb28d
    • M
      Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning · 7acf8a1e
      Matthew Whitehead 提交于
      Constants used for tuning are generally a bad idea, especially as hardware
      changes over time. Replace the constant 2 jiffies with sysctl variable
      netdev_budget_usecs to enable sysadmins to tune the softirq processing.
      Also document the variable.
      
      For example, a very fast machine might tune this to 1000 microseconds,
      while my regression testing 486DX-25 needs it to be 4000 microseconds on
      a nearly idle network to prevent time_squeeze from being incremented.
      
      Version 2: changed jiffies to microseconds for predictable units.
      Signed-off-by: NMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7acf8a1e
    • D
      Merge branch 'iptunnel-policy-based-routing' · 20da848f
      David S. Miller 提交于
      Craig Gallek says:
      
      ====================
      ip_tunnel: Allow policy-based routing through tunnels
      
      iproute2 changes to follow.  Example usage:
        ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4
        ip -detail link show gre-test
        ...
        ip link set gre-test type gre fwmark 0
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20da848f
    • C
      ip_tunnel: Allow policy-based routing through tunnels · 9830ad4c
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      
      There is no concept of per-packet routing decisions through IPv4
      tunnels, so this implementation does not need to work with
      per-packet route lookups as the v6 implementation may
      (with IP6_TNL_F_USE_ORIG_FWMARK).
      
      Further, since the v4 tunnel ioctls share datastructures
      (which can not be trivially modified) with the kernel's internal
      tunnel configuration structures, the mark attribute must be stored
      in the tunnel structure itself and passed as a parameter when
      creating or changing tunnel attributes.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9830ad4c
    • C
      ip6_tunnel: Allow policy-based routing through tunnels · 0a473b82
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a473b82
  2. 21 4月, 2017 15 次提交