1. 25 1月, 2018 1 次提交
    • A
      net: separate SIOCGIFCONF handling from dev_ioctl() · 36fd633e
      Al Viro 提交于
      Only two of dev_ioctl() callers may pass SIOCGIFCONF to it.
      Separating that codepath from the rest of dev_ioctl() allows both
      to simplify dev_ioctl() itself (all other cases work with struct ifreq *)
      *and* seriously simplify the compat side of that beast: all it takes
      is passing to inet_gifconf() an extra argument - the size of individual
      records (sizeof(struct ifreq) or sizeof(struct compat_ifreq)).  With
      dev_ifconf() called directly from sock_do_ioctl()/compat_dev_ifconf()
      that's easy to arrange.
      
      As the result, compat side of SIOCGIFCONF doesn't need any
      allocations, copy_in_user() back and forth, etc.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      36fd633e
  2. 24 1月, 2018 1 次提交
  3. 23 1月, 2018 1 次提交
  4. 15 1月, 2018 2 次提交
    • J
      bpf: offload: add map offload infrastructure · a3884572
      Jakub Kicinski 提交于
      BPF map offload follow similar path to program offload.  At creation
      time users may specify ifindex of the device on which they want to
      create the map.  Map will be validated by the kernel's
      .map_alloc_check callback and device driver will be called for the
      actual allocation.  Map will have an empty set of operations
      associated with it (save for alloc and free callbacks).  The real
      device callbacks are kept in map->offload->dev_ops because they
      have slightly different signatures.  Map operations are called in
      process context so the driver may communicate with HW freely,
      msleep(), wait() etc.
      
      Map alloc and free callbacks are muxed via existing .ndo_bpf, and
      are always called with rtnl lock held.  Maps and programs are
      guaranteed to be destroyed before .ndo_uninit (i.e. before
      unregister_netdev() returns).  Map callbacks are invoked with
      bpf_devs_lock *read* locked, drivers must take care of exclusive
      locking if necessary.
      
      All offload-specific branches are marked with unlikely() (through
      bpf_map_is_dev_bound()), given that branch penalty will be
      negligible compared to IO anyway, and we don't want to penalize
      SW path unnecessarily.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a3884572
    • N
      net: sch: prio: Add offload ability to PRIO qdisc · 7fdb61b4
      Nogah Frankel 提交于
      Add the ability to offload PRIO qdisc by using ndo_setup_tc.
      There are three commands for PRIO offloading:
      * TC_PRIO_REPLACE: handles set and tune
      * TC_PRIO_DESTROY: handles qdisc destroy
      * TC_PRIO_STATS: updates the qdiscs counters (given as reference)
      
      Like RED qdisc, the indication of whether PRIO is being offloaded is being
      set and updated as part of the dump function. It is so because the driver
      could decide to offload or not based on the qdisc parent, which could
      change without notifying the qdisc.
      Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
      Reviewed-by: NYuval Mintz <yuvalm@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fdb61b4
  5. 11 1月, 2018 1 次提交
  6. 09 1月, 2018 2 次提交
  7. 06 1月, 2018 1 次提交
  8. 31 12月, 2017 1 次提交
  9. 21 12月, 2017 1 次提交
  10. 20 12月, 2017 1 次提交
  11. 03 12月, 2017 2 次提交
  12. 24 11月, 2017 1 次提交
    • W
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn 提交于
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c19f846
  13. 10 11月, 2017 1 次提交
  14. 09 11月, 2017 1 次提交
  15. 08 11月, 2017 3 次提交
  16. 05 11月, 2017 2 次提交
  17. 03 11月, 2017 1 次提交
  18. 28 10月, 2017 1 次提交
  19. 21 10月, 2017 1 次提交
  20. 18 10月, 2017 1 次提交
    • J
      bpf: cpumap xdp_buff to skb conversion and allocation · 1c601d82
      Jesper Dangaard Brouer 提交于
      This patch makes cpumap functional, by adding SKB allocation and
      invoking the network stack on the dequeuing CPU.
      
      For constructing the SKB on the remote CPU, the xdp_buff in converted
      into a struct xdp_pkt, and it mapped into the top headroom of the
      packet, to avoid allocating separate mem.  For now, struct xdp_pkt is
      just a cpumap internal data structure, with info carried between
      enqueue to dequeue.
      
      If a driver doesn't have enough headroom it is simply dropped, with
      return code -EOVERFLOW.  This will be picked up the xdp tracepoint
      infrastructure, to allow users to catch this.
      
      V2: take into account xdp->data_meta
      
      V4:
       - Drop busypoll tricks, keeping it more simple.
       - Skip RPS and Generic-XDP-recursive-reinjection, suggested by Alexei
      
      V5: correct RCU read protection around __netif_receive_skb_core.
      
      V6: Setting TASK_RUNNING vs TASK_INTERRUPTIBLE based on talk with Rik van Riel
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c601d82
  21. 17 10月, 2017 1 次提交
    • C
      tun: call dev_get_valid_name() before register_netdevice() · 0ad646c8
      Cong Wang 提交于
      register_netdevice() could fail early when we have an invalid
      dev name, in which case ->ndo_uninit() is not called. For tun
      device, this is a problem because a timer etc. are already
      initialized and it expects ->ndo_uninit() to clean them up.
      
      We could move these initializations into a ->ndo_init() so
      that register_netdevice() knows better, however this is still
      complicated due to the logic in tun_detach().
      
      Therefore, I choose to just call dev_get_valid_name() before
      register_netdevice(), which is quicker and much easier to audit.
      And for this specific case, it is already enough.
      
      Fixes: 96442e42 ("tuntap: choose the txq based on rxq")
      Reported-by: NDmitry Alexeev <avekceeb@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ad646c8
  22. 05 10月, 2017 3 次提交
  23. 04 10月, 2017 1 次提交
  24. 01 10月, 2017 1 次提交
  25. 02 9月, 2017 1 次提交
  26. 01 9月, 2017 1 次提交
  27. 30 8月, 2017 1 次提交
  28. 29 8月, 2017 1 次提交
    • Y
      smp: Avoid using two cache lines for struct call_single_data · 966a9671
      Ying Huang 提交于
      struct call_single_data is used in IPIs to transfer information between
      CPUs.  Its size is bigger than sizeof(unsigned long) and less than
      cache line size.  Currently it is not allocated with any explicit alignment
      requirements.  This makes it possible for allocated call_single_data to
      cross two cache lines, which results in double the number of the cache lines
      that need to be transferred among CPUs.
      
      This can be fixed by requiring call_single_data to be aligned with the
      size of call_single_data. Currently the size of call_single_data is the
      power of 2.  If we add new fields to call_single_data, we may need to
      add padding to make sure the size of new definition is the power of 2
      as well.
      
      Fortunately, this is enforced by GCC, which will report bad sizes.
      
      To set alignment requirements of call_single_data to the size of
      call_single_data, a struct definition and a typedef is used.
      
      To test the effect of the patch, I used the vm-scalability multiple
      thread swap test case (swap-w-seq-mt).  The test will create multiple
      threads and each thread will eat memory until all RAM and part of swap
      is used, so that huge number of IPIs are triggered when unmapping
      memory.  In the test, the throughput of memory writing improves ~5%
      compared with misaligned call_single_data, because of faster IPIs.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NHuang, Ying <ying.huang@intel.com>
      [ Add call_single_data_t and align with size of call_single_data. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      966a9671
  29. 28 8月, 2017 1 次提交
    • A
      netfilter: convert hook list to an array · 960632ec
      Aaron Conole 提交于
      This converts the storage and layout of netfilter hook entries from a
      linked list to an array.  After this commit, hook entries will be
      stored adjacent in memory.  The next pointer is no longer required.
      
      The ops pointers are stored at the end of the array as they are only
      used in the register/unregister path and in the legacy br_netfilter code.
      
      nf_unregister_net_hooks() is slower than needed as it just calls
      nf_unregister_net_hook in a loop (i.e. at least n synchronize_net()
      calls), this will be addressed in followup patch.
      
      Test setup:
       - ixgbe 10gbit
       - netperf UDP_STREAM, 64 byte packets
       - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):
      empty mangle and raw prerouting, mangle and filter input hooks:
      353.9
      this patch:
      364.2
      Signed-off-by: NAaron Conole <aconole@bytheb.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      960632ec
  30. 19 8月, 2017 3 次提交