1. 20 4月, 2018 2 次提交
  2. 13 4月, 2018 1 次提交
  3. 24 3月, 2018 1 次提交
    • J
      virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS · bda7fab5
      Jay Vosburgh 提交于
      The operstate update logic will leave an interface in the
      default UNKNOWN operstate if the interface carrier state never changes
      from the default carrier up state set at creation.  This includes the
      case of an explicit call to netif_carrier_on, as the carrier on to on
      transition has no effect on operstate.
      
      	This affects virtio-net for the case that the virtio peer does
      not support VIRTIO_NET_F_STATUS (the feature that provides carrier state
      updates).  Without this feature, the virtio specification states that
      "the link should be assumed active," so, logically, the operstate should
      be UP instead of UNKNOWN.  This has impact on user space applications
      that use the operstate to make availability decisions for the interface.
      
      	Resolve this by changing the virtio probe logic slightly to call
      netif_carrier_off for both the "with" and "without" VIRTIO_NET_F_STATUS
      cases, and then the existing call to netif_carrier_on for the "without"
      case will cause an operstate transition.
      
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bda7fab5
  4. 05 3月, 2018 2 次提交
  5. 01 3月, 2018 1 次提交
  6. 22 2月, 2018 4 次提交
    • J
      virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP · 8dcc5b0a
      Jesper Dangaard Brouer 提交于
      When a driver implements the ndo_xdp_xmit() function, there is
      (currently) no generic way to determine whether it is safe to call.
      
      It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not
      allocated the needed XDP TX queues yet.  This is the case for
      virtio_net, which first allocates the XDP TX queues once an XDP/bpf
      prog is attached (in virtnet_xdp_set()).
      
      Thus, a crash will occur for virtio_net when redirecting to another
      virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog.
      The sample xdp_redirect_map tries to attach a dummy XDP prog to take
      this into account, but it can also easily fail if the virtio_net (or
      actually underlying vhost driver) have not allocated enough extra
      queues for the device.
      
      Allocating more queue this is currently a manual config.
      Hint for libvirt XML add:
      
        <driver name='vhost' queues='16'>
          <host mrg_rxbuf='off'/>
          <guest tso4='off' tso6='off' ecn='off' ufo='off'/>
        </driver>
      
      The solution in this patch is to check that the device have loaded an
      XDP/bpf prog before proceeding.  This is similar to the check
      performed in driver ixgbe.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dcc5b0a
    • J
      virtio_net: fix memory leak in XDP_REDIRECT · 11b7d897
      Jesper Dangaard Brouer 提交于
      XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons
      (which can be inspected by tracepoints). The current semantics is that
      on failure the driver calling xdp_do_redirect() must handle freeing or
      recycling the page associated with this frame.  This can be seen as an
      optimization, as drivers usually have an optimized XDP_DROP code path
      for frame recycling in place already.
      
      The virtio_net driver didn't handle when xdp_do_redirect() failed.
      This caused a memory leak as the page refcnt wasn't decremented on
      failures.
      
      The function __virtnet_xdp_xmit() did handle one type of failure,
      when the xmit queue virtqueue_add_outbuf() is full, which "hides"
      releasing a refcnt on the page.  Instead the function __virtnet_xdp_xmit()
      must follow API of xdp_do_redirect(), which on errors leave it up to
      the caller to free the page, of the failed send operation.
      
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11b7d897
    • J
      virtio_net: fix XDP code path in receive_small() · 95dbe9e7
      Jesper Dangaard Brouer 提交于
      When configuring virtio_net to use the code path 'receive_small()',
      in-order to get correct XDP_REDIRECT support, I discovered TCP packets
      would get silently dropped when loading an XDP program action XDP_PASS.
      
      The bug seems to be that receive_small() when XDP is loaded check that
      hdr->hdr.flags is zero, which seems wrong as hdr.flags contains the
      flags VIRTIO_NET_HDR_F_* :
       #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */
       #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */
      
      TCP got dropped as it had the VIRTIO_NET_HDR_F_DATA_VALID flag set.
      
      The flags that are relevant here are the VIRTIO_NET_HDR_GSO_* flags
      stored in hdr->hdr.gso_type. Thus, the fix is just check that none of
      the gso_type flags have been set.
      
      Fixes: bb91accf ("virtio-net: XDP support for small buffers")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95dbe9e7
    • J
      virtio_net: disable XDP_REDIRECT in receive_mergeable() case · 7324f539
      Jesper Dangaard Brouer 提交于
      The virtio_net code have three different RX code-paths in receive_buf().
      Two of these code paths can handle XDP, but one of them is broken for
      at least XDP_REDIRECT.
      
      Function(1): receive_big() does not support XDP.
      Function(2): receive_small() support XDP fully and uses build_skb().
      Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb().
      
      The simple explanation is that receive_mergeable() is broken because
      it uses napi_alloc_skb(), which violates XDP given XDP assumes packet
      header+data in single page and enough tail room for skb_shared_info.
      
      The longer explaination is that receive_mergeable() tries to
      work-around and satisfy these XDP requiresments e.g. by having a
      function xdp_linearize_page() that allocates and memcpy RX buffers
      around (in case packet is scattered across multiple rx buffers).  This
      does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
      we have not implemented bpf_xdp_adjust_tail yet).
      
      The XDP_REDIRECT action combined with cpumap is broken, and cause hard
      to debug crashes.  The main issue is that the RX packet does not have
      the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing
      skb_shared_info to overlap the next packets head-room (in which cpumap
      stores info).
      
      Reproducing depend on the packet payload length and if RX-buffer size
      happened to have tail-room for skb_shared_info or not.  But to make
      this even harder to troubleshoot, the RX-buffer size is runtime
      dynamically change based on an Exponentially Weighted Moving Average
      (EWMA) over the packet length, when refilling RX rings.
      
      This patch only disable XDP_REDIRECT support in receive_mergeable()
      case, because it can cause a real crash.
      
      IMHO we should consider NOT supporting XDP in receive_mergeable() at
      all, because the principles behind XDP are to gain speed by (1) code
      simplicity, (2) sacrificing memory and (3) where possible moving
      runtime checks to setup time.  These principles are clearly being
      violated in receive_mergeable(), that e.g. runtime track average
      buffer size to save memory consumption.
      
      In the longer run, we should consider introducing a separate receive
      function when attaching an XDP program, and also change the memory
      model to be compatible with XDP when attaching an XDP prog.
      
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7324f539
  7. 23 1月, 2018 1 次提交
    • T
      virtio_net: Add ethtool stats · d7dfc5cf
      Toshiaki Makita 提交于
      The main purpose of this patch is adding a way of checking per-queue stats.
      It's useful to debug performance problems on multiqueue environment.
      
      $ ethtool -S ens10
      NIC statistics:
           rx_queue_0_packets: 2090408
           rx_queue_0_bytes: 3164825094
           rx_queue_1_packets: 2082531
           rx_queue_1_bytes: 3152932314
           tx_queue_0_packets: 2770841
           tx_queue_0_bytes: 4194955474
           tx_queue_1_packets: 3084697
           tx_queue_1_bytes: 4670196372
      
      This change converts existing per-cpu stats structure into per-queue one.
      This should not impact on performance since each queue counter is not
      updated concurrently by multiple cpus.
      
      Performance numbers:
       - Guest has 2 vcpus and 2 queues
       - Guest runs netserver
       - Host runs 100-flow super_netperf
      
                           Before      After       Diff
      UDP_STREAM 18byte        86.22       87.00   +0.90%
      UDP_STREAM 1472byte    4055.27     4042.18   -0.32%
      TCP_STREAM            16956.32    16890.63   -0.39%
      UDP_RR               178667.11   185862.70   +4.03%
      TCP_RR               128473.04   124985.81   -2.71%
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7dfc5cf
  8. 10 1月, 2018 1 次提交
    • J
      virtio_net: propagate linkspeed/duplex settings from the hypervisor · faa9b39f
      Jason Baron 提交于
      The ability to set speed and duplex for virtio_net is useful in various
      scenarios as described here:
      
      16032be5 virtio_net: add ethtool support for set and get of settings
      
      However, it would be nice to be able to set this from the hypervisor,
      such that virtio_net doesn't require custom guest ethtool commands.
      
      Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
      the hypervisor to export a linkspeed and duplex setting. The user can
      subsequently overwrite it later if desired via: 'ethtool -s'.
      
      Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention
      is that device feature bits are to grow down from bit 63, since the
      transports are starting from bit 24 and growing up.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: virtio-dev@lists.oasis-open.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faa9b39f
  9. 06 1月, 2018 1 次提交
    • J
      virtio_net: setup xdp_rxq_info · 754b8a21
      Jesper Dangaard Brouer 提交于
      The virtio_net driver doesn't dynamically change the RX-ring queue
      layout and backing pages, but instead reject XDP setup if all the
      conditions for XDP is not meet.  Thus, the xdp_rxq_info also remains
      fairly static.  This allow us to simply add the reg/unreg to
      net_device open/close functions.
      
      Driver hook points for xdp_rxq_info:
       * reg  : virtnet_open
       * unreg: virtnet_close
      
      V3:
       - bugfix, also setup xdp.rxq in receive_mergeable()
       - Tested bpf-sample prog inside guest on a virtio_net device
      
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      754b8a21
  10. 09 12月, 2017 1 次提交
    • T
      virtio_net: Disable interrupts if napi_complete_done rescheduled napi · fdaa767a
      Toshiaki Makita 提交于
      Since commit 39e6c820 ("net: solve a NAPI race") napi has been able
      to be rescheduled within napi_complete_done() even in non-busypoll case,
      but virtnet_poll() always enabled interrupts before complete, and when
      napi was rescheduled within napi_complete_done() it did not disable
      interrupts.
      This caused more interrupts when event idx is disabled.
      
      According to commit cbdadbbf ("virtio_net: fix race in RX VQ
      processing") we cannot place virtqueue_enable_cb_prepare() after
      NAPI_STATE_SCHED is cleared, so disable interrupts again if
      napi_complete_done() returned false.
      
      Tested with vhost-user of OVS 2.7 on host, which does not have the event
      idx feature.
      
      * Before patch:
      
      $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET
      Socket  Message  Elapsed      Messages
      Size    Size     Time         Okay Errors   Throughput
      bytes   bytes    secs            #      #   10^6bits/sec
      
      212992    1472   60.00     32763206      0    6430.32
      212992           60.00     23384299           4589.56
      
      Interrupts on guest: 9872369
      Packets/interrupt:   2.37
      
      * After patch
      
      $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET
      Socket  Message  Elapsed      Messages
      Size    Size     Time         Okay Errors   Throughput
      bytes   bytes    secs            #      #   10^6bits/sec
      
      212992    1472   60.00     32794646      0    6436.49
      212992           60.00     32793501           6436.27
      
      Interrupts on guest: 4941299
      Packets/interrupt:   6.64
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdaa767a
  11. 08 12月, 2017 1 次提交
  12. 16 11月, 2017 1 次提交
    • M
      mm: remove __GFP_COLD · 453f85d4
      Mel Gorman 提交于
      As the page free path makes no distinction between cache hot and cold
      pages, there is no real useful ordering of pages in the free list that
      allocation requests can take advantage of.  Juding from the users of
      __GFP_COLD, it is likely that a number of them are the result of copying
      other sites instead of actually measuring the impact.  Remove the
      __GFP_COLD parameter which simplifies a number of paths in the page
      allocator.
      
      This is potentially controversial but bear in mind that the size of the
      per-cpu pagelists versus modern cache sizes means that the whole per-cpu
      list can often fit in the L3 cache.  Hence, there is only a potential
      benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
      even worse when THP is taken into account which has little or no chance
      of getting a cache-hot page as the per-cpu list is bypassed and the
      zeroing of multiple pages will thrash the cache anyway.
      
      The truncate microbenchmarks are not shown as this patch affects the
      allocation path and not the free path.  A page fault microbenchmark was
      tested but it showed no sigificant difference which is not surprising
      given that the __GFP_COLD branches are a miniscule percentage of the
      fault path.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453f85d4
  13. 05 11月, 2017 1 次提交
  14. 27 9月, 2017 1 次提交
    • D
      bpf: add meta pointer for direct access · de8f3a83
      Daniel Borkmann 提交于
      This work enables generic transfer of metadata from XDP into skb. The
      basic idea is that we can make use of the fact that the resulting skb
      must be linear and already comes with a larger headroom for supporting
      bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
      on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
      for adjusting a new pointer called xdp->data_meta. Thus, the packet has
      a flexible and programmable room for meta data, followed by the actual
      packet data. struct xdp_buff is therefore laid out that we first point
      to data_hard_start, then data_meta directly prepended to data followed
      by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
      account whether we have meta data already prepended and if so, memmove()s
      this along with the given offset provided there's enough room.
      
      xdp->data_meta is optional and programs are not required to use it. The
      rationale is that when we process the packet in XDP (e.g. as DoS filter),
      we can push further meta data along with it for the XDP_PASS case, and
      give the guarantee that a clsact ingress BPF program on the same device
      can pick this up for further post-processing. Since we work with skb
      there, we can also set skb->mark, skb->priority or other skb meta data
      out of BPF, thus having this scratch space generic and programmable
      allows for more flexibility than defining a direct 1:1 transfer of
      potentially new XDP members into skb (it's also more efficient as we
      don't need to initialize/handle each of such new members). The facility
      also works together with GRO aggregation. The scratch space at the head
      of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
      yet supporting xdp->data_meta can simply be set up with xdp->data_meta
      as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
      such that the subsequent match against xdp->data for later access is
      guaranteed to fail.
      
      The verifier treats xdp->data_meta/xdp->data the same way as we treat
      xdp->data/xdp->data_end pointer comparisons. The requirement for doing
      the compare against xdp->data is that it hasn't been modified from it's
      original address we got from ctx access. It may have a range marking
      already from prior successful xdp->data/xdp->data_end pointer comparisons
      though.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de8f3a83
  15. 23 9月, 2017 1 次提交
  16. 21 9月, 2017 3 次提交
  17. 25 8月, 2017 1 次提交
  18. 19 8月, 2017 1 次提交
  19. 17 8月, 2017 1 次提交
  20. 14 8月, 2017 1 次提交
  21. 01 8月, 2017 1 次提交
  22. 26 7月, 2017 1 次提交
    • A
      virtio-net: mark PM functions as __maybe_unused · 67a75194
      Arnd Bergmann 提交于
      After removing the reset function, the freeze and restore functions
      are now unused when CONFIG_PM_SLEEP is disabled:
      
      drivers/net/virtio_net.c:1881:12: error: 'virtnet_restore_up' defined but not used [-Werror=unused-function]
       static int virtnet_restore_up(struct virtio_device *vdev)
      drivers/net/virtio_net.c:1859:13: error: 'virtnet_freeze_down' defined but not used [-Werror=unused-function]
       static void virtnet_freeze_down(struct virtio_device *vdev)
      
      A more robust way to do this is to remove the #ifdef around the callers
      and instead mark them as __maybe_unused. The compiler will now just
      silently drop the unused code.
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67a75194
  23. 25 7月, 2017 5 次提交
  24. 18 7月, 2017 1 次提交
  25. 08 7月, 2017 1 次提交
  26. 30 6月, 2017 1 次提交
  27. 16 6月, 2017 2 次提交
  28. 05 6月, 2017 1 次提交