1. 03 4月, 2015 1 次提交
    • J
      xen-netfront: transmit fully GSO-sized packets · 0c36820e
      Jonathan Davies 提交于
      xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
      GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
      bytes each. This slight reduction in the size of packets means a slight loss in
      efficiency.
      
      Since c/s 9ecd1a75, xen-netfront sets gso_max_size to
          XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
      where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
      
      The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
      6c09fa09) in determining when to split an skb into two is
          sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
      
      So the maximum permitted size of an skb is calculated to be
          (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
      
      Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
      Instead, there is no need to deviate from the default gso_max_size of 65536 as
      this already accommodates the size of the header.
      
      Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
      of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
      skbs of up to 65160 bytes (45 segments of 1448 bytes each).
      
      Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
      it relates to the size of the whole packet, including the header.
      
      Fixes: 9ecd1a75 ("xen-netfront: reduce gso_max_size to account for max TCP header")
      Signed-off-by: NJonathan Davies <jonathan.davies@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c36820e
  2. 05 2月, 2015 1 次提交
  3. 14 1月, 2015 3 次提交
  4. 13 1月, 2015 1 次提交
  5. 17 12月, 2014 1 次提交
    • D
      xen-netfront: use napi_complete() correctly to prevent Rx stalling · 6a6dc08f
      David Vrabel 提交于
      After d75b1ade (net: less interrupt
      masking in NAPI) the napi instance is removed from the per-cpu list
      prior to calling the n->poll(), and is only requeued if all of the
      budget was used.  This inadvertently broke netfront because netfront
      does not use NAPI correctly.
      
      If netfront had not used all of its budget it would do a final check
      for any Rx responses and avoid calling napi_complete() if there were
      more responses.  It would still return under budget so it would never
      be rescheduled.  The final check would also not re-enable the Rx
      interrupt.
      
      Additionally, xenvif_poll() would also call napi_complete() /after/
      enabling the interrupt.  This resulted in a race between the
      napi_complete() and the napi_schedule() in the interrupt handler.  The
      use of local_irq_save/restore() avoided by race iff the handler is
      running on the same CPU but not if it was running on a different CPU.
      
      Fix both of these by always calling napi_compete() if the budget was
      not all used, and then calling napi_schedule() if the final checks
      says there's more work.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a6dc08f
  6. 10 12月, 2014 1 次提交
  7. 03 12月, 2014 1 次提交
  8. 27 10月, 2014 1 次提交
    • D
      xen-netfront: always keep the Rx ring full of requests · 1f3c2eba
      David Vrabel 提交于
      A full Rx ring only requires 1 MiB of memory.  This is not enough
      memory that it is useful to dynamically scale the number of Rx
      requests in the ring based on traffic rates, because:
      
      a) Even the full 1 MiB is a tiny fraction of a typically modern Linux
         VM (for example, the AWS micro instance still has 1 GiB of memory).
      
      b) Netfront would have used up to 1 MiB already even with moderate
         data rates (there was no adjustment of target based on memory
         pressure).
      
      c) Small VMs are going to typically have one VCPU and hence only one
         queue.
      
      Keeping the ring full of Rx requests handles bursty traffic better
      than trying to converge on an optimal number of requests to keep
      filled.
      
      On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
      improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
      traffic are expected to be higher.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f3c2eba
  9. 16 10月, 2014 1 次提交
    • T
      net: Add ndo_gso_check · 04ffcb25
      Tom Herbert 提交于
      Add ndo_gso_check which a device can define to indicate whether is
      is capable of doing GSO on a packet. This funciton would be called from
      the stack to determine whether software GSO is needed to be done. A
      driver should populate this function if it advertises GSO types for
      which there are combinations that it wouldn't be able to handle. For
      instance a device that performs UDP tunneling might only implement
      support for transparent Ethernet bridging type of inner packets
      or might have limitations on lengths of inner headers.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04ffcb25
  10. 06 10月, 2014 1 次提交
  11. 12 8月, 2014 1 次提交
    • Z
      xen-netfront: Fix handling packets on compound pages with skb_linearize · 97a6d1bb
      Zoltan Kiss 提交于
      There is a long known problem with the netfront/netback interface: if the guest
      tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
      it gets dropped. The reason is that netback maps these slots to a frag in the
      frags array, which is limited by size. Having so many slots can occur since
      compound pages were introduced, as the ring protocol slice them up into
      individual (non-compound) page aligned slots. The theoretical worst case
      scenario looks like this (note, skbs are limited to 64 Kb here):
      linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
      using 2 slots
      first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
      end and the beginning of a page, therefore they use 3 * 15 = 45 slots
      last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
      Although I don't think this 51 slots skb can really happen, we need a solution
      which can deal with every scenario. In real life there is only a few slots
      overdue, but usually it causes the TCP stream to be blocked, as the retry will
      most likely have the same buffer layout.
      This patch solves this problem by linearizing the packet. This is not the
      fastest way, and it can fail much easier as it tries to allocate a big linear
      area for the whole packet, but probably easier by an order of magnitude than
      anything else. Probably this code path is not touched very frequently anyway.
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Paul Durrant <paul.durrant@citrix.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a6d1bb
  12. 01 8月, 2014 3 次提交
  13. 09 7月, 2014 2 次提交
  14. 22 6月, 2014 2 次提交
  15. 05 6月, 2014 3 次提交
  16. 14 5月, 2014 1 次提交
  17. 13 4月, 2014 1 次提交
  18. 25 3月, 2014 1 次提交
  19. 15 3月, 2014 1 次提交
  20. 20 2月, 2014 1 次提交
    • W
      xen-netfront: reset skb network header before checksum · d554f73d
      Wei Liu 提交于
      In ed1f50c3 ("net: add skb_checksum_setup") we introduced some checksum
      functions in core driver. Subsequent change b5cf66cd ("xen-netfront:
      use new skb_checksum_setup function") made use of those functions to
      replace its own implementation.
      
      However with that change netfront is broken. It sees a lot of checksum
      error. That's because its own implementation of checksum function was a
      bit hacky (dereferencing skb->data directly) while the new function was
      implemented using ip_hdr(). The network header is not reset before skb
      is passed to the new function. When the new function tries to do its
      job, it's confused and reports error.
      
      The fix is simple, we need to reset network header before passing skb to
      checksum function. Netback is not affected as it already does the right
      thing.
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NWei Liu <wei.liu2@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Paul Durrant <paul.durrant@citrix.com>
      Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d554f73d
  21. 15 2月, 2014 1 次提交
  22. 05 2月, 2014 1 次提交
  23. 28 1月, 2014 1 次提交
    • A
      xen-netfront: fix resource leak in netfront · cefe0078
      Annie Li 提交于
      This patch removes grant transfer releasing code from netfront, and uses
      gnttab_end_foreign_access to end grant access since
      gnttab_end_foreign_access_ref may fail when the grant entry is
      currently used for reading or writing.
      
      * clean up grant transfer code kept from old netfront(2.6.18) which grants
      pages for access/map and transfer. But grant transfer is deprecated in current
      netfront, so remove corresponding release code for transfer.
      
      * fix resource leak, release grant access (through gnttab_end_foreign_access)
      and skb for tx/rx path, use get_page to ensure page is released when grant
      access is completed successfully.
      
      Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
      for them will be created separately.
      
      V6: Correct subject line and commit message.
      
      V5: Remove unecessary change in xennet_end_access.
      
      V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
      single patch.
      
      V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
      grant acess is ended.
      
      V2: Improve patch comments.
      Signed-off-by: NAnnie Li <annie.li@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cefe0078
  24. 17 1月, 2014 1 次提交
  25. 15 1月, 2014 1 次提交
  26. 04 1月, 2014 1 次提交
    • K
      xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4). · 51c71a3b
      Konrad Rzeszutek Wilk 提交于
      The user has the option of disabling the platform driver:
      00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
      
      which is used to unplug the emulated drivers (IDE, Realtek 8169, etc)
      and allow the PV drivers to take over. If the user wishes
      to disable that they can set:
      
        xen_platform_pci=0
        (in the guest config file)
      
      or
        xen_emul_unplug=never
        (on the Linux command line)
      
      except it does not work properly. The PV drivers still try to
      load and since the Xen platform driver is not run - and it
      has not initialized the grant tables, most of the PV drivers
      stumble upon:
      
      input: Xen Virtual Keyboard as /devices/virtual/input/input5
      input: Xen Virtual Pointer as /devices/virtual/input/input6M
      ------------[ cut here ]------------
      kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: xen_kbdfront(+) xenfs xen_privcmd
      CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1
      Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013
      RIP: 0010:[<ffffffff813ddc40>]  [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300
      Call Trace:
       [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240
       [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70
       [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront]
       [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront]
       [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130
       [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50
       [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230
       [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0
       [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
       [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
       [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0
       [<ffffffff8145e7d9>] driver_attach+0x19/0x20
       [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220
       [<ffffffff8145f1ff>] driver_register+0x5f/0xf0
       [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20
       [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40
       [<ffffffffa0015000>] ? 0xffffffffa0014fff
       [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront]
       [<ffffffff81002049>] do_one_initcall+0x49/0x170
      
      .. snip..
      
      which is hardly nice. This patch fixes this by having each
      PV driver check for:
       - if running in PV, then it is fine to execute (as that is their
         native environment).
       - if running in HVM, check if user wanted 'xen_emul_unplug=never',
         in which case bail out and don't load any PV drivers.
       - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
         does not exist, then bail out and not load PV drivers.
       - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks',
         then bail out for all PV devices _except_ the block one.
         Ditto for the network one ('nics').
       - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary'
         then load block PV driver, and also setup the legacy IDE paths.
         In (v3) make it actually load PV drivers.
      
      Reported-by: Sander Eikelenboom <linux@eikelenboom.it
      Reported-by: NAnthony PERARD <anthony.perard@citrix.com>
      Reported-and-Tested-by: NFabio Fantoni <fabio.fantoni@m2r.biz>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug'
      can be used per Ian and Stefano suggestion]
      [v3: Make the unnecessary case work properly]
      [v4: s/disks/ide-disks/ spotted by Fabio]
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts]
      CC: stable@vger.kernel.org
      51c71a3b
  27. 19 11月, 2013 1 次提交
    • M
      xen-netfront: fix missing rx_refill_timer when allocate memory failed · fdcf7765
      Ma JieYue 提交于
      There was a bug in xennet_alloc_rx_buffers, when allocating page or
      sk_buff failed, and at the same time rx_batch queue not empty,
      the rx_refill_timer timer won't be scheduled. If finally the remaining
      request buffers in rx ring less than what backend driver expected,
      the backend driver would think of rx ring as full and start dropping packets.
      In such situation, there is no way for the netfront driver to recover
      automatically, so that the device can not work properly.
      
      The patch fixes the problem by always scheduling rx_refill_timer timer when
      alloc_page or __netdev_alloc_skb fails, no matter whether rx_batch queue is
      empty or not. It ensures that the rx ring request buffers will finally meet
      the backend needs.
      Signed-off-by: NMa JieYue <jieyue.majy@alibaba-inc.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdcf7765
  28. 06 11月, 2013 1 次提交
    • J
      net: Explicitly initialize u64_stats_sync structures for lockdep · 827da44c
      John Stultz 提交于
      In order to enable lockdep on seqcount/seqlock structures, we
      must explicitly initialize any locks.
      
      The u64_stats_sync structure, uses a seqcount, and thus we need
      to introduce a u64_stats_init() function and use it to initialize
      the structure.
      
      This unfortunately adds a lot of fairly trivial initialization code
      to a number of drivers. But the benefit of ensuring correctness makes
      this worth while.
      
      Because these changes are required for lockdep to be enabled, and the
      changes are quite trivial, I've not yet split this patch out into 30-some
      separate patches, as I figured it would be better to get the various
      maintainers thoughts on how to best merge this change along with
      the seqcount lockdep enablement.
      
      Feedback would be appreciated!
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mirko Lindner <mlindner@marvell.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Roger Luethi <rl@hellgate.ch>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Wensong Zhang <wensong@linux-vs.org>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      827da44c
  29. 03 10月, 2013 1 次提交
  30. 18 7月, 2013 1 次提交
    • J
      xen-netfront: pull on receive skb may need to happen earlier · 093b9c71
      Jan Beulich 提交于
      Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure
      linear area is big enough on RX") xennet_fill_frags() may end up
      filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce
      the fragment count subsequently via __pskb_pull_tail(). That's a
      result of xennet_get_responses() allowing a maximum of one more slot to
      be consumed (and intermediately transformed into a fragment) if the
      head slot has a size less than or equal to RX_COPY_THRESHOLD.
      
      Hence we need to adjust xennet_fill_frags() to pull earlier if we
      reached the maximum fragment count - due to the described behavior of
      xennet_get_responses() this guarantees that at least the first fragment
      will get completely consumed, and hence the fragment count reduced.
      
      In order to not needlessly call __pskb_pull_tail() twice, make the
      original call conditional upon the pull target not having been reached
      yet, and defer the newly added one as much as possible (an alternative
      would have been to always call the function right before the call to
      xennet_fill_frags(), but that would imply more frequent cases of
      needing to call it twice).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: stable@vger.kernel.org (3.6 onwards)
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      093b9c71
  31. 02 7月, 2013 1 次提交
  32. 11 6月, 2013 1 次提交