1. 26 2月, 2016 2 次提交
    • D
      net: ethtool: add new ETHTOOL_xLINKSETTINGS API · 3f1ac7a7
      David Decotigny 提交于
      This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
      handled by the new get_link_ksettings/set_link_ksettings callbacks.
      This API provides support for most legacy ethtool_cmd fields, adds
      support for larger link mode masks (up to 4064 bits, variable length),
      and removes ethtool_cmd deprecated
      fields (transceiver/maxrxpkt/maxtxpkt).
      
      This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
      the following backward compatibility properties:
       - legacy ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks.
       - legacy ethtool with new get/set_link_ksettings drivers: the new
         driver callbacks are used, data internally converted to legacy
         ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
         mode mask. ETHTOOL_SSET will fail if user tries to set the
         ethtool_cmd deprecated fields to
         non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
         driver sets higher bits.
       - future ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks, internally converted to new data
         structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
         ignored and seen as 0 from user space. Note that that "future"
         ethtool tool will not allow changes to these deprecated fields.
       - future ethtool with new drivers: direct call to the new callbacks.
      
      By "future" ethtool, what is meant is:
       - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
         fails
       - set: query first and remember which of ETHTOOL_GLINKSETTINGS or
         ETHTOOL_GSET was successful
         + if ETHTOOL_GLINKSETTINGS was successful, then change config with
           ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
           ETHTOOL_SSET).
         + otherwise ETHTOOL_GSET was successful, change config with
           ETHTOOL_SSET. A failure there is final (do not try
           ETHTOOL_SLINKSETTINGS).
      
      The interaction user/kernel via the new API requires a small
      ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
      mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
      length it is expecting from user as a negative length (and cmd field is
      0). When kernel and user agree, kernel returns valid info in all
      fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).
      
      Data structure crossing user/kernel boundary is 32/64-bit
      agnostic. Converted internally to a legal kernel bitmap.
      
      The internal __ethtool_get_settings kernel helper will gradually be
      replaced by __ethtool_get_link_ksettings by the time the first
      "link_settings" drivers start to appear. So this patch doesn't change
      it, it will be removed before it needs to be changed.
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f1ac7a7
    • T
      net: Facility to report route quality of connected sockets · a87cb3e4
      Tom Herbert 提交于
      This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
      purpose is to allow an application to give feedback to the kernel about
      the quality of the network path for a connected socket. The value
      argument indicates the type of quality report. For this initial patch
      the only supported advice is a value of 1 which indicates "bad path,
      please reroute"-- the action taken by the kernel is to call
      dst_negative_advice which will attempt to choose a different ECMP route,
      reset the TX hash for flow label and UDP source port in encapsulation,
      etc.
      
      This facility should be useful for connected UDP sockets where only the
      application can provide any feedback about path quality. It could also
      be useful for TCP applications that have additional knowledge about the
      path outside of the normal TCP control loop.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a87cb3e4
  2. 25 2月, 2016 4 次提交
  3. 22 2月, 2016 6 次提交
  4. 20 2月, 2016 5 次提交
    • D
      net: use skb_postpush_rcsum instead of own implementations · 6b83d28a
      Daniel Borkmann 提交于
      Replace individual implementations with the recently introduced
      skb_postpush_rcsum() helper.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b83d28a
    • K
      net/ethtool: support set coalesce per queue · f38d138a
      Kan Liang 提交于
      This patch implements sub command ETHTOOL_SCOALESCE for ioctl
      ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to
      set coalesce of each masked queue to device driver. The wanted coalesce
      information are stored in "data" for each masked queue, which can copy
      from userspace.
      If it fails to set coalesce to device driver, the value which already
      set to specific queue will be tried to rollback.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Reviewed-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f38d138a
    • K
      net/ethtool: support get coalesce per queue · 421797b1
      Kan Liang 提交于
      This patch implements sub command ETHTOOL_GCOALESCE for ioctl
      ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to
      get coalesce of each masked queue from device driver. Then the interrupt
      coalescing parameters will be copied back to user space one by one.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Reviewed-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      421797b1
    • K
      net/ethtool: introduce a new ioctl for per queue setting · ac2c7ad0
      Kan Liang 提交于
      Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
      The following patches will enable some SUB_COMMANDs for per queue
      setting.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Reviewed-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac2c7ad0
    • N
      net: make netdev_for_each_lower_dev safe for device removal · cfdd28be
      Nikolay Aleksandrov 提交于
      When I used netdev_for_each_lower_dev in commit bad53162 ("vrf:
      remove slave queue and private slave struct") I thought that it acts
      like netdev_for_each_lower_private and can be used to remove the current
      device from the list while walking, but unfortunately it acts more like
      netdev_for_each_lower_private_rcu and doesn't allow it. The difference
      is where the "iter" points to, right now it points to the current element
      and that makes it impossible to remove it. Change the logic to be
      similar to netdev_for_each_lower_private and make it point to the "next"
      element so we can safely delete the current one. VRF is the only such
      user right now, there's no change for the read-only users.
      
      Here's what can happen now:
      [98423.249858] general protection fault: 0000 [#1] SMP
      [98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss
      oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul
      crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng
      sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw
      gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon
      parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button
      9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg
      virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd
      ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy
      virtio_ring virtio scsi_mod [last unloaded: bridge]
      [98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G           O
      4.5.0-rc2+ #81
      [98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.8.1-20150318_183358- 04/01/2014
      [98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti:
      ffff88003428c000
      [98423.256123] RIP: 0010:[<ffffffff81514f3e>]  [<ffffffff81514f3e>]
      netdev_lower_get_next+0x1e/0x30
      [98423.256534] RSP: 0018:ffff88003428f940  EFLAGS: 00010207
      [98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX:
      0000000000000000
      [98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI:
      ffff880054ff90c0
      [98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09:
      0000000000000000
      [98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12:
      ffff88003428f9e0
      [98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15:
      0000000000000001
      [98423.258055] FS:  00007f3d76881700(0000) GS:ffff88005d000000(0000)
      knlGS:0000000000000000
      [98423.258418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4:
      00000000000406e0
      [98423.258902] Stack:
      [98423.259075]  ffff88003428f960 ffffffffa0442636 0002000100000004
      ffff880054ff9000
      [98423.259647]  ffff88003428f9b0 ffffffff81518205 ffff880054ff9000
      ffff88003428f978
      [98423.260208]  ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0
      ffff880035b35f00
      [98423.260739] Call Trace:
      [98423.260920]  [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf]
      [98423.261156]  [<ffffffff81518205>]
      rollback_registered_many+0x205/0x390
      [98423.261401]  [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70
      [98423.261641]  [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50
      [98423.271557]  [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0
      [98423.271800]  [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90
      [98423.272049]  [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200
      [98423.272279]  [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10
      [98423.272513]  [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40
      [98423.272755]  [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40
      [98423.272983]  [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0
      [98423.273209]  [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40
      [98423.273476]  [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0
      [98423.273710]  [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610
      [98423.273947]  [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70
      [98423.274175]  [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0
      [98423.274416]  [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140
      [98423.274658]  [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210
      [98423.274894]  [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210
      [98423.275130]  [<ffffffff81269611>] ? __fget_light+0x91/0xb0
      [98423.275365]  [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80
      [98423.275595]  [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20
      [98423.275827]  [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66
      90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48
      89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66
      [98423.279639] RIP  [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30
      [98423.279920]  RSP <ffff88003428f940>
      
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Vlad Yasevich <vyasevic@redhat.com>
      Fixes: bad53162 ("vrf: remove slave queue and private slave struct")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfdd28be
  5. 19 2月, 2016 1 次提交
    • P
      IFF_NO_QUEUE: Fix for drivers not calling ether_setup() · a813104d
      Phil Sutter 提交于
      My implementation around IFF_NO_QUEUE driver flag assumed that leaving
      tx_queue_len untouched (specifically: not setting it to zero) by drivers
      would make it possible to assign a regular qdisc to them without having
      to worry about setting tx_queue_len to a useful value. This was only
      partially true: I overlooked that some drivers don't call ether_setup()
      and therefore not initialize tx_queue_len to the default value of 1000.
      Consequently, removing the workarounds in place for that case in qdisc
      implementations which cared about it (namely, pfifo, bfifo, gred, htb,
      plug and sfb) leads to problems with these specific interface types and
      qdiscs.
      
      Luckily, there's already a sanitization point for drivers setting
      tx_queue_len to zero, which can be reused to assign the fallback value
      most qdisc implementations used, which is 1.
      
      Fixes: 348e3435 ("net: sched: drop all special handling of tx_queue_len == 0")
      Tested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a813104d
  6. 18 2月, 2016 2 次提交
  7. 17 2月, 2016 6 次提交
  8. 12 2月, 2016 2 次提交
    • J
      net: bulk free SKBs that were delay free'ed due to IRQ context · 15fad714
      Jesper Dangaard Brouer 提交于
      The network stack defers SKBs free, in-case free happens in IRQ or
      when IRQs are disabled. This happens in __dev_kfree_skb_irq() that
      writes SKBs that were free'ed during IRQ to the softirq completion
      queue (softnet_data.completion_queue).
      
      These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ
      in function net_tx_action().  Take advantage of this a use the skb
      defer and flush API, as we are already in softirq context.
      
      For modern drivers this rarely happens. Although most drivers do call
      dev_kfree_skb_any(), which detects the situation and calls
      __dev_kfree_skb_irq() when needed.  This due to netpoll can call from
      IRQ context.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15fad714
    • J
      net: bulk free infrastructure for NAPI context, use napi_consume_skb · 795bb1c0
      Jesper Dangaard Brouer 提交于
      Discovered that network stack were hitting the kmem_cache/SLUB
      slowpath when freeing SKBs.  Doing bulk free with kmem_cache_free_bulk
      can speedup this slowpath.
      
      NAPI context is a bit special, lets take advantage of that for bulk
      free'ing SKBs.
      
      In NAPI context we are running in softirq, which gives us certain
      protection.  A softirq can run on several CPUs at once.  BUT the
      important part is a softirq will never preempt another softirq running
      on the same CPU.  This gives us the opportunity to access per-cpu
      variables in softirq context.
      
      Extend napi_alloc_cache (before only contained page_frag_cache) to be
      a struct with a small array based stack for holding SKBs.  Introduce a
      SKB defer and flush API for accessing this.
      
      Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any()
      when running in NAPI context.  A small trick to handle/detect if we
      are called from netpoll is to see if budget is 0.  In that case, we
      need to invoke dev_consume_skb_irq().
      
      Joint work with Alexander Duyck.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      795bb1c0
  9. 11 2月, 2016 6 次提交
  10. 09 2月, 2016 2 次提交
  11. 08 2月, 2016 1 次提交
  12. 06 2月, 2016 3 次提交