1. 11 11月, 2014 4 次提交
    • H
    • E
      mlx4: use napi_complete_done() · 1a288172
      Eric Dumazet 提交于
      To enable gro_flush_timeout, a driver has to use napi_complete_done()
      instead of napi_complete().
      
      Tested:
       Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
      
      Without this feature, we send back about 305,000 ACK per second.
      
      GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
      
      Setting a timer of 2000 nsec is enough to increase GRO packet sizes
      and reduce number of ACK packets. (811/19.2 = 42)
      
      Receiver performs less calls to upper stacks, less wakes up.
      This also reduces cpu usage on the sender, as it receives less ACK
      packets.
      
      Note that reducing number of wakes up increases cpu efficiency, but can
      decrease QPS, as applications wont have the chance to warmup cpu caches
      doing a partial read of RPC requests/answers if they fit in one skb.
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
      0.00      0.50
      
      B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
      0.00      0.50
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a288172
    • E
      net: gro: add a per device gro flush timer · 3b47d303
      Eric Dumazet 提交于
      Tuning coalescing parameters on NIC can be really hard.
      
      Servers can handle both bulk and RPC like traffic, with conflicting
      goals : bulk flows want as big GRO packets as possible, RPC want minimal
      latencies.
      
      To reach big GRO packets on 10Gbe NIC, one can use :
      
      ethtool -C eth0 rx-usecs 4 rx-frames 44
      
      But this penalizes rpc sessions, with an increase of latencies, up to
      50% in some cases, as NICs generally do not force an interrupt when
      a packet with TCP Push flag is received.
      
      Some NICs do not have an absolute timer, only a timer rearmed for every
      incoming packet.
      
      This patch uses a different strategy : Let GRO stack decides what do do,
      based on traffic pattern.
      
      Packets with Push flag wont be delayed.
      Packets without Push flag might be held in GRO engine, if we keep
      receiving data.
      
      This new mechanism is off by default, and shall be enabled by setting
      /sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
      
      To fully enable this mechanism, drivers should use napi_complete_done()
      instead of napi_complete().
      
      Tested:
       Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
      
      Without this feature, we send back about 305,000 ACK per second.
      
      GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
      
      Setting a timer of 2000 nsec is enough to increase GRO packet sizes
      and reduce number of ACK packets. (811/19.2 = 42)
      
      Receiver performs less calls to upper stacks, less wakes up.
      This also reduces cpu usage on the sender, as it receives less ACK
      packets.
      
      Note that reducing number of wakes up increases cpu efficiency, but can
      decrease QPS, as applications wont have the chance to warmup cpu caches
      doing a partial read of RPC requests/answers if they fit in one skb.
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
      0.00      0.50
      
      B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
      0.00      0.50
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b47d303
    • D
      rtnetlink: add babel protocol recognition · be955b29
      Dave Taht 提交于
      Babel uses rt_proto 42. Add to userspace visible header file.
      Signed-off-by: NDave Taht <dave.taht@bufferbloat.net>
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be955b29
  2. 09 11月, 2014 1 次提交
  3. 08 11月, 2014 10 次提交
  4. 07 11月, 2014 25 次提交
    • D
      4e84b496
    • T
      vxlan: Fix to enable UDP checksums on interface · 5c91ae08
      Tom Herbert 提交于
      Add definition to vxlan nla_policy for UDP checksum. This is necessary
      to enable UDP checksums on VXLAN.
      
      In some instances, enabling UDP checksums can improve performance on
      receive for devices that return legacy checksum-unnecessary for UDP/IP.
      Also, UDP checksum provides some protection against VNI corruption.
      
      Testing:
      
      Ran 200 instances of TCP_STREAM and TCP_RR on bnx2x.
      
      TCP_STREAM
        IPv4, without UDP checksums
            14.41% TX CPU utilization
            25.71% RX CPU utilization
            9083.4 Mbps
        IPv4, with UDP checksums
            13.99% TX CPU utilization
            13.40% RX CPU utilization
            9095.65 Mbps
      
      TCP_RR
        IPv4, without UDP checksums
            94.08% TX CPU utilization
            156/248/462 90/95/99% latencies
            1.12743e+06 tps
        IPv4, with UDP checksums
            94.43% TX CPU utilization
            158/250/462 90/95/99% latencies
            1.13345e+06 tps
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c91ae08
    • D
      Merge branch 'amd-xgbe-next' · a1f5313c
      David S. Miller 提交于
      Tom Lendacky says:
      
      ====================
      amd-xgbe: AMD XGBE driver updates 2014-11-06
      
      The following series of patches fixes a couple of bugs that slipped
      through my last series.
      
      - Free channel structure after freeing the per channel interrupts
      - If an skb error allocation occurs during receive processing check
        whether more descriptors are associated with the packet or whether
        to start on a new packet
      
      This patch series is based on net-next.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1f5313c
    • L
      amd-xgbe: Check for complete packet on skb allocation error · f5eecbbe
      Lendacky, Thomas 提交于
      If the skb allocation fails during receive processing, the driver would
      continue reading descriptors without first determining if there were
      any more descriptors for the current packet. Update the code to check
      whether more descriptors are associated with the current packet or
      whether to move on to the next descriptor as a new packet.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5eecbbe
    • L
      amd-xgbe: Free channel/ring structures later · e98c72c9
      Lendacky, Thomas 提交于
      The channel structure is freed before freeing the per channel
      interrupts resulting in a kernel oops. Move the call to free
      the channel structure to after the freeing of the per channel
      interrupts.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e98c72c9
    • M
      netxen: Fix link event handling. · 9d01412a
      Manish Chopra 提交于
      o Poll for the link events only if firmware doesn't have capability
        to notify the driver for the link events.
      Signed-off-by: NManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d01412a
    • G
      enic: update desc properly in rx_copybreak · f6b7734b
      Govindarajulu Varadarajan 提交于
      When we reuse the rx buffer, we need to update the desc. If not hardware sees
      stale value.
      
      In the following crash, when mtu is changed, hardware sees old rx buffer value
      and crashes on skb_put.
      
      Fix this by using enic_queue_rq_desc helper function which updates the necessary
      desc.
      
      [   64.657376] skbuff: skb_over_panic: text:ffffffffa041f55d len:9010 put:9010 head:ffff8800d3ca9fc0 data:ffff8800d3caa000 tail:0x2372 end:0x640 dev:enp0s3
      [   64.659965] ------------[ cut here ]------------
      [   64.661322] kernel BUG at net/core/skbuff.c:100!
      [   64.662644] invalid opcode: 0000 [#1] PREEMPT SMP
      [   64.664001] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 cirrus ttm drm_kms_helper drm enic psmouse microcode evdev serio_raw syscopyarea sysfillrect sysimgblt i2c_piix4 i2c_core pcspkr nfs lockd grace sunrpc fscache ext4 crc16 mbcache jbd2 sd_mod ata_generic virtio_balloon ata_piix libata uhci_hcd virtio_pci virtio_ring usbcore usb_common virtio scsi_mod
      [   64.664834] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W      3.17.0-netnext-10335-g942396b0-dirty #273
      [   64.664834] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   64.664834] task: ffffffff81a1d580 ti: ffffffff81a00000 task.ti: ffffffff81a00000
      [   64.664834] RIP: 0010:[<ffffffff81392cf1>]  [<ffffffff81392cf1>] skb_panic+0x61/0x70
      [   64.664834] RSP: 0018:ffff880210603d48  EFLAGS: 00010292
      [   64.664834] RAX: 000000000000008c RBX: ffff88020b0f6930 RCX: 0000000000000000
      [   64.664834] RDX: 000000000000008c RSI: ffffffff8178b288 RDI: 00000000ffffffff
      [   64.664834] RBP: ffff880210603d68 R08: 0000000000000001 R09: 0000000000000001
      [   64.664834] R10: 00000000000005ce R11: 0000000000000001 R12: ffff88020b1f0b40
      [   64.664834] R13: 000000000000a332 R14: ffff880209a1a000 R15: 0000000000000001
      [   64.664834] FS:  0000000000000000(0000) GS:ffff880210600000(0000) knlGS:0000000000000000
      [   64.664834] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   64.664834] CR2: 00007f6752935e48 CR3: 0000000035743000 CR4: 00000000000006f0
      [   64.664834] Stack:
      [   64.664834]  ffff8800d3caa000 0000000000002372 0000000000000640 ffff88020b1f0000
      [   64.664834]  ffff880210603d78 ffffffff81392d54 ffff880210603e08 ffffffffa041f55d
      [   64.664834]  0000000000000296 ffffffff00000000 00008e7e00008e7e ffff880200002332
      [   64.664834] Call Trace:
      [   64.664834]  <IRQ>
      [   64.664834]
      [   64.664834]  [<ffffffff81392d54>] skb_put+0x54/0x60
      [   64.664834]  [<ffffffffa041f55d>] enic_rq_service.constprop.47+0x3ad/0x730 [enic]
      [   64.664834]  [<ffffffffa041fa79>] enic_poll_msix_rq+0x199/0x370 [enic]
      [   64.664834]  [<ffffffff813a5499>] net_rx_action+0x139/0x210
      [   64.664834]  [<ffffffff81290db3>] ? __this_cpu_preempt_check+0x13/0x20
      [   64.664834]  [<ffffffff8106110e>] __do_softirq+0x14e/0x280
      [   64.664834]  [<ffffffff8106152e>] irq_exit+0x8e/0xb0
      [   64.664834]  [<ffffffff8100fd21>] do_IRQ+0x61/0x100
      [   64.664834]  [<ffffffff814a2bf2>] common_interrupt+0x72/0x72
      
      fixes: a03bb56e ("enic: implement rx_copybreak")
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6b7734b
    • G
      enic: handle error condition properly in enic_rq_indicate_buf · 44aa91ab
      Govindarajulu Varadarajan 提交于
      In case of error in rx path, we free the buf->os_buf but we do not make it NULL.
      In next iteration we use the skb which is already freed. This causes the
      following crash.
      
      [  886.154772] general protection fault: 0000 [#1] PREEMPT SMP
      [  886.154851] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 microcode evdev cirrus ttm drm_kms_helper drm enic syscopyarea sysfillrect sysimgblt psmouse i2c_piix4 serio_raw pcspkr i2c_core nfs lockd grace sunrpc fscache ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common ata_generic ata_piix virtio_balloon libata scsi_mod uhci_hcd usbcore virtio_pci virtio_ring virtio usb_common
      [  886.155199] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W      3.17.0-netnext-05668-g876bc7f #272
      [  886.155263] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  886.155304] task: ffffffff81a1d580 ti: ffffffff81a00000 task.ti: ffffffff81a00000
      [  886.155356] RIP: 0010:[<ffffffff81384030>]  [<ffffffff81384030>] kfree_skb_list+0x10/0x30
      [  886.155418] RSP: 0018:ffff880210603d48  EFLAGS: 00010206
      [  886.155456] RAX: 0000000000000020 RBX: 0000000000000000 RCX: 0000000000000000
      [  886.155504] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 004500084e000017
      [  886.155553] RBP: ffff880210603d50 R08: 00000000fe13d1b6 R09: 0000000000000001
      [  886.155601] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880209ff2f00
      [  886.155650] R13: ffff88020ac0fe40 R14: ffff880209ff2f00 R15: ffff8800da8e3a80
      [  886.155699] FS:  0000000000000000(0000) GS:ffff880210600000(0000) knlGS:0000000000000000
      [  886.155774] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  886.155814] CR2: 00007f0e0c925000 CR3: 0000000035e8b000 CR4: 00000000000006f0
      [  886.155865] Stack:
      [  886.155882]  0000000000000000 ffff880210603d78 ffffffff81383f79 ffff880209ff2f00
      [  886.155942]  ffff88020b0c0b40 000000000000c000 ffff880210603d90 ffffffff81383faf
      [  886.156001]  ffff880209ff2f00 ffff880210603da8 ffffffff8138406d ffff88020b1b08c0
      [  886.156061] Call Trace:
      [  886.156080]  <IRQ>
      [  886.156095]
      [  886.156112]  [<ffffffff81383f79>] skb_release_data+0xa9/0xc0
      [  886.157656]  [<ffffffff81383faf>] skb_release_all+0x1f/0x30
      [  886.159195]  [<ffffffff8138406d>] consume_skb+0x1d/0x40
      [  886.160719]  [<ffffffff813942e5>] __dev_kfree_skb_any+0x35/0x40
      [  886.162224]  [<ffffffffa02dc1d5>] enic_rq_service.constprop.47+0xe5/0x5a0 [enic]
      [  886.163756]  [<ffffffffa02dc829>] enic_poll_msix_rq+0x199/0x370 [enic]
      [  886.164730]  [<ffffffff81397e29>] net_rx_action+0x139/0x210
      [  886.164730]  [<ffffffff8105fb2e>] __do_softirq+0x14e/0x280
      [  886.164730]  [<ffffffff8105ff2e>] irq_exit+0x8e/0xb0
      [  886.164730]  [<ffffffff8100fc1d>] do_IRQ+0x5d/0x100
      [  886.164730]  [<ffffffff81496832>] common_interrupt+0x72/0x72
      
      fixes: a03bb56e ("enic: implement rx_copybreak")
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44aa91ab
    • D
      Merge branch 'mlx5-net' · c8119067
      David S. Miller 提交于
      Eli Cohen says:
      
      ====================
      mlx5_core fixes for 3.18
      
      the following two patches fix races to could lead to kernel panic in some cases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8119067
    • E
      net/mlx5_core: Fix race on driver load · 364d1798
      Eli Cohen 提交于
      When events arrive at driver load, the event handler gets called even before
      the spinlock and list are initialized. Fix this by moving the initialization
      before EQs creation.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      364d1798
    • E
      net/mlx5_core: Fix race in create EQ · a158906d
      Eli Cohen 提交于
      After the EQ is created, it can possibly generate interrupts and the interrupt
      handler is referencing eq->dev. It is therefore required to set eq->dev before
      calling request_irq() so if an event is generated before request_irq() returns,
      we will have a valid eq->dev field.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a158906d
    • D
      Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch · 6b798d70
      David S. Miller 提交于
      Pravin B Shelar says:
      
      ====================
      Open vSwitch
      
      First two patches are related to OVS MPLS support. Rest of patches
      are mostly refactoring and minor improvements to openvswitch.
      
      v1-v2:
       - Fix conflicts due to "gue: Remote checksum offload"
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b798d70
    • D
      Merge branch 'sunvnet-next' · 271d70f4
      David S. Miller 提交于
      Sowmini Varadhan says:
      
      ====================
      sunvnet: bug fixes
      
      This patch series has a coding-style fix and a bug fix.
      
      The coding style fix (patch 1) is the extra indentation flagged by
      Ben Hutchings:
        http://marc.info/?l=linux-netdev&m=141529243409594&w=2
      
      The bugfix (patch 2) is the following:
      when vnet_event_napi() is  called as part of napi_resume
      (i.e., continuation of a previous NAPI read that was truncated
      due to budget constraints), and then finds no more packets to read,
      the code was trying to avoid an additional trip through ldc_rx
      as an optimization. However, when this corner case happens, we would
      need to reset a number of dring state bits such as rcv_nxt carefully,
      which quickly becomes complex and hacky.  The cleaner solution
      is to just roll back to vnet_poll, re-enable interrupts and set up
      dring state as was done in the pre-NAPI version of the driver.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271d70f4
    • S
      sunvnet: Return from vnet_napi_event() if no packets to read · 8c4ee3e7
      Sowmini Varadhan 提交于
      vnet_event_napi() may be called as part of the NAPI ->poll,
      to resume reading descriptor rings. When no data is available,
      descriptor ring state (e.g., rcv_nxt) needs to be reset
      carefully to stay in lock-step with ldc_read(). In the interest
      of simplicity, the best way to do this is to return from
      vnet_event_napi() when there are no more packets to read.
      The next trip through ldc_rx will correctly set up the dring state.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Tested-by: NDavid Stevens <david.stevens@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c4ee3e7
    • S
      sunvnet: Fix indentation in maybe_tx_wakeup() · 6c3ce8a3
      Sowmini Varadhan 提交于
      remove redundant tab.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c3ce8a3
    • D
      Merge branch 'r8152-next' · 4e865a5a
      David S. Miller 提交于
      Hayes Wang says:
      
      ====================
      r8152: rtl_ops_init modify
      
      Initialize the ops through tp->version. This could skip checking
      each VID/PID.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e865a5a
    • H
      r8152: remove the definitions of the PID · 662412d1
      hayeswang 提交于
      The PIDs are only used in the id table, so the definitions are
      unnacessary. Remove them wouldn't have confusion.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662412d1
    • H
      r8152: modify rtl_ops_init · 55b65475
      hayeswang 提交于
      Replace using VID/PID with using tp->version to initialize the ops.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55b65475
    • H
      r8152: move r8152b_get_version · 82cf94cb
      hayeswang 提交于
      Move r8152b_get_version() to the location before rtl_ops_init().
      Then, the rtl_ops_init() could use tp->version.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82cf94cb
    • J
      sock.h: Remove unused NETDEBUG macro · 926c5126
      Joe Perches 提交于
      It's unused now, just delete it.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      926c5126
    • J
      net: esp: Convert NETDEBUG to pr_info · 45083497
      Joe Perches 提交于
      Commit 64ce2073 ("[NET]: Make NETDEBUG pure printk wrappers")
      originally had these NETDEBUG printks as always emitting.
      
      Commit a2a316fd ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
      added a net_msg_warn sysctl to these NETDEBUG uses.
      
      Convert these NETDEBUG uses to normal pr_info calls.
      
      This changes the output prefix from "ESP: " to include
      "IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.
      
      These output lines are now like the other messages in the files.
      
      Other miscellanea:
      
      Neaten the arithmetic spacing to be consistent with other
      arithmetic spacing in the files.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45083497
    • J
      net; ipv[46] - Remove 2 unnecessary NETDEBUG OOM messages · cbffccc9
      Joe Perches 提交于
      These messages aren't useful as there's a generic dump_stack()
      on OOM.
      
      Neaten the comment and if test above the OOM by separating the
      assign in if into an allocation then if test.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbffccc9
    • A
      dsa: mv88e6171: Add support for mv88e6172 · f03ae5f9
      Andrew Lunn 提交于
      The mv88e6172 is very similar to the mv88e6171.  So extend the
      mv88e6171 driver to support it.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f03ae5f9
    • A
      net: dsa: slave: Fix autoneg for phys on switch MDIO bus · b31f65fb
      Andrew Lunn 提交于
      When the ports phys are connected to the switches internal MDIO bus,
      we need to connect the phy to the slave netdev, otherwise
      auto-negotiation etc, does not work.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b31f65fb
    • J
      sched: fix act file names in header comment · 0c6965dd
      Jiri Pirko 提交于
      Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c6965dd