1. 17 11月, 2010 15 次提交
    • A
      ixgbe: remove unnecessary re-init of adapter on Rx-csum change · 4c0ec654
      Alexander Duyck 提交于
      There is no need to reset the adapter when changing the Rx checksum
      settings. Since the only change is a software flag we can disable it
      without needing to reset the entire adapter.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NRoss Brattain <ross.b.brattain@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4c0ec654
    • J
      ixgbe: DCB: credit max only needs to be gt TSO size for 82598 · 80ab193d
      John Fastabend 提交于
      The maximum credits per traffic class only needs to be greater
      then the TSO size for 82598 devices. The 82599 devices do not
      have this requirement so only do this test for 82598 devices.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Tested-by: NRoss Brattain <ross.b.brattain@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      80ab193d
    • J
      ixgbe: DCB set PFC high and low water marks per data sheet specs · 16b61beb
      John Fastabend 提交于
      Currently the high and low water marks for PFC are being set
      conservatively for jumbo frames. This means the RX buffers
      are being underutilized in the default 1500 MTU. This patch
      fixes this so that the water marks are set as described in
      the data sheet considering the MTU size.
      
      The equation used is,
      
      RTT * 1.44 + MTU * 1.44 + MTU
      
      Where RTT is the round trip time and MTU is the max frame size
      in KB. To avoid floating point arithmetic FC_HIGH_WATER is
      defined
      
      ((((RTT + MTU) * 144) + 99) / 100) + MTU
      
      This changes how the hardware field fc.low_water and
      fc.high_water are used. With this change they are no longer
      storing the actual low water and high water markers but are
      storing the required head room in the buffer. This simplifies
      the logic and we do not need to account for the size of the
      buffer when setting the thresholds.
      
      Testing with iperf and 16 threads showed a slight uptick in
      throughput over a single traffic class .1-.2Gbps and a reduction
      in pause frames. Without the patch a 30 second run would show
      ~10-15 pause frames being transmitted with the patch ~2-5 are
      seen. Test were run back to back with 82599.
      
      Note RXPBSIZE is in KB and low and high water marks fields are
      also in KB. However the FCRT* registers are 32B granularity and
      right shifted 5 into the register,
      
      (((rx_pbsize - water_mark) * 1024) / 32) << 5
      
      is the most explicit conversion here we simplify
      
      (rx_pbsize - water_mark) * 32 << 5 = (rx_pbsize - water_mark) << 10
      
      This patch updates the PFC thresholds and legacy FC thresholds.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Tested-by: NRoss Brattain <ross.b.brattain@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      16b61beb
    • G
      ixgbevf: Update Version String and Copyright Notice · 66c87bd5
      Greg Rose 提交于
      Update version string and copyright notice.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Tested-by: NEmil Tantilov <emil.s.tantilov@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      66c87bd5
    • E
      ixgbe: delay rx_ring freeing · 1a51502b
      Eric Dumazet 提交于
      "cat /proc/net/dev" uses RCU protection only.
      
      Its quite possible we call a driver get_stats() method while device is
      dismantling and freeing its data structures.
      
      So get_stats() methods must be very careful not accessing driver private
      data without appropriate locking.
      
      In ixgbe case, we access rx_ring pointers. These pointers are freed in
      ixgbe_clear_interrupt_scheme() and set to NULL, this can trigger NULL
      dereference in ixgbe_get_stats64()
      
      A possible fix is to use RCU locking in ixgbe_get_stats64() and defer
      rx_ring freeing after a grace period in ixgbe_clear_interrupt_scheme()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NTantilov, Emil S <emil.s.tantilov@intel.com>
      Tested-by: NRoss Brattain <ross.b.brattain@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1a51502b
    • E
      net: reorder struct sock fields · b178bb3d
      Eric Dumazet 提交于
      Right now, fields in struct sock are not optimally ordered, because each
      path (RX softirq, TX completion, RX user,  TX user) has to touch fields
      that are contained in many different cache lines.
      
      The really critical thing is to shrink number of cache lines that are
      used at RX softirq time : CPU handling softirqs for a device can receive
      many frames per second for many sockets. If load is too big, we can drop
      frames at NIC level. RPS or multiqueue cards can help, but better reduce
      latency if possible.
      
      This patch starts with UDP protocol, then additional patches will try to
      reduce latencies of other ones as well.
      
      At RX softirq time, fields of interest for UDP protocol are :
      (not counting ones in inet struct for the lookup)
      
      Read/Written:
      sk_refcnt   (atomic increment/decrement)
      sk_rmem_alloc & sk_backlog.len (to check if there is room in queues)
      sk_receive_queue
      sk_backlog (if socket locked by user program)
      sk_rxhash
      sk_forward_alloc
      sk_drops
      
      Read only:
      sk_rcvbuf (sk_rcvqueues_full())
      sk_filter
      sk_wq
      sk_policy[0]
      sk_flags
      
      Additional notes :
      
      - sk_backlog has one hole on 64bit arches. We can fill it to save 8
      bytes.
      - sk_backlog is used only if RX sofirq handler finds the socket while
      locked by user.
      - sk_rxhash is written only once per flow.
      - sk_drops is written only if queues are full
      
      Final layout :
      
      [1] One section grouping all read/write fields, but placing rxhash and
      sk_backlog at the end of this section.
      
      [2] One section grouping all read fields in RX handler
         (sk_filter, sk_rcv_buf, sk_wq)
      
      [3] Section used by other paths
      
      I'll post a patch on its own to put sk_refcnt at the end of struct
      sock_common so that it shares same cache line than section [1]
      
      New offsets on 64bit arch :
      
      sizeof(struct sock)=0x268
      offsetof(struct sock, sk_refcnt)  =0x10
      offsetof(struct sock, sk_lock)    =0x48
      offsetof(struct sock, sk_receive_queue)=0x68
      offsetof(struct sock, sk_backlog)=0x80
      offsetof(struct sock, sk_rmem_alloc)=0x80
      offsetof(struct sock, sk_forward_alloc)=0x98
      offsetof(struct sock, sk_rxhash)=0x9c
      offsetof(struct sock, sk_rcvbuf)=0xa4
      offsetof(struct sock, sk_drops) =0xa0
      offsetof(struct sock, sk_filter)=0xa8
      offsetof(struct sock, sk_wq)=0xb0
      offsetof(struct sock, sk_policy)=0xd0
      offsetof(struct sock, sk_flags) =0xe0
      
      Instead of :
      
      sizeof(struct sock)=0x270
      offsetof(struct sock, sk_refcnt)  =0x10
      offsetof(struct sock, sk_lock)    =0x50
      offsetof(struct sock, sk_receive_queue)=0xc0
      offsetof(struct sock, sk_backlog)=0x70
      offsetof(struct sock, sk_rmem_alloc)=0xac
      offsetof(struct sock, sk_forward_alloc)=0x10c
      offsetof(struct sock, sk_rxhash)=0x128
      offsetof(struct sock, sk_rcvbuf)=0x4c
      offsetof(struct sock, sk_drops) =0x16c
      offsetof(struct sock, sk_filter)=0x198
      offsetof(struct sock, sk_wq)=0x88
      offsetof(struct sock, sk_policy)=0x98
      offsetof(struct sock, sk_flags) =0x130
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b178bb3d
    • E
      udp: use atomic_inc_not_zero_hint · c31504dc
      Eric Dumazet 提交于
      UDP sockets refcount is usually 2, unless an incoming frame is going to
      be queued in receive or backlog queue.
      
      Using atomic_inc_not_zero_hint() permits to reduce latency, because
      processor issues less memory transactions.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c31504dc
    • E
      vlan: remove ndo_select_queue() logic · 213b15ca
      Eric Dumazet 提交于
      Now vlan are lockless, we dont need special ndo_select_queue() logic.
      dev_pick_tx() will do the multiqueue stuff on the real device transmit.
      Suggested-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      213b15ca
    • E
      vlan: lockless transmit path · 4af429d2
      Eric Dumazet 提交于
      vlan is a stacked device, like tunnels. We should use the lockless
      mechanism we are using in tunnels and loopback.
      
      This patch completely removes locking in TX path.
      
      tx stat counters are added into existing percpu stat structure, renamed
      from vlan_rx_stats to vlan_pcpu_stats.
      
      Note : this partially reverts commit 2e59af3d (vlan: multiqueue vlan
      device)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af429d2
    • E
      macvlan: lockless tx path · 8ffab51b
      Eric Dumazet 提交于
      macvlan is a stacked device, like tunnels. We should use the lockless
      mechanism we are using in tunnels and loopback.
      
      This patch completely removes locking in TX path.
      
      tx stat counters are added into existing percpu stat structure, renamed
      from rx_stats to pcpu_stats.
      
      Note : this reverts commit 2c114553 (macvlan: add multiqueue
      capability)
      
      Note : rx_errors converted to a 32bit counter, like tx_dropped, since
      they dont need 64bit range.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Ben Greear <greearb@candelatech.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ffab51b
    • N
      packet: Enhance AF_PACKET implementation to not require high order contiguous... · 0e3125c7
      Neil Horman 提交于
      packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
      MIME-Version: 1.0
      Content-Type: text/plain; charset=UTF-8
      Content-Transfer-Encoding: 8bit
      
      Version 4 of this patch.
      
      Change notes:
      1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)
      
      Summary:
      It was shown to me recently that systems under high load were driven very deep
      into swap when tcpdump was run.  The reason this happened was because the
      AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
      application to specify how many entries an AF_PACKET socket will have and how
      large each entry will be.  It seems the default setting for tcpdump is to set
      the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
      allocation.  Thats difficult under good circumstances, and horrid under memory
      pressure.
      
      I thought it would be good to make that a bit more usable.  I was going to do a
      simple conversion of the ring buffer from contigous pages to iovecs, but
      unfortunately, the metadata which AF_PACKET places in these buffers can easily
      span a page boundary, and given that these buffers get mapped into user space,
      and the data layout doesn't easily allow for a change to padding between frames
      to avoid that, a simple iovec change is just going to break user space ABI
      consistency.
      
      So I've done this, I've added a three tiered mechanism to the af_packet set_ring
      socket option.  It attempts to allocate memory in the following order:
      
      1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
      digging into swap
      
      2) Using vmalloc
      
      3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
      needed to get the memory
      
      The effect is that we don't disturb the system as much when we're under load,
      while still being able to conduct tcpdumps effectively.
      
      Tested successfully by me.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
      Reported-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e3125c7
    • J
      drivers/isdn/mISDN: Use printf extension %pV · 020f01eb
      Joe Perches 提交于
      Using %pV reduces the number of printk calls and
      eliminates any possible message interleaving from
      other printk calls.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      020f01eb
    • J
      netlink: let nlmsg and nla functions take pointer-to-const args · 3654654f
      Jan Engelhardt 提交于
      The changed functions do not modify the NL messages and/or attributes
      at all. They should use const (similar to strchr), so that callers
      which have a const nlmsg/nlattr around can make use of them without
      casting.
      
      While at it, constify a data array.
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3654654f
    • J
      ipv6: fix missing in6_ifa_put in addrconf · 9d82ca98
      John Fastabend 提交于
      Fix ref count bug introduced by
      
      commit 2de79570
      Author: Lorenzo Colitti <lorenzo@google.com>
      Date:   Wed Oct 27 18:16:49 2010 +0000
      
      ipv6: addrconf: don't remove address state on ifdown if the address
      is being kept
      
      Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
      refcnt correctly with in6_ifa_put().
      Reported-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d82ca98
    • D
  2. 16 11月, 2010 25 次提交