1. 07 10月, 2009 1 次提交
    • E
      net: speedup sk_wake_async() · bcdce719
      Eric Dumazet 提交于
      An incoming datagram must bring into cpu cache *lot* of cache lines,
      in particular : (other parts omitted (hash chains, ip route cache...))
      
      On 32bit arches :
      
      offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
      offsetof(struct sock, sk_lock)         =0x34   (rw)
      
      offsetof(struct sock, sk_sleep)        =0x50 (read)
      offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
      offsetof(struct sock, sk_receive_queue)=0x74   (rw)
      
      offsetof(struct sock, sk_forward_alloc)=0x98   (rw)
      
      offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
      offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
      offsetof(struct sock, sk_filter)       =0xf8    (read)
      
      offsetof(struct sock, sk_socket)       =0x138 (read)
      
      offsetof(struct sock, sk_data_ready)   =0x15c   (read)
      
      
      We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
      with no fasync() structures. (socket->fasync_list ptr is probably already in cache
      because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
      
      This avoids one cache line load per incoming packet for common cases (no fasync())
      
      We can leave (or even move in a future patch) sk->sk_socket in a cold location
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcdce719
  2. 05 10月, 2009 10 次提交
    • A
      net: export device speed and duplex via sysfs · d519e17e
      Andy Gospodarek 提交于
      This patch exports the link-speed (in Mbps) and duplex of an interface
      via sysfs.  This eliminates the need to use ethtool just to check the
      link-speed.  Not requiring 'ethtool' and not relying on the SIOCETHTOOL
      ioctl should be helpful in an embedded environment where space is at a
      premium as well.
      
      NOTE: This patch also intentionally allows non-root users to check the link
      speed and duplex -- something not possible with ethtool.
      
      Here's some sample output:
      
      # cat /sys/class/net/eth0/speed
      100
      # cat /sys/class/net/eth0/duplex
      half
      # ethtool eth0
      Settings for eth0:
              Supported ports: [ TP ]
              Supported link modes:   10baseT/Half 10baseT/Full
                                      100baseT/Half 100baseT/Full
                                      1000baseT/Half 1000baseT/Full
              Supports auto-negotiation: Yes
              Advertised link modes:  Not reported
              Advertised auto-negotiation: No
              Speed: 100Mb/s
              Duplex: Half
              Port: Twisted Pair
              PHYAD: 1
              Transceiver: internal
              Auto-negotiation: off
              Supports Wake-on: g
              Wake-on: g
              Current message level: 0x000000ff (255)
              Link detected: yes
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d519e17e
    • M
      cfg80211: assign device type in netdev notifier callback · 053a93dd
      Marcel Holtmann 提交于
      Instead of having to modify every non-mac80211 for device type assignment,
      do this inside the netdev notifier callback of cfg80211. So all drivers
      that integrate with cfg80211 will export a proper device type.
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053a93dd
    • J
      net: introduce NETDEV_POST_INIT notifier · 7ffbe3fd
      Johannes Berg 提交于
      For various purposes including a wireless extensions
      bugfix, we need to hook into the netdev creation before
      before netdev_register_kobject(). This will also ease
      doing the dev type assignment that Marcel was working
      on for cfg80211 drivers w/o touching them all.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ffbe3fd
    • E
      tunnels: Optimize tx path · 0bfbedb1
      Eric Dumazet 提交于
      We currently dirty a cache line to update tunnel device stats
      (tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
      counters that already are present in cpu cache, in the cache
      line shared with txq->_xmit_lock
      
      This patch extends IPTUNNEL_XMIT() macro to use txq pointer
      provided by the caller.
      
      Also &tunnel->dev->stats can be replaced by &dev->stats
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bfbedb1
    • S
      ipv4: fib table algorithm performance improvement · 16c6cf8b
      Stephen Hemminger 提交于
      The FIB algorithim for IPV4 is set at compile time, but kernel goes through
      the overhead of function call indirection at runtime. Save some
      cycles by turning the indirect calls to direct calls to either
      hash or trie code.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16c6cf8b
    • N
      af_packet: add interframe drop cmsg (v6) · 97775007
      Neil Horman 提交于
      Add Ancilliary data to better represent loss information
      
      I've had a few requests recently to provide more detail regarding frame loss
      during an AF_PACKET packet capture session.  Specifically the requestors want to
      see where in a packet sequence frames were lost, i.e. they want to see that 40
      frames were lost between frames 302 and 303 in a packet capture file.  In order
      to do this we need:
      
      1) The kernel to export this data to user space
      2) The applications to make use of it
      
      This patch addresses item (1).  It does this by doing the following:
      
      A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
      also no increment a stats called po->stats.tp_gap.
      
      B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
      value of po->stats.tp_gap in skb->mark.  skb->cb would nominally be the place to
      record this, but since all the space there is used up, we're overloading
      skb->mark.  Its safe to do since any enqueued packet is guaranteed to be
      unshared at this point, and skb->mark isn't used for anything else in the rx
      path to the application.  After we record tp_gap in the skb, we zero
      po->stats.tp_gap.  This allows us to keep a counter of the number of frames lost
      between any two enqueued packets
      
      C) When the application goes to dequeue a frame from the packet socket, we look
      at skb->mark for that frame.  If it is non-zero, we add a cmsg chunk to the
      msghdr of level SOL_PACKET and type PACKET_GAPDATA.  Its a 32 bit integer that
      represents the number of frames lost between this packet and the last previous
      frame received.
      
      Note there is a chance that if there is frame loss after a receive, and then the
      socket is closed, some gap data might be lost.  This is covered by the use of
      the PACKET_AUXDATA socket option, which gives total loss data.  With a bit of
      math, the final gap can be determined that way.
      
      I've tested this patch myself, and it works well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      
       include/linux/if_packet.h |    2 ++
       net/packet/af_packet.c    |   33 +++++++++++++++++++++++++++++++++
       2 files changed, 35 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97775007
    • E
      pktgen: Avoid dirtying skb->users when txq is full · 0835acfe
      Eric Dumazet 提交于
      We can avoid two atomic ops on skb->users if packet is not going to be
      sent to the device (because hardware txqueue is full)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0835acfe
    • E
      icmp: No need to call sk_write_space() · b3a5b6cc
      Eric Dumazet 提交于
      We can make icmp messages tx completion callback a litle bit faster.
      
      Setting SOCK_USE_WRITE_QUEUE sk flag tells sock_wfree() to
      not call sk_write_space() on a socket we know no thread is posssibly
      waiting for write space. (on per cpu kernel internal icmp sockets only)
      
      This avoids the sock_def_write_space() call and
      read_lock(&sk->sk_callback_lock)/read_unlock(&sk->sk_callback_lock) calls
      as well.
      
      We avoid three atomic ops.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3a5b6cc
    • B
      ethtool: Remove support for obsolete string query operations · a9828ec6
      Ben Hutchings 提交于
      The in-tree implementations have all been converted to
      get_sset_count().
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9828ec6
    • A
      a99bbaf5
  3. 03 10月, 2009 1 次提交
  4. 02 10月, 2009 4 次提交
    • A
      net: Use sk_mark for routing lookup in more places · 914a9ab3
      Atis Elsts 提交于
      This patch against v2.6.31 adds support for route lookup using sk_mark in some 
      more places. The benefits from this patch are the following.
      First, SO_MARK option now has effect on UDP sockets too.
      Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
      lookup correctly if TCP sockets with SO_MARK were used.
      Signed-off-by: NAtis Elsts <atis@mikrotik.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      914a9ab3
    • O
      IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61
      Ori Finkelman 提交于
      Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
      and SYN headers even if our window scale is zero.
      
      This fixes the following observed behavior:
      
      1. Client sends a SYN with TCP window scaling option and non zero window scale
      value to a Linux box.
      2. Linux box notes large receive window from client.
      3. Linux decides on a zero value of window scale for its part.
      4. Due to compare against requested window scale size option, Linux does not to
       send windows scale TCP option header on SYN/ACK at all.
      
      With the following result:
      
      Client box thinks TCP window scaling is not supported, since SYN/ACK had no
      TCP window scale option, while Linux thinks that TCP window scaling is
      supported (and scale might be non zero), since SYN had  TCP window scale
      option and we have a mismatched idea between the client and server
      regarding window sizes.
      
      Probably it also fixes up the following bug (not observed in practice):
      
      1. Linux box opens TCP connection to some server.
      2. Linux decides on zero value of window scale.
      3. Due to compare against computed window scale size option, Linux does
      not to set windows scale TCP  option header on SYN.
      
      With the expected result that the server OS does not use window scale option
      due to not receiving such an option in the SYN headers, leading to suboptimal
      performance.
      Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NOri Finkelman <ori@comsleep.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e95a61
    • A
      net/ipv4/tcp.c: fix min() type mismatch warning · 4fdb78d3
      Andrew Morton 提交于
      net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
      net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fdb78d3
    • E
      pktgen: Fix delay handling · 417bc4b8
      Eric Dumazet 提交于
      After last pktgen changes, delay handling is wrong.
      
      pktgen actually sends packets at full line speed.
      
      Fix is to update pkt_dev->next_tx even if spin() returns early,
      so that next spin() calls have a chance to see a positive delay.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      417bc4b8
  5. 01 10月, 2009 6 次提交
  6. 30 9月, 2009 1 次提交
    • I
      mac80211: Fix [re]association power saving issue on AP side · 1f08e84f
      Igor Perminov 提交于
      Consider the following step-by step:
      1. A STA authenticates and associates with the AP and exchanges
      traffic.
      2. The STA reports to the AP that it is going to PS state.
      3. Some time later the STA device goes to the stand-by mode (not only
      its wi-fi card, but the device itself) and drops the association state
      without sending a disassociation frame.
      4. The STA device wakes up and begins authentication with an
      Auth frame as it hasn't been authenticated/associated previously.
      
      At the step 4 the AP "remembers" the STA and considers it is still in
      the PS state, so the AP buffers frames, which it has to send to the STA.
      But the STA isn't actually in the PS state and so it neither checks
      TIM bits nor reports to the AP that it isn't power saving.
      Because of that authentication/[re]association fails.
      
      To fix authentication/[re]association stage of this issue, Auth, Assoc
      Resp and Reassoc Resp frames are transmitted disregarding of STA's power
      saving state.
      
      N.B. This patch doesn't fix further data frame exchange after
      authentication/[re]association. A patch in hostapd is required to fix
      that.
      Signed-off-by: NIgor Perminov <igor.perminov@inbox.ru>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      1f08e84f
  7. 29 9月, 2009 9 次提交
  8. 28 9月, 2009 1 次提交
  9. 27 9月, 2009 4 次提交
  10. 26 9月, 2009 1 次提交
  11. 25 9月, 2009 2 次提交
    • J
      genetlink: fix netns vs. netlink table locking (2) · b8273570
      Johannes Berg 提交于
      Similar to commit d136f1bd,
      there's a bug when unregistering a generic netlink family,
      which is caught by the might_sleep() added in that commit:
      
          BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
          in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
          2 locks held by rmmod/1510:
           #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
           #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
          Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
          Call Trace:
           [<ffffffff81044ff9>] __might_sleep+0x119/0x150
           [<ffffffff81380501>] netlink_table_grab+0x21/0x100
           [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
           [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
           [<ffffffff81382866>] genl_unregister_family+0x56/0x130
           [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
           [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]
      
      Fix in the same way by grabbing the netlink table lock
      before doing rcu_read_lock().
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8273570
    • E
      tunnel: eliminate recursion field · a43912ab
      Eric Dumazet 提交于
      It seems recursion field from "struct ip_tunnel" is not anymore needed.
      recursion prevention is done at the upper level (in dev_queue_xmit()),
      since we use HARD_TX_LOCK protection for tunnels.
      
      This avoids a cache line ping pong on "struct ip_tunnel" : This structure
      should be now mostly read on xmit and receive paths.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a43912ab