1. 26 5月, 2009 1 次提交
  2. 25 5月, 2009 3 次提交
  3. 22 5月, 2009 5 次提交
  4. 21 5月, 2009 1 次提交
  5. 19 5月, 2009 5 次提交
    • E
      net: release dst entry in dev_hard_start_xmit() · 93f154b5
      Eric Dumazet 提交于
      One point of contention in high network loads is the dst_release() performed
      when a transmited skb is freed. This is because NIC tx completion calls
      dev_kree_skb() long after original call to dev_queue_xmit(skb).
      
      CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
      quite visible if one CPU is 100% handling softirqs for a network device,
      since dst_clone() is done by other cpus, involving cache line ping pongs.
      
      It seems right place to release dst is in dev_hard_start_xmit(), for most
      devices but ones that are virtual, and some exceptions.
      
      David Miller suggested to define a new device flag, set in alloc_netdev_mq()
      (so that most devices set it at init time), and carefuly unset in devices
      which dont want a NULL skb->dst in their ndo_start_xmit().
      
      List of devices that must clear this flag is :
      
      - loopback device, because it calls netif_rx() and quoting Patrick :
          "ip_route_input() doesn't accept loopback addresses, so loopback packets
           already need to have a dst_entry attached."
      - appletalk/ipddp.c : needs skb->dst in its xmit function
      
      - And all devices that call again dev_queue_xmit() from their xmit function
      (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f154b5
    • E
      net-sysfs: Use rtnl_trylock in sysfs methods. · 336ca57c
      Eric W. Biederman 提交于
      The earlier patch to fix the deadlock between a network device going
      away and writing to sysfs attributes was incomplete.
      - It did not set signal_pending so we would leak ERSTARTSYS to user space.
      - It used ERESTARTSYS which only restarts if sigaction configures it to.
      - It did not cover store and show for ifalias.
      
      So fix all of these up and use the new helper restart_syscall so we get
      the details correct on what it takes.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      336ca57c
    • T
      net: fix skb_seq_read returning wrong offset/length for page frag data · 995b3379
      Thomas Chenault 提交于
      When called with a consumed value that is less than skb_headlen(skb)
      bytes into a page frag, skb_seq_read() incorrectly returns an
      offset/length relative to skb->data. Ensure that data which should come
      from a page frag does.
      Signed-off-by: NThomas Chenault <thomas_chenault@dell.com>
      Tested-by: NShyam Iyer <shyam_iyer@dell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      995b3379
    • E
      pkt_sched: gen_estimator: use 64 bit intermediate counters for bps · 511e11e3
      Eric Dumazet 提交于
      gen_estimator can overflow bps (bytes per second) with Gb links, while
      it was designed with a u32 API, with a theorical limit of 34360Mbit
      (2^32 bytes)
      
      Using 64 bit intermediate avbps/brate counters can allow us to reach
      this theorical limit.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      511e11e3
    • E
      net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue · 7004bf25
      Eric Dumazet 提交于
      offsetof(struct net_device, features)=0x44
      offsetof(struct net_device, stats.tx_packets)=0x54
      offsetof(struct net_device, stats.tx_bytes)=0x5c
      offsetof(struct net_device, stats.tx_dropped)=0x6c
      
      Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
      tx path can slow down SMP operations, since they dirty a cache line
      that should stay shared (dev->features is needed in rx and tx paths)
      
      We could move away stats field in net_device but it wont help that much.
      (Two cache lines dirtied in tx path, we can do one only)
      
      Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
      netdev_queue because this structure is already touched in tx path and
      counters updates will then be free (no increase in size)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7004bf25
  6. 18 5月, 2009 3 次提交
  7. 10 5月, 2009 1 次提交
  8. 09 5月, 2009 1 次提交
  9. 07 5月, 2009 1 次提交
  10. 06 5月, 2009 1 次提交
    • J
      net: introduce a list of device addresses dev_addr_list (v6) · f001fde5
      Jiri Pirko 提交于
      v5 -> v6 (current):
      -removed so far unused static functions
      -corrected dev_addr_del_multiple to call del instead of add
      
      v4 -> v5:
      -added device address type (suggested by davem)
      -removed refcounting (better to have simplier code then safe potentially few
       bytes)
      
      v3 -> v4:
      -changed kzalloc to kmalloc in __hw_addr_add_ii()
      -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()
      
      v2 -> v3:
      -removed unnecessary rcu read locking
      -moved dev_addr_flush() calling to ensure no null dereference of dev_addr
      
      v1 -> v2:
      -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
      -removed unnecessary rcu_read locking in dev_addr_init
      -use compare_ether_addr_64bits instead of compare_ether_addr
      -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
      -use call_rcu instead of rcu_synchronize
      -moved is_etherdev_addr into __KERNEL__ ifdef
      
      This patch introduces a new list in struct net_device and brings a set of
      functions to handle the work with device address list. The list is a replacement
      for the original dev_addr field and because in some situations there is need to
      carry several device addresses with the net device. To be backward compatible,
      dev_addr is made to point to the first member of the list so original drivers
      sees no difference.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f001fde5
  11. 05 5月, 2009 2 次提交
  12. 04 5月, 2009 1 次提交
  13. 02 5月, 2009 1 次提交
  14. 30 4月, 2009 1 次提交
  15. 28 4月, 2009 1 次提交
    • E
      net: Avoid extra wakeups of threads blocked in wait_for_packet() · bf368e4e
      Eric Dumazet 提交于
      In 2.6.25 we added UDP mem accounting.
      
      This unfortunatly added a penalty when a frame is transmitted, since
      we have at TX completion time to call sock_wfree() to perform necessary
      memory accounting. This calls sock_def_write_space() and utimately
      scheduler if any thread is waiting on the socket.
      Thread(s) waiting for an incoming frame was scheduled, then had to sleep
      again as event was meaningless.
      
      (All threads waiting on a socket are using same sk_sleep anchor)
      
      This adds lot of extra wakeups and increases latencies, as noted
      by Christoph Lameter, and slows down softirq handler.
      
      Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 
      
      Fortunatly, Davide Libenzi recently added concept of keyed wakeups
      into kernel, and particularly for sockets (see commit
      37e5540b 
      epoll keyed wakeups: make sockets use keyed wakeups)
      
      Davide goal was to optimize epoll, but this new wakeup infrastructure
      can help non epoll users as well, if they care to setup an appropriate
      handler.
      
      This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
      in wait_for_packet(), so that only relevant event can wakeup a thread
      blocked in this function.
      
      Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
      __kfree_skb()
       skb_release_head_state()
        sock_wfree()
         sock_def_write_space()
          __wake_up_sync_key()
           __wake_up_common()
            receiver_wake_function() : Stops here since thread is waiting for an INPUT
      Reported-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf368e4e
  16. 27 4月, 2009 2 次提交
  17. 21 4月, 2009 2 次提交
  18. 20 4月, 2009 3 次提交
  19. 16 4月, 2009 1 次提交
  20. 15 4月, 2009 1 次提交
  21. 11 4月, 2009 1 次提交
  22. 02 4月, 2009 1 次提交
  23. 01 4月, 2009 1 次提交