1. 29 8月, 2009 15 次提交
  2. 20 8月, 2009 1 次提交
  3. 14 8月, 2009 1 次提交
    • N
      net: skb ftracer - add tracepoint to skb_copy_datagram_iovec (v3) · e9b3cc1b
      Neil Horman 提交于
      skb allocation / cosumption tracer - Add consumption tracepoint
      
      This patch adds a tracepoint to skb_copy_datagram_iovec, which is called each
      time a userspace process copies a frame from a socket receive queue to a user
      space buffer.  It allows us to hook in and examine each sk_buff that the system
      receives on a per-socket bases, and can be use to compile a list of which skb's
      were received by which processes.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/trace/events/skb.h |   20 ++++++++++++++++++++
       net/core/datagram.c        |    3 +++
       2 files changed, 23 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9b3cc1b
  4. 07 8月, 2009 1 次提交
    • K
      net: Avoid enqueuing skb for default qdiscs · bbd8a0d3
      Krishna Kumar 提交于
      dev_queue_xmit enqueue's a skb and calls qdisc_run which
      dequeue's the skb and xmits it. In most cases, the skb that
      is enqueue'd is the same one that is dequeue'd (unless the
      queue gets stopped or multiple cpu's write to the same queue
      and ends in a race with qdisc_run). For default qdiscs, we
      can remove the redundant enqueue/dequeue and simply xmit the
      skb since the default qdisc is work-conserving.
      
      The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
      default fast queue. The controversial part of the patch is
      incrementing qlen when a skb is requeued - this is to avoid
      checks like the second line below:
      
      +  } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
      >>         !q->gso_skb &&
      +          !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {
      
      Results of a 2 hour testing for multiple netperf sessions (1,
      2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
      aggregate Mb/s across iterations tested with this version on
      System-X boxes with Chelsio 10gbps cards:
      
      ----------------------------------
      Size |  ORG BW          NEW BW   |
      ----------------------------------
      128K |  156964          159381   |
      256K |  158650          162042   |
      ----------------------------------
      
      Changes from ver1:
      
      1. Move sch_direct_xmit declaration from sch_generic.h to
         pkt_sched.h
      2. Update qdisc basic statistics for direct xmit path.
      3. Set qlen to zero in qdisc_reset.
      4. Changed some function names to more meaningful ones.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbd8a0d3
  5. 06 8月, 2009 3 次提交
  6. 05 8月, 2009 1 次提交
    • I
      net: Fix spinlock use in alloc_netdev_mq() · 0bf52b98
      Ingo Molnar 提交于
      -tip testing found this lockdep warning:
      
      [    2.272010] calling  net_dev_init+0x0/0x164 @ 1
      [    2.276033] device class 'net': registering
      [    2.280191] INFO: trying to register non-static key.
      [    2.284005] the code is fine but needs lockdep annotation.
      [    2.284005] turning off the locking correctness validator.
      [    2.284005] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip #1145
      [    2.284005] Call Trace:
      [    2.284005]  [<7958eb4e>] ? printk+0xf/0x11
      [    2.284005]  [<7904f83c>] __lock_acquire+0x11b/0x622
      [    2.284005]  [<7908c9b7>] ? alloc_debug_processing+0xf9/0x144
      [    2.284005]  [<7904e2be>] ? mark_held_locks+0x3a/0x52
      [    2.284005]  [<7908dbc4>] ? kmem_cache_alloc+0xa8/0x13f
      [    2.284005]  [<7904e475>] ? trace_hardirqs_on_caller+0xa2/0xc3
      [    2.284005]  [<7904fdf6>] lock_acquire+0xb3/0xd0
      [    2.284005]  [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<79591514>] _spin_lock_bh+0x2d/0x5d
      [    2.284005]  [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<79489678>] alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<793a38f2>] ? loopback_setup+0x0/0x74
      [    2.284005]  [<798eecd0>] loopback_net_init+0x20/0x5d
      [    2.284005]  [<79483efb>] register_pernet_device+0x23/0x4b
      [    2.284005]  [<798f5c9f>] net_dev_init+0x115/0x164
      [    2.284005]  [<7900104f>] do_one_initcall+0x4a/0x11a
      [    2.284005]  [<798f5b8a>] ? net_dev_init+0x0/0x164
      [    2.284005]  [<79066f6d>] ? register_irq_proc+0x8c/0xa8
      [    2.284005]  [<798cc29a>] do_basic_setup+0x42/0x52
      [    2.284005]  [<798cc30a>] kernel_init+0x60/0xa1
      [    2.284005]  [<798cc2aa>] ? kernel_init+0x0/0xa1
      [    2.284005]  [<79003e03>] kernel_thread_helper+0x7/0x10
      [    2.284078] device: 'lo': device_add
      [    2.288248] initcall net_dev_init+0x0/0x164 returned 0 after 11718 usecs
      [    2.292010] calling  neigh_init+0x0/0x66 @ 1
      [    2.296010] initcall neigh_init+0x0/0x66 returned 0 after 0 usecs
      
      it's using an zero-initialized spinlock. This is a side-effect of:
      
              dev_unicast_init(dev);
      
      in alloc_netdev_mq() making use of dev->addr_list_lock.
      
      The device has just been allocated freshly, it's not accessible
      anywhere yet so no locking is needed at all - in fact it's wrong
      to lock it here (the lock isnt initialized yet).
      
      This bug was introduced via:
      
      | commit a6ac65db
      | Date:   Thu Jul 30 01:06:12 2009 +0000
      |
      |     net: restore the original spinlock to protect unicast list
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJiri Pirko <jpirko@redhat.com>
      Tested-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bf52b98
  7. 03 8月, 2009 3 次提交
  8. 28 7月, 2009 2 次提交
    • J
      cfg80211: make aware of net namespaces · 463d0183
      Johannes Berg 提交于
      In order to make cfg80211/nl80211 aware of network namespaces,
      we have to do the following things:
      
       * del_virtual_intf method takes an interface index rather
         than a netdev pointer - simply change this
      
       * nl80211 uses init_net a lot, it changes to use the sender's
         network namespace
      
       * scan requests use the interface index, hold a netdev pointer
         and reference instead
      
       * we want a wiphy and its associated virtual interfaces to be
         in one netns together, so
          - we need to be able to change ns for a given interface, so
            export dev_change_net_namespace()
          - for each virtual interface set the NETIF_F_NETNS_LOCAL
            flag, and clear that flag only when the wiphy changes ns,
            to disallow breaking this invariant
      
       * when a network namespace goes away, we need to reparent the
         wiphy to init_net
      
       * cfg80211 users that support creating virtual interfaces must
         create them in the wiphy's namespace, currently this affects
         only mac80211
      
      The end result is that you can now switch an entire wiphy into
      a different network namespace with the new command
      	iw phy#<idx> set netns <pid>
      and all virtual interfaces will follow (or the operation fails).
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      463d0183
    • E
  9. 27 7月, 2009 1 次提交
  10. 25 7月, 2009 2 次提交
  11. 20 7月, 2009 1 次提交
  12. 17 7月, 2009 1 次提交
    • E
      net: sock_copy() fixes · 4dc6dc71
      Eric Dumazet 提交于
      Commit e912b114
      (net: sk_prot_alloc() should not blindly overwrite memory)
      took care of not zeroing whole new socket at allocation time.
      
      sock_copy() is another spot where we should be very careful.
      We should not set refcnt to a non null value, until
      we are sure other fields are correctly setup, or
      a lockless reader could catch this socket by mistake,
      while not fully (re)initialized.
      
      This patch puts sk_node & sk_refcnt to the very beginning
      of struct sock to ease sock_copy() & sk_prot_alloc() job.
      
      We add appropriate smp_wmb() before sk_refcnt initializations
      to match our RCU requirements (changes to sock keys should
      be committed to memory before sk_refcnt setting)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dc6dc71
  13. 14 7月, 2009 1 次提交
  14. 13 7月, 2009 2 次提交
  15. 12 7月, 2009 1 次提交
  16. 10 7月, 2009 1 次提交
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  17. 09 7月, 2009 2 次提交
  18. 06 7月, 2009 1 次提交