1. 18 8月, 2009 1 次提交
  2. 05 8月, 2009 1 次提交
    • I
      net: Fix spinlock use in alloc_netdev_mq() · 0bf52b98
      Ingo Molnar 提交于
      -tip testing found this lockdep warning:
      
      [    2.272010] calling  net_dev_init+0x0/0x164 @ 1
      [    2.276033] device class 'net': registering
      [    2.280191] INFO: trying to register non-static key.
      [    2.284005] the code is fine but needs lockdep annotation.
      [    2.284005] turning off the locking correctness validator.
      [    2.284005] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip #1145
      [    2.284005] Call Trace:
      [    2.284005]  [<7958eb4e>] ? printk+0xf/0x11
      [    2.284005]  [<7904f83c>] __lock_acquire+0x11b/0x622
      [    2.284005]  [<7908c9b7>] ? alloc_debug_processing+0xf9/0x144
      [    2.284005]  [<7904e2be>] ? mark_held_locks+0x3a/0x52
      [    2.284005]  [<7908dbc4>] ? kmem_cache_alloc+0xa8/0x13f
      [    2.284005]  [<7904e475>] ? trace_hardirqs_on_caller+0xa2/0xc3
      [    2.284005]  [<7904fdf6>] lock_acquire+0xb3/0xd0
      [    2.284005]  [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<79591514>] _spin_lock_bh+0x2d/0x5d
      [    2.284005]  [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<79489678>] alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<793a38f2>] ? loopback_setup+0x0/0x74
      [    2.284005]  [<798eecd0>] loopback_net_init+0x20/0x5d
      [    2.284005]  [<79483efb>] register_pernet_device+0x23/0x4b
      [    2.284005]  [<798f5c9f>] net_dev_init+0x115/0x164
      [    2.284005]  [<7900104f>] do_one_initcall+0x4a/0x11a
      [    2.284005]  [<798f5b8a>] ? net_dev_init+0x0/0x164
      [    2.284005]  [<79066f6d>] ? register_irq_proc+0x8c/0xa8
      [    2.284005]  [<798cc29a>] do_basic_setup+0x42/0x52
      [    2.284005]  [<798cc30a>] kernel_init+0x60/0xa1
      [    2.284005]  [<798cc2aa>] ? kernel_init+0x0/0xa1
      [    2.284005]  [<79003e03>] kernel_thread_helper+0x7/0x10
      [    2.284078] device: 'lo': device_add
      [    2.288248] initcall net_dev_init+0x0/0x164 returned 0 after 11718 usecs
      [    2.292010] calling  neigh_init+0x0/0x66 @ 1
      [    2.296010] initcall neigh_init+0x0/0x66 returned 0 after 0 usecs
      
      it's using an zero-initialized spinlock. This is a side-effect of:
      
              dev_unicast_init(dev);
      
      in alloc_netdev_mq() making use of dev->addr_list_lock.
      
      The device has just been allocated freshly, it's not accessible
      anywhere yet so no locking is needed at all - in fact it's wrong
      to lock it here (the lock isnt initialized yet).
      
      This bug was introduced via:
      
      | commit a6ac65db
      | Date:   Thu Jul 30 01:06:12 2009 +0000
      |
      |     net: restore the original spinlock to protect unicast list
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJiri Pirko <jpirko@redhat.com>
      Tested-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bf52b98
  3. 03 8月, 2009 2 次提交
  4. 20 7月, 2009 1 次提交
  5. 17 7月, 2009 1 次提交
    • E
      net: sock_copy() fixes · 4dc6dc71
      Eric Dumazet 提交于
      Commit e912b114
      (net: sk_prot_alloc() should not blindly overwrite memory)
      took care of not zeroing whole new socket at allocation time.
      
      sock_copy() is another spot where we should be very careful.
      We should not set refcnt to a non null value, until
      we are sure other fields are correctly setup, or
      a lockless reader could catch this socket by mistake,
      while not fully (re)initialized.
      
      This patch puts sk_node & sk_refcnt to the very beginning
      of struct sock to ease sock_copy() & sk_prot_alloc() job.
      
      We add appropriate smp_wmb() before sk_refcnt initializations
      to match our RCU requirements (changes to sock keys should
      be committed to memory before sk_refcnt setting)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dc6dc71
  6. 12 7月, 2009 1 次提交
  7. 10 7月, 2009 1 次提交
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  8. 09 7月, 2009 1 次提交
  9. 27 6月, 2009 1 次提交
  10. 24 6月, 2009 1 次提交
    • H
      net: Move rx skb_orphan call to where needed · d55d87fd
      Herbert Xu 提交于
      In order to get the tun driver to account packets, we need to be
      able to receive packets with destructors set.  To be on the safe
      side, I added an skb_orphan call for all protocols by default since
      some of them (IP in particular) cannot handle receiving packets
      destructors properly.
      
      Now it seems that at least one protocol (CAN) expects to be able
      to pass skb->sk through the rx path without getting clobbered.
      
      So this patch attempts to fix this properly by moving the skb_orphan
      call to where it's actually needed.  In particular, I've added it
      to skb_set_owner_[rw] which is what most users of skb->destructor
      call.
      
      This is actually an improvement for tun too since it means that
      we only give back the amount charged to the socket when the skb
      is passed to another socket that will also be charged accordingly.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NOliver Hartkopp <olver@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d55d87fd
  11. 18 6月, 2009 3 次提交
  12. 15 6月, 2009 2 次提交
    • V
      net: annotate struct sock bitfield · a98b65a3
      Vegard Nossum 提交于
      2009/2/24 Ingo Molnar <mingo@elte.hu>:
      > ok, this is the last warning i have from today's overnight -tip
      > testruns - a 32-bit system warning in sock_init_data():
      >
      > [    2.610389] NET: Registered protocol family 16
      > [    2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs
      > [    2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184)
      > [    2.624002] 010000000200000000000000604990c000000000000000000000000000000000
      > [    2.634076]  i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i
      > [    2.641038]          ^
      > [    2.643376]
      > [    2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885)
      > [    2.648003] EIP: 0060:[<c07141a1>] EFLAGS: 00010282 CPU: 0
      > [    2.652008] EIP is at sock_init_data+0xa1/0x190
      > [    2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0
      > [    2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec
      > [    2.664003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      > [    2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0
      > [    2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      > [    2.676003] DR6: ffff4ff0 DR7: 00000400
      > [    2.680002]  [<c07423e5>] __netlink_create+0x35/0xa0
      > [    2.684002]  [<c07443cc>] netlink_kernel_create+0x4c/0x140
      > [    2.688002]  [<c072755e>] rtnetlink_net_init+0x1e/0x40
      > [    2.696002]  [<c071b601>] register_pernet_operations+0x11/0x30
      > [    2.700002]  [<c071b72c>] register_pernet_subsys+0x1c/0x30
      > [    2.704002]  [<c0bf3c8c>] rtnetlink_init+0x4c/0x100
      > [    2.708002]  [<c0bf4669>] netlink_proto_init+0x159/0x170
      > [    2.712002]  [<c0101124>] do_one_initcall+0x24/0x150
      > [    2.716002]  [<c0bbf3c7>] do_initcalls+0x27/0x40
      > [    2.723201]  [<c0bbf3fc>] do_basic_setup+0x1c/0x20
      > [    2.728002]  [<c0bbfb8a>] kernel_init+0x5a/0xa0
      > [    2.732002]  [<c0103e47>] kernel_thread_helper+0x7/0x10
      > [    2.736002]  [<ffffffff>] 0xffffffff
      
      We fix this false positive by annotating the bitfield in struct
      sock.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      a98b65a3
    • V
      net: use kmemcheck bitfields API for skbuff · fe55f6d5
      Vegard Nossum 提交于
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      fe55f6d5
  13. 12 6月, 2009 2 次提交
  14. 11 6月, 2009 3 次提交
    • T
      neigh: fix state transition INCOMPLETE->FAILED via Netlink request · 5ef12d98
      Timo Teras 提交于
      The current code errors out the INCOMPLETE neigh entry skb queue only from
      the timer if maximum probes have been attempted and there has been no reply.
      This also causes the transtion to FAILED state.
      
      However, the neigh entry can be also updated via Netlink to inform that the
      address is unavailable.  Currently, neigh_update() just stops the timers and
      leaves the pending skb's unreleased. This results that the clean up code in
      the timer callback is never called, preventing also proper garbage collection.
      
      This fixes neigh_update() to process the pending skb queue immediately if
      INCOMPLETE -> FAILED state transtion occurs due to a Netlink request.
      Signed-off-by: NTimo Teras <timo.teras@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ef12d98
    • E
      net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e
      Eric Dumazet 提交于
      One of the problem with sock memory accounting is it uses
      a pair of sock_hold()/sock_put() for each transmitted packet.
      
      This slows down bidirectional flows because the receive path
      also needs to take a refcount on socket and might use a different
      cpu than transmit path or transmit completion path. So these
      two atomic operations also trigger cache line bounces.
      
      We can see this in tx or tx/rx workloads (media gateways for example),
      where sock_wfree() can be in top five functions in profiles.
      
      We use this sock_hold()/sock_put() so that sock freeing
      is delayed until all tx packets are completed.
      
      As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
      by one unit at init time, until sk_free() is called.
      Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
      to decrement initial offset and atomicaly check if any packets
      are in flight.
      
      skb_set_owner_w() doesnt call sock_hold() anymore
      
      sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
      reached 0 to perform the final freeing.
      
      Drawback is that a skb->truesize error could lead to unfreeable sockets, or
      even worse, prematurely calling __sk_free() on a live socket.
      
      Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
      on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
      contention point. 5 % speedup on a UDP transmit workload (depends
      on number of flows), lowering TX completion cpu usage.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b85a34e
    • J
      mac80211: do not pass PS frames out of mac80211 again · 8f77f384
      Johannes Berg 提交于
      In order to handle powersave frames properly we had needed
      to pass these out to the device queues again, and introduce
      the skb->requeue bit. This, however, also has unnecessary
      overhead by needing to 'clean up' already tried frames, and
      this clean-up code is also buggy when software encryption
      is used.
      
      Instead of sending the frames via the master netdev queue
      again, simply put them into the pending queue. This also
      fixes a problem where frames for that particular station
      could be reordered when some were still on the software
      queues and older ones are re-injected into the software
      queue after them.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      8f77f384
  15. 09 6月, 2009 6 次提交
  16. 08 6月, 2009 6 次提交
  17. 04 6月, 2009 1 次提交
  18. 03 6月, 2009 1 次提交
  19. 30 5月, 2009 1 次提交
    • J
      net: convert unicast addr list · ccffad25
      Jiri Pirko 提交于
      This patch converts unicast address list to standard list_head using
      previously introduced struct netdev_hw_addr. It also relaxes the
      locking. Original spinlock (still used for multicast addresses) is not
      needed and is no longer used for a protection of this list. All
      reading and writing takes place under rtnl (with no changes).
      
      I also removed a possibility to specify the length of the address
      while adding or deleting unicast address. It's always dev->addr_len.
      
      The convertion touched especially e1000 and ixgbe codes when the
      change is not so trivial.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      
       drivers/net/bnx2.c               |   13 +--
       drivers/net/e1000/e1000_main.c   |   24 +++--
       drivers/net/ixgbe/ixgbe_common.c |   14 ++--
       drivers/net/ixgbe/ixgbe_common.h |    4 +-
       drivers/net/ixgbe/ixgbe_main.c   |    6 +-
       drivers/net/ixgbe/ixgbe_type.h   |    4 +-
       drivers/net/macvlan.c            |   11 +-
       drivers/net/mv643xx_eth.c        |   11 +-
       drivers/net/niu.c                |    7 +-
       drivers/net/virtio_net.c         |    7 +-
       drivers/s390/net/qeth_l2_main.c  |    6 +-
       drivers/scsi/fcoe/fcoe.c         |   16 ++--
       include/linux/netdevice.h        |   18 ++--
       net/8021q/vlan.c                 |    4 +-
       net/8021q/vlan_dev.c             |   10 +-
       net/core/dev.c                   |  195 +++++++++++++++++++++++++++-----------
       net/dsa/slave.c                  |   10 +-
       net/packet/af_packet.c           |    4 +-
       18 files changed, 227 insertions(+), 137 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccffad25
  20. 28 5月, 2009 2 次提交
  21. 27 5月, 2009 2 次提交