1. 27 7月, 2009 1 次提交
  2. 20 7月, 2009 1 次提交
  3. 17 7月, 2009 1 次提交
    • E
      net: sock_copy() fixes · 4dc6dc71
      Eric Dumazet 提交于
      Commit e912b114
      (net: sk_prot_alloc() should not blindly overwrite memory)
      took care of not zeroing whole new socket at allocation time.
      
      sock_copy() is another spot where we should be very careful.
      We should not set refcnt to a non null value, until
      we are sure other fields are correctly setup, or
      a lockless reader could catch this socket by mistake,
      while not fully (re)initialized.
      
      This patch puts sk_node & sk_refcnt to the very beginning
      of struct sock to ease sock_copy() & sk_prot_alloc() job.
      
      We add appropriate smp_wmb() before sk_refcnt initializations
      to match our RCU requirements (changes to sock keys should
      be committed to memory before sk_refcnt setting)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dc6dc71
  4. 10 7月, 2009 2 次提交
    • J
      memory barrier: adding smp_mb__after_lock · ad462769
      Jiri Olsa 提交于
      Adding smp_mb__after_lock define to be used as a smp_mb call after
      a lock.
      
      Making it nop for x86, since {read|write|spin}_lock() on x86 are
      full memory barriers.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad462769
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  5. 29 6月, 2009 1 次提交
  6. 25 6月, 2009 1 次提交
  7. 24 6月, 2009 1 次提交
    • H
      net: Move rx skb_orphan call to where needed · d55d87fd
      Herbert Xu 提交于
      In order to get the tun driver to account packets, we need to be
      able to receive packets with destructors set.  To be on the safe
      side, I added an skb_orphan call for all protocols by default since
      some of them (IP in particular) cannot handle receiving packets
      destructors properly.
      
      Now it seems that at least one protocol (CAN) expects to be able
      to pass skb->sk through the rx path without getting clobbered.
      
      So this patch attempts to fix this properly by moving the skb_orphan
      call to where it's actually needed.  In particular, I've added it
      to skb_set_owner_[rw] which is what most users of skb->destructor
      call.
      
      This is actually an improvement for tun too since it means that
      we only give back the amount charged to the socket when the skb
      is passed to another socket that will also be charged accordingly.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NOliver Hartkopp <olver@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d55d87fd
  8. 23 6月, 2009 1 次提交
  9. 19 6月, 2009 1 次提交
  10. 17 6月, 2009 1 次提交
  11. 16 6月, 2009 1 次提交
  12. 15 6月, 2009 4 次提交
    • V
      net: annotate struct sock bitfield · a98b65a3
      Vegard Nossum 提交于
      2009/2/24 Ingo Molnar <mingo@elte.hu>:
      > ok, this is the last warning i have from today's overnight -tip
      > testruns - a 32-bit system warning in sock_init_data():
      >
      > [    2.610389] NET: Registered protocol family 16
      > [    2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs
      > [    2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184)
      > [    2.624002] 010000000200000000000000604990c000000000000000000000000000000000
      > [    2.634076]  i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i
      > [    2.641038]          ^
      > [    2.643376]
      > [    2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885)
      > [    2.648003] EIP: 0060:[<c07141a1>] EFLAGS: 00010282 CPU: 0
      > [    2.652008] EIP is at sock_init_data+0xa1/0x190
      > [    2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0
      > [    2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec
      > [    2.664003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      > [    2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0
      > [    2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      > [    2.676003] DR6: ffff4ff0 DR7: 00000400
      > [    2.680002]  [<c07423e5>] __netlink_create+0x35/0xa0
      > [    2.684002]  [<c07443cc>] netlink_kernel_create+0x4c/0x140
      > [    2.688002]  [<c072755e>] rtnetlink_net_init+0x1e/0x40
      > [    2.696002]  [<c071b601>] register_pernet_operations+0x11/0x30
      > [    2.700002]  [<c071b72c>] register_pernet_subsys+0x1c/0x30
      > [    2.704002]  [<c0bf3c8c>] rtnetlink_init+0x4c/0x100
      > [    2.708002]  [<c0bf4669>] netlink_proto_init+0x159/0x170
      > [    2.712002]  [<c0101124>] do_one_initcall+0x24/0x150
      > [    2.716002]  [<c0bbf3c7>] do_initcalls+0x27/0x40
      > [    2.723201]  [<c0bbf3fc>] do_basic_setup+0x1c/0x20
      > [    2.728002]  [<c0bbfb8a>] kernel_init+0x5a/0xa0
      > [    2.732002]  [<c0103e47>] kernel_thread_helper+0x7/0x10
      > [    2.736002]  [<ffffffff>] 0xffffffff
      
      We fix this false positive by annotating the bitfield in struct
      sock.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      a98b65a3
    • V
      net: annotate inet_timewait_sock bitfields · 9e337b0f
      Vegard Nossum 提交于
      The use of bitfields here would lead to false positive warnings with
      kmemcheck. Silence them.
      
      (Additionally, one erroneous comment related to the bitfield was also
      fixed.)
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      9e337b0f
    • V
      net: annotate bitfields in struct inet_sock · 45e3ff82
      Vegard Nossum 提交于
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      45e3ff82
    • J
      pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US · ca44d6e6
      Jarek Poplawski 提交于
      Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS
      (like in PSCHED_TICKS_PER_SEC already) to avoid misleading.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca44d6e6
  13. 13 6月, 2009 3 次提交
    • P
      netfilter: conntrack: optional reliable conntrack event delivery · dd7669a9
      Pablo Neira Ayuso 提交于
      This patch improves ctnetlink event reliability if one broadcast
      listener has set the NETLINK_BROADCAST_ERROR socket option.
      
      The logic is the following: if an event delivery fails, we keep
      the undelivered events in the missed event cache. Once the next
      packet arrives, we add the new events (if any) to the missed
      events in the cache and we try a new delivery, and so on. Thus,
      if ctnetlink fails to deliver an event, we try to deliver them
      once we see a new packet. Therefore, we may lose state
      transitions but the userspace process gets in sync at some point.
      
      At worst case, if no events were delivered to userspace, we make
      sure that destroy events are successfully delivered. Basically,
      if ctnetlink fails to deliver the destroy event, we remove the
      conntrack entry from the hashes and we insert them in the dying
      list, which contains inactive entries. Then, the conntrack timer
      is added with an extra grace timeout of random32() % 15 seconds
      to trigger the event again (this grace timeout is tunable via
      /proc). The use of a limited random timeout value allows
      distributing the "destroy" resends, thus, avoiding accumulating
      lots "destroy" events at the same time. Event delivery may
      re-order but we can identify them by means of the tuple plus
      the conntrack ID.
      
      The maximum number of conntrack entries (active or inactive) is
      still handled by nf_conntrack_max. Thus, we may start dropping
      packets at some point if we accumulate a lot of inactive conntrack
      entries that did not successfully report the destroy event to
      userspace.
      
      During my stress tests consisting of setting a very small buffer
      of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
      flag, and generating lots of very small connections, I noticed
      very few destroy entries on the fly waiting to be resend.
      
      A simple way to test this patch consist of creating a lot of
      entries, set a very small Netlink buffer in conntrackd (+ a patch
      which is not in the git tree to set the BROADCAST_ERROR flag)
      and invoke `conntrack -F'.
      
      For expectations, no changes are introduced in this patch.
      Currently, event delivery is only done for new expectations (no
      events from expectation expiration, removal and confirmation).
      In that case, they need a per-expectation event cache to implement
      the same idea that is exposed in this patch.
      
      This patch can be useful to provide reliable flow-accouting. We
      still have to add a new conntrack extension to store the creation
      and destroy time.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      dd7669a9
    • P
      netfilter: conntrack: move helper destruction to nf_ct_helper_destroy() · 9858a3ae
      Pablo Neira Ayuso 提交于
      This patch moves the helper destruction to a function that lives
      in nf_conntrack_helper.c. This new function is used in the patch
      to add ctnetlink reliable event delivery.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      9858a3ae
    • P
      netfilter: conntrack: move event caching to conntrack extension infrastructure · a0891aa6
      Pablo Neira Ayuso 提交于
      This patch reworks the per-cpu event caching to use the conntrack
      extension infrastructure.
      
      The main drawback is that we consume more memory per conntrack
      if event delivery is enabled. This patch is required by the
      reliable event delivery that follows to this patch.
      
      BTW, this patch allows you to enable/disable event delivery via
      /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
      you can still disable event caching as compilation option.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      a0891aa6
  14. 11 6月, 2009 2 次提交
    • E
      net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e
      Eric Dumazet 提交于
      One of the problem with sock memory accounting is it uses
      a pair of sock_hold()/sock_put() for each transmitted packet.
      
      This slows down bidirectional flows because the receive path
      also needs to take a refcount on socket and might use a different
      cpu than transmit path or transmit completion path. So these
      two atomic operations also trigger cache line bounces.
      
      We can see this in tx or tx/rx workloads (media gateways for example),
      where sock_wfree() can be in top five functions in profiles.
      
      We use this sock_hold()/sock_put() so that sock freeing
      is delayed until all tx packets are completed.
      
      As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
      by one unit at init time, until sk_free() is called.
      Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
      to decrement initial offset and atomicaly check if any packets
      are in flight.
      
      skb_set_owner_w() doesnt call sock_hold() anymore
      
      sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
      reached 0 to perform the final freeing.
      
      Drawback is that a skb->truesize error could lead to unfreeable sockets, or
      even worse, prematurely calling __sk_free() on a live socket.
      
      Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
      on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
      contention point. 5 % speedup on a UDP transmit workload (depends
      on number of flows), lowering TX completion cpu usage.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b85a34e
    • J
      mac80211: do not pass PS frames out of mac80211 again · 8f77f384
      Johannes Berg 提交于
      In order to handle powersave frames properly we had needed
      to pass these out to the device queues again, and introduce
      the skb->requeue bit. This, however, also has unnecessary
      overhead by needing to 'clean up' already tried frames, and
      this clean-up code is also buggy when software encryption
      is used.
      
      Instead of sending the frames via the master netdev queue
      again, simply put them into the pending queue. This also
      fixes a problem where frames for that particular station
      could be reordered when some were still on the software
      queues and older ones are re-injected into the software
      queue after them.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      8f77f384
  15. 10 6月, 2009 1 次提交
  16. 09 6月, 2009 5 次提交
  17. 08 6月, 2009 6 次提交
  18. 04 6月, 2009 4 次提交
    • J
      cfg80211: add rfkill support · 1f87f7d3
      Johannes Berg 提交于
      To be easier on drivers and users, have cfg80211 register an
      rfkill structure that drivers can access. When soft-killed,
      simply take down all interfaces; when hard-killed the driver
      needs to notify us and we will take down the interfaces
      after the fact. While rfkilled, interfaces cannot be set UP.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      1f87f7d3
    • J
      cfg80211: move txpower wext from mac80211 · 7643a2c3
      Johannes Berg 提交于
      This patch introduces new cfg80211 API to set the TX power
      via cfg80211, puts the wext code into cfg80211 and updates
      mac80211 to use all that. The -ENETDOWN bits are a hack but
      will go away soon.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      7643a2c3
    • J
      rfkill: rewrite · 19d337df
      Johannes Berg 提交于
      This patch completely rewrites the rfkill core to address
      the following deficiencies:
      
       * all rfkill drivers need to implement polling where necessary
         rather than having one central implementation
      
       * updating the rfkill state cannot be done from arbitrary
         contexts, forcing drivers to use schedule_work and requiring
         lots of code
      
       * rfkill drivers need to keep track of soft/hard blocked
         internally -- the core should do this
      
       * the rfkill API has many unexpected quirks, for example being
         asymmetric wrt. alloc/free and register/unregister
      
       * rfkill can call back into a driver from within a function the
         driver called -- this is prone to deadlocks and generally
         should be avoided
      
       * rfkill-input pointlessly is a separate module
      
       * drivers need to #ifdef rfkill functions (unless they want to
         depend on or select RFKILL) -- rfkill should provide inlines
         that do nothing if it isn't compiled in
      
       * the rfkill structure is not opaque -- drivers need to initialise
         it correctly (lots of sanity checking code required) -- instead
         force drivers to pass the right variables to rfkill_alloc()
      
       * the documentation is hard to read because it always assumes the
         reader is completely clueless and contains way TOO MANY CAPS
      
       * the rfkill code needlessly uses a lot of locks and atomic
         operations in locked sections
      
       * fix LED trigger to actually change the LED when the radio state
         changes -- this wasn't done before
      Tested-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk>
      Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> [thinkpad]
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      19d337df
    • J
      mac80211: deprecate conf.beacon_int properly · e535c756
      Johannes Berg 提交于
      Ivo has updated the driver to no longer use the change flag,
      so we can remove that, but rt2x00 and ath5k still use the
      actual value so let's mark it as deprecated too.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      e535c756
  19. 03 6月, 2009 3 次提交
    • V
      sctp: support non-blocking version of the new sctp_connectx() API · c6ba68a2
      Vlad Yasevich 提交于
      Prior implementation of the new sctp_connectx() call that returns
      an association ID did not work correctly on non-blocking socket.
      This is because we could not return both a EINPROGRESS error and
      an association id.  This is a new implementation that supports this.
      
      Originally from Ivan Skytte Jørgensen <isj-sctp@i1.dk
      
      Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      c6ba68a2
    • W
      sctp: fix to choose alternate destination when retransmit ASCONF chunk · 9919b455
      Wei Yongjun 提交于
      RFC 5061 Section 5.1 ASCONF Chunk Procedures said:
      
      B4)  Re-transmit the ASCONF Chunk last sent and if possible choose an
           alternate destination address (please refer to [RFC4960],
           Section 6.4.1).  An endpoint MUST NOT add new parameters to this
           chunk; it MUST be the same (including its Sequence Number) as
           the last ASCONF sent.  An endpoint MAY, however, bundle an
           additional ASCONF with new ASCONF parameters with the next
           Sequence Number.  For details, see Section 5.5.
      
      This patch fix to choose an alternate destination address when
      re-transmit the ASCONF chunk, with some dup codes cleanup.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9919b455
    • E
      net: skb->dst accessors · adf30907
      Eric Dumazet 提交于
      Define three accessors to get/set dst attached to a skb
      
      struct dst_entry *skb_dst(const struct sk_buff *skb)
      
      void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
      
      void skb_dst_drop(struct sk_buff *skb)
      This one should replace occurrences of :
      dst_release(skb->dst)
      skb->dst = NULL;
      
      Delete skb->dst field
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adf30907