1. 19 6月, 2009 1 次提交
  2. 17 6月, 2009 1 次提交
  3. 16 6月, 2009 1 次提交
  4. 15 6月, 2009 1 次提交
  5. 13 6月, 2009 3 次提交
    • P
      netfilter: conntrack: optional reliable conntrack event delivery · dd7669a9
      Pablo Neira Ayuso 提交于
      This patch improves ctnetlink event reliability if one broadcast
      listener has set the NETLINK_BROADCAST_ERROR socket option.
      
      The logic is the following: if an event delivery fails, we keep
      the undelivered events in the missed event cache. Once the next
      packet arrives, we add the new events (if any) to the missed
      events in the cache and we try a new delivery, and so on. Thus,
      if ctnetlink fails to deliver an event, we try to deliver them
      once we see a new packet. Therefore, we may lose state
      transitions but the userspace process gets in sync at some point.
      
      At worst case, if no events were delivered to userspace, we make
      sure that destroy events are successfully delivered. Basically,
      if ctnetlink fails to deliver the destroy event, we remove the
      conntrack entry from the hashes and we insert them in the dying
      list, which contains inactive entries. Then, the conntrack timer
      is added with an extra grace timeout of random32() % 15 seconds
      to trigger the event again (this grace timeout is tunable via
      /proc). The use of a limited random timeout value allows
      distributing the "destroy" resends, thus, avoiding accumulating
      lots "destroy" events at the same time. Event delivery may
      re-order but we can identify them by means of the tuple plus
      the conntrack ID.
      
      The maximum number of conntrack entries (active or inactive) is
      still handled by nf_conntrack_max. Thus, we may start dropping
      packets at some point if we accumulate a lot of inactive conntrack
      entries that did not successfully report the destroy event to
      userspace.
      
      During my stress tests consisting of setting a very small buffer
      of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
      flag, and generating lots of very small connections, I noticed
      very few destroy entries on the fly waiting to be resend.
      
      A simple way to test this patch consist of creating a lot of
      entries, set a very small Netlink buffer in conntrackd (+ a patch
      which is not in the git tree to set the BROADCAST_ERROR flag)
      and invoke `conntrack -F'.
      
      For expectations, no changes are introduced in this patch.
      Currently, event delivery is only done for new expectations (no
      events from expectation expiration, removal and confirmation).
      In that case, they need a per-expectation event cache to implement
      the same idea that is exposed in this patch.
      
      This patch can be useful to provide reliable flow-accouting. We
      still have to add a new conntrack extension to store the creation
      and destroy time.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      dd7669a9
    • P
      netfilter: conntrack: move helper destruction to nf_ct_helper_destroy() · 9858a3ae
      Pablo Neira Ayuso 提交于
      This patch moves the helper destruction to a function that lives
      in nf_conntrack_helper.c. This new function is used in the patch
      to add ctnetlink reliable event delivery.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      9858a3ae
    • P
      netfilter: conntrack: move event caching to conntrack extension infrastructure · a0891aa6
      Pablo Neira Ayuso 提交于
      This patch reworks the per-cpu event caching to use the conntrack
      extension infrastructure.
      
      The main drawback is that we consume more memory per conntrack
      if event delivery is enabled. This patch is required by the
      reliable event delivery that follows to this patch.
      
      BTW, this patch allows you to enable/disable event delivery via
      /proc/sys/net/netfilter/nf_conntrack_events in runtime, although
      you can still disable event caching as compilation option.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      a0891aa6
  6. 11 6月, 2009 2 次提交
    • E
      net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e
      Eric Dumazet 提交于
      One of the problem with sock memory accounting is it uses
      a pair of sock_hold()/sock_put() for each transmitted packet.
      
      This slows down bidirectional flows because the receive path
      also needs to take a refcount on socket and might use a different
      cpu than transmit path or transmit completion path. So these
      two atomic operations also trigger cache line bounces.
      
      We can see this in tx or tx/rx workloads (media gateways for example),
      where sock_wfree() can be in top five functions in profiles.
      
      We use this sock_hold()/sock_put() so that sock freeing
      is delayed until all tx packets are completed.
      
      As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
      by one unit at init time, until sk_free() is called.
      Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
      to decrement initial offset and atomicaly check if any packets
      are in flight.
      
      skb_set_owner_w() doesnt call sock_hold() anymore
      
      sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
      reached 0 to perform the final freeing.
      
      Drawback is that a skb->truesize error could lead to unfreeable sockets, or
      even worse, prematurely calling __sk_free() on a live socket.
      
      Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
      on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
      contention point. 5 % speedup on a UDP transmit workload (depends
      on number of flows), lowering TX completion cpu usage.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b85a34e
    • J
      mac80211: do not pass PS frames out of mac80211 again · 8f77f384
      Johannes Berg 提交于
      In order to handle powersave frames properly we had needed
      to pass these out to the device queues again, and introduce
      the skb->requeue bit. This, however, also has unnecessary
      overhead by needing to 'clean up' already tried frames, and
      this clean-up code is also buggy when software encryption
      is used.
      
      Instead of sending the frames via the master netdev queue
      again, simply put them into the pending queue. This also
      fixes a problem where frames for that particular station
      could be reordered when some were still on the software
      queues and older ones are re-injected into the software
      queue after them.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      8f77f384
  7. 10 6月, 2009 1 次提交
  8. 09 6月, 2009 5 次提交
  9. 08 6月, 2009 6 次提交
  10. 04 6月, 2009 4 次提交
    • J
      cfg80211: add rfkill support · 1f87f7d3
      Johannes Berg 提交于
      To be easier on drivers and users, have cfg80211 register an
      rfkill structure that drivers can access. When soft-killed,
      simply take down all interfaces; when hard-killed the driver
      needs to notify us and we will take down the interfaces
      after the fact. While rfkilled, interfaces cannot be set UP.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      1f87f7d3
    • J
      cfg80211: move txpower wext from mac80211 · 7643a2c3
      Johannes Berg 提交于
      This patch introduces new cfg80211 API to set the TX power
      via cfg80211, puts the wext code into cfg80211 and updates
      mac80211 to use all that. The -ENETDOWN bits are a hack but
      will go away soon.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      7643a2c3
    • J
      rfkill: rewrite · 19d337df
      Johannes Berg 提交于
      This patch completely rewrites the rfkill core to address
      the following deficiencies:
      
       * all rfkill drivers need to implement polling where necessary
         rather than having one central implementation
      
       * updating the rfkill state cannot be done from arbitrary
         contexts, forcing drivers to use schedule_work and requiring
         lots of code
      
       * rfkill drivers need to keep track of soft/hard blocked
         internally -- the core should do this
      
       * the rfkill API has many unexpected quirks, for example being
         asymmetric wrt. alloc/free and register/unregister
      
       * rfkill can call back into a driver from within a function the
         driver called -- this is prone to deadlocks and generally
         should be avoided
      
       * rfkill-input pointlessly is a separate module
      
       * drivers need to #ifdef rfkill functions (unless they want to
         depend on or select RFKILL) -- rfkill should provide inlines
         that do nothing if it isn't compiled in
      
       * the rfkill structure is not opaque -- drivers need to initialise
         it correctly (lots of sanity checking code required) -- instead
         force drivers to pass the right variables to rfkill_alloc()
      
       * the documentation is hard to read because it always assumes the
         reader is completely clueless and contains way TOO MANY CAPS
      
       * the rfkill code needlessly uses a lot of locks and atomic
         operations in locked sections
      
       * fix LED trigger to actually change the LED when the radio state
         changes -- this wasn't done before
      Tested-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk>
      Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> [thinkpad]
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      19d337df
    • J
      mac80211: deprecate conf.beacon_int properly · e535c756
      Johannes Berg 提交于
      Ivo has updated the driver to no longer use the change flag,
      so we can remove that, but rt2x00 and ath5k still use the
      actual value so let's mark it as deprecated too.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      e535c756
  11. 03 6月, 2009 9 次提交
  12. 02 6月, 2009 1 次提交
    • N
      ipv4: New multicast-all socket option · f771bef9
      Nivedita Singhvi 提交于
      After some discussion offline with Christoph Lameter and David Stevens
      regarding multicast behaviour in Linux, I'm submitting a slightly
      modified patch from the one Christoph submitted earlier.
      
      This patch provides a new socket option IP_MULTICAST_ALL.
      
      In this case, default behaviour is _unchanged_ from the current
      Linux standard. The socket option is set by default to provide
      original behaviour. Sockets wishing to receive data only from
      multicast groups they join explicitly will need to clear this
      socket option.
      Signed-off-by: NNivedita Singhvi <niv@us.ibm.com>
      Signed-off-by: Christoph Lameter<cl@linux.com>
      Acked-by: NDavid Stevens <dlstevens@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f771bef9
  13. 27 5月, 2009 1 次提交
  14. 23 5月, 2009 1 次提交
  15. 22 5月, 2009 1 次提交
  16. 21 5月, 2009 2 次提交