1. 31 10月, 2008 14 次提交
  2. 30 10月, 2008 4 次提交
  3. 29 10月, 2008 8 次提交
    • E
      udp: RCU handling for Unicast packets. · 271b72c7
      Eric Dumazet 提交于
      Goals are :
      
      1) Optimizing handling of incoming Unicast UDP frames, so that no memory
       writes should happen in the fast path.
      
       Note: Multicasts and broadcasts still will need to take a lock,
       because doing a full lockless lookup in this case is difficult.
      
      2) No expensive operations in the socket bind/unhash phases :
        - No expensive synchronize_rcu() calls.
      
        - No added rcu_head in socket structure, increasing memory needs,
        but more important, forcing us to use call_rcu() calls,
        that have the bad property of making sockets structure cold.
        (rcu grace period between socket freeing and its potential reuse
         make this socket being cold in CPU cache).
        David did a previous patch using call_rcu() and noticed a 20%
        impact on TCP connection rates.
        Quoting Cristopher Lameter :
         "Right. That results in cacheline cooldown. You'd want to recycle
          the object as they are cache hot on a per cpu basis. That is screwed
          up by the delayed regular rcu processing. We have seen multiple
          regressions due to cacheline cooldown.
          The only choice in cacheline hot sensitive areas is to deal with the
          complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
      
        - Because udp sockets are allocated from dedicated kmem_cache,
        use of SLAB_DESTROY_BY_RCU can help here.
      
      Theory of operation :
      ---------------------
      
      As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
      special attention must be taken by readers and writers.
      
      Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
      reused, inserted in a different chain or in worst case in the same chain
      while readers could do lookups in the same time.
      
      In order to avoid loops, a reader must check each socket found in a chain
      really belongs to the chain the reader was traversing. If it finds a
      mismatch, lookup must start again at the begining. This *restart* loop
      is the reason we had to use rdlock for the multicast case, because
      we dont want to send same message several times to the same socket.
      
      We use RCU only for fast path.
      Thus, /proc/net/udp still takes spinlocks.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271b72c7
    • E
      udp: introduce struct udp_table and multiple spinlocks · 645ca708
      Eric Dumazet 提交于
      UDP sockets are hashed in a 128 slots hash table.
      
      This hash table is protected by *one* rwlock.
      
      This rwlock is readlocked each time an incoming UDP message is handled.
      
      This rwlock is writelocked each time a socket must be inserted in
      hash table (bind time), or deleted from this table (close time)
      
      This is not scalable on SMP machines :
      
      1) Even in read mode, lock() and unlock() are atomic operations and
       must dirty a contended cache line, shared by all cpus.
      
      2) A writer might be starved if many readers are 'in flight'. This can
       happen on a machine with some NIC receiving many UDP messages. User
       process can be delayed a long time at socket creation/dismantle time.
      
      This patch prepares RCU migration, by introducing 'struct udp_table
      and struct udp_hslot', and using one spinlock per chain, to reduce
      contention on central rwlock.
      
      Introducing one spinlock per chain reduces latencies, for port
      randomization on heavily loaded UDP servers. This also speedup
      bindings to specific ports.
      
      udp_lib_unhash() was uninlined, becoming to big.
      
      Some cleanups were done to ease review of following patch
      (RCUification of UDP Unicast lookups)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      645ca708
    • H
      net: remove NIP6(), NIP6_FMT, NIP6_SEQFMT and final users · b189db5d
      Harvey Harrison 提交于
      Open code NIP6_FMT in the one call inside sscanf and one user
      of NIP6() that could use %p6 in the netfilter code.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b189db5d
    • H
      0c6ce78a
    • A
      HID: fix hid_device_id for cross compiling · 8175fe2d
      Andreas Schwab 提交于
      struct hid_device_id contains hidden padding which is bad for cross
      compiling.  Make the padding explicit and consistent across
      architectures.
      Signed-off-by: NAndreas Schwab <schwab@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      8175fe2d
    • M
      xfrm: Notify changes in UDP encapsulation via netlink · 3a2dfbe8
      Martin Willi 提交于
      Add new_mapping() implementation to the netlink xfrm_mgr to notify
      address/port changes detected in UDP encapsulated ESP packets.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a2dfbe8
    • A
      net: reduce structures when XFRM=n · def8b4fa
      Alexey Dobriyan 提交于
      ifdef out
      * struct sk_buff::sp		(pointer)
      * struct dst_entry::xfrm	(pointer)
      * struct sock::sk_policy	(2 pointers)
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def8b4fa
    • P
  4. 28 10月, 2008 5 次提交
    • A
      KVM: Future-proof device assignment ABI · bb45e202
      Avi Kivity 提交于
      Reserve some space so we can add more data.
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      bb45e202
    • S
      KVM: Fix guest shared interrupt with in-kernel irqchip · 5550af4d
      Sheng Yang 提交于
      Every call of kvm_set_irq() should offer an irq_source_id, which is
      allocated by kvm_request_irq_source_id(). Based on irq_source_id, we
      identify the irq source and implement logical OR for shared level
      interrupts.
      
      The allocated irq_source_id can be freed by kvm_free_irq_source_id().
      
      Currently, we support at most sizeof(unsigned long) different irq sources.
      
      [Amit: - rebase to kvm.git HEAD
             - move definition of KVM_USERSPACE_IRQ_SOURCE_ID to common file
             - move kvm_request_irq_source_id to the update_irq ioctl]
      
      [Xiantao: - Add kvm/ia64 stuff and make it work for kvm/ia64 guests]
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAmit Shah <amit.shah@redhat.com>
      Signed-off-by: NXiantao Zhang <xiantao.zhang@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5550af4d
    • N
      net: implement emergency route cache rebulds when gc_elasticity is exceeded · 1080d709
      Neil Horman 提交于
      This is a patch to provide on demand route cache rebuilding.  Currently, our
      route cache is rebulid periodically regardless of need.  This introduced
      unneeded periodic latency.  This patch offers a better approach.  Using code
      provided by Eric Dumazet, we compute the standard deviation of the average hash
      bucket chain length while running rt_check_expire.  Should any given chain
      length grow to larger that average plus 4 standard deviations, we trigger an
      emergency hash table rebuild for that net namespace.  This allows for the common
      case in which chains are well behaved and do not grow unevenly to not incur any
      latency at all, while those systems (which may be being maliciously attacked),
      only rebuild when the attack is detected.  This patch take 2 other factors into
      account:
      1) chains with multiple entries that differ by attributes that do not affect the
      hash value are only counted once, so as not to unduly bias system to rebuilding
      if features like QOS are heavily used
      2) if rebuilding crosses a certain threshold (which is adjustable via the added
      sysctl in this patch), route caching is disabled entirely for that net
      namespace, since constant rebuilding is less efficient that no caching at all
      
      Tested successfully by me.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1080d709
    • R
      mac80211.h: fix kernel-doc excesses · ea2d8b59
      Randy Dunlap 提交于
      Fix mac80211.h kernel-doc: it had some extra parameters that were
      no longer valid and incorrect format for a return value in 2 places.
      
      Warning(lin2628-rc2//include/net/mac80211.h:1487): Excess function parameter or struct member 'control' description in 'ieee80211_beacon_get'
      Warning(lin2628-rc2//include/net/mac80211.h:1596): Excess function parameter or struct member 'control' description in 'ieee80211_get_buffered_bc'
      Warning(lin2628-rc2//include/net/mac80211.h:1632): Excess function parameter or struct member 'rc4key' description in 'ieee80211_get_tkip_key'
      Warning(lin2628-rc2//include/net/mac80211.h:1735): Excess function parameter or struct member 'return' description in 'ieee80211_start_tx_ba_session'
      Warning(lin2628-rc2//include/net/mac80211.h:1775): Excess function parameter or struct member 'return' description in 'ieee80211_stop_tx_ba_session'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      ea2d8b59
    • J
      scsi: make sure that scsi_init_shared_tag_map() doesn't overwrite existing map · 3070f69b
      Jens Axboe 提交于
      Right now callers have to check whether scsi_host->bqt is already
      set up, it's much cleaner to just have scsi_init_shared_tag_map()
      does this check on its own.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3070f69b
  5. 27 10月, 2008 4 次提交
  6. 25 10月, 2008 1 次提交
  7. 24 10月, 2008 4 次提交