1. 13 9月, 2008 1 次提交
    • B
      net: fix scheduling of dst_gc_task by __dst_free · f262b59b
      Benjamin Thery 提交于
      The dst garbage collector dst_gc_task() may not be scheduled as we
      expect it to be in __dst_free().
      
      Indeed, when the dst_gc_timer was replaced by the delayed_work
      dst_gc_work, the mod_timer() call used to schedule the garbage
      collector at an earlier date was replaced by a schedule_delayed_work()
      (see commit 86bba269).
      
      But, the behaviour of mod_timer() and schedule_delayed_work() is
      different in the way they handle the delay. 
      
      mod_timer() stops the timer and re-arm it with the new given delay,
      whereas schedule_delayed_work() only check if the work is already
      queued in the workqueue (and queue it (with delay) if it is not)
      BUT it does NOT take into account the new delay (even if the new delay
      is earlier in time).
      schedule_delayed_work() returns 0 if it didn't queue the work,
      but we don't check the return code in __dst_free().
      
      If I understand the code in __dst_free() correctly, we want dst_gc_task
      to be queued after DST_GC_INC jiffies if we pass the test (and not in
      some undetermined time in the future), so I think we should add a call
      to cancel_delayed_work() before schedule_delayed_work(). Patch below.
      
      Or we should at least test the return code of schedule_delayed_work(),
      and reset the values of dst_garbage.timer_inc and dst_garbage.timer_expires
      back to their former values if schedule_delayed_work() failed.
      Otherwise the subsequent calls to __dst_free will test the wrong values
      and assume wrong thing about when the garbage collector is supposed to
      be scheduled.
      
      dst_gc_task() also calls schedule_delayed_work() without checking
      its return code (or calling cancel_scheduled_work() first), but it
      should fine there: dst_gc_task is the routine of the delayed_work, so
      no dst_gc_work should be pending in the queue when it's running.
      Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
      Acked-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f262b59b
  2. 28 3月, 2008 1 次提交
  3. 26 3月, 2008 1 次提交
  4. 29 2月, 2008 1 次提交
  5. 29 1月, 2008 4 次提交
  6. 07 11月, 2007 1 次提交
  7. 11 10月, 2007 4 次提交
    • E
      [NET]: Make the loopback device per network namespace. · 2774c7ab
      Eric W. Biederman 提交于
      This patch makes loopback_dev per network namespace.  Adding
      code to create a different loopback device for each network
      namespace and adding the code to free a loopback device
      when a network namespace exits.
      
      This patch modifies all users the loopback_dev so they
      access it as init_net.loopback_dev, keeping all of the
      code compiling and working.  A later pass will be needed to
      update the users to use something other than the initial network
      namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2774c7ab
    • D
      [NET]: Dynamically allocate the loopback device, part 1. · de3cb747
      Daniel Lezcano 提交于
      This patch replaces all occurences to the static variable
      loopback_dev to a pointer loopback_dev. That provides the
      mindless, trivial, uninteressting change part for the dynamic
      allocation for the loopback.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-By: NKirill Korotaev <dev@sw.ru>
      Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de3cb747
    • E
      [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue · 86bba269
      Eric Dumazet 提交于
      When the periodic IP route cache flush is done (every 600 seconds on
      default configuration), some hosts suffer a lot and eventually trigger
      the "soft lockup" message.
      
      dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
      eventually freeing some (less than 1%) of them, while holding the
      dst_lock spinlock for the whole scan.
      
      Then it rearms a timer to redo the full thing 1/10 s later...
      The slowdown can last one minute or so, depending on how active are
      the tcp sessions.
      
      This second version of the patch converts the processing from a softirq
      based one to a workqueue.
      
      Even if the list of entries in garbage_list is huge, host is still
      responsive to softirqs and can make progress.
      
      Instead of resetting gc timer to 0.1 second if one entry was freed in a
      gc run, we do this if more than 10% of entries were freed.
      
      Before patch :
      
      Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
      Aug 16 06:21:37 SRV1 kernel:
      Aug 16 06:21:37 SRV1 kernel: Call Trace:
      Aug 16 06:21:37 SRV1 kernel:  <IRQ>  [<ffffffff802286f0>] wake_up_process+0x10/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80251e09>] softlockup_tick+0xe9/0x110
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd380>] dst_run_gc+0x0/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802376f3>] run_local_timers+0x13/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802379c7>] update_process_times+0x57/0x90
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80216034>] smp_local_timer_interrupt+0x34/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802165cc>] smp_apic_timer_interrupt+0x5c/0x80
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8020a816>] apic_timer_interrupt+0x66/0x70
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd3d3>] dst_run_gc+0x53/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd3c6>] dst_run_gc+0x46/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80237148>] run_timer_softirq+0x148/0x1c0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8023340c>] __do_softirq+0x6c/0xe0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8020ad6c>] call_softirq+0x1c/0x30
      Aug 16 06:21:37 SRV1 kernel:  <EOI>  [<ffffffff8020cb34>] do_softirq+0x34/0x90
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802331cf>] local_bh_enable_ip+0x3f/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422913>] _spin_unlock_bh+0x13/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803dfde8>] rt_garbage_collect+0x1d8/0x320
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd4dd>] dst_alloc+0x1d/0xa0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803e1433>] __ip_route_output_key+0x573/0x800
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c02e2>] sock_common_recvmsg+0x32/0x50
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803e16dc>] ip_route_output_flow+0x1c/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80400160>] tcp_v4_connect+0x150/0x610
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803ebf07>] inet_bind_bucket_create+0x17/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8040cd16>] inet_stream_connect+0xa6/0x2c0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c0bdf>] lock_sock_nested+0xcf/0xe0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803be551>] sys_connect+0x71/0xa0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803eee3f>] tcp_setsockopt+0x1f/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c030f>] sock_common_setsockopt+0xf/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803be4bd>] sys_setsockopt+0x9d/0xc0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8028881e>] sys_ioctl+0x5e/0x80
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80209c4e>] system_call+0x7e/0x83
      
      After patch : (RT_CACHE_DEBUG set to 2 to get following traces)
      
      dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us
      dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us
      dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us
      dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us
      dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us
      dst_total: 154522 delayed: 58316 work_perf: 6395 expires: 25 elapsed: 11402 us
      dst_total: 154957 delayed: 58252 work_perf: 64 expires: 150 elapsed: 6148 us
      dst_total: 157377 delayed: 57843 work_perf: 409 expires: 400 elapsed: 6350 us
      dst_total: 163745 delayed: 56679 work_perf: 1164 expires: 775 elapsed: 7051 us
      dst_total: 176577 delayed: 53965 work_perf: 2714 expires: 1389 elapsed: 8120 us
      dst_total: 198993 delayed: 49627 work_perf: 4338 expires: 1997 elapsed: 8909 us
      dst_total: 226638 delayed: 46865 work_perf: 2762 expires: 2748 elapsed: 7351 us
      
      I successfully reduced the IP route cache of many hosts by a four factor
      thanks to this patch. Previously, I had to disable "ip route flush cache"
      to avoid crashes.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86bba269
    • E
      [NET]: Make device event notification network namespace safe · e9dc8653
      Eric W. Biederman 提交于
      Every user of the network device notifiers is either a protocol
      stack or a pseudo device.  If a protocol stack that does not have
      support for multiple network namespaces receives an event for a
      device that is not in the initial network namespace it quite possibly
      can get confused and do the wrong thing.
      
      To avoid problems until all of the protocol stacks are converted
      this patch modifies all netdev event handlers to ignore events on
      devices that are not in the initial network namespace.
      
      As the rest of the code is made network namespace aware these
      checks can be removed.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9dc8653
  8. 08 6月, 2007 1 次提交
  9. 15 2月, 2007 1 次提交
    • T
      [PATCH] remove many unneeded #includes of sched.h · cd354f1a
      Tim Schmielau 提交于
      After Al Viro (finally) succeeded in removing the sched.h #include in module.h
      recently, it makes sense again to remove other superfluous sched.h includes.
      There are quite a lot of files which include it but don't actually need
      anything defined in there.  Presumably these includes were once needed for
      macros that used to live in sched.h, but moved to other header files in the
      course of cleaning it up.
      
      To ease the pain, this time I did not fiddle with any header files and only
      removed #includes from .c-files, which tend to cause less trouble.
      
      Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
      arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
      allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
      configs in arch/arm/configs on arm.  I also checked that no new warnings were
      introduced by the patch (actually, some warnings are removed that were emitted
      by unnecessarily included header files).
      Signed-off-by: NTim Schmielau <tim@physik3.uni-rostock.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd354f1a
  10. 12 2月, 2007 1 次提交
  11. 11 2月, 2007 1 次提交
  12. 09 2月, 2007 1 次提交
    • A
      [NET]: user of the jiffies rounding code: Networking · f5a6e01c
      Arjan van de Ven 提交于
      This patch introduces users of the round_jiffies() function in the
      networking code.
      
      These timers all were of the "about once a second" or "about once
      every X seconds" variety and several showed up in the "what wakes the
      cpu up" profiles that the tickless patches provide.  Some timers are
      highly dynamic based on network load; but even on low activity systems
      they still show up so the rounding is done only in cases of low
      activity, allowing higher frequency timers in the high activity case.
      
      The various hardware watchdogs are an obvious case; they run every 2
      seconds but aren't otherwise specific of exactly when they need to
      run.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5a6e01c
  13. 08 12月, 2006 1 次提交
  14. 09 8月, 2006 1 次提交
    • D
      [NET]: add_timer -> mod_timer() in dst_run_gc() · 7c91767a
      Dmitry Mishin 提交于
      Patch from Dmitry Mishin <dim@openvz.org>:
      
      Replace add_timer() by mod_timer() in dst_run_gc
      in order to avoid BUG message.
      
             CPU1                            CPU2
      dst_run_gc()  entered           dst_run_gc() entered
      spin_lock(&dst_lock)                   .....
      del_timer(&dst_gc_timer)         fail to get lock
             ....                         mod_timer() <--- puts 
                                                       timer back
                                                       to the list
      add_timer(&dst_gc_timer) <--- BUG because timer is in list already.
      
      Found during OpenVZ internal testing.
      
      At first we thought that it is OpenVZ specific as we
      added dst_run_gc(0) call in dst_dev_event(),
      but as Alexey pointed to me it is possible to trigger
      this condition in mainstream kernel.
      
      F.e. timer has fired on CPU2, but the handler was preeempted
      by an irq before dst_lock is tried.
      Meanwhile, someone on CPU1 adds an entry to gc list and
      starts the timer.
      If CPU2 was preempted long enough, this timer can expire
      simultaneously with resuming timer handler on CPU1, arriving
      exactly to the situation described.
      Signed-off-by: NDmitry Mishin <dim@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c91767a
  15. 10 9月, 2005 1 次提交
  16. 31 7月, 2005 1 次提交
  17. 17 4月, 2005 2 次提交