1. 04 10月, 2010 1 次提交
    • N
      net: Fix the condition passed to sk_wait_event() · 482964e5
      Nagendra Tomar 提交于
      This patch fixes the condition (3rd arg) passed to sk_wait_event() in
      sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
      causes the following soft lockup in tcp_sendmsg() when the global tcp
      memory pool has exhausted.
      
      >>> snip <<<
      
      localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
      localhost kernel: CPU 3:
      localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200]  [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
      localhost kernel:
      localhost kernel: Call Trace:
      localhost kernel:  [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
      localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
      localhost kernel:  [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
      localhost kernel:  [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
      localhost kernel:  [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
      localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
      localhost kernel:  [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
      localhost kernel:  [vfs_write+0x185/0x190] vfs_write+0x185/0x190
      localhost kernel:  [sys_write+0x50/0x90] sys_write+0x50/0x90
      localhost kernel:  [system_call+0x7e/0x83] system_call+0x7e/0x83
      
      >>> snip <<<
      
      What is happening is, that the sk_wait_event() condition passed from
      sk_stream_wait_memory() evaluates to true for the case of tcp global memory
      exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
      which causes sk_wait_event() to *not* call schedule_timeout().
      Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
      This causes the caller to again try allocation, which again fails and again
      calls sk_stream_wait_memory(), and so on.
      
      [ Bug introduced by commit c1cbe4b7
        ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]
      Signed-off-by: NNagendra Singh Tomar <tomer_iisc@yahoo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      482964e5
  2. 13 7月, 2010 1 次提交
  3. 02 5月, 2010 1 次提交
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482
  4. 21 4月, 2010 1 次提交
  5. 18 5月, 2009 1 次提交
  6. 14 10月, 2008 1 次提交
  7. 26 7月, 2008 1 次提交
  8. 29 1月, 2008 4 次提交
    • H
      [NET] CORE: Introducing new memory accounting interface. · 3ab224be
      Hideo Aoki 提交于
      This patch introduces new memory accounting functions for each network
      protocol. Most of them are renamed from memory accounting functions
      for stream protocols. At the same time, some stream memory accounting
      functions are removed since other functions do same thing.
      
      Renaming:
      	sk_stream_free_skb()		->	sk_wmem_free_skb()
      	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
      	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
      	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
      	sk_stream_pages()      		->	sk_mem_pages()
      	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
      	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
      	sk_charge_skb()			->	sk_mem_charge()
      
      Removeing
      	sk_stream_rfree():	consolidates into sock_rfree()
      	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
      	sk_stream_mem_schedule()
      
      The following functions are added.
          	sk_has_account(): check if the protocol supports accounting
      	sk_mem_uncharge(): do the opposite of sk_mem_charge()
      
      In addition, to achieve consolidation, updating sk_wmem_queued is
      removed from sk_mem_charge().
      
      Next, to consolidate memory accounting functions, this patch adds
      memory accounting calls to network core functions. Moreover, present
      memory accounting call is renamed to new accounting call.
      
      Finally we replace present memory accounting calls with new interface
      in TCP and SCTP.
      Signed-off-by: NTakahiro Yasui <tyasui@redhat.com>
      Signed-off-by: NHideo Aoki <haoki@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab224be
    • E
      [SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim() · 21371f76
      Eric Dumazet 提交于
      sk_forward_alloc being signed, we should take care of divides by
      SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and
      __sk_stream_mem_reclaim()
      
      This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
      as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
      shifts instead of plain divides.
      
      This should help compiler to choose right shifts instead of
      expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21371f76
    • P
      [NET]: Name magic constants in sock_wake_async() · 8d8ad9d7
      Pavel Emelyanov 提交于
      The sock_wake_async() performs a bit different actions
      depending on "how" argument. Unfortunately this argument
      ony has numerical magic values.
      
      I propose to give names to their constants to help people
      reading this function callers understand what's going on
      without looking into this function all the time.
      
      I suppose this is 2.6.25 material, but if it's not (or the
      naming seems poor/bad/awful), I can rework it against the
      current net-2.6 tree.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d8ad9d7
    • P
      [NET]: Compact sk_stream_mem_schedule() code · 9859a790
      Pavel Emelyanov 提交于
      This function references sk->sk_prot->xxx for many times.
      It turned out, that there's so many code in it, that gcc
      cannot always optimize access to sk->sk_prot's fields.
      
      After saving the sk->sk_prot on the stack and comparing
      disassembled code, it turned out that the function became
      ~10 bytes shorter and made less dereferences (on i386 and
      x86_64). Stack consumption didn't grow.
      
      Besides, this patch drives most of this function into the
      80 columns limit.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9859a790
  9. 11 2月, 2007 1 次提交
  10. 13 7月, 2006 1 次提交
  11. 20 4月, 2006 1 次提交
    • D
      [NET]: Add skb->truesize assertion checking. · dc6de336
      David S. Miller 提交于
      Add some sanity checking.  truesize should be at least sizeof(struct
      sk_buff) plus the current packet length.  If not, then truesize is
      seriously mangled and deserves a kernel log message.
      
      Currently we'll do the check for release of stream socket buffers.
      
      But we can add checks to more spots over time.
      
      Incorporating ideas from Herbert Xu.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc6de336
  12. 04 1月, 2006 1 次提交
  13. 06 11月, 2005 1 次提交
    • H
      [NET]: Fix race condition in sk_stream_wait_connect · 6151b31c
      Herbert Xu 提交于
      When sk_stream_wait_connect detects a state transition to ESTABLISHED
      or CLOSE_WAIT prior to it going to sleep, it will return without
      calling finish_wait and decrementing sk_write_pending.
      
      This may result in crashes and other unintended behaviour.
      
      The fix is to always call finish_wait and update sk_write_pending since
      it is safe to do so even if the wait entry is no longer on the queue.
      
      This bug was tracked down with the help of Alex Sidorenko and the
      fix is also based on his suggestion.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      6151b31c
  14. 01 5月, 2005 1 次提交
    • P
      [PATCH] DocBook: changes and extensions to the kernel documentation · 4dc3b16b
      Pavel Pisa 提交于
      I have recompiled Linux kernel 2.6.11.5 documentation for me and our
      university students again.  The documentation could be extended for more
      sources which are equipped by structured comments for recent 2.6 kernels.  I
      have tried to proceed with that task.  I have done that more times from 2.6.0
      time and it gets boring to do same changes again and again.  Linux kernel
      compiles after changes for i386 and ARM targets.  I have added references to
      some more files into kernel-api book, I have added some section names as well.
       So please, check that changes do not break something and that categories are
      not too much skewed.
      
      I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
      by kernel convention.  Most of the other changes are modifications in the
      comments to make kernel-doc happy, accept some parameters description and do
      not bail out on errors.  Changed <pid> to @pid in the description, moved some
      #ifdef before comments to correct function to comments bindings, etc.
      
      You can see result of the modified documentation build at
        http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz
      
      Some more sources are ready to be included into kernel-doc generated
      documentation.  Sources has been added into kernel-api for now.  Some more
      section names added and probably some more chaos introduced as result of quick
      cleanup work.
      Signed-off-by: NPavel Pisa <pisa@cmp.felk.cvut.cz>
      Signed-off-by: NMartin Waitz <tali@admingilde.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4dc3b16b
  15. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4