1. 22 3月, 2006 1 次提交
  2. 19 3月, 2006 1 次提交
  3. 17 3月, 2006 1 次提交
  4. 14 3月, 2006 1 次提交
    • G
      [PATCH] Fix sigaltstack corruption among cloned threads · f9a3879a
      GOTO Masanori 提交于
      This patch fixes alternate signal stack corruption among cloned threads
      with CLONE_SIGHAND (and CLONE_VM) for linux-2.6.16-rc6.
      
      The value of alternate signal stack is currently inherited after a call of
      clone(...  CLONE_SIGHAND | CLONE_VM).  But if sigaltstack is set by a
      parent thread, and then if multiple cloned child threads (+ parent threads)
      call signal handler at the same time, some threads may be conflicted -
      because they share to use the same alternative signal stack region.
      Finally they get sigsegv.  It's an undesirable race condition.  Note that
      child threads created from NPTL pthread_create() also hit this conflict
      when the parent thread uses sigaltstack, without my patch.
      
      To fix this problem, this patch clears the child threads' sigaltstack
      information like exec().  This behavior follows the SUSv3 specification.
      In SUSv3, pthread_create() says "The alternate stack shall not be inherited
      (when new threads are initialized)".  It means that sigaltstack should be
      cleared when sigaltstack memory space is shared by cloned threads with
      CLONE_SIGHAND.
      
      Note that I chose "if (clone_flags & CLONE_SIGHAND)" line because:
        - If clone_flags line is not existed, fork() does not inherit sigaltstack.
        - CLONE_VM is another choice, but vfork() does not inherit sigaltstack.
        - CLONE_SIGHAND implies CLONE_VM, and it looks suitable.
        - CLONE_THREAD is another candidate, and includes CLONE_SIGHAND + CLONE_VM,
          but this flag has a bit different semantics.
      I decided to use CLONE_SIGHAND.
      
      [ Changed to test for CLONE_VM && !CLONE_VFORK after discussion --Linus ]
      Signed-off-by: NGOTO Masanori <gotom@sanori.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NLinus Torvalds <torvalds@osdl.org>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f9a3879a
  5. 12 3月, 2006 1 次提交
  6. 16 2月, 2006 2 次提交
    • O
      [PATCH] fix kill_proc_info() vs fork() theoretical race · dadac81b
      Oleg Nesterov 提交于
      copy_process:
      
      	attach_pid(p, PIDTYPE_PID, p->pid);
      	attach_pid(p, PIDTYPE_TGID, p->tgid);
      
      What if kill_proc_info(p->pid) happens in between?
      
      copy_process() holds current->sighand.siglock, so we are safe
      in CLONE_THREAD case, because current->sighand == p->sighand.
      
      Otherwise, p->sighand is unlocked, the new process is already
      visible to the find_task_by_pid(), but have a copy of parent's
      'struct pid' in ->pids[PIDTYPE_TGID].
      
      This means that __group_complete_signal() may hang while doing
      
      	do ... while (next_thread() != p)
      
      We can solve this problem if we reverse these 2 attach_pid()s:
      
      	attach_pid() does wmb()
      
      	group_send_sig_info() calls spin_lock(), which
      	provides a read barrier. // Yes ?
      
      I don't think we can hit this race in practice, but still.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dadac81b
    • O
      [PATCH] fix kill_proc_info() vs CLONE_THREAD race · 3f17da69
      Oleg Nesterov 提交于
      There is a window after copy_process() unlocks ->sighand.siglock
      and before it adds the new thread to the thread list.
      
      In that window __group_complete_signal(SIGKILL) will not see the
      new thread yet, so this thread will start running while the whole
      thread group was supposed to exit.
      
      I beleive we have another good reason to place attach_pid(PID/TGID)
      under ->sighand.siglock. We can do the same for
      
      	release_task()->__unhash_process()
      
      	de_thread()->switch_exec_pids()
      
      After that we don't need tasklist_lock to iterate over the thread
      list, and we can simplify things, see for example do_sigaction()
      or sys_times().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3f17da69
  7. 08 2月, 2006 5 次提交
  8. 02 2月, 2006 1 次提交
  9. 12 1月, 2006 2 次提交
  10. 11 1月, 2006 1 次提交
  11. 10 1月, 2006 1 次提交
  12. 09 1月, 2006 4 次提交
  13. 29 11月, 2005 2 次提交
  14. 23 11月, 2005 1 次提交
  15. 14 11月, 2005 3 次提交
  16. 08 11月, 2005 1 次提交
    • H
      [SPARC64] mm: context switch ptlock · dedeb002
      Hugh Dickins 提交于
      sparc64 is unique among architectures in taking the page_table_lock in
      its context switch (well, cris does too, but erroneously, and it's not
      yet SMP anyway).
      
      This seems to be a private affair between switch_mm and activate_mm,
      using page_table_lock as a per-mm lock, without any relation to its uses
      elsewhere.  That's fine, but comment it as such; and unlock sooner in
      switch_mm, more like in activate_mm (preemption is disabled here).
      
      There is a block of "if (0)"ed code in smp_flush_tlb_pending which would
      have liked to rely on the page_table_lock, in switch_mm and elsewhere;
      but its comment explains how dup_mmap's flush_tlb_mm defeated it.  And
      though that could have been changed at any time over the past few years,
      now the chance vanishes as we push the page_table_lock downwards, and
      perhaps split it per page table page.  Just delete that block of code.
      
      Which leaves the mysterious spin_unlock_wait(&oldmm->page_table_lock)
      in kernel/fork.c copy_mm.  Textual analysis (supported by Nick Piggin)
      suggests that the comment was written by DaveM, and that it relates to
      the defeated approach in the sparc64 smp_flush_tlb_pending.  Just delete
      this block too.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dedeb002
  17. 07 11月, 2005 1 次提交
  18. 30 10月, 2005 6 次提交
    • H
      [PATCH] mm: ptd_alloc take ptlock · c74df32c
      Hugh Dickins 提交于
      Second step in pushing down the page_table_lock.  Remove the temporary
      bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
      to hold page_table_lock, whether it's on init_mm or a user mm; take
      page_table_lock internally to check if a racing task already allocated.
      
      Convert their callers from common code.  But avoid coming back to change them
      again later: instead of moving the spin_lock(&mm->page_table_lock) down,
      switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
      encapsulate the mapping+locking and unlocking+unmapping together, and in the
      end may use alternatives to the mm page_table_lock itself.
      
      These callers all hold mmap_sem (some exclusively, some not), so at no level
      can a page table be whipped away from beneath them; and pte_alloc uses the
      "atomic" pmd_present to test whether it needs to allocate.  It appears that on
      all arches we can safely descend without page_table_lock.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c74df32c
    • H
      [PATCH] mm: dup_mmap down new mmap_sem · 7ee78232
      Hugh Dickins 提交于
      One anomaly remains from when Andrea rationalized the responsibilities of
      mmap_sem and page_table_lock: in dup_mmap we add vmas to the child holding its
      page_table_lock, but not the mmap_sem which normally guards the vma list and
      rbtree.  Which could be an issue for unuse_mm: though since it just walks down
      the list (today with page_table_lock, tomorrow not), it's probably okay.  Will
      need a memory barrier?  Oh, keep it simple, Nick and I agreed, no harm in
      taking child's mmap_sem here.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7ee78232
    • H
      [PATCH] mm: dup_mmap use oldmm more · fd3e42fc
      Hugh Dickins 提交于
      Use the parent's oldmm throughout dup_mmap, instead of perversely going back
      to current->mm.  (Can you hear the sigh of relief from those mpnts?  Usually I
      squash them, but not today.)
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fd3e42fc
    • H
      [PATCH] mm: rss = file_rss + anon_rss · 4294621f
      Hugh Dickins 提交于
      I was lazy when we added anon_rss, and chose to change as few places as
      possible.  So currently each anonymous page has to be counted twice, in rss
      and in anon_rss.  Which won't be so good if those are atomic counts in some
      configurations.
      
      Change that around: keep file_rss and anon_rss separately, and add them
      together (with get_mm_rss macro) when the total is needed - reading two
      atomics is much cheaper than updating two atomics.  And update anon_rss
      upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4294621f
    • H
      [PATCH] mm: mm_init set_mm_counters · 404351e6
      Hugh Dickins 提交于
      How is anon_rss initialized?  In dup_mmap, and by mm_alloc's memset; but
      that's not so good if an mm_counter_t is a special type.  And how is rss
      initialized?  By set_mm_counter, all over the place.  Come on, we just need to
      initialize them both at once by set_mm_counter in mm_init (which follows the
      memcpy when forking).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      404351e6
    • H
      [PATCH] mm: vm_stat_account unshackled · ab50b8ed
      Hugh Dickins 提交于
      The original vm_stat_account has fallen into disuse, with only one user, and
      only one user of vm_stat_unaccount.  It's easier to keep track if we convert
      them all to __vm_stat_account, then free it from its __shackles.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab50b8ed
  19. 20 10月, 2005 1 次提交
  20. 18 9月, 2005 1 次提交
  21. 10 9月, 2005 3 次提交
    • D
      [PATCH] files: files struct with RCU · ab2af1f5
      Dipankar Sarma 提交于
      Patch to eliminate struct files_struct.file_lock spinlock on the reader side
      and use rcu refcounting rcuref_xxx api for the f_count refcounter.  The
      updates to the fdtable are done by allocating a new fdtable structure and
      setting files->fdt to point to the new structure.  The fdtable structure is
      protected by RCU thereby allowing lock-free lookup.  For fd arrays/sets that
      are vmalloced, we use keventd to free them since RCU callbacks can't sleep.  A
      global list of fdtable to be freed is not scalable, so we use a per-cpu list.
      If keventd is already handling the current cpu's work, we use a timer to defer
      queueing of that work.
      
      Since the last publication, this patch has been re-written to avoid using
      explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
      premitives instead.  This required that the fd information is kept in a
      separate structure (fdtable) and updated atomically.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab2af1f5
    • D
      [PATCH] files: break up files struct · badf1662
      Dipankar Sarma 提交于
      In order for the RCU to work, the file table array, sets and their sizes must
      be updated atomically.  Instead of ensuring this through too many memory
      barriers, we put the arrays and their sizes in a separate structure.  This
      patch takes the first step of putting the file table elements in a separate
      structure fdtable that is embedded withing files_struct.  It also changes all
      the users to refer to the file table using files_fdtable() macro.  Subsequent
      applciation of RCU becomes easier after this.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      badf1662
    • J
      [PATCH] fix disassociate_ctty vs. fork race · b0d62e6d
      Jason Baron 提交于
      Race is as follows. Process A forks process B, both being part of the same
      session. Then, A calls disassociate_ctty while B forks C:
      
      A				B
      ====				====
      				fork()
      				  copy_signal()
      dissasociate_ctty()		....
      				  attach_pid(p, PIDTYPE_SID, p->signal->session);
      
      Now, C can have current->signal->tty pointing to a freed tty structure, as
      it hasn't yet been added to the session group (to have its controlling tty
      cleared on the diassociate_ctty() call).
      
      This has shown up as an oops but could be even more serious.  I haven't
      tried to create a test case, but a customer has verified that the patch
      below resolves the issue, which was occuring quite frequently.  I'll try
      and post the test case if i can.
      
      The patch simply checks for a NULL tty *after* it has been attached to the
      proper session group and clears it as necessary.  Alternatively, we could
      simply do the tty assignment after the the process is added to the proper
      session group.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b0d62e6d