1. 01 2月, 2006 7 次提交
  2. 24 1月, 2006 1 次提交
  3. 22 1月, 2006 5 次提交
  4. 21 1月, 2006 1 次提交
  5. 20 1月, 2006 2 次提交
  6. 19 1月, 2006 12 次提交
    • D
      [PATCH] Add pselect/ppoll system call implementation · 9f72949f
      David Woodhouse 提交于
      The following implementation of ppoll() and pselect() system calls
      depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
      thread_info.
      
      These system calls have to change the signal mask during their
      operation, and signal handlers must be invoked using the new, temporary
      signal mask. The old signal mask must be restored either upon successful
      exit from the system call, or upon returning from the invoked signal
      handler if the system call is interrupted. We can't simply restore the
      original signal mask and return to userspace, since the restored signal
      mask may actually block the signal which interrupted the system call.
      
      The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
      path to trap into do_signal() just as TIF_SIGPENDING does, and by
      causing do_signal() to use the saved signal mask instead of the current
      signal mask when setting up the stack frame for the signal handler -- or
      by causing do_signal() to simply restore the saved signal mask in the
      case where there is no handler to be invoked.
      
      The first patch implements the sys_pselect() and sys_ppoll() system
      calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
      #ifdef should go away in time when all architectures have implemented
      it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
      kernel (in the -mm tree), and the third patch then removes the
      arch-specific implementations of sys_rt_sigsuspend() and replaces them
      with generic versions using the same trick.
      
      The fourth and fifth patches, provided by David Howells, implement
      TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
      adds the syscalls to the i386 syscall table.
      
      This patch:
      
      Add the pselect() and ppoll() system calls, providing core routines usable by
      the original select() and poll() system calls and also the new calls (with
      their semantics w.r.t timeouts).
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9f72949f
    • D
      [PATCH] Generic sys_rt_sigsuspend() · 150256d8
      David Woodhouse 提交于
      The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of
      sys_rt_sigsuspend() instead of duplicating it for each architecture.  This
      provides such an implementation and makes arch/powerpc use it.
      
      It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK.
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      150256d8
    • U
      [PATCH] vfs: *at functions: core · 5590ff0d
      Ulrich Drepper 提交于
      Here is a series of patches which introduce in total 13 new system calls
      which take a file descriptor/filename pair instead of a single file
      name.  These functions, openat etc, have been discussed on numerous
      occasions.  They are needed to implement race-free filesystem traversal,
      they are necessary to implement a virtual per-thread current working
      directory (think multi-threaded backup software), etc.
      
      We have in glibc today implementations of the interfaces which use the
      /proc/self/fd magic.  But this code is rather expensive.  Here are some
      results (similar to what Jim Meyering posted before).
      
      The test creates a deep directory hierarchy on a tmpfs filesystem.  Then
      rm -fr is used to remove all directories.  Without syscall support I get
      this:
      
      real    0m31.921s
      user    0m0.688s
      sys     0m31.234s
      
      With syscall support the results are much better:
      
      real    0m20.699s
      user    0m0.536s
      sys     0m20.149s
      
      The interfaces are for obvious reasons currently not much used.  But they'll
      be used.  coreutils (and Jeff's posixutils) are already using them.
      Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
      them.  I expect a patch to make follow soon.  Every program which is walking
      the filesystem tree will benefit.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5590ff0d
    • J
      [PATCH] nfsd4: rename lk_stateowner · 3a65588a
      J. Bruce Fields 提交于
      One of the things that's confusing about nfsd4_lock is that the lk_stateowner
      field could be set to either of two different lockowners: the open owner or
      the lock owner.  Rename to lk_replay_owner and add a comment to make it clear
      that it's used for whichever stateowner has its sequence id bumped for replay
      detection.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3a65588a
    • J
      [PATCH] svcrpc: save and restore the daddr field when request deferred · 1918e341
      J. Bruce Fields 提交于
      The server code currently keeps track of the destination address on every
      request so that it can reply using the same address.  However we forget to do
      that in the case of a deferred request.  Remedy this oversight.  >From folks
      at PolyServe.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1918e341
    • Y
      [PATCH] nfsd: check error status from nfsd_sync_dir · f193fbab
      YAMAMOTO Takashi 提交于
      Change nfsd_sync_dir to return an error if ->sync fails, and pass that error
      up through the stack.  This involves a number of rearrangements of error
      paths, and care to distinguish between Linux -errno numbers and NFSERR
      numbers.
      
      In the 'create' routines, we continue with the 'setattr' even if a previous
      sync_dir failed.
      
      This patch is quite different from Takashi's in a few ways, but there is still
      a strong lineage.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f193fbab
    • A
      [PATCH] add missing syscall declarations · 5131cf15
      Arnd Bergmann 提交于
      All standard system calls should be declared in include/linux/syscalls.h.
      
      Add some of the new additions that were previously missed.
      Signed-off-by: NArnd Bergmann <arndb@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5131cf15
    • C
      [PATCH] NUMA policies in the slab allocator V2 · dc85da15
      Christoph Lameter 提交于
      This patch fixes a regression in 2.6.14 against 2.6.13 that causes an
      imbalance in memory allocation during bootup.
      
      The slab allocator in 2.6.13 is not numa aware and simply calls
      alloc_pages().  This means that memory policies may control the behavior of
      alloc_pages().  During bootup the memory policy is set to MPOL_INTERLEAVE
      resulting in the spreading out of allocations during bootup over all
      available nodes.  The slab allocator in 2.6.13 has only a single list of
      slab pages.  As a result the per cpu slab cache and the spinlock controlled
      page lists may contain slab entries from off node memory.  The slab
      allocator in 2.6.13 makes no effort to discern the locality of an entry on
      its lists.
      
      The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages
      explicitly by calling alloc_pages_node().  The NUMA slab allocator manages
      slab entries by having lists of available slab pages for each node.  The
      per cpu slab cache can only contain slab entries associated with the node
      local to the processor.  This guarantees that the default allocation mode
      of the slab allocator always assigns local memory if available.
      
      Setting MPOL_INTERLEAVE as a default policy during bootup has no effect
      anymore.  In 2.6.14 all node unspecific slab allocations are performed on
      the boot processor.  This means that most of key data structures are
      allocated on one node.  Most processors will have to refer to these
      structures making the boot node a potential bottleneck.  This may reduce
      performance and cause unnecessary memory pressure on the boot node.
      
      This patch implements NUMA policies in the slab layer.  There is the need
      of explicit application of NUMA memory policies by the slab allcator itself
      since the NUMA slab allocator does no longer let the page_allocator control
      locality.
      
      The check for policies is made directly at the beginning of __cache_alloc
      using current->mempolicy.  The memory policy is already frequently checked
      by the page allocator (alloc_page_vma() and alloc_page_current()).  So it
      is highly likely that the cacheline is present.  For MPOL_INTERLEAVE
      kmalloc() will spread out each request to one node after another so that an
      equal distribution of allocations can be obtained during bootup.
      
      It is not possible to push the policy check to lower layers of the NUMA
      slab allocator since the per cpu caches are now only containing slab
      entries from the current node.  If the policy says that the local node is
      not to be preferred or forbidden then there is no point in checking the
      slab cache or local list of slab pages.  The allocation better be directed
      immediately to the lists containing slab entries for the allowed set of
      nodes.
      
      This way of applying policy also fixes another strange behavior in 2.6.13.
      alloc_pages() is controlled by the memory allocation policy of the current
      process.  It could therefore be that one process is running with
      MPOL_INTERLEAVE and would f.e.  obtain a new page following that policy
      since no slab entries are in the lists anymore.  A page can typically be
      used for multiple slab entries but lets say that the current process is
      only using one.  The other entries are then added to the slab lists.  These
      are now non local entries in the slab lists despite of the possible
      availability of local pages that would provide faster access and increase
      the performance of the application.
      
      Another process without MPOL_INTERLEAVE may now run and expect a local slab
      entry from kmalloc().  However, there are still these free slab entries
      from the off node page obtained from the other process via MPOL_INTERLEAVE
      in the cache.  The process will then get an off node slab entry although
      other slab entries may be available that are local to that process.  This
      means that the policy if one process may contaminate the locality of the
      slab caches for other processes.
      
      This patch in effect insures that a per process policy is followed for the
      allocation of slab entries and that there cannot be a memory policy
      influence from one process to another.  A process with default policy will
      always get a local slab entry if one is available.  And the process using
      memory policies will get its memory arranged as requested.  Off-node slab
      allocation will require the use of spinlocks and will make the use of per
      cpu caches not possible.  A process using memory policies to redirect
      allocations offnode will have to cope with additional lock overhead in
      addition to the latency added by the need to access a remote slab entry.
      
      Changes V1->V2
      - Remove #ifdef CONFIG_NUMA by moving forward declaration into
        prior #ifdef CONFIG_NUMA section.
      
      - Give the function determining the node number to use a saner
        name.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dc85da15
    • C
      [PATCH] Zone reclaim: proc override · 1743660b
      Christoph Lameter 提交于
      proc support for zone reclaim
      
      This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
      used to override the automatic determination of the zone reclaim made on
      bootup.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1743660b
    • C
      [PATCH] Zone reclaim: Reclaim logic · 9eeff239
      Christoph Lameter 提交于
      Some bits for zone reclaim exists in 2.6.15 but they are not usable.  This
      patch fixes them up, removes unused code and makes zone reclaim usable.
      
      Zone reclaim allows the reclaiming of pages from a zone if the number of
      free pages falls below the watermarks even if other zones still have enough
      pages available.  Zone reclaim is of particular importance for NUMA
      machines.  It can be more beneficial to reclaim a page than taking the
      performance penalties that come with allocating a page on a remote zone.
      
      Zone reclaim is enabled if the maximum distance to another node is higher
      than RECLAIM_DISTANCE, which may be defined by an arch.  By default
      RECLAIM_DISTANCE is 20.  20 is the distance to another node in the same
      component (enclosure or motherboard) on IA64.  The meaning of the NUMA
      distance information seems to vary by arch.
      
      If zone reclaim is not successful then no further reclaim attempts will
      occur for a certain time period (ZONE_RECLAIM_INTERVAL).
      
      This patch was discussed before. See
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2
      http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2
      http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2
      http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9eeff239
    • N
      [PATCH] mm: migration page refcounting fix · 053837fc
      Nick Piggin 提交于
      Migration code currently does not take a reference to target page
      properly, so between unlocking the pte and trying to take a new
      reference to the page with isolate_lru_page, anything could happen to
      it.
      
      Fix this by holding the pte lock until we get a chance to elevate the
      refcount.
      
      Other small cleanups while we're here.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      053837fc
    • A
      [CPUFREQ] convert remaining cpufreq semaphore to a mutex · 83933af4
      Arjan van de Ven 提交于
      This one fell through the automation at first because it initializes the
      semaphore to locked, but that's easily remedied
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NDave Jones <davej@redhat.com>
      
       drivers/cpufreq/cpufreq.c |   37 +++++++++++++++++++------------------
       include/linux/cpufreq.h   |    3 ++-
       2 files changed, 21 insertions(+), 19 deletions(-)
      83933af4
  7. 18 1月, 2006 4 次提交
  8. 17 1月, 2006 8 次提交