1. 07 4月, 2009 1 次提交
  2. 03 4月, 2009 2 次提交
  3. 01 4月, 2009 1 次提交
  4. 16 3月, 2009 1 次提交
    • J
      Use f_lock to protect f_flags · db1dd4d3
      Jonathan Corbet 提交于
      Traditionally, changes to struct file->f_flags have been done under BKL
      protection, or with no protection at all.  This patch causes all f_flags
      changes after file open/creation time to be done under protection of
      f_lock.  This allows the removal of some BKL usage and fixes a number of
      longstanding (if microscopic) races.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      db1dd4d3
  5. 11 2月, 2009 1 次提交
    • M
      Do not account for the address space used by hugetlbfs using VM_ACCOUNT · 5a6fe125
      Mel Gorman 提交于
      When overcommit is disabled, the core VM accounts for pages used by anonymous
      shared, private mappings and special mappings. It keeps track of VMAs that
      should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
      with VM_NORESERVE.
      
      Overcommit for hugetlbfs is much riskier than overcommit for base pages
      due to contiguity requirements. It avoids overcommiting on both shared and
      private mappings using reservation counters that are checked and updated
      during mmap(). This ensures (within limits) that hugepages exist in the
      future when faults occurs or it is too easy to applications to be SIGKILLed.
      
      As hugetlbfs makes its own reservations of a different unit to the base page
      size, VM_ACCOUNT should never be set. Even if the units were correct, we would
      double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
      be set because an application can request no reserves be made for hugetlbfs
      at the risk of getting killed later.
      
      With commit fc8744ad, VM_NORESERVE and
      VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
      breaks the accounting for both the core VM and hugetlbfs, can trigger an
      OOM storm when hugepage pools are too small lockups and corrupted counters
      otherwise are used. This patch brings hugetlbfs more in line with how the
      core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a6fe125
  6. 06 2月, 2009 2 次提交
  7. 01 2月, 2009 1 次提交
    • L
      Stop playing silly games with the VM_ACCOUNT flag · fc8744ad
      Linus Torvalds 提交于
      The mmap_region() code would temporarily set the VM_ACCOUNT flag for
      anonymous shared mappings just to inform shmem_zero_setup() that it
      should enable accounting for the resulting shm object.  It would then
      clear the flag after calling ->mmap (for the /dev/zero case) or doing
      shmem_zero_setup() (for the MAP_ANON case).
      
      This just resulted in vma merge issues, but also made for just
      unnecessary confusion.  Use the already-existing VM_NORESERVE flag for
      this instead, and let shmem_{zero|file}_setup() just figure it out from
      that.
      
      This also happens to make it obvious that the new DRI2 GEM layer uses a
      non-reserving backing store for its object allocation - which is quite
      possibly not intentional.  But since I didn't want to change semantics
      in this patch, I left it alone, and just updated the caller to use the
      new flag semantics.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc8744ad
  8. 14 1月, 2009 5 次提交
  9. 09 1月, 2009 1 次提交
    • S
      mqueue: fix si_pid value in mqueue do_notify() · a6684999
      Sukadev Bhattiprolu 提交于
      If a process registers for asynchronous notification on a POSIX message
      queue, it gets a signal and a siginfo_t structure when a message arrives
      on the message queue.  The si_pid in the siginfo_t structure is set to the
      PID of the process that sent the message to the message queue.
      
      The principle is the following:
      . when mq_notify(SIGEV_SIGNAL) is called, the caller registers for
        notification when a msg arrives. The associated pid structure is stroed into
        inode_info->notify_owner. Let's call this process P1.
      . when mq_send() is called by say P2, P2 sends a signal to P1 to notify
        him about msg arrival.
      
      The way .si_pid is set today is not correct, since it doesn't take into account
      the fact that the process that is sending the message might not be in the
      same namespace as the notified one.
      
      This patch proposes to set si_pid to the sender's pid into the notify_owner
      namespace.
      Signed-off-by: NNadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Bastian Blank <bastian@waldi.eu.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6684999
  10. 08 1月, 2009 1 次提交
    • D
      NOMMU: Make VMAs per MM as for MMU-mode linux · 8feae131
      David Howells 提交于
      Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:
      
       (1) In SYSV SHM where nattch for a segment does not reflect the number of
           shmat's (and forks) done.
      
       (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
           exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
           that a VMA might be shared and already have its vm_mm assigned to another
           process or a dead process.
      
      A new struct (vm_region) is introduced to track a mapped region and to remember
      the circumstances under which it may be shared and the vm_list_struct structure
      is discarded as it's no longer required.
      
      This patch makes the following additional changes:
      
       (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
           with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
           each page has a reference on it held by the region.  Anything else that is
           interested in such a page will have to get a reference on it to retain it.
           When the pages are released due to unmapping, each page is passed to
           put_page() and will be freed when the page usage count reaches zero.
      
       (2) Excess pages are trimmed after an allocation as the allocation must be
           made as a power-of-2 quantity of pages.
      
       (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
           end up with overlapping VMAs within the tree, the VMA struct address is
           appended to the sort key.
      
       (4) Non-anonymous VMAs are now added to the backing inode's prio list.
      
       (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
           the backing region.  The VMA and region structs will be split if
           necessary.
      
       (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
           segment instead of all the attachments at that addresss.  Multiple
           shmat()'s return the same address under NOMMU-mode instead of different
           virtual addresses as under MMU-mode.
      
       (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.
      
       (8) /proc/maps is now the global list of mapped regions, and may list bits
           that aren't actually mapped anywhere.
      
       (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
           of RAM currently allocated by mmap to hold mappable regions that can't be
           mapped directly.  These are copies of the backing device or file if not
           anonymous.
      
      These changes make NOMMU mode more similar to MMU mode.  The downside is that
      NOMMU mode requires some extra memory to track things over NOMMU without this
      patch (VMAs are no longer shared, and there are now region structs).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      8feae131
  11. 07 1月, 2009 3 次提交
  12. 06 1月, 2009 2 次提交
  13. 05 1月, 2009 6 次提交
  14. 20 11月, 2008 1 次提交
  15. 14 11月, 2008 4 次提交
  16. 21 10月, 2008 1 次提交
  17. 20 10月, 2008 2 次提交
    • J
      message queues: increase range limits · b231cca4
      Joe Korty 提交于
      Increase the range of various posix message queue limits.
      
      Posix gives the message queue user the ability to 'trade off' the maximum
      size of messages with the number of possible messages that can be 'in
      flight'.  Linux currently makes this trade off more restrictive than it
      needs to be.
      
      In particular, the maximum message size today can be made no smaller than
      8192.  This greatly restricts those applications that would like to have
      the ability to post large numbers of very small messages.
      
      So this task lowers the limit that the maximum message size can be set to,
      from 8192 to 128.  It also lowers the limit that the maximum #number of
      messages in flight can be set to, from 10 to 1.
      
      With these changes the message queue user can make better trade offs
      between #messages and message size, in order to get everything to fit
      within the setrlimit(RLIMIT_MSGQUEUE) limit for that particular user.
      
      This patch also applies the values in
      
      	/proc/sys/fs/mqueue/msg_max
      	/proc/sys/fs/mqueue/msgsize_max
      
      as the defaults for the max #messages allowed and the max message size
      allowed, respectively, for those applications that do not supply these.
      Previously, the defaults were hardwired to 10 and 8192, respectively.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NJoe Korty <joe.korty@ccur.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b231cca4
    • L
      SHM_LOCKED pages are unevictable · 89e004ea
      Lee Schermerhorn 提交于
      Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be
      kept on the normal LRU, since scanning them is a waste of time and might
      throw off kswapd's balancing algorithms.  Place them on the unevictable
      LRU list instead.
      
      Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared
      memory regions as unevictable.  Then these pages will be culled off the
      normal LRU lists during vmscan.
      
      Add new wrapper function to clear the mapping's unevictable state when/if
      shared memory segment is munlocked.
      
      Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in
      the shmem segment's mapping [struct address_space] for evictability now
      that they're no longer locked.  If so, move them to the appropriate zone
      lru list.
      
      Changes depend on [CONFIG_]UNEVICTABLE_LRU.
      
      [kosaki.motohiro@jp.fujitsu.com: revert shm change]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89e004ea
  18. 17 10月, 2008 2 次提交
  19. 27 7月, 2008 2 次提交
  20. 26 7月, 2008 1 次提交
    • N
      ipc: do not use a negative value to re-enable msgmni automatic recomputing · 9eefe520
      Nadia Derbey 提交于
      This patch proposes an alternative to the "magical
      positive-versus-negative number trick" Andrew complained about last week
      in http://lkml.org/lkml/2008/6/24/418.
      
      This had been introduced with the patches that scale msgmni to the amount
      of lowmem.  With these patches, msgmni has a registered notification
      routine that recomputes msgmni value upon memory add/remove or ipc
      namespace creation/ removal.
      
      When msgmni is changed from user space (i.e.  value written to the proc
      file), that notification routine is unregistered, and the way to make it
      registered back is to write a negative value into the proc file.  This is
      the "magical positive-versus-negative number trick".
      
      To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
      This file acts as ON/OFF for msgmni automatic recomputing.
      
      With this patch, the process is the following:
      1) kernel boots in "automatic recomputing mode"
         /proc/sys/kernel/msgmni contains the value that has been computed (depends
                                 on lowmem)
         /proc/sys/kernel/automatic_msgmni contains "1"
      
      2) echo <val> > /proc/sys/kernel/msgmni
         . sets msg_ctlmni to <val>
         . de-activates automatic recomputing (i.e. if, say, some memory is added
           msgmni won't be recomputed anymore)
         . /proc/sys/kernel/automatic_msgmni now contains "0"
      
      3) echo "0" > /proc/sys/kernel/automatic_msgmni
         . de-activates msgmni automatic recomputing
           this has the same effect as 2) except that msg_ctlmni's value stays
           blocked at its current value)
      
      3) echo "1" > /proc/sys/kernel/automatic_msgmni
         . recomputes msgmni's value based on the current available memory size
           and number of ipc namespaces
         . re-activates automatic recomputing for msgmni.
      Signed-off-by: NNadia Derbey <Nadia.Derbey@bull.net>
      Cc: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9eefe520