1. 22 6月, 2009 1 次提交
  2. 19 6月, 2009 5 次提交
  3. 10 6月, 2009 1 次提交
  4. 22 5月, 2009 2 次提交
  5. 15 4月, 2009 1 次提交
  6. 14 4月, 2009 1 次提交
  7. 07 4月, 2009 3 次提交
    • S
      namespaces: mqueue namespace: adapt sysctl · bdc8e5f8
      Serge E. Hallyn 提交于
      Largely inspired from ipc/ipc_sysctl.c.  This patch isolates the mqueue
      sysctl stuff in its own file.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NNadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdc8e5f8
    • S
      namespaces: ipc namespaces: implement support for posix msqueues · 7eafd7c7
      Serge E. Hallyn 提交于
      Implement multiple mounts of the mqueue file system, and link it to usage
      of CLONE_NEWIPC.
      
      Each ipc ns has a corresponding mqueuefs superblock.  When a user does
      clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
      internal mount of a new mqueuefs sb linked to the new ipc ns.
      
      When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
      mqueuefs superblock.
      
      Posix message queues can be worked with both through the mq_* system calls
      (see mq_overview(7)), and through the VFS through the mqueue mount.  Any
      usage of mq_open() and friends will work with the acting task's ipc
      namespace.  Any actions through the VFS will work with the mqueuefs in
      which the file was created.  So if a user doesn't remount mqueuefs after
      unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
      /dev/mqueue".
      
      If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
      ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
      ipc_ns:1 will be freed, (2) it's superblock will live on until task b
      umounts the corresponding mqueuefs, and vfs actions will continue to
      succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
      the deceased ipc_ns:1.
      
      To make this happen, we must protect the ipc reference count when
      
      a) a task exits and drops its ipcns->count, since it might be dropping
         it to 0 and freeing the ipcns
      
      b) a task accesses the ipcns through its mqueuefs interface, since it
         bumps the ipcns refcount and might race with the last task in the ipcns
         exiting.
      
      So the kref is changed to an atomic_t so we can use
      atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
      through ns = mqueuefs_sb->s_fs_info is protected by the same lock.
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7eafd7c7
    • S
      namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespace · 614b84cf
      Serge E. Hallyn 提交于
      Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
      The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
      posix message queue namespaces and the SYSV ipc namespaces.
      
      The sysctl code will be fixed separately in patch 3.  After just this
      patch, making a change to posix mqueue tunables always changes the values
      in the initial ipc namespace.
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      614b84cf
  8. 03 4月, 2009 2 次提交
  9. 01 4月, 2009 1 次提交
  10. 16 3月, 2009 1 次提交
    • J
      Use f_lock to protect f_flags · db1dd4d3
      Jonathan Corbet 提交于
      Traditionally, changes to struct file->f_flags have been done under BKL
      protection, or with no protection at all.  This patch causes all f_flags
      changes after file open/creation time to be done under protection of
      f_lock.  This allows the removal of some BKL usage and fixes a number of
      longstanding (if microscopic) races.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      db1dd4d3
  11. 11 2月, 2009 1 次提交
    • M
      Do not account for the address space used by hugetlbfs using VM_ACCOUNT · 5a6fe125
      Mel Gorman 提交于
      When overcommit is disabled, the core VM accounts for pages used by anonymous
      shared, private mappings and special mappings. It keeps track of VMAs that
      should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
      with VM_NORESERVE.
      
      Overcommit for hugetlbfs is much riskier than overcommit for base pages
      due to contiguity requirements. It avoids overcommiting on both shared and
      private mappings using reservation counters that are checked and updated
      during mmap(). This ensures (within limits) that hugepages exist in the
      future when faults occurs or it is too easy to applications to be SIGKILLed.
      
      As hugetlbfs makes its own reservations of a different unit to the base page
      size, VM_ACCOUNT should never be set. Even if the units were correct, we would
      double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
      be set because an application can request no reserves be made for hugetlbfs
      at the risk of getting killed later.
      
      With commit fc8744ad, VM_NORESERVE and
      VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
      breaks the accounting for both the core VM and hugetlbfs, can trigger an
      OOM storm when hugepage pools are too small lockups and corrupted counters
      otherwise are used. This patch brings hugetlbfs more in line with how the
      core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a6fe125
  12. 06 2月, 2009 2 次提交
  13. 01 2月, 2009 1 次提交
    • L
      Stop playing silly games with the VM_ACCOUNT flag · fc8744ad
      Linus Torvalds 提交于
      The mmap_region() code would temporarily set the VM_ACCOUNT flag for
      anonymous shared mappings just to inform shmem_zero_setup() that it
      should enable accounting for the resulting shm object.  It would then
      clear the flag after calling ->mmap (for the /dev/zero case) or doing
      shmem_zero_setup() (for the MAP_ANON case).
      
      This just resulted in vma merge issues, but also made for just
      unnecessary confusion.  Use the already-existing VM_NORESERVE flag for
      this instead, and let shmem_{zero|file}_setup() just figure it out from
      that.
      
      This also happens to make it obvious that the new DRI2 GEM layer uses a
      non-reserving backing store for its object allocation - which is quite
      possibly not intentional.  But since I didn't want to change semantics
      in this patch, I left it alone, and just updated the caller to use the
      new flag semantics.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc8744ad
  14. 14 1月, 2009 5 次提交
  15. 09 1月, 2009 1 次提交
    • S
      mqueue: fix si_pid value in mqueue do_notify() · a6684999
      Sukadev Bhattiprolu 提交于
      If a process registers for asynchronous notification on a POSIX message
      queue, it gets a signal and a siginfo_t structure when a message arrives
      on the message queue.  The si_pid in the siginfo_t structure is set to the
      PID of the process that sent the message to the message queue.
      
      The principle is the following:
      . when mq_notify(SIGEV_SIGNAL) is called, the caller registers for
        notification when a msg arrives. The associated pid structure is stroed into
        inode_info->notify_owner. Let's call this process P1.
      . when mq_send() is called by say P2, P2 sends a signal to P1 to notify
        him about msg arrival.
      
      The way .si_pid is set today is not correct, since it doesn't take into account
      the fact that the process that is sending the message might not be in the
      same namespace as the notified one.
      
      This patch proposes to set si_pid to the sender's pid into the notify_owner
      namespace.
      Signed-off-by: NNadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Bastian Blank <bastian@waldi.eu.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6684999
  16. 08 1月, 2009 1 次提交
    • D
      NOMMU: Make VMAs per MM as for MMU-mode linux · 8feae131
      David Howells 提交于
      Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:
      
       (1) In SYSV SHM where nattch for a segment does not reflect the number of
           shmat's (and forks) done.
      
       (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
           exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
           that a VMA might be shared and already have its vm_mm assigned to another
           process or a dead process.
      
      A new struct (vm_region) is introduced to track a mapped region and to remember
      the circumstances under which it may be shared and the vm_list_struct structure
      is discarded as it's no longer required.
      
      This patch makes the following additional changes:
      
       (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
           with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
           each page has a reference on it held by the region.  Anything else that is
           interested in such a page will have to get a reference on it to retain it.
           When the pages are released due to unmapping, each page is passed to
           put_page() and will be freed when the page usage count reaches zero.
      
       (2) Excess pages are trimmed after an allocation as the allocation must be
           made as a power-of-2 quantity of pages.
      
       (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
           end up with overlapping VMAs within the tree, the VMA struct address is
           appended to the sort key.
      
       (4) Non-anonymous VMAs are now added to the backing inode's prio list.
      
       (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
           the backing region.  The VMA and region structs will be split if
           necessary.
      
       (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
           segment instead of all the attachments at that addresss.  Multiple
           shmat()'s return the same address under NOMMU-mode instead of different
           virtual addresses as under MMU-mode.
      
       (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.
      
       (8) /proc/maps is now the global list of mapped regions, and may list bits
           that aren't actually mapped anywhere.
      
       (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
           of RAM currently allocated by mmap to hold mappable regions that can't be
           mapped directly.  These are copies of the backing device or file if not
           anonymous.
      
      These changes make NOMMU mode more similar to MMU mode.  The downside is that
      NOMMU mode requires some extra memory to track things over NOMMU without this
      patch (VMAs are no longer shared, and there are now region structs).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      8feae131
  17. 07 1月, 2009 3 次提交
  18. 06 1月, 2009 2 次提交
  19. 05 1月, 2009 6 次提交