1. 07 2月, 2008 1 次提交
    • P
      IPC: fix error check in all new xxx_lock() and xxx_exit_ns() functions · b1ed88b4
      Pierre Peiffer 提交于
      In the new implementation of the [sem|shm|msg]_lock[_check]() routines, we
      use the return value of ipc_lock() in container_of() without any check.
      But ipc_lock may return a errcode.  The use of this errcode in
      container_of() may alter this errcode, and we don't want this.
      
      And in xxx_exit_ns, the pointer return by idr_find is of type 'struct
      kern_ipc_per'...
      
      Today, the code will work as is because the member used in these
      container_of() is the first member of its container (offset == 0), the
      errcode isn't changed then.  But in the general case, we can't count on
      this assumption and this may lead later to a real bug if we don't correct
      this.
      
      Again, the proposed solution is simple and correct.  But, as pointed by
      Nadia, with this solution, the same check will be done several times (in
      all sub-callers...), what is not very funny/optimal...
      Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1ed88b4
  2. 20 10月, 2007 10 次提交
  3. 17 10月, 2007 2 次提交
    • D
      r/o bind mounts: filesystem helpers for custom 'struct file's · ce8d2cdf
      Dave Hansen 提交于
      Why do we need r/o bind mounts?
      
      This feature allows a read-only view into a read-write filesystem.  In the
      process of doing that, it also provides infrastructure for keeping track of
      the number of writers to any given mount.
      
      This has a number of uses.  It allows chroots to have parts of filesystems
      writable.  It will be useful for containers in the future because users may
      have root inside a container, but should not be allowed to write to
      somefilesystems.  This also replaces patches that vserver has had out of the
      tree for several years.
      
      It allows security enhancement by making sure that parts of your filesystem
      read-only (such as when you don't trust your FTP server), when you don't want
      to have entire new filesystems mounted, or when you want atime selectively
      updated.  I've been using the following script to test that the feature is
      working as desired.  It takes a directory and makes a regular bind and a r/o
      bind mount of it.  It then performs some normal filesystem operations on the
      three directories, including ones that are expected to fail, like creating a
      file on the r/o mount.
      
      This patch:
      
      Some filesystems forego the vfs and may_open() and create their own 'struct
      file's.
      
      This patch creates a couple of helper functions which can be used by these
      filesystems, and will provide a unified place which the r/o bind mount code
      may patch.
      
      Also, rename an existing, static-scope init_file() to a less generic name.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce8d2cdf
    • A
      ipc/shm.c: make 2 functions static · d823e3e7
      Adrian Bunk 提交于
      This patch makes two needlessly global functions static.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d823e3e7
  4. 01 8月, 2007 2 次提交
  5. 20 7月, 2007 2 次提交
    • N
      mm: fault feedback #1 · d0217ac0
      Nick Piggin 提交于
      Change ->fault prototype.  We now return an int, which contains
      VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
       FAULT_RET_ code tells the VM whether a page was found, whether it has been
      locked, and potentially other things.  This is not quite the way he wanted
      it yet, but that's changed in the next patch (which requires changes to
      arch code).
      
      This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
      that a page is locked which requires filemap_nopage to go away (because we
      can no longer remain backward compatible without that flag), but we were
      going to do that anyway.
      
      struct fault_data is renamed to struct vm_fault as Linus asked. address
      is now a void __user * that we should firmly encourage drivers not to use
      without really good reason.
      
      The page is now returned via a page pointer in the vm_fault struct.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0217ac0
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
  6. 17 7月, 2007 1 次提交
  7. 17 6月, 2007 3 次提交
  8. 02 3月, 2007 2 次提交
  9. 21 2月, 2007 1 次提交
    • E
      [PATCH] shm: make sysv ipc shared memory use stacked files · bc56bba8
      Eric W. Biederman 提交于
      The current ipc shared memory code runs into several problems because it
      does not quite use files like the rest of the kernel.  With the option of
      backing ipc shared memory with either hugetlbfs or ordinary shared memory
      the problems got worse.  With the added support for ipc namespaces things
      behaved so unexpected that we now have several bad namespace reference
      counting bugs when using what appears at first glance to be a reasonable
      idiom.
      
      So to attack these problems and hopefully make the code more maintainable
      this patch simply uses the files provided by other parts of the kernel and
      builds it's own files out of them.  The shm files are allocated in do_shmat
      and freed when their reference count drops to zero with their last unmap.
      The file and vm operations that we don't want to implement or we don't
      implement completely we just delegate to the operations of our backing
      file.
      
      This means that we now get an accurate shm_nattch count for we have a
      hugetlbfs inode for backing store, and the shm accounting of last attach
      and last detach time work as well.
      
      This means that getting a reference to the ipc namespace when we create the
      file and dropping the referenece in the release method is now safe and
      correct.
      
      This means we no longer need a special case for clearing VM_MAYWRITE
      as our file descriptor now only has write permissions when we have
      requested write access when calling shmat.  Although VM_SHARED is now
      cleared as well which I believe is harmless and is mostly likely a
      minor bug fix.
      
      By using the same set of operations for both the hugetlb case and regular
      shared memory case shmdt is not simplified and made slightly more correct
      as now the test "vma->vm_ops == &shm_vm_ops" is 100% accurate in spotting
      all shared memory regions generated from sysvipc shared memory.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc56bba8
  10. 13 2月, 2007 1 次提交
  11. 24 1月, 2007 1 次提交
  12. 09 12月, 2006 1 次提交
  13. 04 11月, 2006 1 次提交
    • P
      [PATCH] Fix ipc entries removal · c7e12b83
      Pavel Emelianov 提交于
      Fix two issuses related to ipc_ids->entries freeing.
      
      1. When freeing ipc namespace we need to free entries allocated
         with ipc_init_ids().
      
      2. When removing old entries in grow_ary() ipc_rcu_putref()
         may be called on entries set to &ids->nullentry earlier in
         ipc_init_ids().
         This is almost impossible without namespaces, but with
         them this situation becomes possible.
      
      Found during OpenVZ testing after obvious leaks in beancounters.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Kirill Korotaev <dev@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7e12b83
  14. 02 10月, 2006 1 次提交
  15. 01 7月, 2006 1 次提交
  16. 23 6月, 2006 1 次提交
  17. 20 6月, 2006 1 次提交
    • L
      [PATCH] update of IPC audit record cleanup · ac03221a
      Linda Knippers 提交于
      The following patch addresses most of the issues with the IPC_SET_PERM
      records as described in:
      https://www.redhat.com/archives/linux-audit/2006-May/msg00010.html
      and addresses the comments I received on the record field names.
      
      To summarize, I made the following changes:
      
      1. Changed sys_msgctl() and semctl_down() so that an IPC_SET_PERM
         record is emitted in the failure case as well as the success case.
         This matches the behavior in sys_shmctl().  I could simplify the
         code in sys_msgctl() and semctl_down() slightly but it would mean
         that in some error cases we could get an IPC_SET_PERM record
         without an IPC record and that seemed odd.
      
      2. No change to the IPC record type, given no feedback on the backward
         compatibility question.
      
      3. Removed the qbytes field from the IPC record.  It wasn't being
         set and when audit_ipc_obj() is called from ipcperms(), the
         information isn't available.  If we want the information in the IPC
         record, more extensive changes will be necessary.  Since it only
         applies to message queues and it isn't really permission related, it
         doesn't seem worth it.
      
      4. Removed the obj field from the IPC_SET_PERM record.  This means that
         the kern_ipc_perm argument is no longer needed.
      
      5. Removed the spaces and renamed the IPC_SET_PERM field names.  Replaced iuid and
         igid fields with ouid and ogid in the IPC record.
      
      I tested this with the lspp.22 kernel on an x86_64 box.  I believe it
      applies cleanly on the latest kernel.
      
      -- ljk
      Signed-off-by: NLinda Knippers <linda.knippers@hp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac03221a
  18. 01 5月, 2006 1 次提交
    • S
      [PATCH] Rework of IPC auditing · 073115d6
      Steve Grubb 提交于
      1) The audit_ipc_perms() function has been split into two different
      functions:
              - audit_ipc_obj()
              - audit_ipc_set_perm()
      
      There's a key shift here...  The audit_ipc_obj() collects the uid, gid,
      mode, and SElinux context label of the current ipc object.  This
      audit_ipc_obj() hook is now found in several places.  Most notably, it
      is hooked in ipcperms(), which is called in various places around the
      ipc code permforming a MAC check.  Additionally there are several places
      where *checkid() is used to validate that an operation is being
      performed on a valid object while not necessarily having a nearby
      ipcperms() call.  In these locations, audit_ipc_obj() is called to
      ensure that the information is captured by the audit system.
      
      The audit_set_new_perm() function is called any time the permissions on
      the ipc object changes.  In this case, the NEW permissions are recorded
      (and note that an audit_ipc_obj() call exists just a few lines before
      each instance).
      
      2) Support for an AUDIT_IPC_SET_PERM audit message type.  This allows
      for separate auxiliary audit records for normal operations on an IPC
      object and permissions changes.  Note that the same struct
      audit_aux_data_ipcctl is used and populated, however there are separate
      audit_log_format statements based on the type of the message.  Finally,
      the AUDIT_IPC block of code in audit_free_aux() was extended to handle
      aux messages of this new type.  No more mem leaks I hope ;-)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      073115d6
  19. 18 4月, 2006 1 次提交
    • H
      [PATCH] shmat: stop mprotect from giving write permission to a readonly attachment (CVE-2006-1524) · b78b6af6
      Hugh Dickins 提交于
      I found that all of 2.4 and 2.6 have been letting mprotect give write
      permission to a readonly attachment of shared memory, whether or not IPC
      would give the caller that permission.
      
      SUS says "The behaviour of this function [mprotect] is unspecified if the
      mapping was not established by a call to mmap", but I don't think we can
      interpret that as allowing it to subvert IPC permissions.
      
      I haven't tried 2.2, but the 2.2.26 source looks like it gets it right; and
      the patch below reproduces that behaviour - mprotect cannot be used to add
      write permission to a shared memory segment attached readonly.
      
      This patch is simple, and I'm sure it's what we should have done in 2.4.0:
      if you want to go on to switch write permission on and off with mprotect,
      just don't attach the segment readonly in the first place.
      
      However, we could have accumulated apps which attach readonly (even though
      they would be permitted to attach read/write), and which subsequently use
      mprotect to switch write permission on and off: it's not unreasonable.
      
      I was going to add a second ipcperms check in do_shmat, to check for
      writable when readonly, and if not writable find_vma and clear VM_MAYWRITE.
       But security_ipc_permission might do auditing, and it seems wrong to
      report an attempt for write permission when there has been none.  Or we
      could flag the vma as SHM, note the shmid or shp in vm_private_data, and
      then get mprotect to check.
      
      But the patch below is a lot simpler: I'd rather stick with it, if we can
      convince ourselves somehow that it'll be safe.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      b78b6af6
  20. 02 4月, 2006 1 次提交
  21. 27 3月, 2006 1 次提交
  22. 24 3月, 2006 1 次提交
    • H
      [PATCH] shmdt: check address alignment · df1e2fb5
      Hugh Dickins 提交于
      SUSv3 says the shmdt() function shall fail with EINVAL if the value of
      shmaddr is not the data segment start address of a shared memory segment:
      our sys_shmdt needs to reject a shmaddr which is not page-aligned.
      
      Does it have the potential to break existing apps?
      
      Hugh says
      
        "sys_shmdt() just does the wrong (unexpected) thing with a misaligned
        address: it'll fail on what you might expect it to succeed on, and only
        succeed on what it should definitely fail on.
      
        "That is, I think it behaves as if shmaddr gets rounded up, when the only
        understandable behaviour would be if it rounded it down.
      
        "Which does mean you'd have to be devious to see anything but EINVAL from
        a misaligned shmaddr there, so it's not terribly important."
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df1e2fb5
  23. 21 3月, 2006 1 次提交
    • D
      [PATCH] Capture selinux subject/object context information. · 8c8570fb
      Dustin Kirkland 提交于
      This patch extends existing audit records with subject/object context
      information. Audit records associated with filesystem inodes, ipc, and
      tasks now contain SELinux label information in the field "subj" if the
      item is performing the action, or in "obj" if the item is the receiver
      of an action.
      
      These labels are collected via hooks in SELinux and appended to the
      appropriate record in the audit code.
      
      This additional information is required for Common Criteria Labeled
      Security Protection Profile (LSPP).
      
      [AV: fixed kmalloc flags use]
      [folded leak fixes]
      [folded cleanup from akpm (kfree(NULL)]
      [folded audit_inode_context() leak fix]
      [folded akpm's fix for audit_ipc_perm() definition in case of !CONFIG_AUDIT]
      Signed-off-by: NDustin Kirkland <dustin.kirkland@us.ibm.com>
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8c8570fb
  24. 11 2月, 2006 1 次提交
  25. 12 1月, 2006 1 次提交