1. 20 10月, 2012 1 次提交
  2. 17 10月, 2012 1 次提交
    • D
      mm, mempolicy: fix printing stack contents in numa_maps · 32f8516a
      David Rientjes 提交于
      When reading /proc/pid/numa_maps, it's possible to return the contents of
      the stack where the mempolicy string should be printed if the policy gets
      freed from beneath us.
      
      This happens because mpol_to_str() may return an error the
      stack-allocated buffer is then printed without ever being stored.
      
      There are two possible error conditions in mpol_to_str():
      
       - if the buffer allocated is insufficient for the string to be stored,
         and
      
       - if the mempolicy has an invalid mode.
      
      The first error condition is not triggered in any of the callers to
      mpol_to_str(): at least 50 bytes is always allocated on the stack and this
      is sufficient for the string to be written.  A future patch should convert
      this into BUILD_BUG_ON() since we know the maximum strlen possible, but
      that's not -rc material.
      
      The second error condition is possible if a race occurs in dropping a
      reference to a task's mempolicy causing it to be freed during the read().
      The slab poison value is then used for the mode and mpol_to_str() returns
      -EINVAL.
      
      This race is only possible because get_vma_policy() believes that
      mm->mmap_sem protects task->mempolicy, which isn't true.  The exit path
      does not hold mm->mmap_sem when dropping the reference or setting
      task->mempolicy to NULL: it uses task_lock(task) instead.
      
      Thus, it's required for the caller of a task mempolicy to hold
      task_lock(task) while grabbing the mempolicy and reading it.  Callers with
      a vma policy store their mempolicy earlier and can simply increment the
      reference count so it's guaranteed not to be freed.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32f8516a
  3. 13 10月, 2012 1 次提交
  4. 09 10月, 2012 5 次提交
    • N
      kpageflags: fix wrong KPF_THP on non-huge compound pages · 7a71932d
      Naoya Horiguchi 提交于
      KPF_THP can be set on non-huge compound pages (like slab pages or pages
      allocated by drivers with __GFP_COMP) because PageTransCompound only
      checks PG_head and PG_tail.  Obviously this is a bug and breaks user space
      applications which look for thp via /proc/kpageflags.
      
      This patch rules out setting KPF_THP wrongly by additionally checking
      PageLRU on the head pages.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a71932d
    • M
      rbtree: fix incorrect rbtree node insertion in fs/proc/proc_sysctl.c · ea5272f5
      Michel Lespinasse 提交于
      The recently added code to use rbtrees in sysctl did not follow the proper
      rbtree interface on insertion - it was calling rb_link_node() which
      inserts a new node into the binary tree, but missed the call to
      rb_insert_color() which properly balances the rbtree and establishes all
      expected rbtree invariants.
      
      I found out about this only because faulty commit also used
      rb_init_node(), which I am removing within this patchset.  But I think
      it's an easy mistake to make, and it makes me wonder if we should change
      the rbtree API so that insertions would be done with a single rb_insert()
      call (even if its implementation could still inline the rb_link_node()
      part and call a private __rb_insert_color function to do the rebalancing).
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea5272f5
    • M
      rbtree: empty nodes have no color · 4c199a93
      Michel Lespinasse 提交于
      Empty nodes have no color.  We can make use of this property to simplify
      the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
      we can get rid of the rb_init_node function which had been introduced by
      commit 88d19cf3 ("timers: Add rb_init_node() to allow for stack
      allocated rb nodes") to avoid some issue with the empty node's color not
      being initialized.
      
      I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
      doing there, though.  axboe introduced them in commit 10fd48f2
      ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
      see it, the 'empty node' abstraction is only used by rbtree users to
      flag nodes that they haven't inserted in any rbtree, so asking the
      predecessor or successor of such nodes doesn't make any sense.
      
      One final rb_init_node() caller was recently added in sysctl code to
      implement faster sysctl name lookups.  This code doesn't make use of
      RB_EMPTY_NODE at all, and from what I could see it only called
      rb_init_node() under the mistaken assumption that such initialization was
      required before node insertion.
      
      [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c199a93
    • D
      oom: remove deprecated oom_adj · 01dc52eb
      Davidlohr Bueso 提交于
      The deprecated /proc/<pid>/oom_adj is scheduled for removal this month.
      Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01dc52eb
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  5. 06 10月, 2012 6 次提交
  6. 27 9月, 2012 5 次提交
  7. 18 9月, 2012 3 次提交
    • E
      userns: Add kprojid_t and associated infrastructure in projid.h · f76d207a
      Eric W. Biederman 提交于
      Implement kprojid_t a cousin of the kuid_t and kgid_t.
      
      The per user namespace mapping of project id values can be set with
      /proc/<pid>/projid_map.
      
      A full compliment of helpers is provided: make_kprojid, from_kprojid,
      from_kprojid_munged, kporjid_has_mapping, projid_valid, projid_eq,
      projid_eq, projid_lt.
      
      Project identifiers are part of the generic disk quota interface,
      although it appears only xfs implements project identifiers currently.
      
      The xfs code allows anyone who has permission to set the project
      identifier on a file to use any project identifier so when
      setting up the user namespace project identifier mappings I do
      not require a capability.
      
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f76d207a
    • E
      userns: Convert the audit loginuid to be a kuid · e1760bd5
      Eric W. Biederman 提交于
      Always store audit loginuids in type kuid_t.
      
      Print loginuids by converting them into uids in the appropriate user
      namespace, and then printing the resulting uid.
      
      Modify audit_get_loginuid to return a kuid_t.
      
      Modify audit_set_loginuid to take a kuid_t.
      
      Modify /proc/<pid>/loginuid on read to convert the loginuid into the
      user namespace of the opener of the file.
      
      Modify /proc/<pid>/loginud on write to convert the loginuid
      rom the user namespace of the opener of the file.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com> ?
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e1760bd5
    • F
      fs/proc: fix potential unregister_sysctl_table hang · 6bf61045
      Francesco Ruggeri 提交于
      The unregister_sysctl_table() function hangs if all references to its
      ctl_table_header structure are not dropped.
      
      This can happen sometimes because of a leak in proc_sys_lookup():
      proc_sys_lookup() gets a reference to the table via lookup_entry(), but
      it does not release it when a subsequent call to sysctl_follow_link()
      fails.
      
      This patch fixes this leak by making sure the reference is always
      dropped on return.
      
      See also commit 076c3eed ("sysctl: Rewrite proc_sys_lookup
      introducing find_entry and lookup_entry") which reorganized this code in
      3.4.
      
      Tested in Linux 3.4.4.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6bf61045
  8. 31 7月, 2012 2 次提交
    • D
      proc: do not allow negative offsets on /proc/<pid>/environ · bc452b4b
      Djalal Harouni 提交于
      __mem_open() which is called by both /proc/<pid>/environ and
      /proc/<pid>/mem ->open() handlers will allow the use of negative offsets.
      /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ.
      
      Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open()
      to allow negative offsets only on /proc/<pid>/mem.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc452b4b
    • D
      proc: environ_read() make sure offset points to environment address range · e8905ec2
      Djalal Harouni 提交于
      Currently the following offset and environment address range check in
      environ_read() of /proc/<pid>/environ is buggy:
      
        int this_len = mm->env_end - (mm->env_start + src);
        if (this_len <= 0)
          break;
      
      Large or negative offsets on /proc/<pid>/environ converted to 'unsigned
      long' may pass this check since '(mm->env_start + src)' can overflow and
      'this_len' will be positive.
      
      This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since
      (mm->env_start + src) will point and read from another VMA.
      
      There are two fixes here plus some code cleaning:
      
      1) Fix the overflow by checking if the offset that was converted to
         unsigned long will always point to the [mm->env_start, mm->env_end]
         address range.
      
      2) Remove the truncation that was made to the result of the check,
         storing the result in 'int this_len' will alter its value and we can
         not depend on it.
      
      For kernels that have commit b409e578 ("proc: clean up
      /proc/<pid>/environ handling") which adds the appropriate ptrace check and
      saves the 'mm' at ->open() time, this is not a security issue.
      
      This patch is taken from the grsecurity patch since it was just made
      available.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8905ec2
  9. 14 7月, 2012 5 次提交
  10. 11 7月, 2012 1 次提交
  11. 05 6月, 2012 1 次提交
    • L
      vfs: Fix /proc/<tid>/fdinfo/<fd> file handling · 0640113b
      Linus Torvalds 提交于
      Cyrill Gorcunov reports that I broke the fdinfo files with commit
      30a08bf2 ("proc: move fd symlink i_mode calculations into
      tid_fd_revalidate()"), and he's quite right.
      
      The tid_fd_revalidate() function is not just used for the <tid>/fd
      symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the
      permission model for those are different.
      
      So do the dynamic symlink permission handling just for symlinks, making
      the fdinfo files once more appear as the proper regular files they are.
      
      Of course, Al Viro argued (probably correctly) that we shouldn't do the
      symlink permission games at all, and make the symlinks always just be
      the normal 'lrwxrwxrwx'.  That would have avoided this issue too, but
      since somebody noticed that the permissions had changed (which was the
      reason for that original commit 30a08bf2 in the first place), people
      do apparently use this feature.
      
      [ Basically, you can use the symlink permission data as a cheap "fdinfo"
        replacement, since you see whether the file is open for reading and/or
        writing by just looking at st_mode of the symlink.  So the feature
        does make sense, even if the pain it has caused means we probably
        shouldn't have done it to begin with. ]
      Reported-and-tested-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0640113b
  12. 01 6月, 2012 9 次提交