1. 19 11月, 2012 4 次提交
    • E
      pidns: Make the pidns proc mount/umount logic obvious. · 0a01f2cc
      Eric W. Biederman 提交于
      Track the number of pids in the proc hash table.  When the number of
      pids goes to 0 schedule work to unmount the kernel mount of proc.
      
      Move the mount of proc into alloc_pid when we allocate the pid for
      init.
      
      Remove the surprising calls of pid_ns_release proc in fork and
      proc_flush_task.  Those code paths really shouldn't know about proc
      namespace implementation details and people have demonstrated several
      times that finding and understanding those code paths is difficult and
      non-obvious.
      
      Because of the call path detach pid is alwasy called with the
      rtnl_lock held free_pid is not allowed to sleep, so the work to
      unmounting proc is moved to a work queue.  This has the side benefit
      of not blocking the entire world waiting for the unnecessary
      rcu_barrier in deactivate_locked_super.
      
      In the process of making the code clear and obvious this fixes a bug
      reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a
      mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
      succeeded and copy_net_ns failed.
      Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0a01f2cc
    • E
      pidns: Use task_active_pid_ns where appropriate · 17cf22c3
      Eric W. Biederman 提交于
      The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
      aka ns_of_pid(task_pid(tsk)) should have the same number of
      cache line misses with the practical difference that
      ns_of_pid(task_pid(tsk)) is released later in a processes life.
      
      Furthermore by using task_active_pid_ns it becomes trivial
      to write an unshare implementation for the the pid namespace.
      
      So I have used task_active_pid_ns everywhere I can.
      
      In fork since the pid has not yet been attached to the
      process I use ns_of_pid, to achieve the same effect.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      17cf22c3
    • E
      procfs: Don't cache a pid in the root inode. · ae06c7c8
      Eric W. Biederman 提交于
      Now that we have s_fs_info pointing to our pid namespace
      the original reason for the proc root inode having a struct
      pid is gone.
      
      Caching a pid in the root inode has led to some complicated
      code.  Now that we don't need the struct pid, just remove it.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      ae06c7c8
    • E
      procfs: Use the proc generic infrastructure for proc/self. · e656d8a6
      Eric W. Biederman 提交于
      I had visions at one point of splitting proc into two filesystems.  If
      that had happened proc/self being the the part of proc that actually deals
      with pids would have been a nice cleanup.  As it is proc/self requires
      a lot of unnecessary infrastructure for a single file.
      
      The only user visible change is that a mounted /proc for a pid namespace
      that is dead now shows a broken proc symlink, instead of being completely
      invisible.  I don't think anyone will notice or care.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e656d8a6
  2. 15 11月, 2012 2 次提交
    • E
      userns: Support fuse interacting with multiple user namespaces · 499dcf20
      Eric W. Biederman 提交于
      Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data.
      
      The connection between between a fuse filesystem and a fuse daemon is
      established when a fuse filesystem is mounted and provided with a file
      descriptor the fuse daemon created by opening /dev/fuse.
      
      For now restrict the communication of uids and gids between the fuse
      filesystem and the fuse daemon to the initial user namespace.  Enforce
      this by verifying the file descriptor passed to the mount of fuse was
      opened in the initial user namespace.  Ensuring the mount happens in
      the initial user namespace is not necessary as mounts from non-initial
      user namespaces are not yet allowed.
      
      In fuse_req_init_context convert the currrent fsuid and fsgid into the
      initial user namespace for the request that will be sent to the fuse
      daemon.
      
      In fuse_fill_attr convert the uid and gid passed from the fuse daemon
      from the initial user namespace into kuids and kgids.
      
      In iattr_to_fattr called from fuse_setattr convert kuids and kgids
      into the uids and gids in the initial user namespace before passing
      them to the fuse filesystem.
      
      In fuse_change_attributes_common called from fuse_dentry_revalidate,
      fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert
      the uid and gid from the fuse daemon into a kuid and a kgid to store
      on the fuse inode.
      
      By default fuse mounts are restricted to task whose uid, suid, and
      euid matches the fuse user_id and whose gid, sgid, and egid matches
      the fuse group id.  Convert the user_id and group_id mount options
      into kuids and kgids at mount time, and use uid_eq and gid_eq to
      compare the in fuse_allow_task.
      
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      499dcf20
    • E
      userns: Support autofs4 interacing with multiple user namespaces · 45634cd8
      Eric W. Biederman 提交于
      Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue.
      
      When creating directories and symlinks default the uid and gid of
      the mount requester to the global root uid and gid.  autofs4_wait
      will update these fields when a mount is requested.
      
      When generating autofsv5 packets report the uid and gid of the mount
      requestor in user namespace of the process that opened the pipe,
      reporting unmapped uids and gids as overflowuid and overflowgid.
      
      In autofs_dev_ioctl_requester return the uid and gid of the last mount
      requester converted into the calling processes user namespace.  When the
      uid or gid don't map return overflowuid and overflowgid as appropriate,
      allowing failure to find a mount requester to be distinguished from
      failure to map a mount requester.
      
      The uid and gid mount options specifying the user and group of the
      root autofs inode are converted into kuid and kgid as they are parsed
      defaulting to the current uid and current gid of the process that
      mounts autofs.
      
      Mounting of autofs for the present remains confined to processes in
      the initial user namespace.
      
      Cc: Ian Kent <raven@themaw.net>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      45634cd8
  3. 29 10月, 2012 1 次提交
  4. 27 10月, 2012 1 次提交
    • L
      VFS: don't do protected {sym,hard}links by default · 561ec64a
      Linus Torvalds 提交于
      In commit 800179c9 ("This adds symlink and hardlink restrictions to
      the Linux VFS"), the new link protections were enabled by default, in
      the hope that no actual application would care, despite it being
      technically against legacy UNIX (and documented POSIX) behavior.
      
      However, it does turn out to break some applications.  It's rare, and
      it's unfortunate, but it's unacceptable to break existing systems, so
      we'll have to default to legacy behavior.
      
      In particular, it has broken the way AFD distributes files, see
      
        http://www.dwd.de/AFD/
      
      along with some legacy scripts.
      
      Distributions can end up setting this at initrd time or in system
      scripts: if you have security problems due to link attacks during your
      early boot sequence, you have bigger problems than some kernel sysctl
      setting. Do:
      
      	echo 1 > /proc/sys/fs/protected_symlinks
      	echo 1 > /proc/sys/fs/protected_hardlinks
      
      to re-enable the link protections.
      
      Alternatively, we may at some point introduce a kernel config option
      that sets these kinds of "more secure but not traditional" behavioural
      options automatically.
      Reported-by: NNick Bowler <nbowler@elliptictech.com>
      Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # v3.6
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      561ec64a
  5. 26 10月, 2012 13 次提交
  6. 25 10月, 2012 1 次提交
  7. 24 10月, 2012 6 次提交
  8. 23 10月, 2012 3 次提交
  9. 22 10月, 2012 2 次提交
  10. 20 10月, 2012 1 次提交
  11. 19 10月, 2012 1 次提交
  12. 17 10月, 2012 5 次提交
    • L
      jfs: Fix FITRIM argument handling · 4e7a4b01
      Lukas Czerner 提交于
      Currently when 'range->start' is beyond the end of file system
      nothing is done and that fact is ignored, where in fact we should return
      EINVAL. The same problem is when 'range.len' is smaller than file system
      block.
      
      Fix this by adding check for such conditions and return EINVAL
      appropriately.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Acked-by: NTino Reichardt <milky-kernel@mcmilk.de>
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      4e7a4b01
    • T
      NLM: nlm_lookup_file() may return NLMv4-specific error codes · cd0b16c1
      Trond Myklebust 提交于
      If the filehandle is stale, or open access is denied for some reason,
      nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh
      or nlm4_failed. These get passed right through nlm_lookup_file(),
      and so when nlmsvc_retrieve_args() calls the latter, it needs to filter
      the result through the cast_status() machinery.
      
      Failure to do so, will trigger the BUG_ON() in encode_nlm_stat...
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Reported-by: NLarry McVoy <lm@bitmover.com>
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cd0b16c1
    • D
      mm, mempolicy: fix printing stack contents in numa_maps · 32f8516a
      David Rientjes 提交于
      When reading /proc/pid/numa_maps, it's possible to return the contents of
      the stack where the mempolicy string should be printed if the policy gets
      freed from beneath us.
      
      This happens because mpol_to_str() may return an error the
      stack-allocated buffer is then printed without ever being stored.
      
      There are two possible error conditions in mpol_to_str():
      
       - if the buffer allocated is insufficient for the string to be stored,
         and
      
       - if the mempolicy has an invalid mode.
      
      The first error condition is not triggered in any of the callers to
      mpol_to_str(): at least 50 bytes is always allocated on the stack and this
      is sufficient for the string to be written.  A future patch should convert
      this into BUILD_BUG_ON() since we know the maximum strlen possible, but
      that's not -rc material.
      
      The second error condition is possible if a race occurs in dropping a
      reference to a task's mempolicy causing it to be freed during the read().
      The slab poison value is then used for the mode and mpol_to_str() returns
      -EINVAL.
      
      This race is only possible because get_vma_policy() believes that
      mm->mmap_sem protects task->mempolicy, which isn't true.  The exit path
      does not hold mm->mmap_sem when dropping the reference or setting
      task->mempolicy to NULL: it uses task_lock(task) instead.
      
      Thus, it's required for the caller of a task mempolicy to hold
      task_lock(task) while grabbing the mempolicy and reading it.  Callers with
      a vma policy store their mempolicy earlier and can simply increment the
      reference count so it's guaranteed not to be freed.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32f8516a
    • A
      fix a leak in replace_fd() users · 45525b26
      Al Viro 提交于
      replace_fd() began with "eats a reference, tries to insert into
      descriptor table" semantics; at some point I'd switched it to
      much saner current behaviour ("try to insert into descriptor
      table, grabbing a new reference if inserted; caller should do
      fput() in any case"), but forgot to update the callers.
      Mea culpa...
      
      [Spotted by Pavel Roskin, who has really weird system with pipe-fed
      coredumps as part of what he considers a normal boot ;-)]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      45525b26
    • T
      NFSv4: Fix the return value for nfs_callback_start_svc · e9b7e917
      Trond Myklebust 提交于
      returning PTR_ERR(cb_info->task) just after we have set it to
      NULL looks like a typo...
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      e9b7e917