1. 27 8月, 2013 3 次提交
    • E
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman 提交于
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
    • E
      vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces · 4ce5d2b1
      Eric W. Biederman 提交于
      Don't copy bind mounts of /proc/<pid>/ns/mnt between namespaces.
      These files hold references to a mount namespace and copying them
      between namespaces could result in a reference counting loop.
      
      The current mnt_ns_loop test prevents loops on the assumption that
      mounts don't cross between namespaces.  Unfortunately unsharing a
      mount namespace and shared substrees can both cause mounts to
      propogate between mount namespaces.
      
      Add two flags CL_COPY_UNBINDABLE and CL_COPY_MNT_NS_FILE are added to
      control this behavior, and CL_COPY_ALL is redefined as both of them.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4ce5d2b1
    • E
      proc: Restrict mounting the proc filesystem · aee1c13d
      Eric W. Biederman 提交于
      Don't allow mounting the proc filesystem unless the caller has
      CAP_SYS_ADMIN rights over the pid namespace.  The principle here is if
      you create or have capabilities over it you can mount it, otherwise
      you get to live with what other people have mounted.
      
      Andy pointed out that this is needed to prevent users in a user
      namespace from remounting proc and specifying different hidepid and gid
      options on already existing proc mounts.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      aee1c13d
  2. 25 7月, 2013 1 次提交
    • E
      vfs: Lock in place mounts from more privileged users · 5ff9d8a6
      Eric W. Biederman 提交于
      When creating a less privileged mount namespace or propogating mounts
      from a more privileged to a less privileged mount namespace lock the
      submounts so they may not be unmounted individually in the child mount
      namespace revealing what is under them.
      
      This enforces the reasonable expectation that it is not possible to
      see under a mount point.  Most of the time mounts are on empty
      directories and revealing that does not matter, however I have seen an
      occassionaly sloppy configuration where there were interesting things
      concealed under a mount point that probably should not be revealed.
      
      Expirable submounts are not locked because they will eventually
      unmount automatically so whatever is under them already needs
      to be safe for unprivileged users to access.
      
      From a practical standpoint these restrictions do not appear to be
      significant for unprivileged users of the mount namespace.  Recursive
      bind mounts and pivot_root continues to work, and mounts that are
      created in a mount namespace may be unmounted there.  All of which
      means that the common idiom of keeping a directory of interesting
      files and using pivot_root to throw everything else away continues to
      work just fine.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5ff9d8a6
  3. 14 7月, 2013 3 次提交
  4. 13 7月, 2013 3 次提交
  5. 12 7月, 2013 3 次提交
  6. 11 7月, 2013 12 次提交
  7. 10 7月, 2013 15 次提交
    • C
      xfs: fix sgid inheritance for subdirectories inheriting default acls [V3] · 42c49d7f
      Carlos Maiolino 提交于
      XFS removes sgid bits of subdirectories under a directory containing a default
      acl.
      
      When a default acl is set, it implies xfs to call xfs_setattr_nonsize() in its
      code path. Such function is shared among mkdir and chmod system calls, and
      does some checks unneeded by mkdir (calling inode_change_ok()). Such checks
      remove sgid bit from the inode after it has been granted.
      
      With this patch, we extend the meaning of XFS_ATTR_NOACL flag to avoid these
      checks when acls are being inherited (thanks hch).
      
      Also, xfs_setattr_mode, doesn't need to re-check for group id and capabilities
      permissions, this only implies in another try to remove sgid bit from the
      directories. Such check is already done either on inode_change_ok() or
      xfs_setattr_nonsize().
      
      Changelog:
      
      V2: Extends the meaning of XFS_ATTR_NOACL instead of wrap the tests into another
          function
      
      V3: Remove S_ISDIR check in xfs_setattr_nonsize() from the patch
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      42c49d7f
    • M
      Use ecryptfs_dentry_to_lower_path in a couple of places · cc18ec3c
      Matthew Wilcox 提交于
      There are two places in ecryptfs that benefit from using
      ecryptfs_dentry_to_lower_path() instead of separate calls to
      ecryptfs_dentry_to_lower() and ecryptfs_dentry_to_lower_mnt().  Both
      sites use fewer instructions and less stack (determined by examining
      objdump output).
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      cc18ec3c
    • S
      NFS: Allow nfs_updatepage to extend a write under additional circumstances · c7559663
      Scott Mayhew 提交于
      Currently nfs_updatepage allows a write to be extended to cover a full
      page only if we don't have a byte range lock lock on the file... but if
      we have a write delegation on the file or if we have the whole file
      locked for writing then we should be allowed to extend the write as
      well.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      [Trond: fix up call to nfs_have_delegation()]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c7559663
    • D
      xfs: dquot log reservations are too small · b0a9dab7
      Dave Chinner 提交于
      During review of the separate project quota inode patches, it became
      obvious that the dquot log reservation calculation underestimated
      the number dquots that can be modified in a transaction. This has
      it's roots way back in the Irix quota implementation.
      
      That is, when quotas were first implemented in XFS, it only
      supported user and project quotas as Irix did not have group quotas.
      Hence the worst case operation involving dquot modification was
      calculated to involve 2 user dquots and 1 project dquot or 1 user
      dequot and 2 project dquots. i.e. 3 dquots. This was determined back
      in 1996, and has remained unchanged ever since.
      
      However, back in 2001, the Linux XFS port dropped all support for
      project quota and implmented group quotas over the top. This was
      effectively done with a search-and-replace of project with group,
      and as such the log reservation was not changed. However, with the
      advent of group quotas, chmod and rename now could modify more than
      3 dquots in a single transaction - both could modify 4 dquots. Hence
      this log reservation has been wrong for a long time.
      
      In 2005, project quota support was reintroduced into Linux, but it
      was implemented to be mutually exclusive to group quotas and so this
      didn't add any new changes to the dquot log reservation. Hence when
      project quotas were in use (rather than group quotas) the log
      reservation was again valid, just like in the Irix days.
      
      Now, with the addition of the separate project quota inode, group
      and project quotas are no longer mutually exclusive, and hence
      operations can now modify three dquots per inode where previously it
      was only two. The worst case here is the rename transaction, which
      can allocate/free space on two different directory inodes, and if
      they have different uid/gid/prid configurations and are world
      writeable, then rename can actually modify 6 different dquots now.
      
      Further, the dquot log reservation doesn't take into account the
      space used by the dquot log format structure that precedes the dquot
      that is logged, and hence further underestimates the worst case
      log space required by dquots during a transaction. This has been
      missing since the first commit in 1996.
      
      Hence the worst case log reservation needs to be increased from 3 to
      6, and it needs to take into account a log format header for each of
      those dquots.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0a9dab7
    • D
      xfs: remove local fork format handling from xfs_bmapi_write() · f3508bcd
      Dave Chinner 提交于
      The conversion from local format to extent format requires
      interpretation of the data in the fork being converted, so it cannot
      be done in a generic way. It is up to the caller to convert the fork
      format to extent format before calling into xfs_bmapi_write() so
      format conversion can be done correctly.
      
      The code in xfs_bmapi_write() to convert the format is used
      implicitly by the attribute and directory code, but they
      specifically zero the fork size so that the conversion does not do
      any allocation or manipulation. Move this conversion into the
      shortform to leaf functions for the dir/attr code so the conversions
      are explicitly controlled by all callers.
      
      Now we can remove the conversion code in xfs_bmapi_write.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f3508bcd
    • S
      NFS: Make nfs_readdir revalidate less often · 07b5ce8e
      Scott Mayhew 提交于
      Make nfs_readdir revalidate only when we're at the beginning of the directory or
      if the cached attributes have expired.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      07b5ce8e
    • S
      NFS: Make nfs_attribute_cache_expired() non-static · 43f291cd
      Scott Mayhew 提交于
      NFS: Make nfs_attribute_cache_expired() non-static so we can call it from
      nfs_readdir().
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      43f291cd
    • J
      nfs: set verifier on existing dentries in nfs_prime_dcache · cda57a1e
      Jeff Layton 提交于
      nfs_prime_dcache currently only sets the verifier when it doesn't
      initially a matching dentry in the dcache. Set the verifier in the case
      where we do find a dentry in the dcache. This ensures that we don't
      have to look up the dentry again if we want to use it after a readdir.
      
      Cc: Scott Mayhew <smayhew@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      cda57a1e
    • Y
      xfs: use get_unused_fd_flags(0) instead of get_unused_fd() · 862a6293
      Yann Droneaud 提交于
      Macro get_unused_fd() is used to allocate a file descriptor with
      default flags. Those default flags (0) can be "unsafe":
      O_CLOEXEC must be used by default to not leak file descriptor
      across exec().
      
      Instead of macro get_unused_fd(), functions anon_inode_getfd()
      or get_unused_fd_flags() should be used with flags given by userspace.
      If not possible, flags should be set to O_CLOEXEC to provide userspace
      with a default safe behavor.
      
      In a further patch, get_unused_fd() will be removed so that
      new code start using anon_inode_getfd() or get_unused_fd_flags()
      with correct flags.
      
      This patch replaces calls to get_unused_fd() with equivalent call to
      get_unused_fd_flags(0) to preserve current behavor for existing code.
      
      The hard coded flag value (0) should be reviewed on a per-subsystem basis,
      and, if possible, set to O_CLOEXEC.
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      862a6293
    • J
      xfs: clean up unused codes at xfs_bulkstat() · 9cee4c5b
      Jie Liu 提交于
      There are some unused codes at xfs_bulkstat():
      
      - Variable bp is defined to point to the on-disk inode cluster
        buffer, but it proved to be of no practical help.
      
      - We process the chunks of good inodes which were fetched by iterating
        btree records from an AG.  When processing inodes from each chunk,
        the code recomputing agbno if run into the first inode of a cluster,
        however, the agbno is not being used thereafter.
      
      This patch tries to clean up those things.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9cee4c5b
    • E
      net/fs: change busy poll time accounting · 76b1e9b9
      Eliezer Tamir 提交于
      Suggested by Linus:
      Changed time accounting for busy-poll:
      - Make it microsecond based.
      - Use unsigned longs.
      - Revert back to use time_after instead of time_in_range.
      Reorder poll/select busy loop conditions:
      - Clear busy_flag after one time we can't busy-poll.
      - Only init busy_end if we actually are going to busy-poll.
      Added one more missing need_resched() test.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76b1e9b9
    • E
      xfs: use XFS_BMAP_BMDR_SPACE vs. XFS_BROOT_SIZE_ADJ · a69c7c07
      Eric Sandeen 提交于
      XFS_BROOT_SIZE_ADJ is an undocumented macro which accounts for
      the difference in size between the on-disk and in-core btree
      root.  It's much clearer to just use the newly-added
      XFS_BMAP_BMDR_SPACE macro which gives us the on-disk size
      directly.
      
      In one case, we must test that the if_broot exists before
      applying the macro, however.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a69c7c07
    • M
      fatfs: add FAT_IOCTL_GET_VOLUME_ID · 6e5b93ee
      Mike Lockwood 提交于
      This patch, originally from Android kernel, adds vfat ioctl command
      FAT_IOCTL_GET_VOLUME_ID, with this command we can get the vfat volume ID
      using following code:
      
      	ioctl(fd, FAT_IOCTL_GET_VOLUME_ID, &volume_ID)
      
      This patch is a modified version of the patch by Mike Lockwood, with
      changes from Dmitry Pervushin, who noticed the original patch makes some
      volume IDs abiguous with error returns: for example, if volume id is
      0xFFFFFDAD, that matches -ENOIOCTLCMD, we get "FFFFFFFF" from the user
      space.
      
      So add a parameter to ioctl to get the correct volume ID.
      
      Android uses vfat volume ID to identify different sd card, when a new sd
      card is inserted to device, android can scan the media on it and pop up
      new contents.
      Signed-off-by: NBintian Wang <bintian.wang@linaro.org>
      Cc: dmitry pervushin <dpervushin@gmail.com>
      Cc: Mike Lockwood <lockwood@android.com>
      Cc: Colin Cross <ccross@android.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Sean McNeil <sean@mcneil.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e5b93ee
    • W
      ncpfs: fix error return code in ncp_parse_options() · 2417898b
      Wei Yongjun 提交于
      Fix to return -EINVAL from the option parse error handling case instead
      of 0, as done elsewhere in this function.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2417898b
    • W
      mm/writeback: don't check force_wait to handle bdi->work_list · 25d130ba
      Wanpeng Li 提交于
      After commit 839a8e86 ("writeback: replace custom worker pool
      implementation with unbound workqueue"), bdi_writeback_workfn runs off
      bdi_writeback->dwork, on each execution, it processes bdi->work_list and
      reschedules if there are more things to do instead of flush any work
      that race with us existing.  It is unecessary to check force_wait in
      wb_do_writeback since it is always 0 after the mentioned commit.  This
      patch remove the force_wait in wb_do_writeback.
      Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25d130ba