1. 05 8月, 2013 1 次提交
    • A
      reiserfs: fix deadlock in umount · 672fe15d
      Al Viro 提交于
      Since remove_proc_entry() started to wait for IO in progress (i.e.
      since 2007 or so), the locking in fs/reiserfs/proc.c became wrong;
      if procfs read happens between the moment when umount() locks the
      victim superblock and removal of /proc/fs/reiserfs/<device>/*,
      we'll get a deadlock - read will wait for s_umount (in sget(),
      called by r_start()), while umount will wait in remove_proc_entry()
      for that read to finish, holding s_umount all along.
      
      Fortunately, the same change allows a much simpler race avoidance -
      all we need to do is remove the procfs entries in the very beginning
      of reiserfs ->kill_sb(); that'll guarantee that pointer to superblock
      will remain valid for the duration for procfs IO, so we don't need
      sget() to keep the sucker alive.  As the matter of fact, we can
      get rid of the home-grown iterator completely, and use single_open()
      instead.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      672fe15d
  2. 29 6月, 2013 3 次提交
  3. 01 6月, 2013 3 次提交
    • J
      reiserfs: fix deadlock with nfs racing on create/lookup · a1457c0c
      Jeff Mahoney 提交于
      Reiserfs is currently able to be deadlocked by having two NFS clients
      where one has removed and recreated a file and another is accessing the
      file with an open file handle.
      
      If one client deletes and recreates a file with timing such that the
      recreated file obtains the same [dirid, objectid] pair as the original
      file while another client accesses the file via file handle, the create
      and lookup can race and deadlock if the lookup manages to create the
      in-memory inode first.
      
      The create thread, in insert_inode_locked4, will hold the write lock
      while waiting on the other inode to be unlocked. The lookup thread,
      anywhere in the iget path, will release and reacquire the write lock while
      it schedules. If it needs to reacquire the lock while the create thread
      has it, it will never be able to make forward progress because it needs
      to reacquire the lock before ultimately unlocking the inode.
      
      This patch drops the write lock across the insert_inode_locked4 call so
      that the ordering of inode_wait -> write lock is retained. Since this
      would have been the case before the BKL push-down, this is safe.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      a1457c0c
    • J
      reiserfs: fix problems with chowning setuid file w/ xattrs · 4a857011
      Jeff Mahoney 提交于
      reiserfs_chown_xattrs() takes the iattr struct passed into ->setattr
      and uses it to iterate over all the attrs associated with a file to change
      ownership of xattrs (and transfer quota associated with the xattr files).
      
      When the setuid bit is cleared during chown, ATTR_MODE and iattr->ia_mode
      are passed to all the xattrs as well. This means that the xattr directory
      will have S_IFREG added to its mode bits.
      
      This has been prevented in practice by a missing IS_PRIVATE check
      in reiserfs_acl_chmod, which caused a double-lock to occur while holding
      the write lock. Since the file system was completely locked up, the
      writeout of the corrupted mode never happened.
      
      This patch temporarily clears everything but ATTR_UID|ATTR_GID for the
      calls to reiserfs_setattr and adds the missing IS_PRIVATE check.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      4a857011
    • J
      reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentry · 0bdc7acb
      Jeff Mahoney 提交于
      After sleeping for filldir(), we check to see if the file system has
      changed and research. The next_pos pointer is updated but its value
      isn't pushed into the key used for the search itself. As a result,
      the search returns the same item that the last cycle of the loop did
      and filldir() is called multiple times with the same data.
      
      The end result is that the buffer can contain the same name multiple
      times. This can be returned to userspace or used internally in the
      xattr code where it can manifest with the following warning:
      
      jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)
      
      reiserfs_for_each_xattr uses reiserfs_readdir_dentry to iterate over
      the xattr names and ends up trying to unlink the same name twice. The
      second attempt fails with -ENOENT and the error is returned. At some
      point I'll need to add support into reiserfsck to remove the orphaned
      directories left behind when this occurs.
      
      The fix is to push the value into the key before researching.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      0bdc7acb
  4. 22 5月, 2013 2 次提交
    • L
      reiserfs: use ->invalidatepage() length argument · bad54831
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in reiserfs_invalidatepage()
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: reiserfs-devel@vger.kernel.org
      bad54831
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  5. 08 5月, 2013 1 次提交
  6. 07 5月, 2013 1 次提交
  7. 02 5月, 2013 2 次提交
  8. 10 4月, 2013 2 次提交
  9. 30 3月, 2013 1 次提交
  10. 12 3月, 2013 1 次提交
  11. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  12. 26 2月, 2013 1 次提交
  13. 23 2月, 2013 1 次提交
  14. 21 12月, 2012 1 次提交
  15. 20 11月, 2012 4 次提交
  16. 10 10月, 2012 1 次提交
    • H
      tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking · 35c2a7f4
      Hugh Dickins 提交于
      Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
      	u64 inum = fid->raw[2];
      which is unhelpfully reported as at the end of shmem_alloc_inode():
      
      BUG: unable to handle kernel paging request at ffff880061cd3000
      IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
      Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Call Trace:
       [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
       [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
       [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
       [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6
      
      Right, tmpfs is being stupid to access fid->raw[2] before validating that
      fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
      fall at the end of a page, and the next page not be present.
      
      But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
      careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
      could oops in the same way: add the missing fh_len checks to those.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      35c2a7f4
  17. 03 10月, 2012 1 次提交
  18. 21 9月, 2012 1 次提交
  19. 18 9月, 2012 1 次提交
    • E
      userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr · 5f3a4a28
      Eric W. Biederman 提交于
       - Pass the user namespace the uid and gid values in the xattr are stored
         in into posix_acl_from_xattr.
      
       - Pass the user namespace kuid and kgid values should be converted into
         when storing uid and gid values in an xattr in posix_acl_to_xattr.
      
      - Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
        pass in &init_user_ns.
      
      In the short term this change is not strictly needed but it makes the
      code clearer.  In the longer term this change is necessary to be able to
      mount filesystems outside of the initial user namespace that natively
      store posix acls in the linux xattr format.
      
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5f3a4a28
  20. 04 9月, 2012 1 次提交
  21. 15 8月, 2012 1 次提交
    • J
      reiserfs: fix deadlocks with quotas · 48d17884
      Jeff Mahoney 提交于
      The BKL push-down for reiserfs made lock recursion a special case that needs
      to be handled explicitly. One of the cases that was unhandled is dropping
      the quota during inode eviction. Both reiserfs_evict_inode and
      reiserfs_write_dquot take the write lock, but when the journal lock is
      taken it only drops one the references. The locking rules are that the journal
      lock be acquired before the write lock so leaving the reference open leads
      to a ABBA deadlock.
      
      This patch pushes the unlock up before clear_inode and avoids the recursive
      locking.
      
      Another ABBA situation can occur when the write lock is dropped while reading
      the bitmap buffer while in the quota code. When the lock is reacquired, it
      will deadlock against dquot->dq_lock and dqopt->dqio_mutex in the dquot_acquire
      path. It's safe to retain the lock across the read and should be cached under
      write load.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      48d17884
  22. 23 7月, 2012 2 次提交
    • A
      don't expose I_NEW inodes via dentry->d_inode · 8fc37ec5
      Al Viro 提交于
      	d_instantiate(dentry, inode);
      	unlock_new_inode(inode);
      
      is a bad idea; do it the other way round...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8fc37ec5
    • J
      quota: Move quota syncing to ->sync_fs method · a1177825
      Jan Kara 提交于
      Since the moment writes to quota files are using block device page cache and
      space for quota structures is reserved at the moment they are first accessed we
      have no reason to sync quota before inode writeback. In fact this order is now
      only harmful since quota information can easily change during inode writeback
      (either because conversion of delayed-allocated extents or simply because of
      allocation of new blocks for simple filesystems not using page_mkwrite).
      
      So move syncing of quota information after writeback of inodes into ->sync_fs
      method. This way we do not have to use ->quota_sync callback which is primarily
      intended for use by quotactl syscall anyway and we get rid of calling
      ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
      proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
      also support legacy non-journalled quotas) and thus there are no dirty quota
      structures.
      
      CC: "Theodore Ts'o" <tytso@mit.edu>
      CC: Joel Becker <jlbec@evilplan.org>
      CC: reiserfs-devel@vger.kernel.org
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Acked-by: NDave Kleikamp <shaggy@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a1177825
  23. 14 7月, 2012 4 次提交
  24. 01 6月, 2012 3 次提交
    • A
      reiserfs: get rid of resierfs_sync_super · 033369d1
      Artem Bityutskiy 提交于
      This patch stops reiserfs using the VFS 'write_super()' method along with the
      s_dirt flag, because they are on their way out.
      
      The whole "superblock write-out" VFS infrastructure is served by the
      'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
      writes out all dirty superblock using the '->write_super()' call-back.  But the
      problem with this thread is that it wastes power by waking up the system every
      5 seconds, even if there are no diry superblocks, or there are no client
      file-systems which would need this (e.g., btrfs does not use
      '->write_super()'). So we want to kill it completely and thus, we need to make
      file-systems to stop using the '->write_super()' VFS service, and then remove
      it together with the kernel thread.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      033369d1
    • A
      reiserfs: mark the superblock as dirty a bit later · 5c5fd819
      Artem Bityutskiy 提交于
      The 'journal_mark_dirty()' function currently first marks the superblock as
      dirty by setting 's_dirt' to 1, then does various sanity checks and returns,
      then actuall does all the magic with the journal.
      
      This is not an ideal order, though. It makes more sense to first do all the
      checks, then do all the internal stuff, and at the end notify the VFS that the
      superblock is now dirty.
      
      This patch moves the 's_dirt = 1' assignment from the very beginning of this
      function to the very end.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c5fd819
    • A
      reiserfs: remove useless superblock dirtying · 717f03c4
      Artem Bityutskiy 提交于
      The 'reiserfs_resize()' function marks the superblock as dirty by assigning 1
      to 's_dirt' and then calls 'journal_mark_dirty()' which does the same. Thus,
      we can remove the assignment from 'reiserfs_resize()'.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      717f03c4