1. 19 10月, 2015 1 次提交
    • R
      ext2: Add locking for DAX faults · 5726b27b
      Ross Zwisler 提交于
      Add locking to ensure that DAX faults are isolated from ext2 operations
      that modify the data blocks allocation for an inode.  This is intended to
      be analogous to the work being done in XFS by Dave Chinner:
      
      http://www.spinics.net/lists/linux-fsdevel/msg90260.html
      
      Compared with XFS the ext2 case is greatly simplified by the fact that ext2
      already allocates and zeros new blocks before they are returned as part of
      ext2_get_block(), so DAX doesn't need to worry about getting unmapped or
      unwritten buffer heads.
      
      This means that the only work we need to do in ext2 is to isolate the DAX
      faults from inode block allocation changes.  I believe this just means that
      we need to isolate the DAX faults from truncate operations.
      
      The newly introduced dax_sem is intended to replicate the protection
      offered by i_mmaplock in XFS.  In addition to truncate the i_mmaplock also
      protects XFS operations like hole punching, fallocate down, extent
      manipulation IOCTLS like xfs_ioc_space() and extent swapping.  Truncate is
      the only one of these operations supported by ext2.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.com>
      5726b27b
  2. 18 6月, 2015 1 次提交
    • T
      vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB · 46b15caa
      Tejun Heo 提交于
      FS_CGROUP_WRITEBACK indicates whether a file_system_type supports
      cgroup writeback; however, different super_blocks of the same
      file_system_type may or may not support cgroup writeback depending on
      filesystem options.  This patch replaces FS_CGROUP_WRITEBACK with a
      per-super_block flag.
      
      super_block->s_flags carries some internal flags in the high bits but
      it's exposd to userland through uapi header and running out of space
      anyway.  This patch adds a new field super_block->s_iflags to carry
      kernel-internal flags.  It is currently only used by the new
      SB_I_CGROUPWB flag whose concatenated and abbreviated name is for
      consistency with other super_block flags.
      
      ext2_fill_super() is updated to set SB_I_CGROUPWB.
      
      v2: Added super_block->s_iflags instead of stealing another high bit
          from sb->s_flags as suggested by Christoph and Jan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-ext4@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      46b15caa
  3. 02 6月, 2015 1 次提交
    • T
      ext2: enable cgroup writeback support · 108dad65
      Tejun Heo 提交于
      Writeback now supports cgroup writeback and the generic writeback,
      buffer, libfs, and mpage helpers that ext2 uses are all updated to
      work with cgroup writeback.
      
      This patch enables cgroup writeback for ext2 by adding
      FS_CGROUP_WRITEBACK to its ->fs_flags.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-ext4@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      108dad65
  4. 17 2月, 2015 4 次提交
  5. 10 11月, 2014 1 次提交
  6. 08 9月, 2014 1 次提交
    • T
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
  7. 16 7月, 2014 1 次提交
  8. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  9. 03 3月, 2014 1 次提交
  10. 04 12月, 2013 1 次提交
  11. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  12. 21 1月, 2013 1 次提交
  13. 10 10月, 2012 1 次提交
  14. 03 10月, 2012 1 次提交
  15. 31 7月, 2012 1 次提交
    • J
      ext2: Implement freezing · 1e8b212f
      Jan Kara 提交于
      The only missing piece to make freezing work reliably with ext2 is to
      stop iput() of unlinked inode from deleting the inode on frozen filesystem.
      So add a necessary protection to ext2_evict_inode().
      
      We also provide appropriate ->freeze_fs and ->unfreeze_fs functions.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1e8b212f
  16. 23 7月, 2012 1 次提交
    • J
      quota: Move quota syncing to ->sync_fs method · a1177825
      Jan Kara 提交于
      Since the moment writes to quota files are using block device page cache and
      space for quota structures is reserved at the moment they are first accessed we
      have no reason to sync quota before inode writeback. In fact this order is now
      only harmful since quota information can easily change during inode writeback
      (either because conversion of delayed-allocated extents or simply because of
      allocation of new blocks for simple filesystems not using page_mkwrite).
      
      So move syncing of quota information after writeback of inodes into ->sync_fs
      method. This way we do not have to use ->quota_sync callback which is primarily
      intended for use by quotactl syscall anyway and we get rid of calling
      ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
      proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
      also support legacy non-journalled quotas) and thus there are no dirty quota
      structures.
      
      CC: "Theodore Ts'o" <tytso@mit.edu>
      CC: Joel Becker <jlbec@evilplan.org>
      CC: reiserfs-devel@vger.kernel.org
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Acked-by: NDave Kleikamp <shaggy@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a1177825
  17. 09 7月, 2012 1 次提交
  18. 16 5月, 2012 3 次提交
  19. 11 4月, 2012 3 次提交
    • A
      ext2: do not register write_super within VFS · f72cf5e2
      Artem Bityutskiy 提交于
      Jan Kara removed 'sb->s_dirt' VFS flag references, so we do not need to
      register the ext2 'ext2_write_super()' method in the VFS superblock operations,
      because 'sb->s_dirt' won't be ever set to 1 and VFS won't ever call
      '->write_super()' anyway. Thus, remove the method.
      
      Tested using xfstests.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      f72cf5e2
    • J
      ext2: Remove s_dirt handling · b838ec22
      Jan Kara 提交于
      Places which modify superblock feature / state fields mark the superblock
      buffer dirty so it is written out by flusher thread. Thus there's no need to
      set s_dirt there.
      
      The only other fields changing in the superblock are the numbers of free
      blocks, free inodes and s_wtime. There's no real need to write (or even
      compute) these periodically. Free blocks / inodes counters are recomputed on
      every mount from group counters anyway and value of s_wtime is only
      informational and imprecise anyway. So it should be enough to write these
      opportunistically on mount, remount, umount, and sync_fs times.
      Signed-off-by: NJan Kara <jack@suse.cz>
      b838ec22
    • A
      ext2: write superblock only once on unmount · f2b22420
      Artem Bityutskiy 提交于
      Currently on unmount if we are mounted R/W, we first write the superblock to
      the media if it is dirty, and then write it again, which is not optimal. This
      patch makes ext2 write the superblock on unmount less times.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      f2b22420
  20. 21 3月, 2012 2 次提交
  21. 09 1月, 2012 1 次提交
  22. 07 1月, 2012 1 次提交
  23. 04 1月, 2012 1 次提交
    • A
      vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05
      Al Viro 提交于
      Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
      it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
      the cost of taking it into inode_init_always() will be negligible for pipes
      and sockets and negative for everything else.  Not to mention the removal of
      boilerplate code from ->destroy_inode() instances...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b520e05
  24. 30 8月, 2011 1 次提交
  25. 17 5月, 2011 1 次提交
  26. 31 3月, 2011 1 次提交
  27. 07 1月, 2011 1 次提交
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  28. 06 1月, 2011 1 次提交
  29. 29 10月, 2010 1 次提交
  30. 26 10月, 2010 1 次提交
  31. 05 10月, 2010 2 次提交
    • J
      BKL: Remove BKL from ext2 filesystem · 3e44f9f1
      Jan Blunck 提交于
      The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs()
      ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(),
      ext2_fill_super() and ext2_remount() are protected against each other by
      the struct super_block s_umount rw semaphore. The call in ext2_write_inode()
      could only protect the modification of the ext2_sb_info through
      ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount().
      ext2_fill_super() and ext2_put_super() can be left out because you need a
      valid filesystem reference in all three cases, which you do not have when
      you are one of these functions.
      
      If the BKL is only protecting the modification of the ext2_sb_info it can
      safely be removed since this is protected by the struct ext2_sb_info s_lock.
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      3e44f9f1
    • J
      BKL: Explicitly add BKL around get_sb/fill_super · db719222
      Jan Blunck 提交于
      This patch is a preparation necessary to remove the BKL from do_new_mount().
      It explicitly adds calls to lock_kernel()/unlock_kernel() around
      get_sb/fill_super operations for filesystems that still uses the BKL.
      
      I've read through all the code formerly covered by the BKL inside
      do_kern_mount() and have satisfied myself that it doesn't need the BKL
      any more.
      
      do_kern_mount() is already called without the BKL when mounting the rootfs
      and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
      from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
      through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
      afs_mntpt_follow_link(). Both later functions are actually the filesystems
      follow_link inode operation. vfs_kern_mount() is calling the specified
      get_sb function and lets the filesystem do its job by calling the given
      fill_super function.
      
      Therefore I think it is safe to push down the BKL from the VFS to the
      low-level filesystems get_sb/fill_super operation.
      
      [arnd: do not add the BKL to those file systems that already
             don't use it elsewhere]
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      db719222