1. 18 3月, 2016 1 次提交
    • J
      fs crypto: move per-file encryption from f2fs tree to fs/crypto · 0b81d077
      Jaegeuk Kim 提交于
      This patch adds the renamed functions moved from the f2fs crypto files.
      
      1. definitions for per-file encryption used by ext4 and f2fs.
      
      2. crypto.c for encrypt/decrypt functions
       a. IO preparation:
        - fscrypt_get_ctx / fscrypt_release_ctx
       b. before IOs:
        - fscrypt_encrypt_page
        - fscrypt_decrypt_page
        - fscrypt_zeroout_range
       c. after IOs:
        - fscrypt_decrypt_bio_pages
        - fscrypt_pullback_bio_page
        - fscrypt_restore_control_page
      
      3. policy.c supporting context management.
       a. For ioctls:
        - fscrypt_process_policy
        - fscrypt_get_policy
       b. For context permission
        - fscrypt_has_permitted_context
        - fscrypt_inherit_context
      
      4. keyinfo.c to handle permissions
        - fscrypt_get_encryption_info
        - fscrypt_free_encryption_info
      
      5. fname.c to support filename encryption
       a. general wrapper functions
        - fscrypt_fname_disk_to_usr
        - fscrypt_fname_usr_to_disk
        - fscrypt_setup_filename
        - fscrypt_free_filename
      
       b. specific filename handling functions
        - fscrypt_fname_alloc_buffer
        - fscrypt_fname_free_buffer
      
      6. Makefile and Kconfig
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: NIldar Muslukhov <ildarm@google.com>
      Signed-off-by: NUday Savagaonkar <savagaon@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b81d077
  2. 31 1月, 2016 1 次提交
    • D
      block: revert runtime dax control of the raw block device · 9f4736fe
      Dan Williams 提交于
      Dynamically enabling DAX requires that the page cache first be flushed
      and invalidated.  This must occur atomically with the change of DAX mode
      otherwise we confuse the fsync/msync tracking and violate data
      durability guarantees.  Eliminate the possibilty of DAX-disabled to
      DAX-enabled transitions for now and revisit this for the next cycle.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f4736fe
  3. 09 1月, 2016 2 次提交
    • D
      block: enable dax for raw block devices · 5a023cdb
      Dan Williams 提交于
      If an application wants exclusive access to all of the persistent memory
      provided by an NVDIMM namespace it can use this raw-block-dax facility
      to forgo establishing a filesystem.  This capability is targeted
      primarily to hypervisors wanting to provision persistent memory for
      guests.  It can be disabled / enabled dynamically via the new BLKDAXSET
      ioctl.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5a023cdb
    • T
      fs: clean up the flags definition in uapi/linux/fs.h · 68ce7bfc
      Theodore Ts'o 提交于
      Add an explanation for the flags used by FS_IOC_[GS]ETFLAGS and remind
      people that changes should be revised by linux-fsdevel and linux-api.
      
      Add flags that are used on-disk for ext4, and remove FS_DIRECTIO_FL
      since it was used only by gfs2 and support was removed in 2008 in
      commit c9f6a6bb ("The ability to mark files for direct i/o access
      when opened normally is both unused and pointless, so this patch
      removes support for that feature.")  Now we have _two_ remaining flags
      left.  But since we want to discourage people from assigning new
      flags, that's OK.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      68ce7bfc
  4. 04 1月, 2016 2 次提交
    • D
      xfs: introduce per-inode DAX enablement · 58f88ca2
      Dave Chinner 提交于
      Rather than just being able to turn DAX on and off via a mount
      option, some applications may only want to enable DAX for certain
      performance critical files in a filesystem.
      
      This patch introduces a new inode flag to enable DAX in the v3 inode
      di_flags2 field. It adds support for setting and clearing flags in
      the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
      S_DAX inode flag appropriately when it is seen.
      
      When this flag is set on a directory, it acts as an "inherit flag".
      That is, inodes created in the directory will automatically inherit
      the on-disk inode DAX flag, enabling administrators to set up
      directory heirarchies that automatically use DAX. Setting this flag
      on an empty root directory will make the entire filesystem use DAX
      by default.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      58f88ca2
    • D
      fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR promotion · 334e580a
      Dave Chinner 提交于
      Hoist the ioctl definitions for the XFS_IOC_FS[SG]SETXATTR API from
      fs/xfs/libxfs/xfs_fs.h to include/uapi/linux/fs.h so that the ioctls
      can be used by all filesystems, not just XFS. This enables
      (initially) ext4 to use the ioctl to set project IDs on inodes.
      
      Based-on-patch-from: Li Xi <lixi@ddn.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      334e580a
  5. 01 1月, 2016 1 次提交
  6. 08 12月, 2015 1 次提交
    • C
      vfs: pull btrfs clone API to vfs layer · 04b38d60
      Christoph Hellwig 提交于
      The btrfs clone ioctls are now adopted by other file systems, with NFS
      and CIFS already having support for them, and XFS being under active
      development.  To avoid growth of various slightly incompatible
      implementations, add one to the VFS.  Note that clones are different from
      file copies in several ways:
      
       - they are atomic vs other writers
       - they support whole file clones
       - they support 64-bit legth clones
       - they do not allow partial success (aka short writes)
       - clones are expected to be a fast metadata operation
      
      Because of that it would be rather cumbersome to try to piggyback them on
      top of the recent clone_file_range infrastructure.  The converse isn't
      true and the clone_file_range system call could try clone file range as
      a first attempt to copy, something that further patches will enable.
      
      Based on earlier work from Peng Tao.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      04b38d60
  7. 18 10月, 2015 1 次提交
  8. 05 2月, 2015 1 次提交
    • T
      vfs: add support for a lazytime mount option · 0ae45f63
      Theodore Ts'o 提交于
      Add a new mount option which enables a new "lazytime" mode.  This mode
      causes atime, mtime, and ctime updates to only be made to the
      in-memory version of the inode.  The on-disk times will only get
      updated when (a) if the inode needs to be updated for some non-time
      related change, (b) if userspace calls fsync(), syncfs() or sync(), or
      (c) just before an undeleted inode is evicted from memory.
      
      This is OK according to POSIX because there are no guarantees after a
      crash unless userspace explicitly requests via a fsync(2) call.
      
      For workloads which feature a large number of random write to a
      preallocated file, the lazytime mount option significantly reduces
      writes to the inode table.  The repeated 4k writes to a single block
      will result in undesirable stress on flash devices and SMR disk
      drives.  Even on conventional HDD's, the repeated writes to the inode
      table block will trigger Adjacent Track Interference (ATI) remediation
      latencies, which very negatively impact long tail latencies --- which
      is a very big deal for web serving tiers (for example).
      
      Google-Bug-Id: 18297052
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0ae45f63
  9. 24 10月, 2014 1 次提交
  10. 01 4月, 2014 2 次提交
    • M
      vfs: add cross-rename · da1ce067
      Miklos Szeredi 提交于
      If flags contain RENAME_EXCHANGE then exchange source and destination files.
      There's no restriction on the type of the files; e.g. a directory can be
      exchanged with a symlink.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      da1ce067
    • M
      vfs: add RENAME_NOREPLACE flag · 0a7c3937
      Miklos Szeredi 提交于
      If this flag is specified and the target of the rename exists then the
      rename syscall fails with EEXIST.
      
      The VFS does the existence checking, so it is trivial to enable for most
      local filesystems.  This patch only enables it in ext4.
      
      For network filesystems the VFS check is not enough as there may be a race
      between a remote create and the rename, so these filesystems need to handle
      this flag in their ->rename() implementations to ensure atomicity.
      
      Andy writes about why this is useful:
      
      "The trivial answer: to eliminate the race condition from 'mv -i'.
      
      Another answer: there's a common pattern to atomically create a file
      with contents: open a temporary file, write to it, optionally fsync
      it, close it, then link(2) it to the final name, then unlink the
      temporary file.
      
      The reason to use link(2) is because it won't silently clobber the destination.
      
      This is annoying:
       - It requires an extra system call that shouldn't be necessary.
       - It doesn't work on (IMO sensible) filesystems that don't support
      hard links (e.g. vfat).
       - It's not atomic -- there's an intermediate state where both files exist.
       - It's ugly.
      
      The new rename flag will make this totally sensible.
      
      To be fair, on new enough kernels, you can also use O_TMPFILE and
      linkat to achieve the same thing even more cleanly."
      
      Suggested-by: Andy Lutomirski <luto@amacapital.net> 
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      0a7c3937
  11. 11 9月, 2013 1 次提交
    • G
      fs: bump inode and dentry counters to long · 3942c07c
      Glauber Costa 提交于
      This series reworks our current object cache shrinking infrastructure in
      two main ways:
      
       * Noticing that a lot of users copy and paste their own version of LRU
         lists for objects, we put some effort in providing a generic version.
         It is modeled after the filesystem users: dentries, inodes, and xfs
         (for various tasks), but we expect that other users could benefit in
         the near future with little or no modification.  Let us know if you
         have any issues.
      
       * The underlying list_lru being proposed automatically and
         transparently keeps the elements in per-node lists, and is able to
         manipulate the node lists individually.  Given this infrastructure, we
         are able to modify the up-to-now hammer called shrink_slab to proceed
         with node-reclaim instead of always searching memory from all over like
         it has been doing.
      
      Per-node lru lists are also expected to lead to less contention in the lru
      locks on multi-node scans, since we are now no longer fighting for a
      global lock.  The locks usually disappear from the profilers with this
      change.
      
      Although we have no official benchmarks for this version - be our guest to
      independently evaluate this - earlier versions of this series were
      performance tested (details at
      http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
      visible performance regressions while yielding a better qualitative
      behavior in NUMA machines.
      
      With this infrastructure in place, we can use the list_lru entry point to
      provide memcg isolation and per-memcg targeted reclaim.  Historically,
      those two pieces of work have been posted together.  This version presents
      only the infrastructure work, deferring the memcg work for a later time,
      so we can focus on getting this part tested.  You can see more about the
      history of such work at http://lwn.net/Articles/552769/
      
      Dave Chinner (18):
        dcache: convert dentry_stat.nr_unused to per-cpu counters
        dentry: move to per-sb LRU locks
        dcache: remove dentries from LRU before putting on dispose list
        mm: new shrinker API
        shrinker: convert superblock shrinkers to new API
        list: add a new LRU list type
        inode: convert inode lru list to generic lru list code.
        dcache: convert to use new lru list infrastructure
        list_lru: per-node list infrastructure
        shrinker: add node awareness
        fs: convert inode and dentry shrinking to be node aware
        xfs: convert buftarg LRU to generic code
        xfs: rework buffer dispose list tracking
        xfs: convert dquot cache lru to list_lru
        fs: convert fs shrinkers to new scan/count API
        drivers: convert shrinkers to new count/scan API
        shrinker: convert remaining shrinkers to count/scan API
        shrinker: Kill old ->shrink API.
      
      Glauber Costa (7):
        fs: bump inode and dentry counters to long
        super: fix calculation of shrinkable objects for small numbers
        list_lru: per-node API
        vmscan: per-node deferred work
        i915: bail out earlier when shrinker cannot acquire mutex
        hugepage: convert huge zero page shrinker to new shrinker API
        list_lru: dynamically adjust node arrays
      
      This patch:
      
      There are situations in very large machines in which we can have a large
      quantity of dirty inodes, unused dentries, etc.  This is particularly true
      when umounting a filesystem, where eventually since every live object will
      eventually be discarded.
      
      Dave Chinner reported a problem with this while experimenting with the
      shrinker revamp patchset.  So we believe it is time for a change.  This
      patch just moves int to longs.  Machines where it matters should have a
      big long anyway.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3942c07c
  12. 30 4月, 2013 1 次提交
    • D
      mm: make snapshotting pages for stable writes a per-bio operation · 71368511
      Darrick J. Wong 提交于
      Walking a bio's page mappings has proved problematic, so create a new
      bio flag to indicate that a bio's data needs to be snapshotted in order
      to guarantee stable pages during writeback.  Next, for the one user
      (ext3/jbd) of snapshotting, hook all the places where writes can be
      initiated without PG_writeback set, and set BIO_SNAP_STABLE there.
      
      We must also flag journal "metadata" bios for stable writeout, since
      file data can be written through the journal.  Finally, the
      MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get
      rid of it.
      
      [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration]
      [akpm@linux-foundation.org: teeny cleanup]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71368511
  13. 22 2月, 2013 1 次提交
    • D
      block: optionally snapshot page contents to provide stable pages during write · ffecfd1a
      Darrick J. Wong 提交于
      This provides a band-aid to provide stable page writes on jbd without
      needing to backport the fixed locking and page writeback bit handling
      schemes of jbd2.  The band-aid works by using bounce buffers to snapshot
      page contents instead of waiting.
      
      For those wondering about the ext3 bandage -- fixing the jbd locking
      (which was done as part of ext4dev years ago) is a lot of surgery, and
      setting PG_writeback on data pages when we actually hold the page lock
      dropped ext3 performance by nearly an order of magnitude.  If we're
      going to migrate iscsi and raid to use stable page writes, the
      complaints about high latency will likely return.  We might as well
      centralize their page snapshotting thing to one place.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Tested-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffecfd1a
  14. 17 10月, 2012 1 次提交
  15. 13 10月, 2012 1 次提交