1. 22 5月, 2018 1 次提交
    • D
      mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS · e7638488
      Dan Williams 提交于
      In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
      be able to rely on the fact that they will get wakeups on dev_pagemap
      page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
      generic_dax_page_free() as common indicator / infrastructure for dax
      filesytems to require. With this change there are no users of the
      MEMORY_DEVICE_HOST designation, so remove it.
      
      The HMM sub-system extended dev_pagemap to arrange a callback when a
      dev_pagemap managed page is freed. Since a dev_pagemap page is free /
      idle when its reference count is 1 it requires an additional branch to
      check the page-type at put_page() time. Given put_page() is a hot-path
      we do not want to incur that check if HMM is not in use, so a static
      branch is used to avoid that overhead when not necessary.
      
      Now, the FS_DAX implementation wants to reuse this mechanism for
      receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
      static-key into a generic mechanism that either HMM or FS_DAX code paths
      can enable.
      
      For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
      care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
      However, we still need to support FS_DAX in the FS_DAX_LIMITED case
      implemented by the s390/dcssblk driver.
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NThomas Meyer <thomas@m3y3r.de>
      Reported-by: NDave Jiang <dave.jiang@intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e7638488
  2. 20 1月, 2018 1 次提交
    • D
      dax: require 'struct page' by default for filesystem dax · 569d0365
      Dan Williams 提交于
      If a dax buffer from a device that does not map pages is passed to
      read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
      gdb attempts to examine the contents of a dax buffer from a device that
      does not map pages it triggers SIGBUS. If fork(2) is called on a process
      with a dax mapping from a device that does not map pages it triggers
      SIGBUS. 'struct page' is required otherwise several kernel code paths
      break in surprising ways. Disable filesystem-dax on devices that do not
      map pages.
      
      In addition to needing pfn_to_page() to be valid we also require devmap
      pages.  We need this to detect dax pages in the get_user_pages_fast()
      path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
      drivers that have not supported get_user_pages() to date we allow them
      to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
      option which requires ->direct_access() to return pfn_t_special() pfns.
      This leaves DAX support in brd disabled and scheduled for removal.
      
      Note that when the initial dax support was being merged a few years back
      there was concern that struct page was unsuitable for use with next
      generation persistent memory devices. The theoretical concern was that
      struct page access, being such a hotly used data structure in the
      kernel, would lead to media wear out. While that was a reasonable
      conservative starting position it has not held true in practice. We have
      long since committed to using devm_memremap_pages() to support higher
      order kernel functionality that needs get_user_pages() and
      pfn_to_page().
      
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      569d0365
  3. 02 1月, 2018 1 次提交
  4. 28 11月, 2017 1 次提交
  5. 13 7月, 2017 1 次提交
  6. 09 5月, 2017 1 次提交
    • D
      block, dax: move "select DAX" from BLOCK to FS_DAX · ef510424
      Dan Williams 提交于
      For configurations that do not enable DAX filesystems or drivers, do not
      require the DAX core to be built.
      
      Given that the 'direct_access' method has been removed from
      'block_device_operations', we can also go ahead and remove the
      block-related dax helper functions from fs/block_dev.c to
      drivers/dax/super.c. This keeps dax details out of the block layer and
      lets the DAX core be built as a module in the FS_DAX=n case.
      
      Filesystems need to include dax.h to call bdev_dax_supported().
      
      Cc: linux-xfs@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ef510424
  7. 25 1月, 2017 1 次提交
  8. 15 12月, 2016 1 次提交
    • C
      logfs: remove from tree · 1d0fd57a
      Christoph Hellwig 提交于
      Logfs was introduced to the kernel in 2009, and hasn't seen any non
      drive-by changes since 2012, while having lots of unsolved issues
      including the complete lack of error handling, with more and more
      issues popping up without any fixes.
      
      The logfs.org domain has been bouncing from a mail, and the maintainer
      on the non-logfs.org domain hasn't repsonded to past queries either.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1d0fd57a
  9. 08 11月, 2016 1 次提交
  10. 08 10月, 2016 1 次提交
  11. 22 9月, 2016 1 次提交
  12. 16 7月, 2016 1 次提交
  13. 21 6月, 2016 1 次提交
    • C
      fs: introduce iomap infrastructure · ae259a9c
      Christoph Hellwig 提交于
      Add infrastructure for multipage buffered writes.  This is implemented
      using an main iterator that applies an actor function to a range that
      can be written.
      
      This infrastucture is used to implement a buffered write helper, one
      to zero file ranges and one to implement the ->page_mkwrite VM
      operations.  All of them borrow a fair amount of code from fs/buffers.
      for now by using an internal version of __block_write_begin that
      gets passed an iomap and builds the corresponding buffer head.
      
      The file system is gets a set of paired ->iomap_begin and ->iomap_end
      calls which allow it to map/reserve a range and get a notification
      once the write code is finished with it.
      
      Based on earlier code from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ae259a9c
  14. 20 5月, 2016 1 次提交
    • J
      dax: Make huge page handling depend of CONFIG_BROKEN · 348e967a
      Jan Kara 提交于
      Currently the handling of huge pages for DAX is racy. For example the
      following can happen:
      
      CPU0 (THP write fault)			CPU1 (normal read fault)
      
      __dax_pmd_fault()			__dax_fault()
        get_block(inode, block, &bh, 0) -> not mapped
      					get_block(inode, block, &bh, 0)
      					  -> not mapped
        if (!buffer_mapped(&bh) && write)
          get_block(inode, block, &bh, 1) -> allocates blocks
        truncate_pagecache_range(inode, lstart, lend);
      					dax_load_hole();
      
      This results in data corruption since process on CPU1 won't see changes
      into the file done by CPU0.
      
      The race can happen even if two normal faults race however with THP the
      situation is even worse because the two faults don't operate on the same
      entries in the radix tree and we want to use these entries for
      serialization. So make THP support in DAX code depend on CONFIG_BROKEN
      for now.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      348e967a
  15. 18 3月, 2016 1 次提交
    • J
      fs crypto: move per-file encryption from f2fs tree to fs/crypto · 0b81d077
      Jaegeuk Kim 提交于
      This patch adds the renamed functions moved from the f2fs crypto files.
      
      1. definitions for per-file encryption used by ext4 and f2fs.
      
      2. crypto.c for encrypt/decrypt functions
       a. IO preparation:
        - fscrypt_get_ctx / fscrypt_release_ctx
       b. before IOs:
        - fscrypt_encrypt_page
        - fscrypt_decrypt_page
        - fscrypt_zeroout_range
       c. after IOs:
        - fscrypt_decrypt_bio_pages
        - fscrypt_pullback_bio_page
        - fscrypt_restore_control_page
      
      3. policy.c supporting context management.
       a. For ioctls:
        - fscrypt_process_policy
        - fscrypt_get_policy
       b. For context permission
        - fscrypt_has_permitted_context
        - fscrypt_inherit_context
      
      4. keyinfo.c to handle permissions
        - fscrypt_get_encryption_info
        - fscrypt_free_encryption_info
      
      5. fname.c to support filename encryption
       a. general wrapper functions
        - fscrypt_fname_disk_to_usr
        - fscrypt_fname_usr_to_disk
        - fscrypt_setup_filename
        - fscrypt_free_filename
      
       b. specific filename handling functions
        - fscrypt_fname_alloc_buffer
        - fscrypt_fname_free_buffer
      
      6. Makefile and Kconfig
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: NIldar Muslukhov <ildarm@google.com>
      Signed-off-by: NUday Savagaonkar <savagaon@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b81d077
  16. 16 1月, 2016 1 次提交
  17. 17 11月, 2015 1 次提交
    • D
      dax: disable pmd mappings · ee82c9ed
      Dan Williams 提交于
      While dax pmd mappings are functional in the nominal path they trigger
      kernel crashes in the following paths:
      
       BUG: unable to handle kernel paging request at ffffea0004098000
       IP: [<ffffffff812362f7>] follow_trans_huge_pmd+0x117/0x3b0
       [..]
       Call Trace:
        [<ffffffff811f6573>] follow_page_mask+0x2d3/0x380
        [<ffffffff811f6708>] __get_user_pages+0xe8/0x6f0
        [<ffffffff811f7045>] get_user_pages_unlocked+0x165/0x1e0
        [<ffffffff8106f5b1>] get_user_pages_fast+0xa1/0x1b0
      
       kernel BUG at arch/x86/mm/gup.c:131!
       [..]
       Call Trace:
        [<ffffffff8106f34c>] gup_pud_range+0x1bc/0x220
        [<ffffffff8106f634>] get_user_pages_fast+0x124/0x1b0
      
       BUG: unable to handle kernel paging request at ffffea0004088000
       IP: [<ffffffff81235f49>] copy_huge_pmd+0x159/0x350
       [..]
       Call Trace:
        [<ffffffff811fad3c>] copy_page_range+0x34c/0x9f0
        [<ffffffff810a0daf>] copy_process+0x1b7f/0x1e10
        [<ffffffff810a11c1>] _do_fork+0x91/0x590
      
      All of these paths are interpreting a dax pmd mapping as a transparent
      huge page and making the assumption that the pfn is covered by the
      memmap, i.e. that the pfn has an associated struct page.  PTE mappings
      do not suffer the same fate since they have the _PAGE_SPECIAL flag to
      cause the gup path to fault.  We can do something similar for the PMD
      path, or otherwise defer pmd support for cases where a struct page is
      available.  For now, 4.4-rc and -stable need to disable dax pmd support
      by default.
      
      For development the "depends on BROKEN" line can be removed from
      CONFIG_FS_DAX_PMD.
      
      Cc: <stable@vger.kernel.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ee82c9ed
  18. 16 11月, 2015 1 次提交
    • J
      locks: Allow disabling mandatory locking at compile time · 9e8925b6
      Jeff Layton 提交于
      Mandatory locking appears to be almost unused and buggy and there
      appears no real interest in doing anything with it.  Since effectively
      no one uses the code and since the code is buggy let's allow it to be
      disabled at compile time.  I would just suggest removing the code but
      undoubtedly that will break some piece of userspace code somewhere.
      
      For the distributions that don't care about this piece of code
      this gives a nice starting point to make mandatory locking go away.
      
      Cc: Benjamin Coddington <bcodding@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jeff Layton <jeff.layton@primarydata.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      9e8925b6
  19. 03 10月, 2015 1 次提交
  20. 24 7月, 2015 1 次提交
    • J
      fs: Remove ext3 filesystem driver · c290ea01
      Jan Kara 提交于
      The functionality of ext3 is fully supported by ext4 driver. Major
      distributions (SUSE, RedHat) already use ext4 driver to handle ext3
      filesystems for quite some time. There is some ugliness in mm resulting
      from jbd cleaning buffers in a dirty page without cleaning page dirty
      bit and also support for buffer bouncing in the block layer when stable
      pages are required is there only because of jbd. So let's remove the
      ext3 driver. This saves us some 28k lines of duplicated code.
      Acked-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c290ea01
  21. 11 4月, 2015 1 次提交
  22. 17 2月, 2015 2 次提交
  23. 05 1月, 2015 1 次提交
  24. 24 10月, 2014 1 次提交
    • M
      overlay filesystem · e9be9d5e
      Miklos Szeredi 提交于
      Overlayfs allows one, usually read-write, directory tree to be
      overlaid onto another, read-only directory tree.  All modifications
      go to the upper, writable layer.
      
      This type of mechanism is most often used for live CDs but there's a
      wide variety of other uses.
      
      The implementation differs from other "union filesystem"
      implementations in that after a file is opened all operations go
      directly to the underlying, lower or upper, filesystems.  This
      simplifies the implementation and allows native performance in these
      cases.
      
      The dentry tree is duplicated from the underlying filesystems, this
      enables fast cached lookups without adding special support into the
      VFS.  This uses slightly more memory than union mounts, but dentries
      are relatively small.
      
      Currently inodes are duplicated as well, but it is a possible
      optimization to share inodes for non-directories.
      
      Opening non directories results in the open forwarded to the
      underlying filesystem.  This makes the behavior very similar to union
      mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
      descriptors).
      
      Usage:
      
        mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay
      
      The following cotributions have been folded into this patch:
      
      Neil Brown <neilb@suse.de>:
       - minimal remount support
       - use correct seek function for directories
       - initialise is_real before use
       - rename ovl_fill_cache to ovl_dir_read
      
      Felix Fietkau <nbd@openwrt.org>:
       - fix a deadlock in ovl_dir_read_merged
       - fix a deadlock in ovl_remove_whiteouts
      
      Erez Zadok <ezk@fsl.cs.sunysb.edu>
       - fix cleanup after WARN_ON
      
      Sedat Dilek <sedat.dilek@googlemail.com>
       - fix up permission to confirm to new API
      
      Robin Dong <hao.bigrat@gmail.com>
       - fix possible leak in ovl_new_inode
       - create new inode in ovl_link
      
      Andy Whitcroft <apw@canonical.com>
       - switch to __inode_permission()
       - copy up i_uid/i_gid from the underlying inode
      
      AV:
       - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
       - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
         lack of check for udir being the parent of upper, dropping and regaining
         the lock on udir (which would require _another_ check for parent being
         right).
       - bogus d_drop() in copyup and rename [fix from your mail]
       - copyup/remove and copyup/rename races [fix from your mail]
       - ovl_dir_fsync() leaving ERR_PTR() in ->realfile
       - ovl_entry_free() is pointless - it's just a kfree_rcu()
       - fold ovl_do_lookup() into ovl_lookup()
       - manually assigning ->d_op is wrong.  Just use ->s_d_op.
       [patches picked from Miklos]:
       * copyup/remove and copyup/rename races
       * bogus d_drop() in copyup and rename
      
      Also thanks to the following people for testing and reporting bugs:
      
        Jordi Pujol <jordipujolp@gmail.com>
        Andy Whitcroft <apw@canonical.com>
        Michal Suchanek <hramrach@centrum.cz>
        Felix Fietkau <nbd@openwrt.org>
        Erez Zadok <ezk@fsl.cs.sunysb.edu>
        Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      e9be9d5e
  25. 18 9月, 2014 1 次提交
  26. 08 2月, 2014 1 次提交
  27. 26 1月, 2014 1 次提交
  28. 17 4月, 2013 1 次提交
    • M
      efivarfs: Move to fs/efivarfs · d68772b7
      Matt Fleming 提交于
      Now that efivarfs uses the efivar API, move it out of efivars.c and
      into fs/efivarfs where it belongs. This move will eventually allow us
      to enable the efivarfs code without having to also enable
      CONFIG_EFI_VARS built, and vice versa.
      
      Furthermore, things like,
      
          mount -t efivarfs none /sys/firmware/efi/efivars
      
      will now work if efivarfs is built as a module without requiring the
      use of MODULE_ALIAS(), which would have been necessary when the
      efivarfs code was part of efivars.c.
      
      Cc: Matthew Garrett <matthew.garrett@nebula.com>
      Cc: Jeremy Kerr <jk@ozlabs.org>
      Reviewed-by: NTom Gundersen <teg@jklm.no>
      Tested-by: NTom Gundersen <teg@jklm.no>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      d68772b7
  29. 17 1月, 2013 1 次提交
  30. 11 12月, 2012 2 次提交
    • J
      f2fs: update Kconfig and Makefile · a14d5393
      Jaegeuk Kim 提交于
      This adds Makefile and Kconfig for f2fs, and updates Makefile and Kconfig files
      in the fs directory.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a14d5393
    • T
      ext4: Remove CONFIG_EXT4_FS_XATTR · 939da108
      Tao Ma 提交于
      Ted has sent out a RFC about removing this feature. Eric and Jan
      confirmed that both RedHat and SUSE enable this feature in all their
      product.  David also said that "As far as I know, it's enabled in all
      Android kernels that use ext4."  So it seems OK for us.
      
      And what's more, as inline data depends its implementation on xattr,
      and to be frank, I don't run any test again inline data enabled while
      xattr disabled.  So I think we should add inline data and remove this
      config option in the same release.
      
      [ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
        isn't much in the grand scheme of things.  Since no one seems to be
        testing this configuration except for some automated compile farms, on
        balance we are better removing this config option, and so that it is
        effectively always enabled. -- tytso ]
      
      Cc: David Brown <davidb@codeaurora.org>
      Cc: Eric Sandeen <sandeen@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      939da108
  31. 21 3月, 2012 1 次提交
    • K
      fs: initial qnx6fs addition · 5d026c72
      Kai Bankett 提交于
      Adds support for qnx6fs readonly support to the linux kernel.
      
      * Mount option
        The option mmi_fs can be used to mount Harman Becker/Audi MMI 3G
        HDD qnx6fs filesystems.
      
      * Documentation
        A high level filesystem stucture description can be found in the
        Documentation/filesystems directory. (qnx6.txt)
      
      * Additional features
        - Active (stable) superblock selection
        - Superblock checksum check (enforced)
        - Supports mount of qnx6 filesystems with to host different endianess
        - Automatic endianess detection
        - Longfilename support (with non-enfocing crc check)
        - All blocksizes (512, 1024, 2048 and 4096 supported)
      Signed-off-by: NKai Bankett <chaosman@ontika.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5d026c72
  32. 09 3月, 2012 1 次提交
  33. 06 1月, 2012 1 次提交
    • B
      ore: FIX breakage when MISC_FILESYSTEMS is not set · 831c2dc5
      Boaz Harrosh 提交于
      As Reported by Randy Dunlap
      
      When MISC_FILESYSTEMS is not enabled and NFS4.1 is:
      
      fs/built-in.o: In function `objio_alloc_io_state':
      objio_osd.c:(.text+0xcb525): undefined reference to `ore_get_rw_state'
      fs/built-in.o: In function `_write_done':
      objio_osd.c:(.text+0xcb58d): undefined reference to `ore_check_io'
      fs/built-in.o: In function `_read_done':
      ...
      
      When MISC_FILESYSTEMS, which is more of a GUI thing then anything else,
      is not selected. exofs/Kconfig is never examined during Kconfig,
      and it can not do it's magic stuff to automatically select everything
      needed.
      
      We must split exofs/Kconfig in two. The ore one is always included.
      And the exofs one is left in it's old place in the menu.
      
      [Needed for the 3.2.0 Kernel]
      CC: Stable Tree <stable@kernel.org>
      Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      831c2dc5
  34. 04 1月, 2012 1 次提交
  35. 01 11月, 2011 1 次提交
  36. 04 8月, 2011 1 次提交
  37. 26 5月, 2011 2 次提交