1. 17 4月, 2015 2 次提交
  2. 26 3月, 2015 1 次提交
  3. 11 12月, 2014 1 次提交
    • R
      nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races · 705304a8
      Ryusuke Konishi 提交于
      Same story as in commit 41080b5a ("nfsd race fixes: ext2") (similar
      ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
      of insert_inode_locked() and a bug of a check for dead inodes needs to
      be fixed.
      
      If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
      insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
      the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.
      
      If nilfs_iget() is called before nilfs_new_inode() calls
      insert_inode_locked4(), it will create an in-core inode and read its
      data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
      zero and fail at nilfs_read_inode_common(), which will lead it to call
      iget_failed() and cleanly fail.
      
      However, this sanity check doesn't work as expected for reused on-disk
      inodes because they leave a non-zero value in i_mode field and it
      hinders the test of i_nlink.  This patch also fixes the issue by
      removing the test on i_mode that nilfs2 doesn't need.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      705304a8
  4. 14 10月, 2014 1 次提交
    • A
      nilfs2: improve the performance of fdatasync() · b9f66140
      Andreas Rohner 提交于
      Support for fdatasync() has been implemented in NILFS2 for a long time,
      but whenever the corresponding inode is dirty the implementation falls
      back to a full-flegded sync().  Since every write operation has to
      update the modification time of the file, the inode will almost always
      be dirty and fdatasync() will fall back to sync() most of the time.  But
      this fallback is only necessary for a change of the file size and not
      for a change of the various timestamps.
      
      This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between
      those two situations.
      
       * If it is set the file size was changed and a full sync is necessary.
       * If it is not set then only the timestamps were updated and
         fdatasync() can go ahead.
      
      There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with
      the exact same semantics.  Unfortunately it cannot be used directly,
      because NILFS2 doesn't implement write_inode() and doesn't clear the VFS
      flags when inodes are written out.  So the VFS writeback thread can
      clear I_DIRTY_DATASYNC at any time without notifying NILFS2.  So
      I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in
      nilfs_update_inode().
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9f66140
  5. 26 9月, 2014 1 次提交
    • A
      nilfs2: fix data loss with mmap() · 56d7acc7
      Andreas Rohner 提交于
      This bug leads to reproducible silent data loss, despite the use of
      msync(), sync() and a clean unmount of the file system.  It is easily
      reproducible with the following script:
      
        ----------------[BEGIN SCRIPT]--------------------
        mkfs.nilfs2 -f /dev/sdb
        mount /dev/sdb /mnt
      
        dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
      
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
      
        /root/mmaptest/mmaptest /mnt/testfile 30 10 5
      
        sync
        CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
        umount /mnt
      
        echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
        echo "AFTER MMAP:\t$CHECKSUM_AFTER"
        echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
        ----------------[END SCRIPT]--------------------
      
      The mmaptest tool looks something like this (very simplified, with
      error checking removed):
      
        ----------------[BEGIN mmaptest]--------------------
        data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, file_offset);
      
        for (i = 0; i < write_count; ++i) {
              memcpy(data + i * 4096, buf, sizeof(buf));
              msync(data, file_size - file_offset, MS_SYNC))
        }
        ----------------[END mmaptest]--------------------
      
      The output of the script looks something like this:
      
        BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
        AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
        AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
      
      So it is clear, that the changes done using mmap() do not survive a
      remount.  This can be reproduced a 100% of the time.  The problem was
      introduced in commit 136e8770 ("nilfs2: fix issue of
      nilfs_set_page_dirty() for page at EOF boundary").
      
      If the page was read with mpage_readpage() or mpage_readpages() for
      example, then it has no buffers attached to it.  In that case
      page_has_buffers(page) in nilfs_set_page_dirty() will be false.
      Therefore nilfs_set_file_dirty() is never called and the pages are never
      collected and never written to disk.
      
      This patch fixes the problem by also calling nilfs_set_file_dirty() if the
      page has no buffers attached to it.
      
      [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Tested-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56d7acc7
  6. 07 5月, 2014 3 次提交
  7. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  8. 13 9月, 2013 1 次提交
  9. 04 7月, 2013 1 次提交
  10. 25 5月, 2013 1 次提交
    • R
      nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary · 136e8770
      Ryusuke Konishi 提交于
      nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
      
      DESCRIPTION:
       There are use-cases when NILFS2 file system (formatted with block size
      lesser than 4 KB) can be remounted in RO mode because of encountering of
      "broken bmap" issue.
      
      The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
       "The machine I've been trialling nilfs on is running Debian Testing,
        Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
        version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
        also reproduced it (identically) with Debian Unstable amd64 and Debian
        Experimental (using the 3.8-trunk kernel).  The problematic partitions
        were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
      
      SYMPTOMS:
      (1) System log contains error messages likewise:
      
          [63102.496756] nilfs_direct_assign: invalid pointer: 0
          [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
          [63102.496798]
          [63102.524403] Remounting filesystem read-only
      
      (2) The NILFS2 file system is remounted in RO mode.
      
      REPRODUSING PATH:
      (1) Create volume group with name "unencrypted" by means of vgcreate utility.
      (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
      
      ----------------[BEGIN SCRIPT]--------------------
      
      VG=unencrypted
      lvcreate --size 2G --name ntest $VG
      mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
      mkdir /var/tmp/n
      mkdir /var/tmp/n/ntest
      mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
      mkdir /var/tmp/n/ntest/thedir
      cd /var/tmp/n/ntest/thedir
      sleep 2
      date
      darcs init
      sleep 2
      dmesg|tail -n 5
      date
      darcs whatsnew || true
      date
      sleep 2
      dmesg|tail -n 5
      ----------------[END SCRIPT]--------------------
      
      REPRODUCIBILITY: 100%
      
      INVESTIGATION:
      As it was discovered, the issue takes place during segment
      construction after executing such sequence of user-space operations:
      
        open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
        fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
        ftruncate(7, 60)
      
      The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
      bmap (inode number=28)" takes place because of trying to get block
      number for third block of the file with logical offset #3072 bytes.  As
      it is possible to see from above output, the file has 60 bytes of the
      whole size.  So, it is enough one block (1 KB in size) allocation for
      the whole file.  Trying to operate with several blocks instead of one
      takes place because of discovering several dirty buffers for this file
      in nilfs_segctor_scan_file() method.
      
      The root cause of this issue is in nilfs_set_page_dirty function which
      is called just before writing to an mmapped page.
      
      When nilfs_page_mkwrite function handles a page at EOF boundary, it
      fills hole blocks only inside EOF through __block_page_mkwrite().
      
      The __block_page_mkwrite() function calls set_page_dirty() after filling
      hole blocks, thus nilfs_set_page_dirty function (=
      a_ops->set_page_dirty) is called.  However, the current implementation
      of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
      at EOF boundary.
      
      As a result, buffers outside EOF are inconsistently marked dirty and
      queued for write even though they are not mapped with nilfs_get_block
      function.
      
      FIX:
      This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
      
      Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
      for this issue.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: NAnthony Doggett <Anthony2486@interfaces.org.uk>
      Reported-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      136e8770
  11. 08 5月, 2013 1 次提交
  12. 01 5月, 2013 2 次提交
    • V
      nilfs2: remove unneeded test in nilfs_writepage() · eb53b6db
      Vyacheslav Dubeyko 提交于
      page->mapping->host cannot be NULL in nilfs_writepage(), so remove the
      unneeded test.
      
      The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage()
      error: we previously assumed 'inode' could be null (see line 195)".
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb53b6db
    • V
      nilfs2: fix issue with flush kernel thread after remount in RO mode because of... · 8c26c4e2
      Vyacheslav Dubeyko 提交于
      nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption
      
      The NILFS2 driver remounts itself in RO mode in the case of discovering
      metadata corruption (for example, discovering a broken bmap).  But
      usually, this takes place when there have been file system operations
      before remounting in RO mode.
      
      Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in
      modified inodes' address spaces.  It results in flush kernel thread's
      infinite trying to flush dirty pages in RO mode.  As a result, it is
      possible to see such side effects as: (1) flush kernel thread occupies
      50% - 99% of CPU time; (2) system can't be shutdowned without manual
      power switch off.
      
      SYMPTOMS:
      (1) System log contains error message: "Remounting filesystem read-only".
      (2) The flush kernel thread occupies 50% - 99% of CPU time.
      (3) The system can't be shutdowned without manual power switch off.
      
      REPRODUCTION PATH:
      (1) Create volume group with name "unencrypted" by means of vgcreate utility.
      (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
      
        ----------------[BEGIN SCRIPT]--------------------
        #!/bin/bash
      
        VG=unencrypted
        #apt-get install nilfs-tools darcs
        lvcreate --size 2G --name ntest $VG
        mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
        mkdir /var/tmp/n
        mkdir /var/tmp/n/ntest
        mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
        mkdir /var/tmp/n/ntest/thedir
        cd /var/tmp/n/ntest/thedir
        sleep 2
        date
        darcs init
        sleep 2
        dmesg|tail -n 5
        date
        darcs whatsnew || true
        date
        sleep 2
        dmesg|tail -n 5
        ----------------[END SCRIPT]--------------------
      
      (3) Try to shutdown the system.
      
      REPRODUCIBILITY: 100%
      
      FIX:
      
      This patch implements checking mount state of NILFS2 driver in
      nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page()
      methods.  If it is detected the RO mount state then all dirty pages are
      simply discarded with warning messages is written in system log.
      
      [akpm@linux-foundation.org: fix printk warning]
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Anthony Doggett <Anthony2486@interfaces.org.uk>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c26c4e2
  13. 21 12月, 2012 1 次提交
  14. 21 9月, 2012 1 次提交
  15. 31 7月, 2012 1 次提交
  16. 06 5月, 2012 1 次提交
  17. 04 1月, 2012 1 次提交
  18. 02 11月, 2011 2 次提交
  19. 21 7月, 2011 2 次提交
  20. 20 7月, 2011 3 次提交
  21. 20 6月, 2011 1 次提交
  22. 27 5月, 2011 1 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
  23. 10 5月, 2011 4 次提交
  24. 10 3月, 2011 1 次提交
  25. 09 3月, 2011 3 次提交
    • R
      nilfs2: get rid of nilfs_sb_info structure · e3154e97
      Ryusuke Konishi 提交于
      This directly uses sb->s_fs_info to keep a nilfs filesystem object and
      fully removes the intermediate nilfs_sb_info structure.  With this
      change, the hierarchy of on-memory structures of nilfs will be
      simplified as follows:
      
      Before:
        super_block
             -> nilfs_sb_info
                   -> the_nilfs
                         -> cptree --+-> nilfs_root (current file system)
                                     +-> nilfs_root (snapshot A)
                                     +-> nilfs_root (snapshot B)
                                     :
                   -> nilfs_sc_info (log writer structure)
      After:
        super_block
             -> the_nilfs
                   -> cptree --+-> nilfs_root (current file system)
                               +-> nilfs_root (snapshot A)
                               +-> nilfs_root (snapshot B)
                               :
                   -> nilfs_sc_info (log writer structure)
      
      The reason why we didn't design so from the beginning is because the
      initial shape also differed from the above.  The early hierachy was
      composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a
      shared nilfs object.  On the kernel 2.6.37, it was changed to the
      current shape in order to unify super block instances into one per
      device, and this cleanup became applicable as the result.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      e3154e97
    • R
      nilfs2: move next generation counter into nilfs object · 9b1fc4e4
      Ryusuke Konishi 提交于
      Moves s_next_generation counter and a spinlock protecting it to nilfs
      object from nilfs_sb_info structure.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      9b1fc4e4
    • R
      nilfs2: move s_inode_lock and s_dirty_files into nilfs object · 693dd321
      Ryusuke Konishi 提交于
      Moves s_inode_lock spinlock and s_dirty_files list to nilfs object
      from nilfs_sb_info structure.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      693dd321
  26. 08 3月, 2011 2 次提交
    • R
      nilfs2: record used amount of each checkpoint in checkpoint list · be667377
      Ryusuke Konishi 提交于
      This records the number of used blocks per checkpoint in each
      checkpoint entry of cpfile.  Even though userland tools can get the
      block count via nilfs_get_cpinfo ioctl, it was not updated by the
      nilfs2 kernel code.  This fixes the issue and makes it available for
      userland tools to calculate used amount per checkpoint.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Jiro SEKIBA <jir@unicus.jp>
      be667377
    • R
      nilfs2: tighten restrictions on inode flags · b253a3e4
      Ryusuke Konishi 提交于
      Nilfs has few rectrictions on which flags may be set on which inodes
      like ext2/3/4 filesystems used to be.  Specifically DIRSYNC may only
      be set on directories and IMMUTABLE and APPEND may not be set on
      links.  Tighten that to disallow TOPDIR being set on non-directories
      and only NODUMP and NOATIME to be set on non-regular file,
      non-directories.
      
      This introduces a flags masking function like those of extN and uses
      it during inode creation.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      b253a3e4