1. 24 5月, 2016 6 次提交
  2. 02 5月, 2016 1 次提交
  3. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  4. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  5. 09 12月, 2015 1 次提交
    • A
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro 提交于
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      21fc61c7
  6. 07 11月, 2015 1 次提交
  7. 24 6月, 2015 1 次提交
  8. 17 4月, 2015 3 次提交
  9. 16 4月, 2015 1 次提交
  10. 12 4月, 2015 3 次提交
  11. 26 3月, 2015 1 次提交
  12. 11 12月, 2014 1 次提交
    • R
      nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races · 705304a8
      Ryusuke Konishi 提交于
      Same story as in commit 41080b5a ("nfsd race fixes: ext2") (similar
      ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
      of insert_inode_locked() and a bug of a check for dead inodes needs to
      be fixed.
      
      If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
      insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
      the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.
      
      If nilfs_iget() is called before nilfs_new_inode() calls
      insert_inode_locked4(), it will create an in-core inode and read its
      data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
      zero and fail at nilfs_read_inode_common(), which will lead it to call
      iget_failed() and cleanly fail.
      
      However, this sanity check doesn't work as expected for reused on-disk
      inodes because they leave a non-zero value in i_mode field and it
      hinders the test of i_nlink.  This patch also fixes the issue by
      removing the test on i_mode that nilfs2 doesn't need.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      705304a8
  13. 14 10月, 2014 1 次提交
    • A
      nilfs2: improve the performance of fdatasync() · b9f66140
      Andreas Rohner 提交于
      Support for fdatasync() has been implemented in NILFS2 for a long time,
      but whenever the corresponding inode is dirty the implementation falls
      back to a full-flegded sync().  Since every write operation has to
      update the modification time of the file, the inode will almost always
      be dirty and fdatasync() will fall back to sync() most of the time.  But
      this fallback is only necessary for a change of the file size and not
      for a change of the various timestamps.
      
      This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between
      those two situations.
      
       * If it is set the file size was changed and a full sync is necessary.
       * If it is not set then only the timestamps were updated and
         fdatasync() can go ahead.
      
      There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with
      the exact same semantics.  Unfortunately it cannot be used directly,
      because NILFS2 doesn't implement write_inode() and doesn't clear the VFS
      flags when inodes are written out.  So the VFS writeback thread can
      clear I_DIRTY_DATASYNC at any time without notifying NILFS2.  So
      I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in
      nilfs_update_inode().
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9f66140
  14. 26 9月, 2014 1 次提交
    • A
      nilfs2: fix data loss with mmap() · 56d7acc7
      Andreas Rohner 提交于
      This bug leads to reproducible silent data loss, despite the use of
      msync(), sync() and a clean unmount of the file system.  It is easily
      reproducible with the following script:
      
        ----------------[BEGIN SCRIPT]--------------------
        mkfs.nilfs2 -f /dev/sdb
        mount /dev/sdb /mnt
      
        dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
      
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
      
        /root/mmaptest/mmaptest /mnt/testfile 30 10 5
      
        sync
        CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
        umount /mnt
      
        echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
        echo "AFTER MMAP:\t$CHECKSUM_AFTER"
        echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
        ----------------[END SCRIPT]--------------------
      
      The mmaptest tool looks something like this (very simplified, with
      error checking removed):
      
        ----------------[BEGIN mmaptest]--------------------
        data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, file_offset);
      
        for (i = 0; i < write_count; ++i) {
              memcpy(data + i * 4096, buf, sizeof(buf));
              msync(data, file_size - file_offset, MS_SYNC))
        }
        ----------------[END mmaptest]--------------------
      
      The output of the script looks something like this:
      
        BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
        AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
        AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
      
      So it is clear, that the changes done using mmap() do not survive a
      remount.  This can be reproduced a 100% of the time.  The problem was
      introduced in commit 136e8770 ("nilfs2: fix issue of
      nilfs_set_page_dirty() for page at EOF boundary").
      
      If the page was read with mpage_readpage() or mpage_readpages() for
      example, then it has no buffers attached to it.  In that case
      page_has_buffers(page) in nilfs_set_page_dirty() will be false.
      Therefore nilfs_set_file_dirty() is never called and the pages are never
      collected and never written to disk.
      
      This patch fixes the problem by also calling nilfs_set_file_dirty() if the
      page has no buffers attached to it.
      
      [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Tested-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56d7acc7
  15. 07 5月, 2014 3 次提交
  16. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  17. 13 9月, 2013 1 次提交
  18. 04 7月, 2013 1 次提交
  19. 25 5月, 2013 1 次提交
    • R
      nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary · 136e8770
      Ryusuke Konishi 提交于
      nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
      
      DESCRIPTION:
       There are use-cases when NILFS2 file system (formatted with block size
      lesser than 4 KB) can be remounted in RO mode because of encountering of
      "broken bmap" issue.
      
      The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
       "The machine I've been trialling nilfs on is running Debian Testing,
        Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
        version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
        also reproduced it (identically) with Debian Unstable amd64 and Debian
        Experimental (using the 3.8-trunk kernel).  The problematic partitions
        were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
      
      SYMPTOMS:
      (1) System log contains error messages likewise:
      
          [63102.496756] nilfs_direct_assign: invalid pointer: 0
          [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
          [63102.496798]
          [63102.524403] Remounting filesystem read-only
      
      (2) The NILFS2 file system is remounted in RO mode.
      
      REPRODUSING PATH:
      (1) Create volume group with name "unencrypted" by means of vgcreate utility.
      (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
      
      ----------------[BEGIN SCRIPT]--------------------
      
      VG=unencrypted
      lvcreate --size 2G --name ntest $VG
      mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
      mkdir /var/tmp/n
      mkdir /var/tmp/n/ntest
      mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
      mkdir /var/tmp/n/ntest/thedir
      cd /var/tmp/n/ntest/thedir
      sleep 2
      date
      darcs init
      sleep 2
      dmesg|tail -n 5
      date
      darcs whatsnew || true
      date
      sleep 2
      dmesg|tail -n 5
      ----------------[END SCRIPT]--------------------
      
      REPRODUCIBILITY: 100%
      
      INVESTIGATION:
      As it was discovered, the issue takes place during segment
      construction after executing such sequence of user-space operations:
      
        open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
        fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
        ftruncate(7, 60)
      
      The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
      bmap (inode number=28)" takes place because of trying to get block
      number for third block of the file with logical offset #3072 bytes.  As
      it is possible to see from above output, the file has 60 bytes of the
      whole size.  So, it is enough one block (1 KB in size) allocation for
      the whole file.  Trying to operate with several blocks instead of one
      takes place because of discovering several dirty buffers for this file
      in nilfs_segctor_scan_file() method.
      
      The root cause of this issue is in nilfs_set_page_dirty function which
      is called just before writing to an mmapped page.
      
      When nilfs_page_mkwrite function handles a page at EOF boundary, it
      fills hole blocks only inside EOF through __block_page_mkwrite().
      
      The __block_page_mkwrite() function calls set_page_dirty() after filling
      hole blocks, thus nilfs_set_page_dirty function (=
      a_ops->set_page_dirty) is called.  However, the current implementation
      of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
      at EOF boundary.
      
      As a result, buffers outside EOF are inconsistently marked dirty and
      queued for write even though they are not mapped with nilfs_get_block
      function.
      
      FIX:
      This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
      
      Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
      for this issue.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: NAnthony Doggett <Anthony2486@interfaces.org.uk>
      Reported-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      136e8770
  20. 08 5月, 2013 1 次提交
  21. 01 5月, 2013 2 次提交
    • V
      nilfs2: remove unneeded test in nilfs_writepage() · eb53b6db
      Vyacheslav Dubeyko 提交于
      page->mapping->host cannot be NULL in nilfs_writepage(), so remove the
      unneeded test.
      
      The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage()
      error: we previously assumed 'inode' could be null (see line 195)".
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb53b6db
    • V
      nilfs2: fix issue with flush kernel thread after remount in RO mode because of... · 8c26c4e2
      Vyacheslav Dubeyko 提交于
      nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption
      
      The NILFS2 driver remounts itself in RO mode in the case of discovering
      metadata corruption (for example, discovering a broken bmap).  But
      usually, this takes place when there have been file system operations
      before remounting in RO mode.
      
      Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in
      modified inodes' address spaces.  It results in flush kernel thread's
      infinite trying to flush dirty pages in RO mode.  As a result, it is
      possible to see such side effects as: (1) flush kernel thread occupies
      50% - 99% of CPU time; (2) system can't be shutdowned without manual
      power switch off.
      
      SYMPTOMS:
      (1) System log contains error message: "Remounting filesystem read-only".
      (2) The flush kernel thread occupies 50% - 99% of CPU time.
      (3) The system can't be shutdowned without manual power switch off.
      
      REPRODUCTION PATH:
      (1) Create volume group with name "unencrypted" by means of vgcreate utility.
      (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
      
        ----------------[BEGIN SCRIPT]--------------------
        #!/bin/bash
      
        VG=unencrypted
        #apt-get install nilfs-tools darcs
        lvcreate --size 2G --name ntest $VG
        mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
        mkdir /var/tmp/n
        mkdir /var/tmp/n/ntest
        mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
        mkdir /var/tmp/n/ntest/thedir
        cd /var/tmp/n/ntest/thedir
        sleep 2
        date
        darcs init
        sleep 2
        dmesg|tail -n 5
        date
        darcs whatsnew || true
        date
        sleep 2
        dmesg|tail -n 5
        ----------------[END SCRIPT]--------------------
      
      (3) Try to shutdown the system.
      
      REPRODUCIBILITY: 100%
      
      FIX:
      
      This patch implements checking mount state of NILFS2 driver in
      nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page()
      methods.  If it is detected the RO mount state then all dirty pages are
      simply discarded with warning messages is written in system log.
      
      [akpm@linux-foundation.org: fix printk warning]
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Anthony Doggett <Anthony2486@interfaces.org.uk>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c26c4e2
  22. 21 12月, 2012 1 次提交
  23. 21 9月, 2012 1 次提交
  24. 31 7月, 2012 1 次提交
  25. 06 5月, 2012 1 次提交
  26. 04 1月, 2012 1 次提交
  27. 02 11月, 2011 2 次提交