1. 02 5月, 2016 6 次提交
  2. 11 4月, 2016 1 次提交
    • L
      Revert "ext4: allow readdir()'s of large empty directories to be interrupted" · 9f2394c9
      Linus Torvalds 提交于
      This reverts commit 1028b55b.
      
      It's broken: it makes ext4 return an error at an invalid point, causing
      the readdir wrappers to write the the position of the last successful
      directory entry into the position field, which means that the next
      readdir will now return that last successful entry _again_.
      
      You can only return fatal errors (that terminate the readdir directory
      walk) from within the filesystem readdir functions, the "normal" errors
      (that happen when the readdir buffer fills up, for example) happen in
      the iterorator where we know the position of the actual failing entry.
      
      I do have a very different patch that does the "signal_pending()"
      handling inside the iterator function where it is allowable, but while
      that one passes all the sanity checks, I screwed up something like four
      times while emailing it out, so I'm not going to commit it today.
      
      So my track record is not good enough, and the stars will have to align
      better before that one gets committed.  And it would be good to get some
      review too, of course, since celestial alignments are always an iffy
      debugging model.
      
      IOW, let's just revert the commit that caused the problem for now.
      Reported-by: NGreg Thelen <gthelen@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f2394c9
  3. 09 4月, 2016 7 次提交
  4. 07 4月, 2016 1 次提交
    • F
      Btrfs: fix file/data loss caused by fsync after rename and new inode · 56f23fdb
      Filipe Manana 提交于
      If we rename an inode A (be it a file or a directory), create a new
      inode B with the old name of inode A and under the same parent directory,
      fsync inode B and then power fail, at log tree replay time we end up
      removing inode A completely. If inode A is a directory then all its files
      are gone too.
      
      Example scenarios where this happens:
      This is reproducible with the following steps, taken from a couple of
      test cases written for fstests which are going to be submitted upstream
      soon:
      
         # Scenario 1
      
         mkfs.btrfs -f /dev/sdc
         mount /dev/sdc /mnt
         mkdir -p /mnt/a/x
         echo "hello" > /mnt/a/x/foo
         echo "world" > /mnt/a/x/bar
         sync
         mv /mnt/a/x /mnt/a/y
         mkdir /mnt/a/x
         xfs_io -c fsync /mnt/a/x
         <power failure happens>
      
         The next time the fs is mounted, log tree replay happens and
         the directory "y" does not exist nor do the files "foo" and
         "bar" exist anywhere (neither in "y" nor in "x", nor the root
         nor anywhere).
      
         # Scenario 2
      
         mkfs.btrfs -f /dev/sdc
         mount /dev/sdc /mnt
         mkdir /mnt/a
         echo "hello" > /mnt/a/foo
         sync
         mv /mnt/a/foo /mnt/a/bar
         echo "world" > /mnt/a/foo
         xfs_io -c fsync /mnt/a/foo
         <power failure happens>
      
         The next time the fs is mounted, log tree replay happens and the
         file "bar" does not exists anymore. A file with the name "foo"
         exists and it matches the second file we created.
      
      Another related problem that does not involve file/data loss is when a
      new inode is created with the name of a deleted snapshot and we fsync it:
      
         mkfs.btrfs -f /dev/sdc
         mount /dev/sdc /mnt
         mkdir /mnt/testdir
         btrfs subvolume snapshot /mnt /mnt/testdir/snap
         btrfs subvolume delete /mnt/testdir/snap
         rmdir /mnt/testdir
         mkdir /mnt/testdir
         xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir
         <power failure>
      
         The next time the fs is mounted the log replay procedure fails because
         it attempts to delete the snapshot entry (which has dir item key type
         of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
         resulting in the following error that causes mount to fail:
      
         [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
         [52174.512570] ------------[ cut here ]------------
         [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
         [52174.514681] BTRFS: Transaction aborted (error -2)
         [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
         [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G        W       4.5.0-rc6-btrfs-next-27+ #1
         [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
         [52174.524053]  0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
         [52174.524053]  0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
         [52174.524053]  00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
         [52174.524053] Call Trace:
         [52174.524053]  [<ffffffff81264e93>] dump_stack+0x67/0x90
         [52174.524053]  [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2
         [52174.524053]  [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
         [52174.524053]  [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50
         [52174.524053]  [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs]
         [52174.524053]  [<ffffffff8118f5e9>] ? iput+0xb0/0x284
         [52174.524053]  [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
         [52174.524053]  [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs]
         [52174.524053]  [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs]
         [52174.524053]  [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs]
         [52174.524053]  [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs]
         [52174.524053]  [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs]
         [52174.524053]  [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs]
         [52174.524053]  [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs]
         [52174.524053]  [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs]
         [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
         [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
         [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
         [52174.524053]  [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs]
         [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
         [52174.524053]  [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3
         [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
         [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
         [52174.524053]  [<ffffffff8119590f>] do_mount+0x8a6/0x9e8
         [52174.524053]  [<ffffffff811358dd>] ? strndup_user+0x3f/0x59
         [52174.524053]  [<ffffffff81195c65>] SyS_mount+0x77/0x9f
         [52174.524053]  [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b
         [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---
      
      Fix this by forcing a transaction commit when such cases happen.
      This means we check in the commit root of the subvolume tree if there
      was any other inode with the same reference when the inode we are
      fsync'ing is a new inode (created in the current transaction).
      
      Test cases for fstests, covering all the scenarios given above, were
      submitted upstream for fstests:
      
        * fstests: generic test for fsync after renaming directory
          https://patchwork.kernel.org/patch/8694281/
      
        * fstests: generic test for fsync after renaming file
          https://patchwork.kernel.org/patch/8694301/
      
        * fstests: add btrfs test for fsync after snapshot deletion
          https://patchwork.kernel.org/patch/8670671/
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      56f23fdb
  5. 05 4月, 2016 2 次提交
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  6. 04 4月, 2016 9 次提交
  7. 03 4月, 2016 1 次提交
  8. 02 4月, 2016 1 次提交
  9. 01 4月, 2016 4 次提交
    • J
      ext4: retry block allocation for failed DIO and DAX writes · e84dfbe2
      Jan Kara 提交于
      Currently if block allocation for DIO or DAX write fails due to ENOSPC,
      we just returned it to userspace. However these ENOSPC errors can be
      transient because the transaction freeing blocks has not yet committed.
      This demonstrates as failures of generic/102 test when the filesystem is
      mounted with 'dax' mount option.
      
      Fix the problem by properly retrying the allocation in case of ENOSPC
      error in get blocks functions used for direct IO.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>                                                                               
      e84dfbe2
    • T
      ext4: add lockdep annotations for i_data_sem · daf647d2
      Theodore Ts'o 提交于
      With the internal Quota feature, mke2fs creates empty quota inodes and
      quota usage tracking is enabled as soon as the file system is mounted.
      Since quotacheck is no longer preallocating all of the blocks in the
      quota inode that are likely needed to be written to, we are now seeing
      a lockdep false positive caused by needing to allocate a quota block
      from inside ext4_map_blocks(), while holding i_data_sem for a data
      inode.  This results in this complaint:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&ei->i_data_sem);
                                      lock(&s->s_dquot.dqio_mutex);
                                      lock(&ei->i_data_sem);
         lock(&s->s_dquot.dqio_mutex);
      
      Google-Bug-Id: 27907753
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      daf647d2
    • M
      orangefs: minimum userspace version is 2.9.3 · 878dfd32
      Martin Brandenburg 提交于
      Version 2.9.4 isn't even released yet.
      Signed-off-by: NMartin Brandenburg <martin@omnibond.com>
      878dfd32
    • M
      orangefs: don't put readdir slot twice · 641bb324
      Martin Brandenburg 提交于
      This was quite an oversight. After a readdir, the module could not be
      unloaded, the number of slots is wrong, and memory near the slot bitmap
      is possibly corrupt. Oops.
      Signed-off-by: NMartin Brandenburg <martin@omnibond.com>
      641bb324
  10. 31 3月, 2016 5 次提交
    • A
      fix the braino in "namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()" · 7500c38a
      Al Viro 提交于
      We should try to trigger automount *before* bailing out on negative dentry.
      Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Reported-by: NArend van Spriel <arend@broadcom.com>
      Tested-by: NArend van Spriel <arend@broadcom.com>
      Tested-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7500c38a
    • T
      ext4: allow readdir()'s of large empty directories to be interrupted · 1028b55b
      Theodore Ts'o 提交于
      If a directory has a large number of empty blocks, iterating over all
      of them can take a long time, leading to scheduler warnings and users
      getting irritated when they can't kill a process in the middle of one
      of these long-running readdir operations.  Fix this by adding checks to
      ext4_readdir() and ext4_htree_fill_tree().
      Reported-by: NBenjamin LaHaise <bcrl@kvack.org>
      Google-Bug-Id: 27880676
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1028b55b
    • F
      btrfs: fix crash/invalid memory access on fsync when using overlayfs · de17e793
      Filipe Manana 提交于
      If the lower or upper directory of an overlayfs mount belong to a btrfs
      file system and we fsync the file through the overlayfs' merged directory
      we ended up accessing an inode that didn't belong to btrfs as if it were
      a btrfs inode at btrfs_sync_file() resulting in a crash like the following:
      
      [ 7782.588845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000544
      [ 7782.590624] IP: [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
      [ 7782.591931] PGD 4d954067 PUD 1e878067 PMD 0
      [ 7782.592016] Oops: 0002 [#6] PREEMPT SMP DEBUG_PAGEALLOC
      [ 7782.592016] Modules linked in: btrfs overlay ppdev crc32c_generic evdev xor raid6_pq psmouse pcspkr sg serio_raw acpi_cpufreq parport_pc parport tpm_tis i2c_piix4 tpm i2c_core processor button loop autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
      [ 7782.592016] CPU: 10 PID: 16437 Comm: xfs_io Tainted: G      D         4.5.0-rc6-btrfs-next-26+ #1
      [ 7782.592016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [ 7782.592016] task: ffff88001b8d40c0 ti: ffff880137488000 task.ti: ffff880137488000
      [ 7782.592016] RIP: 0010:[<ffffffffa030b7ab>]  [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
      [ 7782.592016] RSP: 0018:ffff88013748be40  EFLAGS: 00010286
      [ 7782.592016] RAX: 0000000080000000 RBX: ffff880133b30c88 RCX: 0000000000000001
      [ 7782.592016] RDX: 0000000000000001 RSI: ffffffff8148fec0 RDI: 00000000ffffffff
      [ 7782.592016] RBP: ffff88013748bec0 R08: 0000000000000001 R09: 0000000000000000
      [ 7782.624248] R10: ffff88013748be40 R11: 0000000000000246 R12: 0000000000000000
      [ 7782.624248] R13: 0000000000000000 R14: 00000000009305a0 R15: ffff880015e3be40
      [ 7782.624248] FS:  00007fa83b9cb700(0000) GS:ffff88023ed40000(0000) knlGS:0000000000000000
      [ 7782.624248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 7782.624248] CR2: 0000000000000544 CR3: 00000001fa652000 CR4: 00000000000006e0
      [ 7782.624248] Stack:
      [ 7782.624248]  ffffffff8108b5cc ffff88013748bec0 0000000000000246 ffff8800b005ded0
      [ 7782.624248]  ffff880133b30d60 8000000000000000 7fffffffffffffff 0000000000000246
      [ 7782.624248]  0000000000000246 ffffffff81074f9b ffffffff8104357c ffff880015e3be40
      [ 7782.624248] Call Trace:
      [ 7782.624248]  [<ffffffff8108b5cc>] ? arch_local_irq_save+0x9/0xc
      [ 7782.624248]  [<ffffffff81074f9b>] ? ___might_sleep+0xce/0x217
      [ 7782.624248]  [<ffffffff8104357c>] ? __do_page_fault+0x3c0/0x43a
      [ 7782.624248]  [<ffffffff811a2351>] vfs_fsync_range+0x8c/0x9e
      [ 7782.624248]  [<ffffffff811a237f>] vfs_fsync+0x1c/0x1e
      [ 7782.624248]  [<ffffffff811a24d6>] do_fsync+0x31/0x4a
      [ 7782.624248]  [<ffffffff811a2700>] SyS_fsync+0x10/0x14
      [ 7782.624248]  [<ffffffff81493617>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [ 7782.624248] Code: 85 c0 0f 85 e2 02 00 00 48 8b 45 b0 31 f6 4c 29 e8 48 ff c0 48 89 45 a8 48 8d 83 d8 00 00 00 48 89 c7 48 89 45 a0 e8 fc 43 18 e1 <f0> 41 ff 84 24 44 05 00 00 48 8b 83 58 ff ff ff 48 c1 e8 07 83
      [ 7782.624248] RIP  [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs]
      [ 7782.624248]  RSP <ffff88013748be40>
      [ 7782.624248] CR2: 0000000000000544
      [ 7782.661994] ---[ end trace 721e14960eb939bc ]---
      
      This started happening since commit 4bacc9c9 (overlayfs: Make f_path
      always point to the overlay and f_inode to the underlay) and even though
      after this change we could still access the btrfs inode through
      struct file->f_mapping->host or struct file->f_inode, we would end up
      resulting in more similar issues later on at check_parent_dirs_for_sync()
      because the dentry we got (from struct file->f_path.dentry) was from
      overlayfs and not from btrfs, that is, we had no way of getting the dentry
      that belonged to btrfs (we always got the dentry that belonged to
      overlayfs).
      
      The new patch from Miklos Szeredi, titled "vfs: add file_dentry()" and
      recently submitted to linux-fsdevel, adds a file_dentry() API that allows
      us to get the btrfs dentry from the input file and therefore being able
      to fsync when the upper and lower directories belong to btrfs filesystems.
      
      This issue has been reported several times by users in the mailing list
      and bugzilla. A test case for xfstests is being submitted as well.
      
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101951
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109791Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Cc: stable@vger.kernel.org
      de17e793
    • S
      f2fs: retrieve IO write stat from the right place · b2dde6fc
      Shuoran Liu 提交于
      In the following patch,
      
          f2fs: split journal cache from curseg cache
      
      journal cache is split from curseg cache. So IO write statistics should be
      retrived from journal cache but not curseg->sum_blk. Otherwise, it will
      get 0, and the stat is lost.
      Signed-off-by: NShuoran Liu <liushuoran@huawei.com>
      Reviewed-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b2dde6fc
    • J
      f2fs crypto: fix corrupted symlink in encrypted case · c90e09f7
      Jaegeuk Kim 提交于
      In the encrypted symlink case, we should check its corrupted symname after
      decrypting it.
      Otherwise, we can report -ENOENT incorrectly, if encrypted symname starts with
      '\0'.
      
      Cc: stable 4.5+ <stable@vger.kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c90e09f7
  11. 29 3月, 2016 3 次提交