1. 09 6月, 2019 1 次提交
    • F
      Btrfs: fix wrong ctime and mtime of a directory after log replay · 7301bbea
      Filipe Manana 提交于
      commit 5338e43abbab13791144d37fd8846847062351c6 upstream.
      
      When replaying a log that contains a new file or directory name that needs
      to be added to its parent directory, we end up updating the mtime and the
      ctime of the parent directory to the current time after we have set their
      values to the correct ones (set at fsync time), efectivelly losing them.
      
      Sample reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/dir
        $ touch /mnt/dir/file
      
        # fsync of the directory is optional, not needed
        $ xfs_io -c fsync /mnt/dir
        $ xfs_io -c fsync /mnt/dir/file
      
        $ stat -c %Y /mnt/dir
        1557856079
      
        <power failure>
      
        $ sleep 3
        $ mount /dev/sdb /mnt
        $ stat -c %Y /mnt/dir
        1557856082
      
          --> should have been 1557856079, the mtime is updated to the current
              time when replaying the log
      
      Fix this by not updating the mtime and ctime to the current time at
      btrfs_add_link() when we are replaying a log tree.
      
      This could be triggered by my recent fsync fuzz tester for fstests, for
      which an fstests patch exists titled "fstests: generic, fsync fuzz tester
      with fsstress".
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7301bbea
  2. 13 2月, 2019 1 次提交
    • E
      btrfs: use tagged writepage to mitigate livelock of snapshot · f5d5b543
      Ethan Lien 提交于
      [ Upstream commit 3cd24c698004d2f7668e0eb9fc1f096f533c791b ]
      
      Snapshot is expected to be fast. But if there are writers steadily
      creating dirty pages in our subvolume, the snapshot may take a very long
      time to complete. To fix the problem, we use tagged writepage for
      snapshot flusher as we do in the generic write_cache_pages(), so we can
      omit pages dirtied after the snapshot command.
      
      This does not change the semantics regarding which data get to the
      snapshot, if there are pages being dirtied during the snapshotting
      operation.  There's a sync called before snapshot is taken in old/new
      case, any IO in flight just after that may be in the snapshot but this
      depends on other system effects that might still sync the IO.
      
      We do a simple snapshot speed test on a Intel D-1531 box:
      
      fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
      --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
      --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
      time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
      
      original: 1m58sec
      patched:  6.54sec
      
      This is the best case for this patch since for a sequential write case,
      we omit nearly all pages dirtied after the snapshot command.
      
      For a multi writers, random write test:
      
      fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
      --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
      --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
      time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
      
      original: 15.83sec
      patched:  10.35sec
      
      The improvement is smaller compared to the sequential write case,
      since we omit only half of the pages dirtied after snapshot command.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NEthan Lien <ethanlien@synology.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f5d5b543
  3. 26 1月, 2019 1 次提交
  4. 23 1月, 2019 1 次提交
    • D
      Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io" · 4675f90e
      David Sterba 提交于
      commit 77b7aad195099e7c6da11e94b7fa6ef5e6fb0025 upstream.
      
      This reverts commit e73e81b6.
      
      This patch causes a few problems:
      
      - adds latency to btrfs_finish_ordered_io
      - as btrfs_finish_ordered_io is used for free space cache, generating
        more work from btrfs_btree_balance_dirty_nodelay could end up in the
        same workque, effectively deadlocking
      
      12260 kworker/u96:16+btrfs-freespace-write D
      [<0>] balance_dirty_pages+0x6e6/0x7ad
      [<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
      [<0>] btrfs_finish_ordered_io+0x3da/0x770
      [<0>] normal_work_helper+0x1c5/0x5a0
      [<0>] process_one_work+0x1ee/0x5a0
      [<0>] worker_thread+0x46/0x3d0
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      Transaction commit will wait on the freespace cache:
      
      838 btrfs-transacti D
      [<0>] btrfs_start_ordered_extent+0x154/0x1e0
      [<0>] btrfs_wait_ordered_range+0xbd/0x110
      [<0>] __btrfs_wait_cache_io+0x49/0x1a0
      [<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
      [<0>] commit_cowonly_roots+0x215/0x2b0
      [<0>] btrfs_commit_transaction+0x37e/0x910
      [<0>] transaction_kthread+0x14d/0x180
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      And then writepages ends up waiting on transaction commit:
      
      9520 kworker/u96:13+flush-btrfs-1 D
      [<0>] wait_current_trans+0xac/0xe0
      [<0>] start_transaction+0x21b/0x4b0
      [<0>] cow_file_range_inline+0x10b/0x6b0
      [<0>] cow_file_range.isra.69+0x329/0x4a0
      [<0>] run_delalloc_range+0x105/0x3c0
      [<0>] writepage_delalloc+0x119/0x180
      [<0>] __extent_writepage+0x10c/0x390
      [<0>] extent_write_cache_pages+0x26f/0x3d0
      [<0>] extent_writepages+0x4f/0x80
      [<0>] do_writepages+0x17/0x60
      [<0>] __writeback_single_inode+0x59/0x690
      [<0>] writeback_sb_inodes+0x291/0x4e0
      [<0>] __writeback_inodes_wb+0x87/0xb0
      [<0>] wb_writeback+0x3bb/0x500
      [<0>] wb_workfn+0x40d/0x610
      [<0>] process_one_work+0x1ee/0x5a0
      [<0>] worker_thread+0x1e0/0x3d0
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      Eventually, we have every process in the system waiting on
      balance_dirty_pages(), and nobody is able to make progress on page
      writeback.
      
      The original patch tried to fix an OOM condition, that happened on 4.4 but no
      success reproducing that on later kernels (4.19 and 4.20). This is more likely
      a problem in OOM itself.
      
      Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/Reported-by: NChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org # 4.18+
      CC: ethanlien <ethanlien@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4675f90e
  5. 10 1月, 2019 2 次提交
    • F
      Btrfs: fix fsync of files with multiple hard links in new directories · 10b04210
      Filipe Manana 提交于
      commit 41bd60676923822de1df2c50b3f9a10171f4338a upstream.
      
      The log tree has a long standing problem that when a file is fsync'ed we
      only check for new ancestors, created in the current transaction, by
      following only the hard link for which the fsync was issued. We follow the
      ancestors using the VFS' dget_parent() API. This means that if we create a
      new link for a file in a directory that is new (or in an any other new
      ancestor directory) and then fsync the file using an old hard link, we end
      up not logging the new ancestor, and on log replay that new hard link and
      ancestor do not exist. In some cases, involving renames, the file will not
      exist at all.
      
      Example:
      
        mkfs.btrfs -f /dev/sdb
        mount /dev/sdb /mnt
      
        mkdir /mnt/A
        touch /mnt/foo
        ln /mnt/foo /mnt/A/bar
        xfs_io -c fsync /mnt/foo
      
        <power failure>
      
      In this example after log replay only the hard link named 'foo' exists
      and directory A does not exist, which is unexpected. In other major linux
      filesystems, such as ext4, xfs and f2fs for example, both hard links exist
      and so does directory A after mounting again the filesystem.
      
      Checking if any new ancestors are new and need to be logged was added in
      2009 by commit 12fcfd22 ("Btrfs: tree logging unlink/rename fixes"),
      however only for the ancestors of the hard link (dentry) for which the
      fsync was issued, instead of checking for all ancestors for all of the
      inode's hard links.
      
      So fix this by tracking the id of the last transaction where a hard link
      was created for an inode and then on fsync fallback to a full transaction
      commit when an inode has more than one hard link and at least one new hard
      link was created in the current transaction. This is the simplest solution
      since this is not a common use case (adding frequently hard links for
      which there's an ancestor created in the current transaction and then
      fsync the file). In case it ever becomes a common use case, a solution
      that consists of iterating the fs/subvol btree for each hard link and
      check if any ancestor is new, could be implemented.
      
      This solves many unexpected scenarios reported by Jayashree Mohan and
      Vijay Chidambaram, and for which there is a new test case for fstests
      under review.
      
      Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
      CC: stable@vger.kernel.org # 4.4+
      Reported-by: NVijay Chidambaram <vvijay03@gmail.com>
      Reported-by: NJayashree Mohan <jayashree2912@gmail.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10b04210
    • L
      btrfs: skip file_extent generation check for free_space_inode in run_delalloc_nocow · 7708a830
      Lu Fengqi 提交于
      commit 27a7ff554e8d349627a90bda275c527b7348adae upstream.
      
      The test case btrfs/001 with inode_cache mount option will encounter the
      following warning:
      
        WARNING: CPU: 1 PID: 23700 at fs/btrfs/inode.c:956 cow_file_range.isra.19+0x32b/0x430 [btrfs]
        CPU: 1 PID: 23700 Comm: btrfs Kdump: loaded Tainted: G        W  O      4.20.0-rc4-custom+ #30
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:cow_file_range.isra.19+0x32b/0x430 [btrfs]
        Call Trace:
         ? free_extent_buffer+0x46/0x90 [btrfs]
         run_delalloc_nocow+0x455/0x900 [btrfs]
         btrfs_run_delalloc_range+0x1a7/0x360 [btrfs]
         writepage_delalloc+0xf9/0x150 [btrfs]
         __extent_writepage+0x125/0x3e0 [btrfs]
         extent_write_cache_pages+0x1b6/0x3e0 [btrfs]
         ? __wake_up_common_lock+0x63/0xc0
         extent_writepages+0x50/0x80 [btrfs]
         do_writepages+0x41/0xd0
         ? __filemap_fdatawrite_range+0x9e/0xf0
         __filemap_fdatawrite_range+0xbe/0xf0
         btrfs_fdatawrite_range+0x1b/0x50 [btrfs]
         __btrfs_write_out_cache+0x42c/0x480 [btrfs]
         btrfs_write_out_ino_cache+0x84/0xd0 [btrfs]
         btrfs_save_ino_cache+0x551/0x660 [btrfs]
         commit_fs_roots+0xc5/0x190 [btrfs]
         btrfs_commit_transaction+0x2bf/0x8d0 [btrfs]
         btrfs_mksubvol+0x48d/0x4d0 [btrfs]
         btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs]
         btrfs_ioctl_snap_create_v2+0x124/0x180 [btrfs]
         btrfs_ioctl+0x123f/0x3030 [btrfs]
      
      The file extent generation of the free space inode is equal to the last
      snapshot of the file root, so the inode will be passed to cow_file_rage.
      But the inode was created and its extents were preallocated in
      btrfs_save_ino_cache, there are no cow copies on disk.
      
      The preallocated extent is not yet in the extent tree, and
      btrfs_cross_ref_exist will ignore the -ENOENT returned by
      check_committed_ref, so we can directly write the inode to the disk.
      
      Fixes: 78d4295b ("btrfs: lift some btrfs_cross_ref_exist checks in nocow path")
      CC: stable@vger.kernel.org # 4.18+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7708a830
  6. 21 11月, 2018 1 次提交
    • R
      Btrfs: fix cur_offset in the error case for nocow · db39065c
      Robbie Ko 提交于
      commit 506481b20e818db40b6198815904ecd2d6daee64 upstream.
      
      When the cow_file_range fails, the related resources are unlocked
      according to the range [start..end), so the unlock cannot be repeated in
      run_delalloc_nocow.
      
      In some cases (e.g. cur_offset <= end && cow_start != -1), cur_offset is
      not updated correctly, so move the cur_offset update before
      cow_file_range.
      
        kernel BUG at mm/page-writeback.c:2663!
        Internal error: Oops - BUG: 0 [#1] SMP
        CPU: 3 PID: 31525 Comm: kworker/u8:7 Tainted: P O
        Hardware name: Realtek_RTD1296 (DT)
        Workqueue: writeback wb_workfn (flush-btrfs-1)
        task: ffffffc076db3380 ti: ffffffc02e9ac000 task.ti: ffffffc02e9ac000
        PC is at clear_page_dirty_for_io+0x1bc/0x1e8
        LR is at clear_page_dirty_for_io+0x14/0x1e8
        pc : [<ffffffc00033c91c>] lr : [<ffffffc00033c774>] pstate: 40000145
        sp : ffffffc02e9af4f0
        Process kworker/u8:7 (pid: 31525, stack limit = 0xffffffc02e9ac020)
        Call trace:
        [<ffffffc00033c91c>] clear_page_dirty_for_io+0x1bc/0x1e8
        [<ffffffbffc514674>] extent_clear_unlock_delalloc+0x1e4/0x210 [btrfs]
        [<ffffffbffc4fb168>] run_delalloc_nocow+0x3b8/0x948 [btrfs]
        [<ffffffbffc4fb948>] run_delalloc_range+0x250/0x3a8 [btrfs]
        [<ffffffbffc514c0c>] writepage_delalloc.isra.21+0xbc/0x1d8 [btrfs]
        [<ffffffbffc516048>] __extent_writepage+0xe8/0x248 [btrfs]
        [<ffffffbffc51630c>] extent_write_cache_pages.isra.17+0x164/0x378 [btrfs]
        [<ffffffbffc5185a8>] extent_writepages+0x48/0x68 [btrfs]
        [<ffffffbffc4f5828>] btrfs_writepages+0x20/0x30 [btrfs]
        [<ffffffc00033d758>] do_writepages+0x30/0x88
        [<ffffffc0003ba0f4>] __writeback_single_inode+0x34/0x198
        [<ffffffc0003ba6c4>] writeback_sb_inodes+0x184/0x3c0
        [<ffffffc0003ba96c>] __writeback_inodes_wb+0x6c/0xc0
        [<ffffffc0003bac20>] wb_writeback+0x1b8/0x1c0
        [<ffffffc0003bb0f0>] wb_workfn+0x150/0x250
        [<ffffffc0002b0014>] process_one_work+0x1dc/0x388
        [<ffffffc0002b02f0>] worker_thread+0x130/0x500
        [<ffffffc0002b6344>] kthread+0x10c/0x110
        [<ffffffc000284590>] ret_from_fork+0x10/0x40
        Code: d503201f a9025bb5 a90363b7 f90023b9 (d4210000)
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db39065c
  7. 14 11月, 2018 3 次提交
  8. 23 8月, 2018 1 次提交
    • F
      Btrfs: sync log after logging new name · d4682ba0
      Filipe Manana 提交于
      When we add a new name for an inode which was logged in the current
      transaction, we update the inode in the log so that its new name and
      ancestors are added to the log. However when we do this we do not persist
      the log, so the changes remain in memory only, and as a consequence, any
      ancestors that were created in the current transaction are updated such
      that future calls to btrfs_inode_in_log() return true. This leads to a
      subsequent fsync against such new ancestor directories returning
      immediately, without persisting the log, therefore after a power failure
      the new ancestor directories do not exist, despite fsync being called
      against them explicitly.
      
      Example:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/A
        $ mkdir /mnt/B
        $ mkdir /mnt/A/C
        $ touch /mnt/B/foo
        $ xfs_io -c "fsync" /mnt/B/foo
        $ ln /mnt/B/foo /mnt/A/C/foo
        $ xfs_io -c "fsync" /mnt/A
        <power failure>
      
      After the power failure, directory "A" does not exist, despite the explicit
      fsync on it.
      
      Instead of fixing this by changing the behaviour of the explicit fsync on
      directory "A" to persist the log instead of doing nothing, make the logging
      of the new file name (which happens when creating a hard link or renaming)
      persist the log. This approach not only is simpler, not requiring addition
      of new fields to the inode in memory structure, but also gives us the same
      behaviour as ext4, xfs and f2fs (possibly other filesystems too).
      
      A test case for fstests follows soon.
      
      Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
      Reported-by: NVijay Chidambaram <vvijay03@gmail.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d4682ba0
  9. 18 8月, 2018 1 次提交
    • R
      Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space · 8ecebf4d
      Robbie Ko 提交于
      Commit e9894fd3 ("Btrfs: fix snapshot vs nocow writting") forced
      nocow writes to fallback to COW, during writeback, when a snapshot is
      created. This resulted in writes made before creating the snapshot to
      unexpectedly fail with ENOSPC during writeback when success (0) was
      returned to user space through the write system call.
      
      The steps leading to this problem are:
      
      1. When it's not possible to allocate data space for a write, the
         buffered write path checks if a NOCOW write is possible.  If it is,
         it will not reserve space and success (0) is returned to user space.
      
      2. Then when a snapshot is created, the root's will_be_snapshotted
         atomic is incremented and writeback is triggered for all inode's that
         belong to the root being snapshotted. Incrementing that atomic forces
         all previous writes to fallback to COW during writeback (running
         delalloc).
      
      3. This results in the writeback for the inodes to fail and therefore
         setting the ENOSPC error in their mappings, so that a subsequent
         fsync on them will report the error to user space. So it's not a
         completely silent data loss (since fsync will report ENOSPC) but it's
         a very unexpected and undesirable behaviour, because if a clean
         shutdown/unmount of the filesystem happens without previous calls to
         fsync, it is expected to have the data present in the files after
         mounting the filesystem again.
      
      So fix this by adding a new atomic named snapshot_force_cow to the
      root structure which prevents this behaviour and works the following way:
      
      1. It is incremented when we start to create a snapshot after triggering
         writeback and before waiting for writeback to finish.
      
      2. This new atomic is now what is used by writeback (running delalloc)
         to decide whether we need to fallback to COW or not. Because we
         incremented this new atomic after triggering writeback in the
         snapshot creation ioctl, we ensure that all buffered writes that
         happened before snapshot creation will succeed and not fallback to
         COW (which would make them fail with ENOSPC).
      
      3. The existing atomic, will_be_snapshotted, is kept because it is used
         to force new buffered writes, that start after we started
         snapshotting, to reserve data space even when NOCOW is possible.
         This makes these writes fail early with ENOSPC when there's no
         available space to allocate, preventing the unexpected behaviour of
         writeback later failing with ENOSPC due to a fallback to COW mode.
      
      Fixes: e9894fd3 ("Btrfs: fix snapshot vs nocow writting")
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8ecebf4d
  10. 06 8月, 2018 18 次提交
  11. 04 8月, 2018 1 次提交
  12. 28 6月, 2018 1 次提交
    • C
      Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion · 717beb96
      Chris Mason 提交于
      The vm_fault_t conversion commit introduced a ret2 variable for tracking
      the integer return values from internal btrfs functions.  It was
      sometimes returning VM_FAULT_LOCKED for pages that were actually invalid
      and had been removed from the radix.  Something like this:
      
          ret2 = btrfs_delalloc_reserve_space() // returns zero on success
      
          lock_page(page)
          if (page->mapping != inode->i_mapping)
      	goto out_unlock;
      
      ...
      
      out_unlock:
          if (!ret2) {
      	    ...
      	    return VM_FAULT_LOCKED;
          }
      
      This ends up triggering this WARNING in btrfs_destroy_inode()
          WARN_ON(BTRFS_I(inode)->block_rsv.size);
      
      xfstests generic/095 was able to reliably reproduce the errors.
      
      Since out_unlock: is only used for errors, this fix moves it below the
      if (!ret2) check we use to return VM_FAULT_LOCKED for success.
      
      Fixes: a528a241 (btrfs: change return type of btrfs_page_mkwrite to vm_fault_t)
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      717beb96
  13. 22 6月, 2018 1 次提交
    • F
      Btrfs: fix return value on rename exchange failure · c5b4a50b
      Filipe Manana 提交于
      If we failed during a rename exchange operation after starting/joining a
      transaction, we would end up replacing the return value, stored in the
      local 'ret' variable, with the return value from btrfs_end_transaction().
      So this could end up returning 0 (success) to user space despite the
      operation having failed and aborted the transaction, because if there are
      multiple tasks having a reference on the transaction at the time
      btrfs_end_transaction() is called by the rename exchange, that function
      returns 0 (otherwise it returns -EIO and not the original error value).
      So fix this by not overwriting the return value on error after getting
      a transaction handle.
      
      Fixes: cdd1fedf ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
      CC: stable@vger.kernel.org # 4.9+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c5b4a50b
  14. 07 6月, 2018 1 次提交
  15. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  16. 31 5月, 2018 3 次提交
    • O
      Btrfs: clean up error handling in btrfs_truncate() · ad7e1a74
      Omar Sandoval 提交于
      btrfs_truncate() uses two variables for error handling, ret and err (if
      this sounds familiar, it's because btrfs_truncate_inode_items() did
      something similar). This is error prone, as was made evident by "Btrfs:
      fix error handling in btrfs_truncate()". We only have err because we
      don't want to mask an error if we call btrfs_update_inode() and
      btrfs_end_transaction(), so let's make that its own scoped return
      variable and use ret everywhere else.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ad7e1a74
    • N
      btrfs: Factor out write portion of btrfs_get_blocks_direct · c5794e51
      Nikolay Borisov 提交于
      Now that the read side is extracted into its own function, do the same
      to the write side. This leaves btrfs_get_blocks_direct_write with the
      sole purpose of handling common locking required. Also flip the
      condition in btrfs_get_blocks_direct_write so that the write case
      comes first and we check for if (Create) rather than if (!create). This
      is purely subjective but I believe makes reading a bit more "linear".
      No functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c5794e51
    • N
      btrfs: Factor out read portion of btrfs_get_blocks_direct · 1c8d0175
      Nikolay Borisov 提交于
      Currently this function handles both the READ and WRITE dio cases. This
      is facilitated by a bunch of 'if' statements, a goto short-circuit
      statement and a very perverse aliasing of "!created"(READ) case
      by setting lockstart = lockend and checking for lockstart < lockend for
      detecting the write. Let's simplify this mess by extracting the
      READ-only code into a separate __btrfs_get_block_direct_read function.
      This is only the first step, the next one will be to factor out the
      write side as well. The end goal will be to have the common locking/
      unlocking code in btrfs_get_blocks_direct and then it will call either
      the read|write subvariants. No functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1c8d0175
  17. 30 5月, 2018 2 次提交