1. 31 1月, 2019 8 次提交
    • D
      convert do_remount_sb() to fs_context · 8d0347f6
      David Howells 提交于
      Replace do_remount_sb() with a function, reconfigure_super(), that's
      fs_context aware.  The fs_context is expected to be parameterised already
      and have ->root pointing to the superblock to be reconfigured.
      
      A legacy wrapper is provided that is intended to be called from the
      fs_context ops when those appear, but for now is called directly from
      reconfigure_super().  This wrapper invokes the ->remount_fs() superblock op
      for the moment.  It is intended that the remount_fs() op will be phased
      out.
      
      The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate
      that the context is being used for reconfiguration.
      
      do_umount_root() is provided to consolidate remount-to-R/O for umount and
      emergency remount by creating a context and invoking reconfiguration.
      
      do_remount(), do_umount() and do_emergency_remount_callback() are switched
      to use the new process.
      
      [AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the
      umount / bug, gets rid of pointless complexity]
      [AV -- set ->net_ns in all cases; nfs remount will need that]
      [AV -- shift security_sb_remount() call into reconfigure_super(); the callers
      that didn't do security_sb_remount() have NULL fc->security anyway, so it's
      a no-op for them]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Co-developed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d0347f6
    • A
      vfs_get_tree(): evict the call of security_sb_kern_mount() · c9ce29ed
      Al Viro 提交于
      Right now vfs_get_tree() calls security_sb_kern_mount() (i.e.
      mount MAC) unless it gets MS_KERNMOUNT or MS_SUBMOUNT in flags.
      Doing it that way is both clumsy and imprecise.
      
      Consider the callers' tree of vfs_get_tree():
      vfs_get_tree()
              <- do_new_mount()
      	<- vfs_kern_mount()
      		<- simple_pin_fs()
      		<- vfs_submount()
      		<- kern_mount_data()
      		<- init_mount_tree()
      		<- btrfs_mount()
      			<- vfs_get_tree()
      		<- nfs_do_root_mount()
      			<- nfs4_try_mount()
      				<- nfs_fs_mount()
      					<- vfs_get_tree()
      			<- nfs4_referral_mount()
      
      do_new_mount() always does need MAC (we are guaranteed that neither
      MS_KERNMOUNT nor MS_SUBMOUNT will be passed there).
      
      simple_pin_fs(), vfs_submount() and kern_mount_data() pass explicit
      flags inhibiting that check.  So does nfs4_referral_mount() (the
      flags there are ulimately coming from vfs_submount()).
      
      init_mount_tree() is called too early for anything LSM-related; it
      doesn't matter whether we attempt those checks, they'll do nothing.
      
      Finally, in case of btrfs_mount() and nfs_fs_mount(), doing MAC
      is pointless - either the caller will do it, or the flags are
      such that we wouldn't have done it either.
      
      In other words, the one and only case when we want that check
      done is when we are called from do_new_mount(), and there we
      want it unconditionally.
      
      So let's simply move it there.  The superblock is still locked,
      so nobody is going to get access to it (via ustat(2), etc.)
      until we get a chance to apply the checks - we are free to
      move them to any point up to where we drop ->s_umount (in
      do_new_mount_fc()).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c9ce29ed
    • D
      new helper: do_new_mount_fc() · 132e4608
      David Howells 提交于
      Create an fs_context-aware version of do_new_mount().  This takes an
      fs_context with a superblock already attached to it.
      
      Make do_new_mount() use do_new_mount_fc() rather than do_new_mount(); this
      allows the consolidation of the mount creation, check and add steps.
      
      To make this work, mount_too_revealing() is changed to take a superblock
      rather than a mount (which the fs_context doesn't have available), allowing
      this check to be done before the mount object is created.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Co-developed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      132e4608
    • A
      teach vfs_get_tree() to handle subtype, switch do_new_mount() to it · a0c9a8b8
      Al Viro 提交于
      Roll the handling of subtypes into do_new_mount() and vfs_get_tree().  The
      former determines any subtype string and hangs it off the fs_context; the
      latter applies it.
      
      Make do_new_mount() create, parameterise and commit an fs_context and
      create a mount for itself rather than calling vfs_kern_mount().
      
      [AV -- missing kstrdup()]
      [AV -- ... and no kstrdup() if we get to setting ->s_submount - we
      simply transfer it from fc, leaving NULL behind]
      [AV -- constify ->s_submount, while we are at it]
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a0c9a8b8
    • A
      new helpers: vfs_create_mount(), fc_mount() · 8f291889
      Al Viro 提交于
      Create a new helper, vfs_create_mount(), that creates a detached vfsmount
      object from an fs_context that has a superblock attached to it.
      
      Almost all uses will be paired with immediately preceding vfs_get_tree();
      add a helper for such combination.
      
      Switch vfs_kern_mount() to use this.
      
      NOTE: mild behaviour change; passing NULL as 'device name' to
      something like procfs will change /proc/*/mountstats - "device none"
      instead on "no device".  That is consistent with /proc/mounts et.al.
      
      [do'h - EXPORT_SYMBOL_GPL slipped in by mistake; removed]
      [AV -- remove confused comment from vfs_create_mount()]
      [AV -- removed the second argument]
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8f291889
    • D
      vfs: Introduce fs_context, switch vfs_kern_mount() to it. · 9bc61ab1
      David Howells 提交于
      Introduce a filesystem context concept to be used during superblock
      creation for mount and superblock reconfiguration for remount.  This is
      allocated at the beginning of the mount procedure and into it is placed:
      
       (1) Filesystem type.
      
       (2) Namespaces.
      
       (3) Source/Device names (there may be multiple).
      
       (4) Superblock flags (SB_*).
      
       (5) Security details.
      
       (6) Filesystem-specific data, as set by the mount options.
      
      Accessor functions are then provided to set up a context, parameterise it
      from monolithic mount data (the data page passed to mount(2)) and tear it
      down again.
      
      A legacy wrapper is provided that implements what will be the basic
      operations, wrapping access to filesystems that aren't yet aware of the
      fs_context.
      
      Finally, vfs_kern_mount() is changed to make use of the fs_context and
      mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount().
      [AV -- add missing kstrdup()]
      [AV -- put_cred() can be unconditional - fc->cred can't be NULL]
      [AV -- take legacy_validate() contents into legacy_parse_monolithic()]
      [AV -- merge KERNEL_MOUNT and USER_MOUNT]
      [AV -- don't unlock superblock on success return from vfs_get_tree()]
      [AV -- kill 'reference' argument of init_fs_context()]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Co-developed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9bc61ab1
    • A
      saner handling of temporary namespaces · 74e83122
      Al Viro 提交于
      mount_subtree() creates (and soon destroys) a temporary namespace,
      so that automounts could function normally.  These beasts should
      never become anyone's current namespaces; they don't, but it would
      be better to make prevention of that more straightforward.  And
      since they don't become anyone's current namespace, we don't need
      to bother with reserving procfs inums for those.
      
      Teach alloc_mnt_ns() to skip inum allocation if told so, adjust
      put_mnt_ns() accordingly, make mount_subtree() use temporary
      (anon) namespace.  is_anon_ns() checks if a namespace is such.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      74e83122
    • A
      separate copying and locking mount tree on cross-userns copies · 3bd045cc
      Al Viro 提交于
      Rather than having propagate_mnt() check doing unprivileged copies,
      lock them before commit_tree().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3bd045cc
  2. 18 1月, 2019 2 次提交
    • A
      kill kernfs_pin_sb() · 6d7fbce7
      Al Viro 提交于
      unused now and impossible to use safely anyway.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6d7fbce7
    • A
      fix cgroup_do_mount() handling of failure exits · 399504e2
      Al Viro 提交于
      same story as with last May fixes in sysfs (7b745a4e
      "unfuck sysfs_mount()"); new_sb is left uninitialized
      in case of early errors in kernfs_mount_ns() and papering
      over it by treating any error from kernfs_mount_ns() as
      equivalent to !new_ns ends up conflating the cases when
      objects had never been transferred to a superblock with
      ones when that has happened and resulting new superblock
      had been dropped.  Easily fixed (same way as in sysfs
      case).  Additionally, there's a superblock leak on
      kernfs_node_dentry() failure *and* a dentry leak inside
      kernfs_node_dentry() itself - the latter on probably
      impossible errors, but the former not impossible to trigger
      (as the matter of fact, injecting allocation failures
      at that point *does* trigger it).
      
      Cc: stable@kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      399504e2
  3. 11 1月, 2019 14 次提交
  4. 09 1月, 2019 4 次提交
    • F
      Btrfs: fix deadlock when using free space tree due to block group creation · a6d8654d
      Filipe Manana 提交于
      When modifying the free space tree we can end up COWing one of its extent
      buffers which in turn might result in allocating a new chunk, which in
      turn can result in flushing (finish creation) of pending block groups. If
      that happens we can deadlock because creating a pending block group needs
      to update the free space tree, and if any of the updates tries to modify
      the same extent buffer that we are COWing, we end up in a deadlock since
      we try to write lock twice the same extent buffer.
      
      So fix this by skipping pending block group creation if we are COWing an
      extent buffer from the free space tree. This is a case missed by commit
      5ce55557 ("Btrfs: fix deadlock when writing out free space caches").
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202173
      Fixes: 5ce55557 ("Btrfs: fix deadlock when writing out free space caches")
      CC: stable@vger.kernel.org # 4.18+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a6d8654d
    • F
      Btrfs: fix race between reflink/dedupe and relocation · d8b55242
      Filipe Manana 提交于
      The recent rework that makes btrfs' remap_file_range operation use the
      generic helper generic_remap_file_range_prep() introduced a race between
      relocation and reflinking (for both cloning and deduplication) the file
      extents between the source and destination inodes.
      
      This happens because we no longer lock the source range anymore, and we do
      not lock it anymore because we wait for direct IO writes and writeback to
      complete early on the code path right after locking the inodes, which
      guarantees no other file operations interfere with the reflinking. However
      there is one exception which is relocation, since it replaces the byte
      number of file extents items in the fs tree after locking the range the
      file extent items represent. This is a problem because after finding each
      file extent to clone in the fs tree, the reflink process copies the file
      extent item into a local buffer, releases the search path, inserts new
      file extent items in the destination range and then increments the
      reference count for the extent mentioned in the file extent item that it
      previously copied to the buffer. If right after copying the file extent
      item into the buffer and releasing the path the relocation process
      updates the file extent item to point to the new extent, the reflink
      process ends up creating a delayed reference to increment the reference
      count of the old extent, for which the relocation process already created
      a delayed reference to drop it. This results in failure to run delayed
      references because we will attempt to increment the count of a reference
      that was already dropped. This is illustrated by the following diagram:
      
              CPU 1                                       CPU 2
      
                                              relocation is running
      
        btrfs_clone_files()
      
          btrfs_clone()
            --> finds extent item
                in source range
                point to extent
                at bytenr X
            --> copies it into a
                local buffer
            --> releases path
      
                                              replace_file_extents()
                                                --> successfully locks the
                                                    range represented by
                                                    the file extent item
                                                --> replaces disk_bytenr
                                                    field in the file
                                                    extent item with some
                                                    other value Y
                                                --> creates delayed reference
                                                    to increment reference
                                                    count for extent at
                                                    bytenr Y
                                                --> creates delayed reference
                                                    to drop the extent at
                                                    bytenr X
      
            --> starts transaction
            --> creates delayed
                reference to
                increment extent
                at bytenr X
      
                          <delayed references are run, due to a transaction
                           commit for example, and the transaction is aborted
                           with -EIO because we attempt to increment reference
                           count for the extent at bytenr X after we freed it>
      
      When this race is hit the running transaction ends up getting aborted with
      an -EIO error and a trace like the following is produced:
      
      [ 4382.553858] WARNING: CPU: 2 PID: 3648 at fs/btrfs/extent-tree.c:1552 lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
      (...)
      [ 4382.556293] CPU: 2 PID: 3648 Comm: btrfs Tainted: G        W         4.20.0-rc6-btrfs-next-41 #1
      [ 4382.556294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
      [ 4382.556308] RIP: 0010:lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
      (...)
      [ 4382.556310] RSP: 0018:ffffac784408f738 EFLAGS: 00010202
      [ 4382.556311] RAX: 0000000000000001 RBX: ffff8980673c3a48 RCX: 0000000000000001
      [ 4382.556312] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
      [ 4382.556312] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
      [ 4382.556313] R10: 0000000000000001 R11: ffff897f40000000 R12: 0000000000001000
      [ 4382.556313] R13: 00000000c224f000 R14: ffff89805de9bd40 R15: ffff8980453f4548
      [ 4382.556315] FS:  00007f5e759178c0(0000) GS:ffff89807b300000(0000) knlGS:0000000000000000
      [ 4382.563130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4382.563562] CR2: 00007f2e9789fcbc CR3: 0000000120512001 CR4: 00000000003606e0
      [ 4382.564005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 4382.564451] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 4382.564887] Call Trace:
      [ 4382.565343]  insert_inline_extent_backref+0x55/0xe0 [btrfs]
      [ 4382.565796]  __btrfs_inc_extent_ref.isra.60+0x88/0x260 [btrfs]
      [ 4382.566249]  ? __btrfs_run_delayed_refs+0x93/0x1650 [btrfs]
      [ 4382.566702]  __btrfs_run_delayed_refs+0xa22/0x1650 [btrfs]
      [ 4382.567162]  btrfs_run_delayed_refs+0x7e/0x1d0 [btrfs]
      [ 4382.567623]  btrfs_commit_transaction+0x50/0x9c0 [btrfs]
      [ 4382.568112]  ? _raw_spin_unlock+0x24/0x30
      [ 4382.568557]  ? block_rsv_release_bytes+0x14e/0x410 [btrfs]
      [ 4382.569006]  create_subvol+0x3c8/0x830 [btrfs]
      [ 4382.569461]  ? btrfs_mksubvol+0x317/0x600 [btrfs]
      [ 4382.569906]  btrfs_mksubvol+0x317/0x600 [btrfs]
      [ 4382.570383]  ? rcu_sync_lockdep_assert+0xe/0x60
      [ 4382.570822]  ? __sb_start_write+0xd4/0x1c0
      [ 4382.571262]  ? mnt_want_write_file+0x24/0x50
      [ 4382.571712]  btrfs_ioctl_snap_create_transid+0x117/0x1a0 [btrfs]
      [ 4382.572155]  ? _copy_from_user+0x66/0x90
      [ 4382.572602]  btrfs_ioctl_snap_create+0x66/0x80 [btrfs]
      [ 4382.573052]  btrfs_ioctl+0x7c1/0x30e0 [btrfs]
      [ 4382.573502]  ? mem_cgroup_commit_charge+0x8b/0x570
      [ 4382.573946]  ? do_raw_spin_unlock+0x49/0xc0
      [ 4382.574379]  ? _raw_spin_unlock+0x24/0x30
      [ 4382.574803]  ? __handle_mm_fault+0xf29/0x12d0
      [ 4382.575215]  ? do_vfs_ioctl+0xa2/0x6f0
      [ 4382.575622]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
      [ 4382.576020]  do_vfs_ioctl+0xa2/0x6f0
      [ 4382.576405]  ksys_ioctl+0x70/0x80
      [ 4382.576776]  __x64_sys_ioctl+0x16/0x20
      [ 4382.577137]  do_syscall_64+0x60/0x1b0
      [ 4382.577488]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      (...)
      [ 4382.578837] RSP: 002b:00007ffe04bf64c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [ 4382.579174] RAX: ffffffffffffffda RBX: 00005564136f3050 RCX: 00007f5e74724dd7
      [ 4382.579505] RDX: 00007ffe04bf64d0 RSI: 000000005000940e RDI: 0000000000000003
      [ 4382.579848] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000044
      [ 4382.580164] R10: 0000000000000541 R11: 0000000000000202 R12: 00005564136f3010
      [ 4382.580477] R13: 0000000000000003 R14: 00005564136f3035 R15: 00005564136f3050
      [ 4382.580792] irq event stamp: 0
      [ 4382.581106] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
      [ 4382.581441] hardirqs last disabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
      [ 4382.581772] softirqs last  enabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
      [ 4382.582095] softirqs last disabled at (0): [<0000000000000000>]           (null)
      [ 4382.582413] ---[ end trace d3c188e3e9367382 ]---
      [ 4382.623855] BTRFS: error (device sdc) in btrfs_run_delayed_refs:2981: errno=-5 IO failure
      [ 4382.624295] BTRFS info (device sdc): forced readonly
      
      Fix this by locking the source range before searching for the file extent
      items in the fs tree, since the relocation process will try to lock the
      range a file extent item represents before updating it with the new extent
      location.
      
      Fixes: 34a28e3d ("Btrfs: use generic_remap_file_range_prep() for cloning and deduplication")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d8b55242
    • F
      Btrfs: fix race between cloning range ending at eof and writeback · f7fa1107
      Filipe Manana 提交于
      The recent rework that makes btrfs' remap_file_range operation use the
      generic helper generic_remap_file_range_prep() introduced a race between
      writeback and cloning a range that covers the eof extent of the source
      file into a destination offset that is greater then the same file's size.
      
      This happens because we now wait for writeback to complete before doing
      the truncation of the eof block, while previously we did the truncation
      and then waited for writeback to complete. This leads to a race between
      writeback of the truncated block and cloning the file extents in the
      source range, because we copy each file extent item we find in the fs
      root into a buffer, then release the path and then increment the reference
      count for the extent referred in that file extent item we copied, which
      can no longer exist if writeback of the truncated eof block completes
      after we copied the file extent item into the buffer and before we
      incremented the reference count. This is illustrated by the following
      diagram:
      
              CPU 1                                       CPU 2
      
        btrfs_clone_files()
          btrfs_cont_expand()
            btrfs_truncate_block()
               --> zeroes part of the
                   page containg eof,
                   marking it for
                  delalloc
      
          btrfs_clone()
            --> finds extent item
                covering eof,
                points to extent
                at bytenr X
            --> copies it into a
                local buffer
            --> releases path
      
                                              writeback starts
      
                                              btrfs_finish_ordered_io()
                                                insert_reserved_file_extent()
                                                  __btrfs_drop_extents()
                                                    --> creates delayed
                                                        reference to drop
                                                        the extent at
                                                        bytenr X
      
            --> starts transaction
            --> creates delayed
                reference to
                increment extent
                at bytenr X
      
                          <delayed references are run, due to a transaction
                           commit for example, and the transaction is aborted
                           with -EIO because we attempt to increment reference
                           count for the extent at bytenr X after we freed it>
      
      When this race is hit the running transaction ends up getting aborted with
      an -EIO error and a trace like the following is produced:
      
      [ 4382.553858] WARNING: CPU: 2 PID: 3648 at fs/btrfs/extent-tree.c:1552 lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
      (...)
      [ 4382.556293] CPU: 2 PID: 3648 Comm: btrfs Tainted: G        W         4.20.0-rc6-btrfs-next-41 #1
      [ 4382.556294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
      [ 4382.556308] RIP: 0010:lookup_inline_extent_backref+0x4f4/0x650 [btrfs]
      (...)
      [ 4382.556310] RSP: 0018:ffffac784408f738 EFLAGS: 00010202
      [ 4382.556311] RAX: 0000000000000001 RBX: ffff8980673c3a48 RCX: 0000000000000001
      [ 4382.556312] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
      [ 4382.556312] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
      [ 4382.556313] R10: 0000000000000001 R11: ffff897f40000000 R12: 0000000000001000
      [ 4382.556313] R13: 00000000c224f000 R14: ffff89805de9bd40 R15: ffff8980453f4548
      [ 4382.556315] FS:  00007f5e759178c0(0000) GS:ffff89807b300000(0000) knlGS:0000000000000000
      [ 4382.563130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4382.563562] CR2: 00007f2e9789fcbc CR3: 0000000120512001 CR4: 00000000003606e0
      [ 4382.564005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 4382.564451] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 4382.564887] Call Trace:
      [ 4382.565343]  insert_inline_extent_backref+0x55/0xe0 [btrfs]
      [ 4382.565796]  __btrfs_inc_extent_ref.isra.60+0x88/0x260 [btrfs]
      [ 4382.566249]  ? __btrfs_run_delayed_refs+0x93/0x1650 [btrfs]
      [ 4382.566702]  __btrfs_run_delayed_refs+0xa22/0x1650 [btrfs]
      [ 4382.567162]  btrfs_run_delayed_refs+0x7e/0x1d0 [btrfs]
      [ 4382.567623]  btrfs_commit_transaction+0x50/0x9c0 [btrfs]
      [ 4382.568112]  ? _raw_spin_unlock+0x24/0x30
      [ 4382.568557]  ? block_rsv_release_bytes+0x14e/0x410 [btrfs]
      [ 4382.569006]  create_subvol+0x3c8/0x830 [btrfs]
      [ 4382.569461]  ? btrfs_mksubvol+0x317/0x600 [btrfs]
      [ 4382.569906]  btrfs_mksubvol+0x317/0x600 [btrfs]
      [ 4382.570383]  ? rcu_sync_lockdep_assert+0xe/0x60
      [ 4382.570822]  ? __sb_start_write+0xd4/0x1c0
      [ 4382.571262]  ? mnt_want_write_file+0x24/0x50
      [ 4382.571712]  btrfs_ioctl_snap_create_transid+0x117/0x1a0 [btrfs]
      [ 4382.572155]  ? _copy_from_user+0x66/0x90
      [ 4382.572602]  btrfs_ioctl_snap_create+0x66/0x80 [btrfs]
      [ 4382.573052]  btrfs_ioctl+0x7c1/0x30e0 [btrfs]
      [ 4382.573502]  ? mem_cgroup_commit_charge+0x8b/0x570
      [ 4382.573946]  ? do_raw_spin_unlock+0x49/0xc0
      [ 4382.574379]  ? _raw_spin_unlock+0x24/0x30
      [ 4382.574803]  ? __handle_mm_fault+0xf29/0x12d0
      [ 4382.575215]  ? do_vfs_ioctl+0xa2/0x6f0
      [ 4382.575622]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
      [ 4382.576020]  do_vfs_ioctl+0xa2/0x6f0
      [ 4382.576405]  ksys_ioctl+0x70/0x80
      [ 4382.576776]  __x64_sys_ioctl+0x16/0x20
      [ 4382.577137]  do_syscall_64+0x60/0x1b0
      [ 4382.577488]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      (...)
      [ 4382.578837] RSP: 002b:00007ffe04bf64c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [ 4382.579174] RAX: ffffffffffffffda RBX: 00005564136f3050 RCX: 00007f5e74724dd7
      [ 4382.579505] RDX: 00007ffe04bf64d0 RSI: 000000005000940e RDI: 0000000000000003
      [ 4382.579848] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000044
      [ 4382.580164] R10: 0000000000000541 R11: 0000000000000202 R12: 00005564136f3010
      [ 4382.580477] R13: 0000000000000003 R14: 00005564136f3035 R15: 00005564136f3050
      [ 4382.580792] irq event stamp: 0
      [ 4382.581106] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
      [ 4382.581441] hardirqs last disabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
      [ 4382.581772] softirqs last  enabled at (0): [<ffffffff8d085842>] copy_process.part.32+0x6e2/0x2320
      [ 4382.582095] softirqs last disabled at (0): [<0000000000000000>]           (null)
      [ 4382.582413] ---[ end trace d3c188e3e9367382 ]---
      [ 4382.623855] BTRFS: error (device sdc) in btrfs_run_delayed_refs:2981: errno=-5 IO failure
      [ 4382.624295] BTRFS info (device sdc): forced readonly
      
      Fix this by waiting for writeback to complete after truncating the eof
      block.
      
      Fixes: 34a28e3d ("Btrfs: use generic_remap_file_range_prep() for cloning and deduplication")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f7fa1107
    • M
      hugetlbfs: revert "Use i_mmap_rwsem to fix page fault/truncate race" · e7c58097
      Mike Kravetz 提交于
      This reverts c86aa7bb
      
      The reverted commit caused ABBA deadlocks when file migration raced with
      file eviction for specific hugetlbfs files.  This was discovered with a
      modified version of the LTP move_pages12 test.
      
      The purpose of the reverted patch was to close a long existing race
      between hugetlbfs file truncation and page faults.  After more analysis
      of the patch and impacted code, it was determined that i_mmap_rwsem can
      not be used for all required synchronization.  Therefore, revert this
      patch while working an another approach to the underlying issue.
      
      Link: http://lkml.kernel.org/r/20190103235452.29335-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7c58097
  5. 08 1月, 2019 2 次提交
  6. 07 1月, 2019 1 次提交
  7. 06 1月, 2019 1 次提交
    • E
      fscrypt: add Adiantum support · 8094c3ce
      Eric Biggers 提交于
      Add support for the Adiantum encryption mode to fscrypt.  Adiantum is a
      tweakable, length-preserving encryption mode with security provably
      reducible to that of XChaCha12 and AES-256, subject to a security bound.
      It's also a true wide-block mode, unlike XTS.  See the paper
      "Adiantum: length-preserving encryption for entry-level processors"
      (https://eprint.iacr.org/2018/720.pdf) for more details.  Also see
      commit 059c2a4d ("crypto: adiantum - add Adiantum support").
      
      On sufficiently long messages, Adiantum's bottlenecks are XChaCha12 and
      the NH hash function.  These algorithms are fast even on processors
      without dedicated crypto instructions.  Adiantum makes it feasible to
      enable storage encryption on low-end mobile devices that lack AES
      instructions; currently such devices are unencrypted.  On ARM Cortex-A7,
      on 4096-byte messages Adiantum encryption is about 4 times faster than
      AES-256-XTS encryption; decryption is about 5 times faster.
      
      In fscrypt, Adiantum is suitable for encrypting both file contents and
      names.  With filenames, it fixes a known weakness: when two filenames in
      a directory share a common prefix of >= 16 bytes, with CTS-CBC their
      encrypted filenames share a common prefix too, leaking information.
      Adiantum does not have this problem.
      
      Since Adiantum also accepts long tweaks (IVs), it's also safe to use the
      master key directly for Adiantum encryption rather than deriving
      per-file keys, provided that the per-file nonce is included in the IVs
      and the master key isn't used for any other encryption mode.  This
      configuration saves memory and improves performance.  A new fscrypt
      policy flag is added to allow users to opt-in to this configuration.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      8094c3ce
  8. 05 1月, 2019 8 次提交