1. 12 10月, 2022 1 次提交
  2. 22 9月, 2022 1 次提交
  3. 31 8月, 2021 1 次提交
    • J
      ext4: Support for checksumming from journal triggers · 188c299e
      Jan Kara 提交于
      JBD2 layer support triggers which are called when journaling layer moves
      buffer to a certain state. We can use the frozen trigger, which gets
      called when buffer data is frozen and about to be written out to the
      journal, to compute block checksums for some buffer types (similarly as
      does ocfs2). This avoids unnecessary repeated recomputation of the
      checksum (at the cost of larger window where memory corruption won't be
      caught by checksumming) and is even necessary when there are
      unsynchronized updaters of the checksummed data.
      
      So add superblock and journal trigger type arguments to
      ext4_journal_get_write_access() and ext4_journal_get_create_access() so
      that frozen triggers can be set accordingly. Also add inode argument to
      ext4_walk_page_buffers() and all the callbacks used with that function
      for the same purpose. This patch is mostly only a change of prototype of
      the above mentioned functions and a few small helpers. Real checksumming
      will come later.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      188c299e
  4. 23 6月, 2021 1 次提交
  5. 03 6月, 2021 1 次提交
    • R
      ext4: fix accessing uninit percpu counter variable with fast_commit · b45f189a
      Ritesh Harjani 提交于
      When running generic/527 with fast_commit configuration, the following
      issue is seen on Power.  With fast_commit, during ext4_fc_replay()
      (which can be called from ext4_fill_super()), if inode eviction
      happens then it can access an uninitialized percpu counter variable.
      
      This patch adds the check before accessing the counters in
      ext4_free_inode() path.
      
      [  321.165371] run fstests generic/527 at 2021-04-29 08:38:43
      [  323.027786] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: block_validity. Quota mode: none.
      [  323.618772] BUG: Unable to handle kernel data access on read at 0x1fbd80000
      [  323.619767] Faulting instruction address: 0xc000000000bae78c
      cpu 0x1: Vector: 300 (Data Access) at [c000000010706ef0]
          pc: c000000000bae78c: percpu_counter_add_batch+0x3c/0x100
          lr: c0000000006d0bb0: ext4_free_inode+0x780/0xb90
          pid   = 5593, comm = mount
      	ext4_free_inode+0x780/0xb90
      	ext4_evict_inode+0xa8c/0xc60
      	evict+0xfc/0x1e0
      	ext4_fc_replay+0xc50/0x20f0
      	do_one_pass+0xfe0/0x1350
      	jbd2_journal_recover+0x184/0x2e0
      	jbd2_journal_load+0x1c0/0x4a0
      	ext4_fill_super+0x2458/0x4200
      	mount_bdev+0x1dc/0x290
      	ext4_mount+0x28/0x40
      	legacy_get_tree+0x4c/0xa0
      	vfs_get_tree+0x4c/0x120
      	path_mount+0xcf8/0xd70
      	do_mount+0x80/0xd0
      	sys_mount+0x3fc/0x490
      	system_call_exception+0x384/0x3d0
      	system_call_common+0xec/0x278
      
      Cc: stable@kernel.org
      Fixes: 8016e29f ("ext4: fast commit recovery path")
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/6cceb9a75c54bef8fa9696c1b08c8df5ff6169e2.1619692410.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      b45f189a
  6. 13 4月, 2021 1 次提交
  7. 09 4月, 2021 1 次提交
  8. 23 3月, 2021 2 次提交
  9. 28 1月, 2021 1 次提交
  10. 24 1月, 2021 2 次提交
    • C
      ext4: support idmapped mounts · 14f3db55
      Christian Brauner 提交于
      Enable idmapped mounts for ext4. All dedicated helpers we need for this
      exist. So this basically just means we're passing down the
      user_namespace argument from the VFS methods to the relevant helpers.
      
      Let's create simple example where we idmap an ext4 filesystem:
      
       root@f2-vm:~# truncate -s 5G ext4.img
      
       root@f2-vm:~# mkfs.ext4 ./ext4.img
       mke2fs 1.45.5 (07-Jan-2020)
       Discarding device blocks: done
       Creating filesystem with 1310720 4k blocks and 327680 inodes
       Filesystem UUID: 3fd91794-c6ca-4b0f-9964-289a000919cf
       Superblock backups stored on blocks:
               32768, 98304, 163840, 229376, 294912, 819200, 884736
      
       Allocating group tables: done
       Writing inode tables: done
       Creating journal (16384 blocks): done
       Writing superblocks and filesystem accounting information: done
      
       root@f2-vm:~# losetup -f --show ./ext4.img
       /dev/loop0
      
       root@f2-vm:~# mount /dev/loop0 /mnt
      
       root@f2-vm:~# ls -al /mnt/
       total 24
       drwxr-xr-x  3 root root  4096 Oct 28 13:34 .
       drwxr-xr-x 30 root root  4096 Oct 28 13:22 ..
       drwx------  2 root root 16384 Oct 28 13:34 lost+found
      
       # Let's create an idmapped mount at /idmapped1 where we map uid and gid
       # 0 to uid and gid 1000
       root@f2-vm:/# ./mount-idmapped --map-mount b:0:1000:1 /mnt/ /idmapped1/
      
       root@f2-vm:/# ls -al /idmapped1/
       total 24
       drwxr-xr-x  3 ubuntu ubuntu  4096 Oct 28 13:34 .
       drwxr-xr-x 30 root   root    4096 Oct 28 13:22 ..
       drwx------  2 ubuntu ubuntu 16384 Oct 28 13:34 lost+found
      
       # Let's create an idmapped mount at /idmapped2 where we map uid and gid
       # 0 to uid and gid 2000
       root@f2-vm:/# ./mount-idmapped --map-mount b:0:2000:1 /mnt/ /idmapped2/
      
       root@f2-vm:/# ls -al /idmapped2/
       total 24
       drwxr-xr-x  3 2000 2000  4096 Oct 28 13:34 .
       drwxr-xr-x 31 root root  4096 Oct 28 13:39 ..
       drwx------  2 2000 2000 16384 Oct 28 13:34 lost+found
      
      Let's create another example where we idmap the rootfs filesystem
      without a mapping for uid 0 and gid 0:
      
       # Create an idmapped mount of for a full POSIX range of rootfs under
       # /mnt but without a mapping for uid 0 to reduce attack surface
      
       root@f2-vm:/# ./mount-idmapped --map-mount b:1:1:65536 / /mnt/
      
       # Since we don't have a mapping for uid and gid 0 all files owned by
       # uid and gid 0 should show up as uid and gid 65534:
       root@f2-vm:/# ls -al /mnt/
       total 664
       drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 .
       drwxr-xr-x 31 root   root      4096 Oct 28 13:39 ..
       lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 bin -> usr/bin
       drwxr-xr-x  4 nobody nogroup   4096 Oct 28 13:17 boot
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:48 dev
       drwxr-xr-x 81 nobody nogroup   4096 Oct 28 04:00 etc
       drwxr-xr-x  4 nobody nogroup   4096 Oct 28 04:00 home
       lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 lib -> usr/lib
       lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib32 -> usr/lib32
       lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib64 -> usr/lib64
       lrwxrwxrwx  1 nobody nogroup     10 Aug 25 07:44 libx32 -> usr/libx32
       drwx------  2 nobody nogroup  16384 Aug 25 07:47 lost+found
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 media
       drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 mnt
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 opt
       drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 proc
       drwx--x--x  6 nobody nogroup   4096 Oct 28 13:34 root
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:46 run
       lrwxrwxrwx  1 nobody nogroup      8 Aug 25 07:44 sbin -> usr/sbin
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 srv
       drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 sys
       drwxrwxrwt 10 nobody nogroup   4096 Oct 28 13:19 tmp
       drwxr-xr-x 14 nobody nogroup   4096 Oct 20 13:00 usr
       drwxr-xr-x 12 nobody nogroup   4096 Aug 25 07:45 var
      
       # Since we do have a mapping for uid and gid 1000 all files owned by
       # uid and gid 1000 should simply show up as uid and gid 1000:
       root@f2-vm:/# ls -al /mnt/home/ubuntu/
       total 40
       drwxr-xr-x 3 ubuntu ubuntu  4096 Oct 28 00:43 .
       drwxr-xr-x 4 nobody nogroup 4096 Oct 28 04:00 ..
       -rw------- 1 ubuntu ubuntu  2936 Oct 28 12:26 .bash_history
       -rw-r--r-- 1 ubuntu ubuntu   220 Feb 25  2020 .bash_logout
       -rw-r--r-- 1 ubuntu ubuntu  3771 Feb 25  2020 .bashrc
       -rw-r--r-- 1 ubuntu ubuntu   807 Feb 25  2020 .profile
       -rw-r--r-- 1 ubuntu ubuntu     0 Oct 16 16:11 .sudo_as_admin_successful
       -rw------- 1 ubuntu ubuntu  1144 Oct 28 00:43 .viminfo
      
      Link: https://lore.kernel.org/r/20210121131959.646623-39-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      14f3db55
    • C
      inode: make init and permission helpers idmapped mount aware · 21cb47be
      Christian Brauner 提交于
      The inode_owner_or_capable() helper determines whether the caller is the
      owner of the inode or is capable with respect to that inode. Allow it to
      handle idmapped mounts. If the inode is accessed through an idmapped
      mount it according to the mount's user namespace. Afterwards the checks
      are identical to non-idmapped mounts. If the initial user namespace is
      passed nothing changes so non-idmapped mounts will see identical
      behavior as before.
      
      Similarly, allow the inode_init_owner() helper to handle idmapped
      mounts. It initializes a new inode on idmapped mounts by mapping the
      fsuid and fsgid of the caller from the mount's user namespace. If the
      initial user namespace is passed nothing changes so non-idmapped mounts
      will see identical behavior as before.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      21cb47be
  11. 22 10月, 2020 1 次提交
  12. 18 10月, 2020 2 次提交
    • Z
      ext4: use common helpers in all places reading metadata buffers · 2d069c08
      zhangyi (F) 提交于
      Revome all open codes that read metadata buffers, switch to use
      ext4_read_bh_*() common helpers.
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200924073337.861472-4-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      2d069c08
    • Z
      ext4: clear buffer verified flag if read meta block from disk · d9befeda
      zhangyi (F) 提交于
      The metadata buffer is no longer trusted after we read it from disk
      again because it is not uptodate for some reasons (e.g. failed to write
      back). Otherwise we may get below memory corruption problem in
      ext4_ext_split()->memset() if we read stale data from the newly
      allocated extent block on disk which has been failed to async write
      out but miss verify again since the verified bit has already been set
      on the buffer.
      
      [   29.774674] BUG: unable to handle kernel paging request at ffff88841949d000
      ...
      [   29.783317] Oops: 0002 [#2] SMP
      [   29.784219] R10: 00000000000f4240 R11: 0000000000002e28 R12: ffff88842fa1c800
      [   29.784627] CPU: 1 PID: 126 Comm: kworker/u4:3 Tainted: G      D W
      [   29.785546] R13: ffffffff9cddcc20 R14: ffffffff9cddd420 R15: ffff88842fa1c2f8
      [   29.786679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS ?-20190727_0738364
      [   29.787588] FS:  0000000000000000(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
      [   29.789288] Workqueue: writeback wb_workfn
      [   29.790319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   29.790321]  (flush-8:0)
      [   29.790844] CR2: 0000000000000008 CR3: 00000004234f2000 CR4: 00000000000006f0
      [   29.791924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   29.792839] RIP: 0010:__memset+0x24/0x30
      [   29.793739] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   29.794256] Code: 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 033
      [   29.795161] Kernel panic - not syncing: Fatal exception in interrupt
      ...
      [   29.808149] Call Trace:
      [   29.808475]  ext4_ext_insert_extent+0x102e/0x1be0
      [   29.809085]  ext4_ext_map_blocks+0xa89/0x1bb0
      [   29.809652]  ext4_map_blocks+0x290/0x8a0
      [   29.809085]  ext4_ext_map_blocks+0xa89/0x1bb0
      [   29.809652]  ext4_map_blocks+0x290/0x8a0
      [   29.810161]  ext4_writepages+0xc85/0x17c0
      ...
      
      Fix this by clearing buffer's verified bit if we read meta block from
      disk again.
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20200924073337.861472-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      d9befeda
  13. 22 9月, 2020 2 次提交
  14. 04 6月, 2020 1 次提交
  15. 29 5月, 2020 1 次提交
  16. 22 5月, 2020 1 次提交
  17. 16 4月, 2020 2 次提交
  18. 02 4月, 2020 1 次提交
  19. 26 3月, 2020 1 次提交
  20. 22 2月, 2020 1 次提交
  21. 27 12月, 2019 2 次提交
  22. 15 12月, 2019 1 次提交
    • Y
      ext4: reserve revoke credits in __ext4_new_inode · a70fd5ac
      yangerkun 提交于
      It's possible that __ext4_new_inode will release the xattr block, so
      it will trigger a warning since there is revoke credits will be 0 if
      the handle == NULL. The below scripts can reproduce it easily.
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 3861 at fs/jbd2/revoke.c:374 jbd2_journal_revoke+0x30e/0x540 fs/jbd2/revoke.c:374
      ...
      __ext4_forget+0x1d7/0x800 fs/ext4/ext4_jbd2.c:248
      ext4_free_blocks+0x213/0x1d60 fs/ext4/mballoc.c:4743
      ext4_xattr_release_block+0x55b/0x780 fs/ext4/xattr.c:1254
      ext4_xattr_block_set+0x1c2c/0x2c40 fs/ext4/xattr.c:2112
      ext4_xattr_set_handle+0xa7e/0x1090 fs/ext4/xattr.c:2384
      __ext4_set_acl+0x54d/0x6c0 fs/ext4/acl.c:214
      ext4_init_acl+0x218/0x2e0 fs/ext4/acl.c:293
      __ext4_new_inode+0x352a/0x42b0 fs/ext4/ialloc.c:1151
      ext4_mkdir+0x2e9/0xbd0 fs/ext4/namei.c:2774
      vfs_mkdir+0x386/0x5f0 fs/namei.c:3811
      do_mkdirat+0x11c/0x210 fs/namei.c:3834
      do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:294
      ...
      -------------------------------------
      
      scripts:
      mkfs.ext4 /dev/vdb
      mount /dev/vdb /mnt
      cd /mnt && mkdir dir && for i in {1..8}; do setfacl -dm "u:user_"$i":rx" dir; done
      mkdir dir/dir1 && mv dir/dir1 ./
      sh repro.sh && add some user
      
      [root@localhost ~]# cat repro.sh
      while [ 1 -eq 1 ]; do
          rm -rf dir
          rm -rf dir1/dir1
          mkdir dir
          for i in {1..8}; do  setfacl -dm "u:test"$i":rx" dir; done
          setfacl -m "u:user_9:rx" dir &
          mkdir dir1/dir1 &
      done
      
      Before exec repro.sh, dir1 has inherit the default acl from dir, and
      xattr block of dir1 dir is not the same, so the h_refcount of these
      two dir's xattr block will be 1. Then repro.sh can trigger the warning
      with the situation show as below. The last h_refcount can be clear
      with mkdir, and __ext4_new_inode has not reserved revoke credits, so
      the warning will happened, fix it by reserve revoke credits in
      __ext4_new_inode.
      
      Thread 1                        Thread 2
      mkdir dir
      set default acl(will create
      a xattr block blk1 and the
      refcount of ext4_xattr_header
      will be 1)
      				...
                                      mkdir dir1/dir1
      				->....->ext4_init_acl
      				->__ext4_set_acl(set default acl,
      			          will reuse blk1, and h_refcount
      				  will be 2)
      
      setfacl->ext4_set_acl->...
      ->ext4_xattr_block_set(will create
      new block blk2 to store xattr)
      
      				->__ext4_set_acl(set access acl, since
      				  h_refcount of blk1 is 2, will create
      				  blk3 to store xattr)
      
        ->ext4_xattr_release_block(dec
        h_refcount of blk1 to 1)
      				  ->ext4_xattr_release_block(dec
      				    h_refcount and since it is 0,
      				    will release the block and trigger
      				    the warning)
      
      Link: https://lore.kernel.org/r/20191213014900.47228-1-yangerkun@huawei.comReported-by: NHulk Robot <hulkci@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      a70fd5ac
  23. 15 11月, 2019 1 次提交
  24. 06 11月, 2019 1 次提交
    • J
      ext4: Reserve revoke credits for freed blocks · 83448bdf
      Jan Kara 提交于
      So far we have reserved only relatively high fixed amount of revoke
      credits for each transaction. We over-reserved by large amount for most
      cases but when freeing large directories or files with data journalling,
      the fixed amount is not enough. In fact the worst case estimate is
      inconveniently large (maximum extent size) for freeing of one extent.
      
      We fix this by doing proper estimate of the amount of blocks that need
      to be revoked when removing blocks from the inode due to truncate or
      hole punching and otherwise reserve just a small amount of revoke
      credits for each transaction to accommodate freeing of xattrs block or
      so.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      83448bdf
  25. 26 4月, 2019 1 次提交
    • G
      ext4: Support case-insensitive file name lookups · b886ee3e
      Gabriel Krisman Bertazi 提交于
      This patch implements the actual support for case-insensitive file name
      lookups in ext4, based on the feature bit and the encoding stored in the
      superblock.
      
      A filesystem that has the casefold feature set is able to configure
      directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
      to succeed in that directory in a case-insensitive fashion, i.e: match
      a directory entry even if the name used by userspace is not a byte per
      byte match with the disk name, but is an equivalent case-insensitive
      version of the Unicode string.  This operation is called a
      case-insensitive file name lookup.
      
      The feature is configured as an inode attribute applied to directories
      and inherited by its children.  This attribute can only be enabled on
      empty directories for filesystems that support the encoding feature,
      thus preventing collision of file names that only differ by case.
      
      * dcache handling:
      
      For a +F directory, Ext4 only stores the first equivalent name dentry
      used in the dcache. This is done to prevent unintentional duplication of
      dentries in the dcache, while also allowing the VFS code to quickly find
      the right entry in the cache despite which equivalent string was used in
      a previous lookup, without having to resort to ->lookup().
      
      d_hash() of casefolded directories is implemented as the hash of the
      casefolded string, such that we always have a well-known bucket for all
      the equivalencies of the same string. d_compare() uses the
      utf8_strncasecmp() infrastructure, which handles the comparison of
      equivalent, same case, names as well.
      
      For now, negative lookups are not inserted in the dcache, since they
      would need to be invalidated anyway, because we can't trust missing file
      dentries.  This is bad for performance but requires some leveraging of
      the vfs layer to fix.  We can live without that for now, and so does
      everyone else.
      
      * on-disk data:
      
      Despite using a specific version of the name as the internal
      representation within the dcache, the name stored and fetched from the
      disk is a byte-per-byte match with what the user requested, making this
      implementation 'name-preserving'. i.e. no actual information is lost
      when writing to storage.
      
      DX is supported by modifying the hashes used in +F directories to make
      them case/encoding-aware.  The new disk hashes are calculated as the
      hash of the full casefolded string, instead of the string directly.
      This allows us to efficiently search for file names in the htree without
      requiring the user to provide an exact name.
      
      * Dealing with invalid sequences:
      
      By default, when a invalid UTF-8 sequence is identified, ext4 will treat
      it as an opaque byte sequence, ignoring the encoding and reverting to
      the old behavior for that unique file.  This means that case-insensitive
      file name lookup will not work only for that file.  An optional bit can
      be set in the superblock telling the filesystem code and userspace tools
      to enforce the encoding.  When that optional bit is set, any attempt to
      create a file name using an invalid UTF-8 sequence will fail and return
      an error to userspace.
      
      * Normalization algorithm:
      
      The UTF-8 algorithms used to compare strings in ext4 is implemented
      lives in fs/unicode, and is based on a previous version developed by
      SGI.  It implements the Canonical decomposition (NFD) algorithm
      described by the Unicode specification 12.1, or higher, combined with
      the elimination of ignorable code points (NFDi) and full
      case-folding (CF) as documented in fs/unicode/utf8_norm.c.
      
      NFD seems to be the best normalization method for EXT4 because:
      
        - It has a lower cost than NFC/NFKC (which requires
          decomposing to NFD as an intermediary step)
        - It doesn't eliminate important semantic meaning like
          compatibility decompositions.
      
      Although:
      
        - This implementation is not completely linguistic accurate, because
        different languages have conflicting rules, which would require the
        specialization of the filesystem to a given locale, which brings all
        sorts of problems for removable media and for users who use more than
        one language.
      Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b886ee3e
  26. 24 1月, 2019 1 次提交
  27. 20 12月, 2018 1 次提交
    • T
      ext4: avoid declaring fs inconsistent due to invalid file handles · 8a363970
      Theodore Ts'o 提交于
      If we receive a file handle, either from NFS or open_by_handle_at(2),
      and it points at an inode which has not been initialized, and the file
      system has metadata checksums enabled, we shouldn't try to get the
      inode, discover the checksum is invalid, and then declare the file
      system as being inconsistent.
      
      This can be reproduced by creating a test file system via "mke2fs -t
      ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
      directory, and then running the following program.
      
      #define _GNU_SOURCE
      #include <fcntl.h>
      
      struct handle {
      	struct file_handle fh;
      	unsigned char fid[MAX_HANDLE_SZ];
      };
      
      int main(int argc, char **argv)
      {
      	struct handle h = {{8, 1 }, { 12, }};
      
      	open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
      	return 0;
      }
      
      Google-Bug-Id: 120690101
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      8a363970
  28. 11 10月, 2018 1 次提交
  29. 02 8月, 2018 1 次提交
  30. 30 7月, 2018 2 次提交
    • A
      ext4: use timespec64 for all inode times · 7b62b293
      Arnd Bergmann 提交于
      This is the last missing piece for the inode times on 32-bit systems:
      now that VFS interfaces use timespec64, we just need to stop truncating
      the tv_sec values for y2038 compatibililty.
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7b62b293
    • T
      ext4: fix check to prevent initializing reserved inodes · 50122847
      Theodore Ts'o 提交于
      Commit 8844618d: "ext4: only look at the bg_flags field if it is
      valid" will complain if block group zero does not have the
      EXT4_BG_INODE_ZEROED flag set.  Unfortunately, this is not correct,
      since a freshly created file system has this flag cleared.  It gets
      almost immediately after the file system is mounted read-write --- but
      the following somewhat unlikely sequence will end up triggering a
      false positive report of a corrupted file system:
      
         mkfs.ext4 /dev/vdc
         mount -o ro /dev/vdc /vdc
         mount -o remount,rw /dev/vdc
      
      Instead, when initializing the inode table for block group zero, test
      to make sure that itable_unused count is not too large, since that is
      the case that will result in some or all of the reserved inodes
      getting cleared.
      
      This fixes the failures reported by Eric Whiteney when running
      generic/230 and generic/231 in the the nojournal test case.
      
      Fixes: 8844618d ("ext4: only look at the bg_flags field if it is valid")
      Reported-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      50122847
  31. 13 7月, 2018 1 次提交
  32. 14 6月, 2018 1 次提交
  33. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00