1. 04 11月, 2021 1 次提交
  2. 01 10月, 2021 3 次提交
    • Y
      ext4: flush s_error_work before journal destroy in ext4_fill_super · bb9464e0
      yangerkun 提交于
      The error path in ext4_fill_super forget to flush s_error_work before
      journal destroy, and it may trigger the follow bug since
      flush_stashed_error_work can run concurrently with journal destroy
      without any protection for sbi->s_journal.
      
      [32031.740193] EXT4-fs (loop66): get root inode failed
      [32031.740484] EXT4-fs (loop66): mount failed
      [32031.759805] ------------[ cut here ]------------
      [32031.759807] kernel BUG at fs/jbd2/transaction.c:373!
      [32031.760075] invalid opcode: 0000 [#1] SMP PTI
      [32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded
      4.18.0
      [32031.765112] Call Trace:
      [32031.765375]  ? __switch_to_asm+0x35/0x70
      [32031.765635]  ? __switch_to_asm+0x41/0x70
      [32031.765893]  ? __switch_to_asm+0x35/0x70
      [32031.766148]  ? __switch_to_asm+0x41/0x70
      [32031.766405]  ? _cond_resched+0x15/0x40
      [32031.766665]  jbd2__journal_start+0xf1/0x1f0 [jbd2]
      [32031.766934]  jbd2_journal_start+0x19/0x20 [jbd2]
      [32031.767218]  flush_stashed_error_work+0x30/0x90 [ext4]
      [32031.767487]  process_one_work+0x195/0x390
      [32031.767747]  worker_thread+0x30/0x390
      [32031.768007]  ? process_one_work+0x390/0x390
      [32031.768265]  kthread+0x10d/0x130
      [32031.768521]  ? kthread_flush_work_fn+0x10/0x10
      [32031.768778]  ret_from_fork+0x35/0x40
      
      static int start_this_handle(...)
          BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this
      
      Besides, after we enable fast commit, ext4_fc_replay can add work to
      s_error_work but return success, so the latter journal destroy in
      ext4_load_journal can trigger this problem too.
      
      Fix this problem with two steps:
      1. Call ext4_commit_super directly in ext4_handle_error for the case
         that called from ext4_fc_replay
      2. Since it's hard to pair the init and flush for s_error_work, we'd
         better add a extras flush_work before journal destroy in
         ext4_fill_super
      
      Besides, this patch will call ext4_commit_super in ext4_handle_error for
      any nojournal case too. But it seems safe since the reason we call
      schedule_work was that we should save error info to sb through journal
      if available. Conversely, for the nojournal case, it seems useless delay
      commit superblock to s_error_work.
      
      Fixes: c92dc856 ("ext4: defer saving error info from atomic context")
      Fixes: 2d01ddc8 ("ext4: save error info to sb through journal if available")
      Cc: stable@kernel.org
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Link: https://lore.kernel.org/r/20210924093917.1953239-1-yangerkun@huawei.com
      bb9464e0
    • R
      ext4: fix loff_t overflow in ext4_max_bitmap_size() · 75ca6ad4
      Ritesh Harjani 提交于
      We should use unsigned long long rather than loff_t to avoid
      overflow in ext4_max_bitmap_size() for comparison before returning.
      w/o this patch sbi->s_bitmap_maxbytes was becoming a negative
      value due to overflow of upper_limit (with has_huge_files as true)
      
      Below is a quick test to trigger it on a 64KB pagesize system.
      
      sudo mkfs.ext4 -b 65536 -O ^has_extents,^64bit /dev/loop2
      sudo mount /dev/loop2 /mnt
      sudo echo "hello" > /mnt/hello 	-> This will error out with
      				"echo: write error: File too large"
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/594f409e2c543e90fd836b78188dfa5c575065ba.1622867594.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      75ca6ad4
    • J
      ext4: fix reserved space counter leakage · 6fed8395
      Jeffle Xu 提交于
      When ext4_insert_delayed block receives and recovers from an error from
      ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the
      space it has reserved for that block insertion as it should. One effect
      of this bug is that s_dirtyclusters_counter is not decremented and
      remains incorrectly elevated until the file system has been unmounted.
      This can result in premature ENOSPC returns and apparent loss of free
      space.
      
      Another effect of this bug is that
      /sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even
      after syncfs has been executed on the filesystem.
      
      Besides, add check for s_dirtyclusters_counter when inode is going to be
      evicted and freed. s_dirtyclusters_counter can still keep non-zero until
      inode is written back in .evict_inode(), and thus the check is delayed
      to .destroy_inode().
      
      Fixes: 51865fda ("ext4: let ext4 maintain extent status tree")
      Cc: stable@kernel.org
      Suggested-by: NGao Xiang <hsiangkao@linux.alibaba.com>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Link: https://lore.kernel.org/r/20210823061358.84473-1-jefflexu@linux.alibaba.com
      6fed8395
  3. 31 8月, 2021 4 次提交
    • J
      ext4: Speedup ext4 orphan inode handling · 02f310fc
      Jan Kara 提交于
      Ext4 orphan inode handling is a bottleneck for workloads which heavily
      truncate / unlink small files since it contends on the global
      s_orphan_mutex lock (and generally it's difficult to improve scalability
      of the ondisk linked list of orphaned inodes).
      
      This patch implements new way of handling orphan inodes. Instead of
      linking orphaned inode into a linked list, we store it's inode number in
      a new special file which we call "orphan file". Only if there's no more
      space in the orphan file (too many inodes are currently orphaned) we
      fall back to using old style linked list. Currently we protect
      operations in the orphan file with a spinlock for simplicity but even in
      this setting we can substantially reduce the length of the critical
      section and thus speedup some workloads. In the next patch we improve
      this by making orphan handling lockless.
      
      Note that the change is backwards compatible when the filesystem is
      clean - the existence of the orphan file is a compat feature, we set
      another ro-compat feature indicating orphan file needs scanning for
      orphaned inodes when mounting filesystem read-write. This ro-compat
      feature gets cleared on unmount / remount read-only.
      
      Some performance data from 80 CPU Xeon Server with 512 GB of RAM,
      filesystem located on SSD, average of 5 runs:
      
      stress-orphan (microbenchmark truncating files byte-by-byte from N
      processes in parallel)
      
      Threads Time            Time
              Vanilla         Patched
        1       1.057200        0.945600
        2       1.680400        1.331800
        4       2.547000        1.995000
        8       7.049400        6.424200
       16      14.827800       14.937600
       32      40.948200       33.038200
       64      87.787400       60.823600
      128     206.504000      122.941400
      
      So we can see significant wins all over the board.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210816095713.16537-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      02f310fc
    • J
      ext4: Move orphan inode handling into a separate file · 25c6d98f
      Jan Kara 提交于
      Move functions for handling orphan inodes into a new file
      fs/ext4/orphan.c to have them in one place and somewhat reduce size of
      other files. No code changes.
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210816095713.16537-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      25c6d98f
    • J
      ext4: Support for checksumming from journal triggers · 188c299e
      Jan Kara 提交于
      JBD2 layer support triggers which are called when journaling layer moves
      buffer to a certain state. We can use the frozen trigger, which gets
      called when buffer data is frozen and about to be written out to the
      journal, to compute block checksums for some buffer types (similarly as
      does ocfs2). This avoids unnecessary repeated recomputation of the
      checksum (at the cost of larger window where memory corruption won't be
      caught by checksumming) and is even necessary when there are
      unsynchronized updaters of the checksummed data.
      
      So add superblock and journal trigger type arguments to
      ext4_journal_get_write_access() and ext4_journal_get_create_access() so
      that frozen triggers can be set accordingly. Also add inode argument to
      ext4_walk_page_buffers() and all the callbacks used with that function
      for the same purpose. This patch is mostly only a change of prototype of
      the above mentioned functions and a few small helpers. Real checksumming
      will come later.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      188c299e
    • J
      ext4: fix e2fsprogs checksum failure for mounted filesystem · b2bbb92f
      Jan Kara 提交于
      Commit 81414b4d ("ext4: remove redundant sb checksum
      recomputation") removed checksum recalculation after updating
      superblock free space / inode counters in ext4_fill_super() based on
      the fact that we will recalculate the checksum on superblock
      writeout.
      
      That is correct assumption but until the writeout happens (which can
      take a long time) the checksum is incorrect in the buffer cache and if
      programs such as tune2fs or resize2fs is called shortly after a file
      system is mounted can fail.  So return back the checksum recalculation
      and add a comment explaining why.
      
      Fixes: 81414b4d ("ext4: remove redundant sb checksum recomputation")
      Cc: stable@kernel.org
      Reported-by: NBoyang Xue <bxue@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Link: https://lore.kernel.org/r/20210812124737.21981-1-jack@suse.cz
      b2bbb92f
  4. 27 8月, 2021 1 次提交
  5. 13 7月, 2021 1 次提交
  6. 08 7月, 2021 2 次提交
  7. 01 7月, 2021 1 次提交
    • Y
      ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock · 558d6450
      Ye Bin 提交于
      If a writeback of the superblock fails with an I/O error, the buffer
      is marked not uptodate.  However, this can cause a WARN_ON to trigger
      when we attempt to write superblock a second time.  (Which might
      succeed this time, for cerrtain types of block devices such as iSCSI
      devices over a flaky network.)
      
      Try to detect this case in flush_stashed_error_work(), and also change
      __ext4_handle_dirty_metadata() so we always set the uptodate flag, not
      just in the nojournal case.
      
      Before this commit, this problem can be repliciated via:
      
      1. dmsetup  create dust1 --table  '0 2097152 dust /dev/sdc 0 4096'
      2. mount  /dev/mapper/dust1  /home/test
      3. dmsetup message dust1 0 addbadblock 0 10
      4. cd /home/test
      5. echo "XXXXXXX" > t
      
      After a few seconds, we got following warning:
      
      [   80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0
      [   80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write
      [   85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata
      [   91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0
      [   91.417038] ------------[ cut here ]------------
      [   91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e
      [   91.440322] Call Trace:
      [   91.440652]  __jbd2_journal_temp_unlink_buffer+0x135/0x220
      [   91.441354]  __jbd2_journal_unfile_buffer+0x24/0x90
      [   91.441981]  __jbd2_journal_refile_buffer+0x134/0x1d0
      [   91.442628]  jbd2_journal_commit_transaction+0x249a/0x3240
      [   91.443336]  ? put_prev_entity+0x2a/0x200
      [   91.443856]  ? kjournald2+0x12e/0x510
      [   91.444324]  kjournald2+0x12e/0x510
      [   91.444773]  ? woken_wake_function+0x30/0x30
      [   91.445326]  kthread+0x150/0x1b0
      [   91.445739]  ? commit_timeout+0x20/0x20
      [   91.446258]  ? kthread_flush_worker+0xb0/0xb0
      [   91.446818]  ret_from_fork+0x1f/0x30
      [   91.447293] ---[ end trace 66f0b6bf3d1abade ]---
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      558d6450
  8. 30 6月, 2021 1 次提交
  9. 24 6月, 2021 2 次提交
  10. 23 6月, 2021 1 次提交
  11. 17 6月, 2021 3 次提交
  12. 06 6月, 2021 1 次提交
    • A
      ext4: fix memory leak in ext4_fill_super · afd09b61
      Alexey Makhalov 提交于
      Buffer head references must be released before calling kill_bdev();
      otherwise the buffer head (and its page referenced by b_data) will not
      be freed by kill_bdev, and subsequently that bh will be leaked.
      
      If blocksizes differ, sb_set_blocksize() will kill current buffers and
      page cache by using kill_bdev(). And then super block will be reread
      again but using correct blocksize this time. sb_set_blocksize() didn't
      fully free superblock page and buffer head, and being busy, they were
      not freed and instead leaked.
      
      This can easily be reproduced by calling an infinite loop of:
      
        systemctl start <ext4_on_lvm>.mount, and
        systemctl stop <ext4_on_lvm>.mount
      
      ... since systemd creates a cgroup for each slice which it mounts, and
      the bh leak get amplified by a dying memory cgroup that also never
      gets freed, and memory consumption is much more easily noticed.
      
      Fixes: ce40733c ("ext4: Check for return value from sb_set_blocksize")
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.comSigned-off-by: NAlexey Makhalov <amakhalov@vmware.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      afd09b61
  13. 19 4月, 2021 1 次提交
  14. 10 4月, 2021 4 次提交
  15. 09 4月, 2021 3 次提交
    • H
      ext4: make prefetch_block_bitmaps default · 21175ca4
      Harshad Shirwadkar 提交于
      Block bitmap prefetching is needed for these allocator optimization
      data structures to get populated and provide better group scanning
      order. So, turn it on bu default. prefetch_block_bitmaps mount option
      is now marked as removed and a new option no_prefetch_block_bitmaps is
      added to disable block bitmap prefetching.
      Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20210401172129.189766-8-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      21175ca4
    • H
      ext4: improve cr 0 / cr 1 group scanning · 196e402a
      Harshad Shirwadkar 提交于
      Instead of traversing through groups linearly, scan groups in specific
      orders at cr 0 and cr 1. At cr 0, we want to find groups that have the
      largest free order >= the order of the request. So, with this patch,
      we maintain lists for each possible order and insert each group into a
      list based on the largest free order in its buddy bitmap. During cr 0
      allocation, we traverse these lists in the increasing order of largest
      free orders. This allows us to find a group with the best available cr
      0 match in constant time. If nothing can be found, we fallback to cr 1
      immediately.
      
      At CR1, the story is slightly different. We want to traverse in the
      order of increasing average fragment size. For CR1, we maintain a rb
      tree of groupinfos which is sorted by average fragment size. Instead
      of traversing linearly, at CR1, we traverse in the order of increasing
      average fragment size, starting at the most optimal group. This brings
      down cr 1 search complexity to log(num groups).
      
      For cr >= 2, we just perform the linear search as before. Also, in
      case of lock contention, we intermittently fallback to linear search
      even in CR 0 and CR 1 cases. This allows us to proceed during the
      allocation path even in case of high contention.
      
      There is an opportunity to do optimization at CR2 too. That's because
      at CR2 we only consider groups where bb_free counter (number of free
      blocks) is greater than the request extent size. That's left as future
      work.
      
      All the changes introduced in this patch are protected under a new
      mount option "mb_optimize_scan".
      
      With this patchset, following experiment was performed:
      
      Created a highly fragmented disk of size 65TB. The disk had no
      contiguous 2M regions. Following command was run consecutively for 3
      times:
      
      time dd if=/dev/urandom of=file bs=2M count=10
      
      Here are the results with and without cr 0/1 optimizations introduced
      in this patch:
      
      |---------+------------------------------+---------------------------|
      |         | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
      |---------+------------------------------+---------------------------|
      | 1st run | 5m1.871s                     | 2m47.642s                 |
      | 2nd run | 2m28.390s                    | 0m0.611s                  |
      | 3rd run | 2m26.530s                    | 0m1.255s                  |
      |---------+------------------------------+---------------------------|
      Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20210401172129.189766-6-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      196e402a
    • H
      ext4: add ability to return parsed options from parse_options · b237e304
      Harshad Shirwadkar 提交于
      Before this patch, the function parse_options() was returning
      journal_devnum and journal_ioprio variables to the caller. This patch
      generalizes that interface to allow parse_options to return any parsed
      options to return back to the caller. In this patch series, it gets
      used to capture the value of "mb_optimize_scan=%u" mount option.
      Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Reviewed-by: NRitesh Harjani <ritesh.list@gmail.com>
      Link: https://lore.kernel.org/r/20210401172129.189766-3-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      b237e304
  16. 06 4月, 2021 1 次提交
  17. 21 3月, 2021 1 次提交
  18. 07 3月, 2021 1 次提交
    • E
      ext4: shrink race window in ext4_should_retry_alloc() · efc61345
      Eric Whitney 提交于
      When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it
      fails at significant rates on the two test scenarios that disable
      delayed allocation (ext3conv and data_journal) and force actual block
      allocation for the fallocate and pwrite functions in the test.  The
      failure rate on 5.10 for both ext3conv and data_journal on one test
      system typically runs about 85%.  On 5.11, the failure rate on ext3conv
      sometimes drops to as low as 1% while the rate on data_journal
      increases to nearly 100%.
      
      The observed failures are largely due to ext4_should_retry_alloc()
      cutting off block allocation retries when s_mb_free_pending (used to
      indicate that a transaction in progress will free blocks) is 0.
      However, free space is usually available when this occurs during runs
      of generic/371.  It appears that a thread attempting to allocate
      blocks is just missing transaction commits in other threads that
      increase the free cluster count and reset s_mb_free_pending while
      the allocating thread isn't running.  Explicitly testing for free space
      availability avoids this race.
      
      The current code uses a post-increment operator in the conditional
      expression that determines whether the retry limit has been exceeded.
      This means that the conditional expression uses the value of the
      retry counter before it's increased, resulting in an extra retry cycle.
      The current code actually retries twice before hitting its retry limit
      rather than once.
      
      Increasing the retry limit to 3 from the current actual maximum retry
      count of 2 in combination with the change described above reduces the
      observed failure rate to less that 0.1% on both ext3conv and
      data_journal with what should be limited impact on users sensitive to
      the overhead caused by retries.
      
      A per filesystem percpu counter exported via sysfs is added to allow
      users or developers to track the number of times the retry limit is
      exceeded without resorting to debugging methods.  This should provide
      some insight into worst case retry behavior.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      efc61345
  19. 03 2月, 2021 2 次提交
  20. 28 1月, 2021 1 次提交
  21. 24 1月, 2021 1 次提交
    • C
      ext4: support idmapped mounts · 14f3db55
      Christian Brauner 提交于
      Enable idmapped mounts for ext4. All dedicated helpers we need for this
      exist. So this basically just means we're passing down the
      user_namespace argument from the VFS methods to the relevant helpers.
      
      Let's create simple example where we idmap an ext4 filesystem:
      
       root@f2-vm:~# truncate -s 5G ext4.img
      
       root@f2-vm:~# mkfs.ext4 ./ext4.img
       mke2fs 1.45.5 (07-Jan-2020)
       Discarding device blocks: done
       Creating filesystem with 1310720 4k blocks and 327680 inodes
       Filesystem UUID: 3fd91794-c6ca-4b0f-9964-289a000919cf
       Superblock backups stored on blocks:
               32768, 98304, 163840, 229376, 294912, 819200, 884736
      
       Allocating group tables: done
       Writing inode tables: done
       Creating journal (16384 blocks): done
       Writing superblocks and filesystem accounting information: done
      
       root@f2-vm:~# losetup -f --show ./ext4.img
       /dev/loop0
      
       root@f2-vm:~# mount /dev/loop0 /mnt
      
       root@f2-vm:~# ls -al /mnt/
       total 24
       drwxr-xr-x  3 root root  4096 Oct 28 13:34 .
       drwxr-xr-x 30 root root  4096 Oct 28 13:22 ..
       drwx------  2 root root 16384 Oct 28 13:34 lost+found
      
       # Let's create an idmapped mount at /idmapped1 where we map uid and gid
       # 0 to uid and gid 1000
       root@f2-vm:/# ./mount-idmapped --map-mount b:0:1000:1 /mnt/ /idmapped1/
      
       root@f2-vm:/# ls -al /idmapped1/
       total 24
       drwxr-xr-x  3 ubuntu ubuntu  4096 Oct 28 13:34 .
       drwxr-xr-x 30 root   root    4096 Oct 28 13:22 ..
       drwx------  2 ubuntu ubuntu 16384 Oct 28 13:34 lost+found
      
       # Let's create an idmapped mount at /idmapped2 where we map uid and gid
       # 0 to uid and gid 2000
       root@f2-vm:/# ./mount-idmapped --map-mount b:0:2000:1 /mnt/ /idmapped2/
      
       root@f2-vm:/# ls -al /idmapped2/
       total 24
       drwxr-xr-x  3 2000 2000  4096 Oct 28 13:34 .
       drwxr-xr-x 31 root root  4096 Oct 28 13:39 ..
       drwx------  2 2000 2000 16384 Oct 28 13:34 lost+found
      
      Let's create another example where we idmap the rootfs filesystem
      without a mapping for uid 0 and gid 0:
      
       # Create an idmapped mount of for a full POSIX range of rootfs under
       # /mnt but without a mapping for uid 0 to reduce attack surface
      
       root@f2-vm:/# ./mount-idmapped --map-mount b:1:1:65536 / /mnt/
      
       # Since we don't have a mapping for uid and gid 0 all files owned by
       # uid and gid 0 should show up as uid and gid 65534:
       root@f2-vm:/# ls -al /mnt/
       total 664
       drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 .
       drwxr-xr-x 31 root   root      4096 Oct 28 13:39 ..
       lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 bin -> usr/bin
       drwxr-xr-x  4 nobody nogroup   4096 Oct 28 13:17 boot
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:48 dev
       drwxr-xr-x 81 nobody nogroup   4096 Oct 28 04:00 etc
       drwxr-xr-x  4 nobody nogroup   4096 Oct 28 04:00 home
       lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 lib -> usr/lib
       lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib32 -> usr/lib32
       lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib64 -> usr/lib64
       lrwxrwxrwx  1 nobody nogroup     10 Aug 25 07:44 libx32 -> usr/libx32
       drwx------  2 nobody nogroup  16384 Aug 25 07:47 lost+found
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 media
       drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 mnt
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 opt
       drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 proc
       drwx--x--x  6 nobody nogroup   4096 Oct 28 13:34 root
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:46 run
       lrwxrwxrwx  1 nobody nogroup      8 Aug 25 07:44 sbin -> usr/sbin
       drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 srv
       drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 sys
       drwxrwxrwt 10 nobody nogroup   4096 Oct 28 13:19 tmp
       drwxr-xr-x 14 nobody nogroup   4096 Oct 20 13:00 usr
       drwxr-xr-x 12 nobody nogroup   4096 Aug 25 07:45 var
      
       # Since we do have a mapping for uid and gid 1000 all files owned by
       # uid and gid 1000 should simply show up as uid and gid 1000:
       root@f2-vm:/# ls -al /mnt/home/ubuntu/
       total 40
       drwxr-xr-x 3 ubuntu ubuntu  4096 Oct 28 00:43 .
       drwxr-xr-x 4 nobody nogroup 4096 Oct 28 04:00 ..
       -rw------- 1 ubuntu ubuntu  2936 Oct 28 12:26 .bash_history
       -rw-r--r-- 1 ubuntu ubuntu   220 Feb 25  2020 .bash_logout
       -rw-r--r-- 1 ubuntu ubuntu  3771 Feb 25  2020 .bashrc
       -rw-r--r-- 1 ubuntu ubuntu   807 Feb 25  2020 .profile
       -rw-r--r-- 1 ubuntu ubuntu     0 Oct 16 16:11 .sudo_as_admin_successful
       -rw------- 1 ubuntu ubuntu  1144 Oct 28 00:43 .viminfo
      
      Link: https://lore.kernel.org/r/20210121131959.646623-39-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      14f3db55
  22. 23 12月, 2020 4 次提交