1. 24 8月, 2021 19 次提交
  2. 21 8月, 2021 2 次提交
  3. 19 8月, 2021 1 次提交
    • L
      pipe: avoid unnecessary EPOLLET wakeups under normal loads · 3b844826
      Linus Torvalds 提交于
      I had forgotten just how sensitive hackbench is to extra pipe wakeups,
      and commit 3a34b13a ("pipe: make pipe writes always wake up
      readers") ended up causing a quite noticeable regression on larger
      machines.
      
      Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
      not clear that this matters in real life all that much, but as Mel
      points out, it's used often enough when comparing kernels and so the
      performance regression shows up like a sore thumb.
      
      It's easy enough to fix at least for the common cases where pipes are
      used purely for data transfer, and you never have any exciting poll
      usage at all.  So set a special 'poll_usage' flag when there is polling
      activity, and make the ugly "EPOLLET has crazy legacy expectations"
      semantics explicit to only that case.
      
      I would love to limit it to just the broken EPOLLET case, but the pipe
      code can't see the difference between epoll and regular select/poll, so
      any non-read/write waiting will trigger the extra wakeup behavior.  That
      is sufficient for at least the hackbench case.
      
      Apart from making the odd extra wakeup cases more explicitly about
      EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
      write, not at the first write chunk.  That is actually much saner
      semantics (as much as you can call any of the legacy edge-triggered
      expectations for EPOLLET "sane") since it means that you know the wakeup
      will happen once the write is done, rather than possibly in the middle
      of one.
      
      [ For stable people: I'm putting a "Fixes" tag on this, but I leave it
        up to you to decide whether you actually want to backport it or not.
        It likely has no impact outside of synthetic benchmarks  - Linus ]
      
      Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
      Fixes: 3a34b13a ("pipe: make pipe writes always wake up readers")
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Tested-by: NSandeep Patil <sspatil@android.com>
      Tested-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b844826
  4. 18 8月, 2021 1 次提交
  5. 16 8月, 2021 1 次提交
    • N
      btrfs: prevent rename2 from exchanging a subvol with a directory from different parents · 3f79f6f6
      NeilBrown 提交于
      Cross-rename lacks a check when that would prevent exchanging a
      directory and subvolume from different parent subvolume. This causes
      data inconsistencies and is caught before commit by tree-checker,
      turning the filesystem to read-only.
      
      Calling the renameat2 with RENAME_EXCHANGE flags like
      
        renameat2(AT_FDCWD, namesrc, AT_FDCWD, namedest, (1 << 1))
      
      on two paths:
      
        namesrc = dir1/subvol1/dir2
       namedest = subvol2/subvol3
      
      will cause key order problem with following write time tree-checker
      report:
      
        [1194842.307890] BTRFS critical (device loop1): corrupt leaf: root=5 block=27574272 slot=10 ino=258, invalid previous key objectid, have 257 expect 258
        [1194842.322221] BTRFS info (device loop1): leaf 27574272 gen 8 total ptrs 11 free space 15444 owner 5
        [1194842.331562] BTRFS info (device loop1): refs 2 lock_owner 0 current 26561
        [1194842.338772]        item 0 key (256 1 0) itemoff 16123 itemsize 160
        [1194842.338793]                inode generation 3 size 16 mode 40755
        [1194842.338801]        item 1 key (256 12 256) itemoff 16111 itemsize 12
        [1194842.338809]        item 2 key (256 84 2248503653) itemoff 16077 itemsize 34
        [1194842.338817]                dir oid 258 type 2
        [1194842.338823]        item 3 key (256 84 2363071922) itemoff 16043 itemsize 34
        [1194842.338830]                dir oid 257 type 2
        [1194842.338836]        item 4 key (256 96 2) itemoff 16009 itemsize 34
        [1194842.338843]        item 5 key (256 96 3) itemoff 15975 itemsize 34
        [1194842.338852]        item 6 key (257 1 0) itemoff 15815 itemsize 160
        [1194842.338863]                inode generation 6 size 8 mode 40755
        [1194842.338869]        item 7 key (257 12 256) itemoff 15801 itemsize 14
        [1194842.338876]        item 8 key (257 84 2505409169) itemoff 15767 itemsize 34
        [1194842.338883]                dir oid 256 type 2
        [1194842.338888]        item 9 key (257 96 2) itemoff 15733 itemsize 34
        [1194842.338895]        item 10 key (258 12 256) itemoff 15719 itemsize 14
        [1194842.339163] BTRFS error (device loop1): block=27574272 write time tree block corruption detected
        [1194842.339245] ------------[ cut here ]------------
        [1194842.443422] WARNING: CPU: 6 PID: 26561 at fs/btrfs/disk-io.c:449 csum_one_extent_buffer+0xed/0x100 [btrfs]
        [1194842.511863] CPU: 6 PID: 26561 Comm: kworker/u17:2 Not tainted 5.14.0-rc3-git+ #793
        [1194842.511870] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
        [1194842.511876] Workqueue: btrfs-worker-high btrfs_work_helper [btrfs]
        [1194842.511976] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
        [1194842.512068] RSP: 0018:ffffa2c284d77da0 EFLAGS: 00010282
        [1194842.512074] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff928867bd9978
        [1194842.512078] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff928867bd9970
        [1194842.512081] RBP: ffff92876b958000 R08: 0000000000000001 R09: 00000000000c0003
        [1194842.512085] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
        [1194842.512088] R13: ffff92875f989f98 R14: 0000000000000000 R15: 0000000000000000
        [1194842.512092] FS:  0000000000000000(0000) GS:ffff928867a00000(0000) knlGS:0000000000000000
        [1194842.512095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [1194842.512099] CR2: 000055f5384da1f0 CR3: 0000000102fe4000 CR4: 00000000000006e0
        [1194842.512103] Call Trace:
        [1194842.512128]  ? run_one_async_free+0x10/0x10 [btrfs]
        [1194842.631729]  btree_csum_one_bio+0x1ac/0x1d0 [btrfs]
        [1194842.631837]  run_one_async_start+0x18/0x30 [btrfs]
        [1194842.631938]  btrfs_work_helper+0xd5/0x1d0 [btrfs]
        [1194842.647482]  process_one_work+0x262/0x5e0
        [1194842.647520]  worker_thread+0x4c/0x320
        [1194842.655935]  ? process_one_work+0x5e0/0x5e0
        [1194842.655946]  kthread+0x135/0x160
        [1194842.655953]  ? set_kthread_struct+0x40/0x40
        [1194842.655965]  ret_from_fork+0x1f/0x30
        [1194842.672465] irq event stamp: 1729
        [1194842.672469] hardirqs last  enabled at (1735): [<ffffffffbd1104f5>] console_trylock_spinning+0x185/0x1a0
        [1194842.672477] hardirqs last disabled at (1740): [<ffffffffbd1104cc>] console_trylock_spinning+0x15c/0x1a0
        [1194842.672482] softirqs last  enabled at (1666): [<ffffffffbdc002e1>] __do_softirq+0x2e1/0x50a
        [1194842.672491] softirqs last disabled at (1651): [<ffffffffbd08aab7>] __irq_exit_rcu+0xa7/0xd0
      
      The corrupted data will not be written, and filesystem can be unmounted
      and mounted again (all changes since the last commit will be lost).
      
      Add the missing check for new_ino so that all non-subvolumes must reside
      under the same parent subvolume. There's an exception allowing to
      exchange two subvolumes from any parents as the directory representing a
      subvolume is only a logical link and does not have any other structures
      related to the parent subvolume, unlike files, directories etc, that
      are always in the inode namespace of the parent subvolume.
      
      Fixes: cdd1fedf ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
      CC: stable@vger.kernel.org # 4.7+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3f79f6f6
  6. 15 8月, 2021 1 次提交
    • J
      io_uring: only assign io_uring_enter() SQPOLL error in actual error case · 21f96522
      Jens Axboe 提交于
      If an SQPOLL based ring is newly created and an application issues an
      io_uring_enter(2) system call on it, then we can return a spurious
      -EOWNERDEAD error. This happens because there's nothing to submit, and
      if the caller doesn't specify any other action, the initial error
      assignment of -EOWNERDEAD never gets overwritten. This causes us to
      return it directly, even if it isn't valid.
      
      Move the error assignment into the actual failure case instead.
      
      Cc: stable@vger.kernel.org
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Reported-by: Sherlock Holo sherlockya@gmail.com
      Link: https://github.com/axboe/liburing/issues/413Signed-off-by: NJens Axboe <axboe@kernel.dk>
      21f96522
  7. 13 8月, 2021 2 次提交
  8. 10 8月, 2021 11 次提交
  9. 09 8月, 2021 2 次提交