1. 22 5月, 2019 40 次提交
    • J
      ext4: avoid panic during forced reboot due to aborted journal · 316063bf
      Jan Kara 提交于
      commit 2c1d0e3631e5732dba98ef49ac0bec1388776793 upstream.
      
      Handling of aborted journal is a special code path different from
      standard ext4_error() one and it can call panic() as well. Commit
      1dc1097ff60e ("ext4: avoid panic during forced reboot") forgot to update
      this path so fix that omission.
      
      Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot")
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 5.1
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      316063bf
    • S
      ext4: fix use-after-free in dx_release() · c19db366
      Sahitya Tummala 提交于
      commit 08fc98a4d6424af66eb3ac4e2cedd2fc927ed436 upstream.
      
      The buffer_head (frames[0].bh) and it's corresping page can be
      potentially free'd once brelse() is done inside the for loop
      but before the for loop exits in dx_release(). It can be free'd
      in another context, when the page cache is flushed via
      drop_caches_sysctl_handler(). This results into below data abort
      when accessing info->indirect_levels in dx_release().
      
      Unable to handle kernel paging request at virtual address ffffffc17ac3e01e
      Call trace:
       dx_release+0x70/0x90
       ext4_htree_fill_tree+0x2d4/0x300
       ext4_readdir+0x244/0x6f8
       iterate_dir+0xbc/0x160
       SyS_getdents64+0x94/0x174
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c19db366
    • L
      ext4: fix data corruption caused by overlapping unaligned and aligned IO · 0db24122
      Lukas Czerner 提交于
      commit 57a0da28ced8707cb9f79f071a016b9d005caf5a upstream.
      
      Unaligned AIO must be serialized because the zeroing of partial blocks
      of unaligned AIO can result in data corruption in case it's overlapping
      another in flight IO.
      
      Currently we wait for all unwritten extents before we submit unaligned
      AIO which protects data in case of unaligned AIO is following overlapping
      IO. However if a unaligned AIO is followed by overlapping aligned AIO we
      can still end up corrupting data.
      
      To fix this, we must make sure that the unaligned AIO is the only IO in
      flight by waiting for unwritten extents conversion not just before the
      IO submission, but right after it as well.
      
      This problem can be reproduced by xfstest generic/538
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0db24122
    • S
      ext4: zero out the unused memory region in the extent tree block · 25d010f4
      Sriram Rajagopalan 提交于
      commit 592acbf16821288ecdc4192c47e3774a4c48bb64 upstream.
      
      This commit zeroes out the unused memory region in the buffer_head
      corresponding to the extent metablock after writing the extent header
      and the corresponding extent node entries.
      
      This is done to prevent random uninitialized data from getting into
      the filesystem when the extent block is synced.
      
      This fixes CVE-2019-11833.
      Signed-off-by: NSriram Rajagopalan <sriramr@arista.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25d010f4
    • A
      tty: Don't force RISCV SBI console as preferred console · c907ce3f
      Anup Patel 提交于
      commit f91253a3d005796404ae0e578b3394459b5f9b71 upstream.
      
      The Linux kernel will auto-disables all boot consoles whenever it
      gets a preferred real console.
      
      Currently on RISC-V systems, if we have a real console which is not
      RISCV SBI console then boot consoles (such as earlycon=sbi) are not
      auto-disabled when a real console (ttyS0 or ttySIF0) is available.
      This results in duplicate prints at boot-time after kernel starts
      using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel
      parameter was passed by bootloader.
      
      The reason for above issue is that RISCV SBI console always adds
      itself as preferred console which is causing other real consoles
      to be not used as preferred console.
      
      Ideally "console=" kernel parameter passed by bootloaders should
      be the one selecting a preferred real console.
      
      This patch fixes above issue by not forcing RISCV SBI console as
      preferred console.
      
      Fixes: afa6b1cc ("tty: New RISC-V SBI console driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnup Patel <anup.patel@wdc.com>
      Reviewed-by: NAtish Patra <atish.patra@wdc.com>
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c907ce3f
    • J
      fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount · 986d3453
      Jiufei Xue 提交于
      commit ec084de929e419e51bcdafaafe567d9e7d0273b7 upstream.
      
      synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
      switch may not go to the workqueue after synchronize_rcu().  Thus
      previous scheduled switches was not finished even flushing the
      workqueue, which will cause a NULL pointer dereferenced followed below.
      
        VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
          evict+0xb3/0x180
          iput+0x1b0/0x230
          inode_switch_wbs_work_fn+0x3c0/0x6a0
          worker_thread+0x4e/0x490
          ? process_one_work+0x410/0x410
          kthread+0xe6/0x100
          ret_from_fork+0x39/0x50
      
      Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
      pending callbacks to finish.  And inc isw_nr_in_flight after call_rcu()
      in inode_switch_wbs() to make more sense.
      
      Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.comSigned-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      986d3453
    • E
      crypto: ccm - fix incompatibility between "ccm" and "ccm_base" · a80da82d
      Eric Biggers 提交于
      commit 6a1faa4a43f5fabf9cbeaa742d916e7b5e73120f upstream.
      
      CCM instances can be created by either the "ccm" template, which only
      allows choosing the block cipher, e.g. "ccm(aes)"; or by "ccm_base",
      which allows choosing the ctr and cbcmac implementations, e.g.
      "ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
      
      However, a "ccm_base" instance prevents a "ccm" instance from being
      registered using the same implementations.  Nor will the instance be
      found by lookups of "ccm".  This can be used as a denial of service.
      Moreover, "ccm_base" instances are never tested by the crypto
      self-tests, even if there are compatible "ccm" tests.
      
      The root cause of these problems is that instances of the two templates
      use different cra_names.  Therefore, fix these problems by making
      "ccm_base" instances set the same cra_name as "ccm" instances, e.g.
      "ccm(aes)" instead of "ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
      
      This requires extracting the block cipher name from the name of the ctr
      and cbcmac algorithms.  It also requires starting to verify that the
      algorithms are really ctr and cbcmac using the same block cipher, not
      something else entirely.  But it would be bizarre if anyone were
      actually using non-ccm-compatible algorithms with ccm_base, so this
      shouldn't break anyone in practice.
      
      Fixes: 4a49b499 ("[CRYPTO] ccm: Added CCM mode")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      a80da82d
    • K
      ipmi:ssif: compare block number correctly for multi-part return messages · f6de0a3b
      Kamlakant Patel 提交于
      commit 55be8658c7e2feb11a5b5b33ee031791dbd23a69 upstream.
      
      According to ipmi spec, block number is a number that is incremented,
      starting with 0, for each new block of message data returned using the
      middle transaction.
      
      Here, the 'blocknum' is data[0] which always starts from zero(0) and
      'ssif_info->multi_pos' starts from 1.
      So, we need to add +1 to blocknum while comparing with multi_pos.
      
      Fixes: 7d6380cd40f79 ("ipmi:ssif: Fix handling of multi-part return messages").
      Reported-by: NKiran Kolukuluru <kirank@ami.com>
      Signed-off-by: NKamlakant Patel <kamlakantp@marvell.com>
      Message-Id: <1556106615-18722-1-git-send-email-kamlakantp@marvell.com>
      [Also added a debug log if the block numbers don't match.]
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Cc: stable@vger.kernel.org # 4.4
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6de0a3b
    • C
      bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim() · 88681649
      Coly Li 提交于
      commit 1bee2addc0c8470c8aaa65ef0599eeae96dd88bc upstream.
      
      In journal_reclaim() ja->cur_idx of each cache will be update to
      reclaim available journal buckets. Variable 'int n' is used to count how
      many cache is successfully reclaimed, then n is set to c->journal.key
      by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache()
      loop will write the jset data onto each cache.
      
      The problem is, if all jouranl buckets on each cache is full, the
      following code in journal_reclaim(),
      
      529 for_each_cache(ca, c, iter) {
      530       struct journal_device *ja = &ca->journal;
      531       unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
      532
      533       /* No space available on this device */
      534       if (next == ja->discard_idx)
      535               continue;
      536
      537       ja->cur_idx = next;
      538       k->ptr[n++] = MAKE_PTR(0,
      539                         bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
      540                         ca->sb.nr_this_dev);
      541 }
      542
      543 bkey_init(k);
      544 SET_KEY_PTRS(k, n);
      
      If there is no available bucket to reclaim, the if() condition at line
      534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS()
      will set KEY_PTRS field of c->journal.key to 0.
      
      Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in
      journal_write_unlocked() the journal data is written in following loop,
      
      649	for (i = 0; i < KEY_PTRS(k); i++) {
      650-671		submit journal data to cache device
      672	}
      
      If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data
      won't be written to cache device here. If system crahed or rebooted
      before bkeys of the lost journal entries written into btree nodes, data
      corruption will be reported during bcache reload after rebooting the
      system.
      
      Indeed there is only one cache in a cache set, there is no need to set
      KEY_PTRS field in journal_reclaim() at all. But in order to keep the
      for_each_cache() logic consistent for now, this patch fixes the above
      problem by not setting 0 KEY_PTRS of journal key, if there is no bucket
      available to reclaim.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88681649
    • L
      bcache: fix a race between cache register and cacheset unregister · ecfc882f
      Liang Chen 提交于
      commit a4b732a248d12cbdb46999daf0bf288c011335eb upstream.
      
      There is a race between cache device register and cache set unregister.
      For an already registered cache device, register_bcache will call
      bch_is_open to iterate through all cachesets and check every cache
      there. The race occurs if cache_set_free executes at the same time and
      clears the caches right before ca is dereferenced in bch_is_open_cache.
      To close the race, let's make sure the clean up work is protected by
      the bch_register_lock as well.
      
      This issue can be reproduced as follows,
      while true; do echo /dev/XXX> /sys/fs/bcache/register ; done&
      while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &
      
      and results in the following oops,
      
      [  +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998
      [  +0.000457] #PF error: [normal kernel read fault]
      [  +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0
      [  +0.000388] Oops: 0000 [#1] SMP PTI
      [  +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6
      [  +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
      [  +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache]
      [  +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d
      [  +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202
      [  +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000
      [  +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001
      [  +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a
      [  +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0
      [  +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620
      [  +0.000384] FS:  00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000
      [  +0.000420] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0
      [  +0.000837] Call Trace:
      [  +0.000682]  ? _cond_resched+0x10/0x20
      [  +0.000691]  ? __kmalloc+0x131/0x1b0
      [  +0.000710]  kernfs_fop_write+0xfa/0x170
      [  +0.000733]  __vfs_write+0x2e/0x190
      [  +0.000688]  ? inode_security+0x10/0x30
      [  +0.000698]  ? selinux_file_permission+0xd2/0x120
      [  +0.000752]  ? security_file_permission+0x2b/0x100
      [  +0.000753]  vfs_write+0xa8/0x1a0
      [  +0.000676]  ksys_write+0x4d/0xb0
      [  +0.000699]  do_syscall_64+0x3a/0xf0
      [  +0.000692]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NLiang Chen <liangchen.linux@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecfc882f
    • F
      Btrfs: do not start a transaction at iterate_extent_inodes() · 8a8f671b
      Filipe Manana 提交于
      commit bfc61c36260ca990937539cd648ede3cd749bc10 upstream.
      
      When finding out which inodes have references on a particular extent, done
      by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both
      v1 and v2) ioctl and from scrub we use the transaction join API to grab a
      reference on the currently running transaction, since in order to give
      accurate results we need to inspect the delayed references of the currently
      running transaction.
      
      However, if there is currently no running transaction, the join operation
      will create a new transaction. This is inefficient as the transaction will
      eventually be committed, doing unnecessary IO and introducing a potential
      point of failure that will lead to a transaction abort due to -ENOSPC, as
      recently reported [1].
      
      That's because the join, creates the transaction but does not reserve any
      space, so when attempting to update the root item of the root passed to
      btrfs_join_transaction(), during the transaction commit, we can end up
      failling with -ENOSPC. Users of a join operation are supposed to actually
      do some filesystem changes and reserve space by some means, which is not
      the case of iterate_extent_inodes(), it is a read-only operation for all
      contextes from which it is called.
      
      The reported [1] -ENOSPC failure stack trace is the following:
      
       heisenberg kernel: ------------[ cut here ]------------
       heisenberg kernel: BTRFS: Transaction aborted (error -28)
       heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
       heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
       heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
       heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
       heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
       heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
       heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
       heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
       heisenberg kernel: FS:  0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
       heisenberg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
       heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       heisenberg kernel: Call Trace:
       heisenberg kernel:  commit_fs_roots+0x166/0x1d0 [btrfs]
       heisenberg kernel:  ? _cond_resched+0x15/0x30
       heisenberg kernel:  ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
       heisenberg kernel:  btrfs_commit_transaction+0x2bd/0x870 [btrfs]
       heisenberg kernel:  ? start_transaction+0x9d/0x3f0 [btrfs]
       heisenberg kernel:  transaction_kthread+0x147/0x180 [btrfs]
       heisenberg kernel:  ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
       heisenberg kernel:  kthread+0x112/0x130
       heisenberg kernel:  ? kthread_bind+0x30/0x30
       heisenberg kernel:  ret_from_fork+0x35/0x40
       heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
      
      So fix that by using the attach API, which does not create a transaction
      when there is currently no running transaction.
      
      [1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/Reported-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a8f671b
    • F
      Btrfs: do not start a transaction during fiemap · 0388d45a
      Filipe Manana 提交于
      commit 03628cdbc64db6262e50d0357960a4e9562676a1 upstream.
      
      During fiemap, for regular extents (non inline) we need to check if they
      are shared and if they are, set the shared bit. Checking if an extent is
      shared requires checking the delayed references of the currently running
      transaction, since some reference might have not yet hit the extent tree
      and be only in the in-memory delayed references.
      
      However we were using a transaction join for this, which creates a new
      transaction when there is no transaction currently running. That means
      that two more potential failures can happen: creating the transaction and
      committing it. Further, if no write activity is currently happening in the
      system, and fiemap calls keep being done, we end up creating and
      committing transactions that do nothing.
      
      In some extreme cases this can result in the commit of the transaction
      created by fiemap to fail with ENOSPC when updating the root item of a
      subvolume tree because a join does not reserve any space, leading to a
      trace like the following:
      
       heisenberg kernel: ------------[ cut here ]------------
       heisenberg kernel: BTRFS: Transaction aborted (error -28)
       heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
       heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
       heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
       heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
       heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
       heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
       heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
       heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
       heisenberg kernel: FS:  0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
       heisenberg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
       heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       heisenberg kernel: Call Trace:
       heisenberg kernel:  commit_fs_roots+0x166/0x1d0 [btrfs]
       heisenberg kernel:  ? _cond_resched+0x15/0x30
       heisenberg kernel:  ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
       heisenberg kernel:  btrfs_commit_transaction+0x2bd/0x870 [btrfs]
       heisenberg kernel:  ? start_transaction+0x9d/0x3f0 [btrfs]
       heisenberg kernel:  transaction_kthread+0x147/0x180 [btrfs]
       heisenberg kernel:  ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
       heisenberg kernel:  kthread+0x112/0x130
       heisenberg kernel:  ? kthread_bind+0x30/0x30
       heisenberg kernel:  ret_from_fork+0x35/0x40
       heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
      
      Since fiemap (and btrfs_check_shared()) is a read-only operation, do not do
      a transaction join to avoid the overhead of creating a new transaction (if
      there is currently no running transaction) and introducing a potential
      point of failure when the new transaction gets committed, instead use a
      transaction attach to grab a handle for the currently running transaction
      if any.
      Reported-by: NChristoph Anton Mitterer <calestyo@scientia.net>
      Link: https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/
      Fixes: afce772e ("btrfs: fix check_shared for fiemap ioctl")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0388d45a
    • F
      Btrfs: send, flush dellaloc in order to avoid data loss · 74ca0a76
      Filipe Manana 提交于
      commit 9f89d5de8631c7930898a601b6612e271aa2261c upstream.
      
      When we set a subvolume to read-only mode we do not flush dellaloc for any
      of its inodes (except if the filesystem is mounted with -o flushoncommit),
      since it does not affect correctness for any subsequent operations - except
      for a future send operation. The send operation will not be able to see the
      delalloc data since the respective file extent items, inode item updates,
      backreferences, etc, have not hit yet the subvolume and extent trees.
      
      Effectively this means data loss, since the send stream will not contain
      any data from existing delalloc. Another problem from this is that if the
      writeback starts and finishes while the send operation is in progress, we
      have the subvolume tree being being modified concurrently which can result
      in send failing unexpectedly with EIO or hitting runtime errors, assertion
      failures or hitting BUG_ONs, etc.
      
      Simple reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ btrfs subvolume create /mnt/sv
        $ xfs_io -f -c "pwrite -S 0xea 0 108K" /mnt/sv/foo
      
        $ btrfs property set /mnt/sv ro true
        $ btrfs send -f /tmp/send.stream /mnt/sv
      
        $ od -t x1 -A d /mnt/sv/foo
        0000000 ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea
        *
        0110592
      
        $ umount /mnt
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
      
        $ btrfs receive -f /tmp/send.stream /mnt
        $ echo $?
        0
        $ od -t x1 -A d /mnt/sv/foo
        0000000
        # ---> empty file
      
      Since this a problem that affects send only, fix it in send by flushing
      dellaloc for all the roots used by the send operation before send starts
      to process the commit roots.
      
      This is a problem that affects send since it was introduced (commit
      31db9f7c ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive"))
      but backporting it to older kernels has some dependencies:
      
      - For kernels between 3.19 and 4.20, it depends on commit 3cd24c698004d2
        ("btrfs: use tagged writepage to mitigate livelock of snapshot") because
        the function btrfs_start_delalloc_snapshot() does not exist before that
        commit. So one has to either pick that commit or replace the calls to
        btrfs_start_delalloc_snapshot() in this patch with calls to
        btrfs_start_delalloc_inodes().
      
      - For kernels older than 3.19 it also requires commit e5fa8f86
        ("Btrfs: ensure send always works on roots without orphans") because
        it depends on the function ensure_commit_roots_uptodate() which that
        commits introduced.
      
      - No dependencies for 5.0+ kernels.
      
      A test case for fstests follows soon.
      
      CC: stable@vger.kernel.org # 3.19+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74ca0a76
    • N
      btrfs: Honour FITRIM range constraints during free space trim · 8b13bb91
      Nikolay Borisov 提交于
      commit c2d1b3aae33605a61cbab445d8ae1c708ccd2698 upstream.
      
      Up until now trimming the freespace was done irrespective of what the
      arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
      will be entirely ignored. Fix it by correctly handling those paramter.
      This requires breaking if the found freespace extent is after the end of
      the passed range as well as completing trim after trimming
      fstrim_range::len bytes.
      
      Fixes: 499f377f ("btrfs: iterate over unused chunk space in FITRIM")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b13bb91
    • N
      btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails · 87dcf0c6
      Nikolay Borisov 提交于
      commit 537f38f019fa0b762dbb4c0fc95d7fcce9db8e2d upstream.
      
      If a an eb fails to be read for whatever reason - it's corrupted on disk
      and parent transid/key validations fail or IO for eb pages fail then
      this buffer must be removed from the buffer cache. Currently the code
      calls free_extent_buffer if an error occurs. Unfortunately this doesn't
      achieve the desired behavior since btrfs_find_create_tree_block returns
      with eb->refs == 2.
      
      On the other hand free_extent_buffer will only decrement the refs once
      leaving it added to the buffer cache radix tree.  This enables later
      code to look up the buffer from the cache and utilize it potentially
      leading to a crash.
      
      The correct way to free the buffer is call free_extent_buffer_stale.
      This function will correctly call atomic_dec explicitly for the buffer
      and subsequently call release_extent_buffer which will decrement the
      final reference thus correctly remove the invalid buffer from buffer
      cache. This change affects only newly allocated buffers since they have
      eb->refs == 2.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755Reported-by: NJungyeon <jungyeon@gatech.edu>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87dcf0c6
    • Q
      btrfs: Check the first key and level for cached extent buffer · d8925a1f
      Qu Wenruo 提交于
      commit 448de471cd4cab0cedd15770082567a69a784a11 upstream.
      
      [BUG]
      When reading a file from a fuzzed image, kernel can panic like:
      
        BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
        assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3500!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
        Call Trace:
         btrfs_lookup_csum+0x52/0x150 [btrfs]
         __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
         btrfs_submit_bio_hook+0x103/0x170 [btrfs]
         submit_one_bio+0x59/0x80 [btrfs]
         extent_read_full_page+0x58/0x80 [btrfs]
         generic_file_read_iter+0x2f6/0x9d0
         __vfs_read+0x14d/0x1a0
         vfs_read+0x8d/0x140
         ksys_read+0x52/0xc0
         do_syscall_64+0x60/0x210
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [CAUSE]
      The fuzzed image has a corrupted leaf whose first key doesn't match its
      parent:
      
        checksum tree key (CSUM_TREE ROOT_ITEM 0)
        node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
        	...
                key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
      
        leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
        leaf 29761536 flags 0x1(WRITTEN) backref revision 1
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
                item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
                        range start 8798638964736 end 8798641262592 length 2297856
      
      When reading the above tree block, we have extent_buffer->refs = 2 in
      the context:
      
      - initial one from __alloc_extent_buffer()
        alloc_extent_buffer()
        |- __alloc_extent_buffer()
           |- atomic_set(&eb->refs, 1)
      
      - one being added to fs_info->buffer_radix
        alloc_extent_buffer()
        |- check_buffer_tree_ref()
           |- atomic_inc(&eb->refs)
      
      So if even we call free_extent_buffer() in read_tree_block or other
      similar situation, we only decrease the refs by 1, it doesn't reach 0
      and won't be freed right now.
      
      The staled eb and its corrupted content will still be kept cached.
      
      Furthermore, we have several extra cases where we either don't do first
      key check or the check is not proper for all callers:
      
      - scrub
        We just don't have first key in this context.
      
      - shared tree block
        One tree block can be shared by several snapshot/subvolume trees.
        In that case, the first key check for one subvolume doesn't apply to
        another.
      
      So for the above reasons, a corrupted extent buffer can sneak into the
      buffer cache.
      
      [FIX]
      Call verify_level_key in read_block_for_search to do another
      verification. For that purpose the function is exported.
      
      Due to above reasons, although we can free corrupted extent buffer from
      cache, we still need the check in read_block_for_search(), for scrub and
      shared tree blocks.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769Reported-by: NYoon Jungyeon <jungyeon@gatech.edu>
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8925a1f
    • D
      ext4: fix ext4_show_options for file systems w/o journal · 45123ae5
      Debabrata Banerjee 提交于
      commit 50b29d8f033a7c88c5bc011abc2068b1691ab755 upstream.
      
      Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as
      I assume was intended, all other options were blown away leading to
      _ext4_show_options() output being incorrect.
      
      Fixes: 1e381f60 ("ext4: do not allow journal_opts for fs w/o journal")
      Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45123ae5
    • K
      ext4: actually request zeroing of inode table after grow · f7952475
      Kirill Tkhai 提交于
      commit 310a997fd74de778b9a4848a64be9cda9f18764a upstream.
      
      It is never possible, that number of block groups decreases,
      since only online grow is supported.
      
      But after a growing occured, we have to zero inode tables
      for just created new block groups.
      
      Fixes: 19c5246d ("ext4: add new online resize interface")
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7952475
    • B
      ext4: fix use-after-free race with debug_want_extra_isize · 2a18c9c7
      Barret Rhoden 提交于
      commit 7bc04c5c2cc467c5b40f2b03ba08da174a0d5fa7 upstream.
      
      When remounting with debug_want_extra_isize, we were not performing the
      same checks that we do during a normal mount.  That allowed us to set a
      value for s_want_extra_isize that reached outside the s_inode_size.
      
      Fixes: e2b911c5 ("ext4: clean up feature test macros with predicate functions")
      Reported-by: syzbot+f584efa0ac7213c226b7@syzkaller.appspotmail.com
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NBarret Rhoden <brho@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a18c9c7
    • P
      ext4: avoid drop reference to iloc.bh twice · b12a8d80
      Pan Bian 提交于
      commit 8c380ab4b7b59c0c602743810be1b712514eaebc upstream.
      
      The reference to iloc.bh has been dropped in ext4_mark_iloc_dirty.
      However, the reference is dropped again if error occurs during
      ext4_handle_dirty_metadata, which may result in use-after-free bugs.
      
      Fixes: fb265c9cb49e("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
      Signed-off-by: NPan Bian <bianpan2016@163.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b12a8d80
    • T
      ext4: ignore e_value_offs for xattrs with value-in-ea-inode · f0f805f8
      Theodore Ts'o 提交于
      commit e5d01196c0428a206f307e9ee5f6842964098ff0 upstream.
      
      In other places in fs/ext4/xattr.c, if e_value_inum is non-zero, the
      code ignores the value in e_value_offs.  The e_value_offs *should* be
      zero, but we shouldn't depend upon it, since it might not be true in a
      corrupted/fuzzed file system.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202897
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202877Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0f805f8
    • J
      ext4: make sanity check in mballoc more strict · 71478ef6
      Jan Kara 提交于
      commit 31562b954b60f02acb91b7349dc6432d3f8c3c5f upstream.
      
      The sanity check in mb_find_extent() only checked that returned extent
      does not extend past blocksize * 8, however it should not extend past
      EXT4_CLUSTERS_PER_GROUP(sb). This can happen when clusters_per_group <
      blocksize * 8 and the tail of the bitmap is not properly filled by 1s
      which happened e.g. when ancient kernels have grown the filesystem.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71478ef6
    • J
      jbd2: check superblock mapped prior to committing · 001fe0da
      Jiufei Xue 提交于
      commit 742b06b5628f2cd23cb51a034cb54dc33c6162c5 upstream.
      
      We hit a BUG at fs/buffer.c:3057 if we detached the nbd device
      before unmounting ext4 filesystem.
      
      The typical chain of events leading to the BUG:
      jbd2_write_superblock
        submit_bh
          submit_bh_wbc
            BUG_ON(!buffer_mapped(bh));
      
      The block device is removed and all the pages are invalidated. JBD2
      was trying to write journal superblock to the block device which is
      no longer present.
      
      Fix this by checking the journal superblock's buffer head prior to
      submitting.
      Reported-by: NEric Ren <renzhen@linux.alibaba.com>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      001fe0da
    • S
      tty/vt: fix write/write race in ioctl(KDSKBSENT) handler · 0fd2df64
      Sergei Trofimovich 提交于
      commit 46ca3f735f345c9d87383dd3a09fa5d43870770e upstream.
      
      The bug manifests as an attempt to access deallocated memory:
      
          BUG: unable to handle kernel paging request at ffff9c8735448000
          #PF error: [PROT] [WRITE]
          PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 80000007f5448161
          Oops: 0003 [#1] PREEMPT SMP
          CPU: 6 PID: 388 Comm: loadkeys Tainted: G         C        5.0.0-rc6-00153-g5ded5871030e #91
          Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013
          RIP: 0010:__memmove+0x81/0x1a0
          Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 <f3> 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49
          RSP: 0018:ffffa1b9002d7d08 EFLAGS: 00010203
          RAX: ffff9c873541af43 RBX: ffff9c873541af43 RCX: 00000c6f105cd6bf
          RDX: 0000637882e986b6 RSI: ffff9c8735447ffb RDI: ffff9c8735447ffb
          RBP: ffff9c8739cd3800 R08: ffff9c873b802f00 R09: 00000000fffff73b
          R10: ffffffffb82b35f1 R11: 00505b1b004d5b1b R12: 0000000000000000
          R13: ffff9c873541af3d R14: 000000000000000b R15: 000000000000000c
          FS:  00007f450c390580(0000) GS:ffff9c873f180000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffff9c8735448000 CR3: 00000007e213c002 CR4: 00000000000606e0
          Call Trace:
           vt_do_kdgkb_ioctl+0x34d/0x440
           vt_ioctl+0xba3/0x1190
           ? __bpf_prog_run32+0x39/0x60
           ? mem_cgroup_commit_charge+0x7b/0x4e0
           tty_ioctl+0x23f/0x920
           ? preempt_count_sub+0x98/0xe0
           ? __seccomp_filter+0x67/0x600
           do_vfs_ioctl+0xa2/0x6a0
           ? syscall_trace_enter+0x192/0x2d0
           ksys_ioctl+0x3a/0x70
           __x64_sys_ioctl+0x16/0x20
           do_syscall_64+0x54/0xe0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The bug manifests on systemd systems with multiple vtcon devices:
        # cat /sys/devices/virtual/vtconsole/vtcon0/name
        (S) dummy device
        # cat /sys/devices/virtual/vtconsole/vtcon1/name
        (M) frame buffer device
      
      There systemd runs 'loadkeys' tool in tapallel for each vtcon
      instance. This causes two parallel ioctl(KDSKBSENT) calls to
      race into adding the same entry into 'func_table' array at:
      
          drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl()
      
      The function has no locking around writes to 'func_table'.
      
      The simplest reproducer is to have initrams with the following
      init on a 8-CPU machine x86_64:
      
          #!/bin/sh
      
          loadkeys -q windowkeys ru4 &
          loadkeys -q windowkeys ru4 &
          loadkeys -q windowkeys ru4 &
          loadkeys -q windowkeys ru4 &
      
          loadkeys -q windowkeys ru4 &
          loadkeys -q windowkeys ru4 &
          loadkeys -q windowkeys ru4 &
          loadkeys -q windowkeys ru4 &
          wait
      
      The change adds lock on write path only. Reads are still racy.
      
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      CC: Jiri Slaby <jslaby@suse.com>
      Link: https://lkml.org/lkml/2019/2/17/256Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fd2df64
    • Y
      tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0 · d90824ec
      Yifeng Li 提交于
      commit 75ddbc1fb11efac87b611d48e9802f6fe2bb2163 upstream.
      
      Previously, in the userspace, it was possible to use the "setterm" command
      from util-linux to blank the VT console by default, using the following
      command.
      
      According to the man page,
      
      > The force option keeps the screen blank even if a key is pressed.
      
      It was implemented by calling TIOCL_BLANKSCREEN.
      
      	case BLANKSCREEN:
      		ioctlarg = TIOCL_BLANKSCREEN;
      		if (ioctl(STDIN_FILENO, TIOCLINUX, &ioctlarg))
      			warn(_("cannot force blank"));
      		break;
      
      However, after Linux 4.12, this command ceased to work anymore, which is
      unexpected. By inspecting the kernel source, it shows that the issue was
      triggered by the side-effect from commit a4199f5e ("tty: Disable
      default console blanking interval").
      
      The console blanking is implemented by function do_blank_screen() in vt.c:
      "blank_state" will be initialized to "blank_normal_wait" in con_init() if
      AND ONLY IF ("blankinterval" > 0). If "blankinterval" is 0, "blank_state"
      will be "blank_off" (== 0), and a call to do_blank_screen() will always
      abort, even if a forced blanking is required from the user by calling
      TIOCL_BLANKSCREEN, the console won't be blanked.
      
      This behavior is unexpected from a user's point-of-view, since it's not
      mentioned in any documentation. The setterm man page suggests it will
      always work, and the kernel comments in uapi/linux/tiocl.h says
      
      > /* keep screen blank even if a key is pressed */
      > #define TIOCL_BLANKSCREEN 14
      
      To fix it, we simply remove the "blank_state != blank_off" check, as
      pointed out by Nicolas Pitre, this check doesn't logically make sense
      and it's safe to remove.
      Suggested-by: NNicolas Pitre <nicolas.pitre@linaro.org>
      Fixes: a4199f5e ("tty: Disable default console blanking interval")
      Signed-off-by: NYifeng Li <tomli@tomli.me>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d90824ec
    • A
      mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write · 6a01793e
      Alexander Sverdlin 提交于
      commit 2b75ebeea6f4937d4d05ec4982c471cef9a29b7f upstream.
      
      It was observed that reads crossing 4K address boundary are failing.
      
      This limitation is mentioned in Intel documents:
      
      Intel(R) 9 Series Chipset Family Platform Controller Hub (PCH) Datasheet:
      
      "5.26.3 Flash Access
      Program Register Access:
      * Program Register Accesses are not allowed to cross a 4 KB boundary..."
      
      Enhanced Serial Peripheral Interface (eSPI)
      Interface Base Specification (for Client and Server Platforms):
      
      "5.1.4 Address
      For other memory transactions, the address may start or end at any byte
      boundary. However, the address and payload length combination must not
      cross the naturally aligned address boundary of the corresponding Maximum
      Payload Size. It must not cross a 4 KB address boundary."
      
      Avoid this by splitting an operation crossing the boundary into two
      operations.
      
      Fixes: 8afda8b2 ("spi-nor: Add support for Intel SPI serial flash controller")
      Cc: stable@vger.kernel.org
      Reported-by: NRomain Porte <romain.porte@nokia.com>
      Tested-by: NPascal Fabreges <pascal.fabreges@nokia.com>
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Reviewed-by: NTudor Ambarus <tudor.ambarus@microchip.com>
      Acked-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a01793e
    • D
      mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values · dc6d69bd
      Dmitry Osipenko 提交于
      commit ea611d1cc180fbb56982c83cd5142a2b34881f5c upstream.
      
      The FPS_PERIOD_MAX_US definitions are swapped for MAX20024 and MAX77620,
      fix it.
      
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDmitry Osipenko <digetx@gmail.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc6d69bd
    • S
      mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L · 5185672f
      Steve Twiss 提交于
      commit 6b4814a9451add06d457e198be418bf6a3e6a990 upstream.
      
      Mismatch between what is found in the Datasheets for DA9063 and DA9063L
      provided by Dialog Semiconductor, and the register names provided in the
      MFD registers file. The changes are for the OTP (one-time-programming)
      control registers. The two naming errors are OPT instead of OTP, and
      COUNT instead of CONT (i.e. control).
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve Twiss <stwiss.opensource@diasemi.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5185672f
    • R
      ACPI: PM: Set enable_for_wake for wakeup GPEs during suspend-to-idle · 770e46b3
      Rajat Jain 提交于
      commit 2f844b61db8297a1f7a06adf2eb5c43381f2c183 upstream.
      
      I noticed that recently multiple systems (chromebooks) couldn't wake
      from S0ix using LID or Keyboard after updating to a newer kernel. I
      bisected and it turned up commit f941d3e41da7 ("ACPI: EC / PM: Disable
      non-wakeup GPEs for suspend-to-idle"). I checked that the issue got
      fixed if that commit was reverted.
      
      I debugged and found that although PNP0C0D:00 (representing the LID)
      is wake capable and should wakeup the system per the code in
      acpi_wakeup_gpe_init() and in drivers/acpi/button.c:
      
      localhost /sys # cat /proc/acpi/wakeup
      Device  S-state   Status   Sysfs node
      LID0      S4    *enabled   platform:PNP0C0D:00
      CREC      S5    *disabled  platform:GOOG0004:00
                      *disabled  platform:cros-ec-dev.1.auto
                      *disabled  platform:cros-ec-accel.0
                      *disabled  platform:cros-ec-accel.1
                      *disabled  platform:cros-ec-gyro.0
                      *disabled  platform:cros-ec-ring.0
                      *disabled  platform:cros-usbpd-charger.2.auto
                      *disabled  platform:cros-usbpd-logger.3.auto
      D015      S3    *enabled   i2c:i2c-ELAN0000:00
      PENH      S3    *enabled   platform:PRP0001:00
      XHCI      S3    *enabled   pci:0000:00:14.0
      GLAN      S4    *disabled
      WIFI      S3    *disabled  pci:0000:00:14.3
      localhost /sys #
      
      On debugging, I found that its corresponding GPE is not being enabled.
      The particular GPE's "gpe_register_info->enable_for_wake" does not
      have any bits set when acpi_enable_all_wakeup_gpes() comes around to
      use it. I looked at code and could not find any other code path that
      should set the bits in "enable_for_wake" bitmask for the wake enabled
      devices for s2idle.  [I do see that it happens for S3 in
      acpi_sleep_prepare()].
      
      Thus I used the same call to enable the GPEs for wake enabled devices,
      and verified that this fixes the regression I was seeing on multiple
      of my devices.
      
      [ rjw: The problem is that commit f941d3e41da7 ("ACPI: EC / PM:
        Disable non-wakeup GPEs for suspend-to-idle") forgot to add
        the acpi_enable_wakeup_devices() call for s2idle along with
        acpi_enable_all_wakeup_gpes(). ]
      
      Fixes: f941d3e41da7 ("ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203579Signed-off-by: NRajat Jain <rajatja@google.com>
      [ rjw: Subject & changelog ]
      Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      770e46b3
    • A
      userfaultfd: use RCU to free the task struct when fork fails · 8bae4398
      Andrea Arcangeli 提交于
      commit c3f3ce049f7d97cc7ec9c01cb51d9ec74e0f37c2 upstream.
      
      The task structure is freed while get_mem_cgroup_from_mm() holds
      rcu_read_lock() and dereferences mm->owner.
      
        get_mem_cgroup_from_mm()                failing fork()
        ----                                    ---
        task = mm->owner
                                                mm->owner = NULL;
                                                free(task)
        if (task) *task; /* use after free */
      
      The fix consists in freeing the task with RCU also in the fork failure
      case, exactly like it always happens for the regular exit(2) path.  That
      is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm()
      (left side above) effective to avoid a use after free when dereferencing
      the task structure.
      
      An alternate possible fix would be to defer the delivery of the
      userfaultfd contexts to the monitor until after fork() is guaranteed to
      succeed.  Such a change would require more changes because it would
      create a strict ordering dependency where the uffd methods would need to
      be called beyond the last potentially failing branch in order to be
      safe.  This solution as opposed only adds the dependency to common code
      to set mm->owner to NULL and to free the task struct that was pointed by
      mm->owner with RCU, if fork ends up failing.  The userfaultfd methods
      can still be called anywhere during the fork runtime and the monitor
      will keep discarding orphaned "mm" coming from failed forks in userland.
      
      This race condition couldn't trigger if CONFIG_MEMCG was set =n at build
      time.
      
      [aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal]
        Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com
      Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com
      Fixes: 893e26e6 ("userfaultfd: non-cooperative: Add fork() event")
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: Nzhong jiang <zhongjiang@huawei.com>
      Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Cc: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bae4398
    • S
      ocfs2: fix ocfs2 read inode data panic in ocfs2_iget · 3574bc98
      Shuning Zhang 提交于
      commit e091eab028f9253eac5c04f9141bbc9d170acab3 upstream.
      
      In some cases, ocfs2_iget() reads the data of inode, which has been
      deleted for some reason.  That will make the system panic.  So We should
      judge whether this inode has been deleted, and tell the caller that the
      inode is a bad inode.
      
      For example, the ocfs2 is used as the backed of nfs, and the client is
      nfsv3.  This issue can be reproduced by the following steps.
      
      on the nfs server side,
      ..../patha/pathb
      
      Step 1: The process A was scheduled before calling the function fh_verify.
      
      Step 2: The process B is removing the 'pathb', and just completed the call
      to function dput.  Then the dentry of 'pathb' has been deleted from the
      dcache, and all ancestors have been deleted also.  The relationship of
      dentry and inode was deleted through the function hlist_del_init.  The
      following is the call stack.
      dentry_iput->hlist_del_init(&dentry->d_u.d_alias)
      
      At this time, the inode is still in the dcache.
      
      Step 3: The process A call the function ocfs2_get_dentry, which get the
      inode from dcache.  Then the refcount of inode is 1.  The following is the
      call stack.
      nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)
      
      Step 4: Dirty pages are flushed by bdi threads.  So the inode of 'patha'
      is evicted, and this directory was deleted.  But the inode of 'pathb'
      can't be evicted, because the refcount of the inode was 1.
      
      Step 5: The process A keep running, and call the function
      reconnect_path(in exportfs_decode_fh), which call function
      ocfs2_get_parent of ocfs2.  Get the block number of parent
      directory(patha) by the name of ...  Then read the data from disk by the
      block number.  But this inode has been deleted, so the system panic.
      
      Process A                                             Process B
      1. in nfsd3_proc_getacl                   |
      2.                                        |        dput
      3. fh_to_dentry(ocfs2_get_dentry)         |
      4. bdi flush dirty cache                  |
      5. ocfs2_iget                             |
      
      [283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block:
      Invalid dinode #580640: OCFS2_VALID_FL not set
      
      [283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced
      after error
      
      [283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G        W
      4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2
      [283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
      Desktop Reference Platform, BIOS 6.00 09/21/2015
      [283465.549657]  0000000000000000 ffff8800a56fb7b8 ffffffff816e839c
      ffffffffa0514758
      [283465.550392]  000000000008dc20 ffff8800a56fb838 ffffffff816e62d3
      0000000000000008
      [283465.551056]  ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8
      ffff88005df9f000
      [283465.551710] Call Trace:
      [283465.552516]  [<ffffffff816e839c>] dump_stack+0x63/0x81
      [283465.553291]  [<ffffffff816e62d3>] panic+0xcb/0x21b
      [283465.554037]  [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2]
      [283465.554882]  [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2]
      [283465.555768]  [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230
      [ocfs2]
      [283465.556683]  [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2]
      [283465.557408]  [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20
      [ocfs2]
      [283465.557973]  [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60
      [ocfs2]
      [283465.558525]  [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2]
      [283465.559082]  [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2]
      [283465.559622]  [<ffffffff81297c05>] reconnect_path+0xb5/0x300
      [283465.560156]  [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0
      [283465.560708]  [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
      [283465.561262]  [<ffffffff810a8196>] ? prepare_creds+0x26/0x110
      [283465.561932]  [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd]
      [283465.562862]  [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd]
      [283465.563697]  [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd]
      [283465.564510]  [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [283465.565358]  [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30
      [sunrpc]
      [283465.566272]  [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc]
      [283465.567155]  [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc]
      [283465.568020]  [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd]
      [283465.568962]  [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [283465.570112]  [<ffffffff810a622b>] kthread+0xcb/0xf0
      [283465.571099]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
      [283465.572114]  [<ffffffff816f11b8>] ret_from_fork+0x58/0x90
      [283465.573156]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
      
      Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@oracle.comSigned-off-by: NShuning Zhang <sunny.s.zhang@oracle.com>
      Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: piaojun <piaojun@huawei.com>
      Cc: "Gang He" <ghe@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3574bc98
    • M
      hugetlb: use same fault hash key for shared and private mappings · a3ccc156
      Mike Kravetz 提交于
      commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream.
      
      hugetlb uses a fault mutex hash table to prevent page faults of the
      same pages concurrently.  The key for shared and private mappings is
      different.  Shared keys off address_space and file index.  Private keys
      off mm and virtual address.  Consider a private mappings of a populated
      hugetlbfs file.  A fault will map the page from the file and if needed
      do a COW to map a writable page.
      
      Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
      pages.  It uses the address_space file index key.  However, private
      mappings will use a different key and could race with this code to map
      the file page.  This causes problems (BUG) for the page cache remove
      code as it expects the page to be unmapped.  A sample stack is:
      
      page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
      kernel BUG at mm/filemap.c:169!
      ...
      RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
      ...
      Call Trace:
      __delete_from_page_cache+0x39/0x220
      delete_from_page_cache+0x45/0x70
      remove_inode_hugepages+0x13c/0x380
      ? __add_to_page_cache_locked+0x162/0x380
      hugetlbfs_fallocate+0x403/0x540
      ? _cond_resched+0x15/0x30
      ? __inode_security_revalidate+0x5d/0x70
      ? selinux_file_permission+0x100/0x130
      vfs_fallocate+0x13f/0x270
      ksys_fallocate+0x3c/0x80
      __x64_sys_fallocate+0x1a/0x20
      do_syscall_64+0x5b/0x180
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      There seems to be another potential COW issue/race with this approach
      of different private and shared keys as noted in commit 8382d914
      ("mm, hugetlb: improve page-fault scalability").
      
      Since every hugetlb mapping (even anon and private) is actually a file
      mapping, just use the address_space index key for all mappings.  This
      results in potentially more hash collisions.  However, this should not
      be the common case.
      
      Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
      Fixes: b5cec28d ("hugetlbfs: truncate_hugepages() takes a range of pages")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3ccc156
    • K
      mm/hugetlb.c: don't put_page in lock of hugetlb_lock · 0b16b09a
      Kai Shen 提交于
      commit 2bf753e64b4a702e27ce26ff520c59563c62f96b upstream.
      
      spinlock recursion happened when do LTP test:
      #!/bin/bash
      ./runltp -p -f hugetlb &
      ./runltp -p -f hugetlb &
      ./runltp -p -f hugetlb &
      ./runltp -p -f hugetlb &
      ./runltp -p -f hugetlb &
      
      The dtor returned by get_compound_page_dtor in __put_compound_page may be
      the function of free_huge_page which will lock the hugetlb_lock, so don't
      put_page in lock of hugetlb_lock.
      
       BUG: spinlock recursion on CPU#0, hugemmap05/1079
        lock: hugetlb_lock+0x0/0x18, .magic: dead4ead, .owner: hugemmap05/1079, .owner_cpu: 0
       Call trace:
        dump_backtrace+0x0/0x198
        show_stack+0x24/0x30
        dump_stack+0xa4/0xcc
        spin_dump+0x84/0xa8
        do_raw_spin_lock+0xd0/0x108
        _raw_spin_lock+0x20/0x30
        free_huge_page+0x9c/0x260
        __put_compound_page+0x44/0x50
        __put_page+0x2c/0x60
        alloc_surplus_huge_page.constprop.19+0xf0/0x140
        hugetlb_acct_memory+0x104/0x378
        hugetlb_reserve_pages+0xe0/0x250
        hugetlbfs_file_mmap+0xc0/0x140
        mmap_region+0x3e8/0x5b0
        do_mmap+0x280/0x460
        vm_mmap_pgoff+0xf4/0x128
        ksys_mmap_pgoff+0xb4/0x258
        __arm64_sys_mmap+0x34/0x48
        el0_svc_common+0x78/0x130
        el0_svc_handler+0x38/0x78
        el0_svc+0x8/0xc
      
      Link: http://lkml.kernel.org/r/b8ade452-2d6b-0372-32c2-703644032b47@huawei.com
      Fixes: 9980d744 ("mm, hugetlb: get rid of surplus page accounting tricks")
      Signed-off-by: NKai Shen <shenkai8@huawei.com>
      Signed-off-by: NFeilong Lin <linfeilong@huawei.com>
      Reported-by: NWang Wang <wangwang2@huawei.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b16b09a
    • D
      mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses · 58db3813
      Dan Williams 提交于
      commit fce86ff5802bac3a7b19db171aa1949ef9caac31 upstream.
      
      Starting with c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page
      protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls
      pmdp_set_access_flags().  That helper enforces a pmd aligned @address
      argument via VM_BUG_ON() assertion.
      
      Update the implementation to take a 'struct vm_fault' argument directly
      and apply the address alignment fixup internally to fix crash signatures
      like:
      
          kernel BUG at arch/x86/mm/pgtable.c:515!
          invalid opcode: 0000 [#1] SMP NOPTI
          CPU: 51 PID: 43713 Comm: java Tainted: G           OE     4.19.35 #1
          [..]
          RIP: 0010:pmdp_set_access_flags+0x48/0x50
          [..]
          Call Trace:
           vmf_insert_pfn_pmd+0x198/0x350
           dax_iomap_fault+0xe82/0x1190
           ext4_dax_huge_fault+0x103/0x1f0
           ? __switch_to_asm+0x40/0x70
           __handle_mm_fault+0x3f6/0x1370
           ? __switch_to_asm+0x34/0x70
           ? __switch_to_asm+0x40/0x70
           handle_mm_fault+0xda/0x200
           __do_page_fault+0x249/0x4f0
           do_page_fault+0x32/0x110
           ? page_fault+0x8/0x30
           page_fault+0x1e/0x30
      
      Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NPiotr Balcer <piotr.balcer@intel.com>
      Tested-by: NYan Ma <yan.ma@intel.com>
      Tested-by: NPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: NMatthew Wilcox <willy@infradead.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Chandan Rajendra <chandan@linux.ibm.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58db3813
    • J
      mm/mincore.c: make mincore() more conservative · f580a54b
      Jiri Kosina 提交于
      commit 134fca9063ad4851de767d1768180e5dede9a881 upstream.
      
      The semantics of what mincore() considers to be resident is not
      completely clear, but Linux has always (since 2.3.52, which is when
      mincore() was initially done) treated it as "page is available in page
      cache".
      
      That's potentially a problem, as that [in]directly exposes
      meta-information about pagecache / memory mapping state even about
      memory not strictly belonging to the process executing the syscall,
      opening possibilities for sidechannel attacks.
      
      Change the semantics of mincore() so that it only reveals pagecache
      information for non-anonymous mappings that belog to files that the
      calling process could (if it tried to) successfully open for writing;
      otherwise we'd be including shared non-exclusive mappings, which
      
       - is the sidechannel
      
       - is not the usecase for mincore(), as that's primarily used for data,
         not (shared) text
      
      [jkosina@suse.cz: v2]
        Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz
      [mhocko@suse.com: restructure can_do_mincore() conditions]
      Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pmSigned-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJosh Snyder <joshs@netflix.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Originally-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Originally-by: NDominique Martinet <asmadeus@codewreck.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Kevin Easton <kevin@guarana.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Cyril Hrubis <chrubis@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Daniel Gruss <daniel@gruss.cc>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f580a54b
    • O
      crypto: ccree - handle tee fips error during power management resume · 681f3695
      Ofir Drang 提交于
      commit 7138377ce10455b7183c6dde4b2c51b33f464c45 upstream.
      
      in order to support cryptocell tee fips error that may occurs while
      cryptocell ree is suspended, an cc_tee_handle_fips_error  call added
      to the cc_pm_resume function.
      Signed-off-by: NOfir Drang <ofir.drang@arm.com>
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      681f3695
    • O
      crypto: ccree - add function to handle cryptocell tee fips error · 4fb3d87e
      Ofir Drang 提交于
      commit 897ab2316910a66bb048f1c9cefa25e6a592dcd7 upstream.
      
      Adds function that checks if cryptocell tee fips error occurred
      and in such case triggers system error through kernel panic.
      Change fips function to use this new routine.
      Signed-off-by: NOfir Drang <ofir.drang@arm.com>
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fb3d87e
    • O
      crypto: ccree - HOST_POWER_DOWN_EN should be the last CC access during suspend · 65f5c14a
      Ofir Drang 提交于
      commit 3499efbeed39d114873267683b9e776bcb34b058 upstream.
      
      During power management suspend the driver need to prepare the device
      for the power down operation and as a last indication write to the
      HOST_POWER_DOWN_EN register which signals to the hardware that
      The ccree is ready for power down.
      Signed-off-by: NOfir Drang <ofir.drang@arm.com>
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65f5c14a
    • O
      crypto: ccree - pm resume first enable the source clk · 1a4fc3d2
      Ofir Drang 提交于
      commit 7766dd774d80463cec7b81d90c8672af91de2da1 upstream.
      
      On power management resume function first enable the device clk source
      to allow access to the device registers.
      Signed-off-by: NOfir Drang <ofir.drang@arm.com>
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a4fc3d2
    • G
      crypto: ccree - don't map AEAD key and IV on stack · 120ab825
      Gilad Ben-Yossef 提交于
      commit e8662a6a5f8f7f2cadc0edb934aef622d96ac3ee upstream.
      
      The AEAD authenc key and IVs might be passed to us on stack. Copy it to
      a slab buffer before mapping to gurantee proper DMA mapping.
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      120ab825