1. 29 8月, 2019 40 次提交
    • D
      rxrpc: Fix local refcounting · 6d471741
      David Howells 提交于
      [ Upstream commit 68553f1a6f746bf860bce3eb42d78c26a717d9c0 ]
      
      Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called
      on an unbound socket on which rx->local is not yet set.
      
      The following reproduced (includes omitted):
      
      	int main(void)
      	{
      		socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
      		return 0;
      	}
      
      causes the following oops to occur:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000010
      	...
      	RIP: 0010:rxrpc_unuse_local+0x8/0x1b
      	...
      	Call Trace:
      	 rxrpc_release+0x2b5/0x338
      	 __sock_release+0x37/0xa1
      	 sock_close+0x14/0x17
      	 __fput+0x115/0x1e9
      	 task_work_run+0x72/0x98
      	 do_exit+0x51b/0xa7a
      	 ? __context_tracking_exit+0x4e/0x10e
      	 do_group_exit+0xab/0xab
      	 __x64_sys_exit_group+0x14/0x17
      	 do_syscall_64+0x89/0x1d4
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com
      Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Jeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6d471741
    • D
      rxrpc: Fix local endpoint replacement · ce3f9e19
      David Howells 提交于
      [ Upstream commit b00df840fb4004b7087940ac5f68801562d0d2de ]
      
      When a local endpoint (struct rxrpc_local) ceases to be in use by any
      AF_RXRPC sockets, it starts the process of being destroyed, but this
      doesn't cause it to be removed from the namespace endpoint list immediately
      as tearing it down isn't trivial and can't be done in softirq context, so
      it gets deferred.
      
      If a new socket comes along that wants to bind to the same endpoint, a new
      rxrpc_local object will be allocated and rxrpc_lookup_local() will use
      list_replace() to substitute the new one for the old.
      
      Then, when the dying object gets to rxrpc_local_destroyer(), it is removed
      unconditionally from whatever list it is on by calling list_del_init().
      
      However, list_replace() doesn't reset the pointers in the replaced
      list_head and so the list_del_init() will likely corrupt the local
      endpoints list.
      
      Fix this by using list_replace_init() instead.
      
      Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting")
      Reported-by: syzbot+193e29e9387ea5837f1d@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ce3f9e19
    • D
      rxrpc: Fix read-after-free in rxrpc_queue_local() · a05354cb
      David Howells 提交于
      commit 06d9532fa6b34f12a6d75711162d47c17c1add72 upstream.
      
      rxrpc_queue_local() attempts to queue the local endpoint it is given and
      then, if successful, prints a trace line.  The trace line includes the
      current usage count - but we're not allowed to look at the local endpoint
      at this point as we passed our ref on it to the workqueue.
      
      Fix this by reading the usage count before queuing the work item.
      
      Also fix the reading of local->debug_id for trace lines, which must be done
      with the same consideration as reading the usage count.
      
      Fixes: 09d2bf59 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting")
      Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a05354cb
    • D
      rxrpc: Fix local endpoint refcounting · f28023c4
      David Howells 提交于
      commit 730c5fd42c1e3652a065448fd235cb9fafb2bd10 upstream.
      
      The object lifetime management on the rxrpc_local struct is broken in that
      the rxrpc_local_processor() function is expected to clean up and remove an
      object - but it may get requeued by packets coming in on the backing UDP
      socket once it starts running.
      
      This may result in the assertion in rxrpc_local_rcu() firing because the
      memory has been scheduled for RCU destruction whilst still queued:
      
      	rxrpc: Assertion failed
      	------------[ cut here ]------------
      	kernel BUG at net/rxrpc/local_object.c:468!
      
      Note that if the processor comes around before the RCU free function, it
      will just do nothing because ->dead is true.
      
      Fix this by adding a separate refcount to count active users of the
      endpoint that causes the endpoint to be destroyed when it reaches 0.
      
      The original refcount can then be used to refcount objects through the work
      processor and cause the memory to be rcu freed when that reaches 0.
      
      Fixes: 4f95dd78 ("rxrpc: Rework local endpoint management")
      Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f28023c4
    • A
      powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB · 32df8a30
      Alastair D'Silva 提交于
      The upstream commit:
      22e9c88d486a ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
      has a similar effect, but since it is a rewrite of the assembler to C, is
      too invasive for stable. This patch is a minimal fix to address the issue in
      assembler.
      
      This patch applies cleanly to v5.2, v4.19 & v4.14.
      
      When calling flush_(inval_)dcache_range with a size >4GB, we were masking
      off the upper 32 bits, so we would incorrectly flush a range smaller
      than intended.
      
      This patch replaces the 32 bit shifts with 64 bit ones, so that
      the full size is accounted for.
      Signed-off-by: NAlastair D'Silva <alastair@d-silva.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32df8a30
    • D
      dm zoned: fix potential NULL dereference in dmz_do_reclaim() · 0d5e34c1
      Dan Carpenter 提交于
      [ Upstream commit e0702d90b79d430b0ccc276ead4f88440bb51352 ]
      
      This function is supposed to return error pointers so it matches the
      dmz_get_rnd_zone_for_reclaim() function.  The current code could lead to
      a NULL dereference in dmz_do_reclaim()
      
      Fixes: b234c6d7a703 ("dm zoned: improve error handling in reclaim")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0d5e34c1
    • D
      xfs: always rejoin held resources during defer roll · 655bb2c4
      Darrick J. Wong 提交于
      commit 710d707d2fa9cf4c2aa9def129e71e99513466ea upstream.
      
      During testing of xfs/141 on a V4 filesystem, I observed some
      inconsistent behavior with regards to resources that are held (i.e.
      remain locked) across a defer roll.  The transaction roll always gives
      the defer roll function a new transaction, even if committing the old
      transaction fails.  However, the defer roll function only rejoins the
      held resources if the transaction commit succeedied.  This means that
      callers of defer roll have to figure out whether the held resources are
      attached to the transaction being passed back.
      
      Worse yet, if the defer roll was part of a defer finish call, we have a
      third possibility: the defer finish could pass back a dirty transaction
      with dirty held resources and an error code.
      
      The only sane way to handle all of these scenarios is to require that
      the code that held the resource either cancel the transaction before
      unlocking and releasing the resources, or use functions that detach
      resources from a transaction properly (e.g.  xfs_trans_brelse) if they
      need to drop the reference before committing or cancelling the
      transaction.
      
      In order to make this so, change the defer roll code to join held
      resources to the new transaction unconditionally and fix all the bhold
      callers to release the held buffers correctly.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      [mcgrof: fixes kz#204223 ]
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      655bb2c4
    • A
      xfs: Add attibute remove and helper functions · 83a8e6b2
      Allison Henderson 提交于
      commit 068f985a9e5ec70fde58d8f679994fdbbd093a36 upstream.
      
      This patch adds xfs_attr_remove_args. These sub-routines remove
      the attributes specified in @args. We will use this later for setting
      parent pointers as a deferred attribute operation.
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      83a8e6b2
    • A
      xfs: Add attibute set and helper functions · b21ff6cf
      Allison Henderson 提交于
      commit 2f3cd8091963810d85e6a5dd6ed1247e10e9e6f2 upstream.
      
      This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff.
      These sub-routines set the attributes specified in @args.
      We will use this later for setting parent pointers as a deferred
      attribute operation.
      
      [dgc: remove attr fork init code from xfs_attr_set_args().]
      [dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.]
      [dgc: correct sf add error handling.]
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b21ff6cf
    • A
      xfs: Add helper function xfs_attr_try_sf_addname · b3a248f2
      Allison Henderson 提交于
      commit 4c74a56b9de76bb6b581274b76b52535ad77c2a7 upstream.
      
      This patch adds a subroutine xfs_attr_try_sf_addname
      used by xfs_attr_set.  This subrotine will attempt to
      add the attribute name specified in args in shortform,
      as well and perform error handling previously done in
      xfs_attr_set.
      
      This patch helps to pre-simplify xfs_attr_set for reviewing
      purposes and reduce indentation.  New function will be added
      in the next patch.
      
      [dgc: moved commit to helper function, too.]
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b3a248f2
    • A
      xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h · a9912f34
      Allison Henderson 提交于
      commit e2421f0b5ff3ce279573036f5cfcb0ce28b422a9 upstream.
      
      This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
      since xfs_attr.c is in libxfs.  We will need these later in
      xfsprogs.
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a9912f34
    • B
      xfs: don't trip over uninitialized buffer on extent read of corrupted inode · 17c2b7af
      Brian Foster 提交于
      commit 6958d11f77d45db80f7e22a21a74d4d5f44dc667 upstream.
      
      We've had rather rare reports of bmap btree block corruption where
      the bmap root block has a level count of zero. The root cause of the
      corruption is so far unknown. We do have verifier checks to detect
      this form of on-disk corruption, but this doesn't cover a memory
      corruption variant of the problem. The latter is a reasonable
      possibility because the root block is part of the inode fork and can
      reside in-core for some time before inode extents are read.
      
      If this occurs, it leads to a system crash such as the following:
      
       BUG: unable to handle kernel paging request at ffffffff00000221
       PF error: [normal kernel read fault]
       ...
       RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs]
       ...
       Call Trace:
        xfs_iread_extents+0x379/0x540 [xfs]
        xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs]
        ? xfs_attr_get+0xd1/0x120 [xfs]
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        xfs_file_iomap_begin+0x4c4/0x6d0 [xfs]
        ? __vfs_getxattr+0x53/0x70
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        iomap_apply+0x63/0x130
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        iomap_file_buffered_write+0x62/0x90
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs]
        __vfs_write+0x150/0x1b0
        vfs_write+0xba/0x1c0
        ksys_pwrite64+0x64/0xa0
        do_syscall_64+0x5a/0x1d0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The crash occurs because xfs_iread_extents() attempts to release an
      uninitialized buffer pointer as the level == 0 value prevented the
      buffer from ever being allocated or read. Change the level > 0
      assert to an explicit error check in xfs_iread_extents() to avoid
      crashing the kernel in the event of localized, in-core inode
      corruption.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      17c2b7af
    • D
      xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT · 11f85d4d
      Darrick J. Wong 提交于
      commit 1fb254aa983bf190cfd685d40c64a480a9bafaee upstream.
      
      Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
      fails on account of being out of disk quota.  I ran his reproducer
      script:
      
      # adduser dummy
      # adduser dummy plugdev
      
      # dd if=/dev/zero bs=1M count=100 of=test.img
      # mkfs.xfs test.img
      # mount -t xfs -o gquota test.img /mnt
      # mkdir -p /mnt/dummy
      # chown -c dummy /mnt/dummy
      # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
      
      (and then as user dummy)
      
      $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
      $ chgrp plugdev /mnt/dummy/foo
      
      and saw:
      
      ================================================
      WARNING: lock held when returning to user space!
      5.3.0-rc5 #rc5 Tainted: G        W
      ------------------------------------------------
      chgrp/47006 is leaving the kernel with locks still held!
      1 lock held by chgrp/47006:
       #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
      
      ...which is clearly caused by xfs_setattr_nonsize failing to unlock the
      ILOCK after the xfs_qm_vop_chown_reserve call fails.  Add the missing
      unlock.
      
      Reported-by: benjamin.moody@gmail.com
      Fixes: 253f4911 ("xfs: better xfs_trans_alloc interface")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NSalvatore Bonaccorso <carnil@debian.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11f85d4d
    • H
      mm/zsmalloc.c: fix race condition in zs_destroy_pool · ed11e600
      Henry Burns 提交于
      commit 701d678599d0c1623aaf4139c03eea260a75b027 upstream.
      
      In zs_destroy_pool() we call flush_work(&pool->free_work).  However, we
      have no guarantee that migration isn't happening in the background at
      that time.
      
      Since migration can't directly free pages, it relies on free_work being
      scheduled to free the pages.  But there's nothing preventing an
      in-progress migrate from queuing the work *after*
      zs_unregister_migration() has called flush_work().  Which would mean
      pages still pointing at the inode when we free it.
      
      Since we know at destroy time all objects should be free, no new
      migrations can come in (since zs_page_isolate() fails for fully-free
      zspages).  This means it is sufficient to track a "# isolated zspages"
      count by class, and have the destroy logic ensure all such pages have
      drained before proceeding.  Keeping that state under the class spinlock
      keeps the logic straightforward.
      
      In this case a memory leak could lead to an eventual crash if compaction
      hits the leaked page.  This crash would only occur if people are
      changing their zswap backend at runtime (which eventually starts
      destruction).
      
      Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
      Fixes: 48b4800a ("zsmalloc: page migration support")
      Signed-off-by: NHenry Burns <henryburns@google.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Henry Burns <henrywolfeburns@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Jonathan Adams <jwadams@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed11e600
    • H
      mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely · b30a2f60
      Henry Burns 提交于
      commit 1a87aa03597efa9641e92875b883c94c7f872ccb upstream.
      
      In zs_page_migrate() we call putback_zspage() after we have finished
      migrating all pages in this zspage.  However, the return value is
      ignored.  If a zs_free() races in between zs_page_isolate() and
      zs_page_migrate(), freeing the last object in the zspage,
      putback_zspage() will leave the page in ZS_EMPTY for potentially an
      unbounded amount of time.
      
      To fix this, we need to do the same thing as zs_page_putback() does:
      schedule free_work to occur.
      
      To avoid duplicated code, move the sequence to a new
      putback_zspage_deferred() function which both zs_page_migrate() and
      zs_page_putback() call.
      
      Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
      Fixes: 48b4800a ("zsmalloc: page migration support")
      Signed-off-by: NHenry Burns <henryburns@google.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Henry Burns <henrywolfeburns@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Jonathan Adams <jwadams@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b30a2f60
    • V
      mm, page_owner: handle THP splits correctly · db67ac03
      Vlastimil Babka 提交于
      commit f7da677bc6e72033f0981b9d58b5c5d409fa641e upstream.
      
      THP splitting path is missing the split_page_owner() call that
      split_page() has.
      
      As a result, split THP pages are wrongly reported in the page_owner file
      as order-9 pages.  Furthermore when the former head page is freed, the
      remaining former tail pages are not listed in the page_owner file at
      all.  This patch fixes that by adding the split_page_owner() call into
      __split_huge_page().
      
      Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
      Fixes: a9627bc5 ("mm/page_owner: introduce split_page_owner and replace manual handling")
      Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db67ac03
    • M
      genirq: Properly pair kobject_del() with kobject_add() · 42731def
      Michael Kelley 提交于
      commit d0ff14fdc987303aeeb7de6f1bd72c3749ae2a9b upstream.
      
      If alloc_descs() fails before irq_sysfs_init() has run, free_desc() in the
      cleanup path will call kobject_del() even though the kobject has not been
      added with kobject_add().
      
      Fix this by making the call to kobject_del() conditional on whether
      irq_sysfs_init() has run.
      
      This problem surfaced because commit aa30f47cf666 ("kobject: Add support
      for default attribute groups to kobj_type") makes kobject_del() stricter
      about pairing with kobject_add(). If the pairing is incorrrect, a WARNING
      and backtrace occur in sysfs_remove_group() because there is no parent.
      
      [ tglx: Add a comment to the code and make it work with CONFIG_SYSFS=n ]
      
      Fixes: ecb3f394 ("genirq: Expose interrupt information through sysfs")
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1564703564-4116-1-git-send-email-mikelley@microsoft.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42731def
    • D
      dm zoned: properly handle backing device failure · c14fe4e8
      Dmitry Fomichev 提交于
      commit 75d66ffb48efb30f2dd42f041ba8b39c5b2bd115 upstream.
      
      dm-zoned is observed to lock up or livelock in case of hardware
      failure or some misconfiguration of the backing zoned device.
      
      This patch adds a new dm-zoned target function that checks the status of
      the backing device. If the request queue of the backing device is found
      to be in dying state or the SCSI backing device enters offline state,
      the health check code sets a dm-zoned target flag prompting all further
      incoming I/O to be rejected. In order to detect backing device failures
      timely, this new function is called in the request mapping path, at the
      beginning of every reclaim run and before performing any metadata I/O.
      
      The proper way out of this situation is to do
      
      dmsetup remove <dm-zoned target>
      
      and recreate the target when the problem with the backing device
      is resolved.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c14fe4e8
    • D
      dm zoned: improve error handling in i/o map code · 4530f2f1
      Dmitry Fomichev 提交于
      commit d7428c50118e739e672656c28d2b26b09375d4e0 upstream.
      
      Some errors are ignored in the I/O path during queueing chunks
      for processing by chunk works. Since at least these errors are
      transient in nature, it should be possible to retry the failed
      incoming commands.
      
      The fix -
      
      Errors that can happen while queueing chunks are carried upwards
      to the main mapping function and it now returns DM_MAPIO_REQUEUE
      for any incoming requests that can not be properly queued.
      
      Error logging/debug messages are added where needed.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4530f2f1
    • D
      dm zoned: improve error handling in reclaim · 8b7c17bb
      Dmitry Fomichev 提交于
      commit b234c6d7a703661b5045c5bf569b7c99d2edbf88 upstream.
      
      There are several places in reclaim code where errors are not
      propagated to the main function, dmz_reclaim(). This function
      is responsible for unlocking zones that might be still locked
      at the end of any failed reclaim iterations. As the result,
      some device zones may be left permanently locked for reclaim,
      degrading target's capability to reclaim zones.
      
      This patch fixes these issues as follows -
      
      Make sure that dmz_reclaim_buf(), dmz_reclaim_seq_data() and
      dmz_reclaim_rnd_data() return error codes to the caller.
      
      dmz_reclaim() function is renamed to dmz_do_reclaim() to avoid
      clashing with "struct dmz_reclaim" and is modified to return the
      error to the caller.
      
      dmz_get_zone_for_reclaim() now returns an error instead of NULL
      pointer and reclaim code checks for that error.
      
      Error logging/debug messages are added where necessary.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b7c17bb
    • M
      dm table: fix invalid memory accesses with too high sector number · ded8e524
      Mikulas Patocka 提交于
      commit 1cfd5d3399e87167b7f9157ef99daa0e959f395d upstream.
      
      If the sector number is too high, dm_table_find_target() should return a
      pointer to a zeroed dm_target structure (the caller should test it with
      dm_target_is_valid).
      
      However, for some table sizes, the code in dm_table_find_target() that
      performs btree lookup will access out of bound memory structures.
      
      Fix this bug by testing the sector number at the beginning of
      dm_table_find_target(). Also, add an "inline" keyword to the function
      dm_table_get_size() because this is a hot path.
      
      Fixes: 512875bd ("dm: table detect io beyond device")
      Cc: stable@vger.kernel.org
      Reported-by: NZhang Tao <kontais@zoho.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ded8e524
    • Z
      dm space map metadata: fix missing store of apply_bops() return value · 53e73d10
      ZhangXiaoxu 提交于
      commit ae148243d3f0816b37477106c05a2ec7d5f32614 upstream.
      
      In commit 6096d91a ("dm space map metadata: fix occasional leak
      of a metadata block on resize"), we refactor the commit logic to a new
      function 'apply_bops'.  But when that logic was replaced in out() the
      return value was not stored.  This may lead out() returning a wrong
      value to the caller.
      
      Fixes: 6096d91a ("dm space map metadata: fix occasional leak of a metadata block on resize")
      Cc: stable@vger.kernel.org
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53e73d10
    • W
      dm raid: add missing cleanup in raid_ctr() · 2cff6c87
      Wenwen Wang 提交于
      commit dc1a3e8e0cc6b2293b48c044710e63395aeb4fb4 upstream.
      
      If rs_prepare_reshape() fails, no cleanup is executed, leading to
      leak of the raid_set structure allocated at the beginning of
      raid_ctr(). To fix this issue, go to the label 'bad' if the error
      occurs.
      
      Fixes: 11e47232 ("dm raid: stop keeping raid set frozen altogether")
      Cc: stable@vger.kernel.org
      Signed-off-by: NWenwen Wang <wenwen@cs.uga.edu>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cff6c87
    • M
      dm integrity: fix a crash due to BUG_ON in __journal_read_write() · 795b0572
      Mikulas Patocka 提交于
      commit 5729b6e5a1bcb0bbc28abe82d749c7392f66d2c7 upstream.
      
      Fix a crash that was introduced by the commit 724376a0. The crash is
      reported here: https://gitlab.com/cryptsetup/cryptsetup/issues/468
      
      When reading from the integrity device, the function
      dm_integrity_map_continue calls find_journal_node to find out if the
      location to read is present in the journal. Then, it calculates how many
      sectors are consecutively stored in the journal. Then, it locks the range
      with add_new_range and wait_and_add_new_range.
      
      The problem is that during wait_and_add_new_range, we hold no locks (we
      don't hold ic->endio_wait.lock and we don't hold a range lock), so the
      journal may change arbitrarily while wait_and_add_new_range sleeps.
      
      The code then goes to __journal_read_write and hits
      BUG_ON(journal_entry_get_sector(je) != logical_sector); because the
      journal has changed.
      
      In order to fix this bug, we need to re-check the journal location after
      wait_and_add_new_range. We restrict the length to one block in order to
      not complicate the code too much.
      
      Fixes: 724376a0 ("dm integrity: implement fair range locks")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      795b0572
    • Z
      dm btree: fix order of block initialization in btree_split_beneath · 8114012d
      ZhangXiaoxu 提交于
      commit e4f9d6013820d1eba1432d51dd1c5795759aa77f upstream.
      
      When btree_split_beneath() splits a node to two new children, it will
      allocate two blocks: left and right.  If right block's allocation
      failed, the left block will be unlocked and marked dirty.  If this
      happened, the left block'ss content is zero, because it wasn't
      initialized with the btree struct before the attempot to allocate the
      right block.  Upon return, when flushing the left block to disk, the
      validator will fail when check this block.  Then a BUG_ON is raised.
      
      Fix this by completely initializing the left block before allocating and
      initializing the right block.
      
      Fixes: 4dcb8b57 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
      Cc: stable@vger.kernel.org
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8114012d
    • D
      dm kcopyd: always complete failed jobs · e0fb8135
      Dmitry Fomichev 提交于
      commit d1fef41465f0e8cae0693fb184caa6bfafb6cd16 upstream.
      
      This patch fixes a problem in dm-kcopyd that may leave jobs in
      complete queue indefinitely in the event of backing storage failure.
      
      This behavior has been observed while running 100% write file fio
      workload against an XFS volume created on top of a dm-zoned target
      device. If the underlying storage of dm-zoned goes to offline state
      under I/O, kcopyd sometimes never issues the end copy callback and
      dm-zoned reclaim work hangs indefinitely waiting for that completion.
      
      This behavior was traced down to the error handling code in
      process_jobs() function that places the failed job to complete_jobs
      queue, but doesn't wake up the job handler. In case of backing device
      failure, all outstanding jobs may end up going to complete_jobs queue
      via this code path and then stay there forever because there are no
      more successful I/O jobs to wake up the job handler.
      
      This patch adds a wake() call to always wake up kcopyd job wait queue
      for all I/O jobs that fail before dm_io() gets called for that job.
      
      The patch also sets the write error status in all sub jobs that are
      failed because their master job has failed.
      
      Fixes: b73c67c2 ("dm kcopyd: add sequential write feature")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0fb8135
    • J
      x86/boot: Fix boot regression caused by bootparam sanitizing · f7d157f3
      John Hubbard 提交于
      commit 7846f58fba964af7cb8cf77d4d13c33254725211 upstream.
      
      commit a90118c445cc ("x86/boot: Save fields explicitly, zero out everything
      else") had two errors:
      
          * It preserved boot_params.acpi_rsdp_addr, and
          * It failed to preserve boot_params.hdr
      
      Therefore, zero out acpi_rsdp_addr, and preserve hdr.
      
      Fixes: a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else")
      Reported-by: NNeil MacLeod <neil@nmacleod.com>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NNeil MacLeod <neil@nmacleod.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190821192513.20126-1-jhubbard@nvidia.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7d157f3
    • J
      x86/boot: Save fields explicitly, zero out everything else · d9556011
      John Hubbard 提交于
      commit a90118c445cc7f07781de26a9684d4ec58bfcfd1 upstream.
      
      Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds
      memset, if the memset goes accross several fields of a struct. This
      generated a couple of warnings on x86_64 builds in sanitize_boot_params().
      
      Fix this by explicitly saving the fields in struct boot_params
      that are intended to be preserved, and zeroing all the rest.
      
      [ tglx: Tagged for stable as it breaks the warning free build there as well ]
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9556011
    • T
      x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h · e063b03b
      Tom Lendacky 提交于
      commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24 upstream.
      
      There have been reports of RDRAND issues after resuming from suspend on
      some AMD family 15h and family 16h systems. This issue stems from a BIOS
      not performing the proper steps during resume to ensure RDRAND continues
      to function properly.
      
      RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
      reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
      support using CPUID, including the kernel, will believe that RDRAND is
      not supported.
      
      Update the CPU initialization to clear the RDRAND CPUID bit for any family
      15h and 16h processor that supports RDRAND. If it is known that the family
      15h or family 16h system does not have an RDRAND resume issue or that the
      system will not be placed in suspend, the "rdrand=force" kernel parameter
      can be used to stop the clearing of the RDRAND CPUID bit.
      
      Additionally, update the suspend and resume path to save and restore the
      MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
      place after resuming from suspend.
      
      Note, that clearing the RDRAND CPUID bit does not prevent a processor
      that normally supports the RDRAND instruction from executing it. So any
      code that determined the support based on family and model won't #UD.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chen Yu <yu.c.chen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
      Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "x86@kernel.org" <x86@kernel.org>
      Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e063b03b
    • T
      x86/apic: Handle missing global clockevent gracefully · 685e598e
      Thomas Gleixner 提交于
      commit f897e60a12f0b9146357780d317879bce2a877dc upstream.
      
      Some newer machines do not advertise legacy timers. The kernel can handle
      that situation if the TSC and the CPU frequency are enumerated by CPUID or
      MSRs and the CPU supports TSC deadline timer. If the CPU does not support
      TSC deadline timer the local APIC timer frequency has to be known as well.
      
      Some Ryzens machines do not advertize legacy timers, but there is no
      reliable way to determine the bus frequency which feeds the local APIC
      timer when the machine allows overclocking of that frequency.
      
      As there is no legacy timer the local APIC timer calibration crashes due to
      a NULL pointer dereference when accessing the not installed global clock
      event device.
      
      Switch the calibration loop to a non interrupt based one, which polls
      either TSC (if frequency is known) or jiffies. The latter requires a global
      clockevent. As the machines which do not have a global clockevent installed
      have a known TSC frequency this is a non issue. For older machines where
      TSC frequency is not known, there is no known case where the legacy timers
      do not exist as that would have been reported long ago.
      Reported-by: NDaniel Drake <drake@endlessm.com>
      Reported-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NDaniel Drake <drake@endlessm.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de
      Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      685e598e
    • S
      x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 · f9747104
      Sean Christopherson 提交于
      commit b63f20a778c88b6a04458ed6ffc69da953d3a109 upstream.
      
      Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
      avoid clobbering flags.
      
      KVM's emulator makes indirect calls into a jump table of sorts, where
      the destination of the CALL_NOSPEC is a small blob of code that performs
      fast emulation by executing the target instruction with fixed operands.
      
        adcb_al_dl:
           0x000339f8 <+0>:   adc    %dl,%al
           0x000339fa <+2>:   ret
      
      A major motiviation for doing fast emulation is to leverage the CPU to
      handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
      both an input and output to the target of CALL_NOSPEC.  Clobbering flags
      results in all sorts of incorrect emulation, e.g. Jcc instructions often
      take the wrong path.  Sans the nops...
      
        asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
           0x0003595a <+58>:  mov    0xc0(%ebx),%eax
           0x00035960 <+64>:  mov    0x60(%ebx),%edx
           0x00035963 <+67>:  mov    0x90(%ebx),%ecx
           0x00035969 <+73>:  push   %edi
           0x0003596a <+74>:  popf
           0x0003596b <+75>:  call   *%esi
           0x000359a0 <+128>: pushf
           0x000359a1 <+129>: pop    %edi
           0x000359a2 <+130>: mov    %eax,0xc0(%ebx)
           0x000359b1 <+145>: mov    %edx,0x60(%ebx)
      
        ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
           0x000359a8 <+136>: mov    -0x10(%ebp),%eax
           0x000359ab <+139>: and    $0x8d5,%edi
           0x000359b4 <+148>: and    $0xfffff72a,%eax
           0x000359b9 <+153>: or     %eax,%edi
           0x000359bd <+157>: mov    %edi,0x4(%ebx)
      
      For the most part this has gone unnoticed as emulation of guest code
      that can trigger fast emulation is effectively limited to MMIO when
      running on modern hardware, and MMIO is rarely, if ever, accessed by
      instructions that affect or consume flags.
      
      Breakage is almost instantaneous when running with unrestricted guest
      disabled, in which case KVM must emulate all instructions when the guest
      has invalid state, e.g. when the guest is in Big Real Mode during early
      BIOS.
      
      Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
      Fixes: 1a29b5b7 ("KVM: x86: Make indirect calls in emulator speculation safe")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9747104
    • O
      userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx · cf13e30c
      Oleg Nesterov 提交于
      commit 46d0b24c5ee10a15dfb25e20642f5a5ed59c5003 upstream.
      
      userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
      mm->core_state != NULL.
      
      Otherwise a page fault can see userfaultfd_missing() == T and use an
      already freed userfaultfd_ctx.
      
      Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
      Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf13e30c
    • D
      Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAE · a6f236e1
      Dexuan Cui 提交于
      commit a9fc4340aee041dd186d1fb8f1b5d1e9caf28212 upstream.
      
      In the case of X86_PAE, unsigned long is u32, but the physical address type
      should be u64. Due to the bug here, the netvsc driver can not load
      successfully, and sometimes the VM can panic due to memory corruption (the
      hypervisor writes data to the wrong location).
      
      Fixes: 6ba34171 ("Drivers: hv: vmbus: Remove use of slow_virt_to_phys()")
      Cc: stable@vger.kernel.org
      Cc: Michael Kelley <mikelley@microsoft.com>
      Reported-and-tested-by: NJuliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6f236e1
    • B
      gpiolib: never report open-drain/source lines as 'input' to user-space · 3783c7ee
      Bartosz Golaszewski 提交于
      commit 2c60e6b5c9241b24b8b523fefd3e44fb85622cda upstream.
      
      If the driver doesn't support open-drain/source config options, we
      emulate this behavior when setting the direction by calling
      gpiod_direction_input() if the default value is 0 (open-source) or
      1 (open-drain), thus not actively driving the line in those cases.
      
      This however clears the FLAG_IS_OUT bit for the GPIO line descriptor
      and makes the LINEINFO ioctl() incorrectly report this line's mode as
      'input' to user-space.
      
      This commit modifies the ioctl() to always set the GPIOLINE_FLAG_IS_OUT
      bit in the lineinfo structure's flags field. Since it's impossible to
      use the input mode and open-drain/source options at the same time, we
      can be sure the reported information will be correct.
      
      Fixes: 521a2ad6 ("gpio: add userspace ABI for GPIO line information")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Link: https://lore.kernel.org/r/20190806114151.17652-1-brgl@bgdev.plSigned-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3783c7ee
    • L
      drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX · f88c31b4
      Lyude Paul 提交于
      commit c358ebf59634f06d8ed176da651ec150df3c8686 upstream.
      
      While I had thought I had fixed this issue in:
      
      commit 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after
      ->fini()")
      
      It turns out that while I did fix the error messages I was seeing on my
      P50 when trying to access i2c busses with the GPU in runtime suspend, I
      accidentally had missed one important detail that was mentioned on the
      bug report this commit was supposed to fix: that the CPU would only lock
      up when trying to access i2c busses _on connected devices_ _while the
      GPU is not in runtime suspend_. Whoops. That definitely explains why I
      was not able to get my machine to hang with i2c bus interactions until
      now, as plugging my P50 into it's dock with an HDMI monitor connected
      allowed me to finally reproduce this locally.
      
      Now that I have managed to reproduce this issue properly, it looks like
      the problem is much simpler then it looks. It turns out that some
      connected devices, such as MST laptop docks, will actually ACK i2c reads
      even if no data was actually read:
      
      [  275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
      [  275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
      [  275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
      [  275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
      [  275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
      [  275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
      
      Because we don't handle the situation of i2c ack without any data, we
      end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
      value of cnt always remains at 0. This finally properly explains how
      this could result in a CPU hang like the ones observed in the
      aforementioned commit.
      
      So, fix this by retrying transactions if no data is written or received,
      and give up and fail the transaction if we continue to not write or
      receive any data after 32 retries.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f88c31b4
    • I
      libceph: fix PG split vs OSD (re)connect race · 51f6afdd
      Ilya Dryomov 提交于
      commit a561372405cf6bc6f14239b3a9e57bb39f2788b0 upstream.
      
      We can't rely on ->peer_features in calc_target() because it may be
      called both when the OSD session is established and open and when it's
      not.  ->peer_features is not valid unless the OSD session is open.  If
      this happens on a PG split (pg_num increase), that could mean we don't
      resend a request that should have been resent, hanging the client
      indefinitely.
      
      In userspace this was fixed by looking at require_osd_release and
      get_xinfo[osd].features fields of the osdmap.  However these fields
      belong to the OSD section of the osdmap, which the kernel doesn't
      decode (only the client section is decoded).
      
      Instead, let's drop this feature check.  It effectively checks for
      luminous, so only pre-luminous OSDs would be affected in that on a PG
      split the kernel might resend a request that should not have been
      resent.  Duplicates can occur in other scenarios, so both sides should
      already be prepared for them: see dup/replay logic on the OSD side and
      retry_attempt check on the client side.
      
      Cc: stable@vger.kernel.org
      Fixes: 7de030d6 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
      Link: https://tracker.ceph.com/issues/41162Reported-by: NJerry Lee <leisurelysw24@gmail.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Tested-by: NJerry Lee <leisurelysw24@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51f6afdd
    • J
      ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply · f2951720
      Jeff Layton 提交于
      commit 28a282616f56990547b9dcd5c6fbd2001344664c upstream.
      
      When ceph_mdsc_do_request returns an error, we can't assume that the
      filelock_reply pointer will be set. Only try to fetch fields out of
      the r_reply_info when it returns success.
      
      Cc: stable@vger.kernel.org
      Reported-by: NHector Martin <hector@marcansoft.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2951720
    • E
      ceph: clear page dirty before invalidate page · 7bed2889
      Erqi Chen 提交于
      commit c95f1c5f436badb9bb87e9b30fd573f6b3d59423 upstream.
      
      clear_page_dirty_for_io(page) before mapping->a_ops->invalidatepage().
      invalidatepage() clears page's private flag, if dirty flag is not
      cleared, the page may cause BUG_ON failure in ceph_set_page_dirty().
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/40862Signed-off-by: NErqi Chen <chenerqi@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bed2889
    • D
      clk: socfpga: stratix10: fix rate caclulationg for cnt_clks · a8f7703f
      Dinh Nguyen 提交于
      commit c7ec75ea4d5316518adc87224e3cff47192579e7 upstream.
      
      Checking bypass_reg is incorrect for calculating the cnt_clk rates.
      Instead we should be checking that there is a proper hardware register
      that holds the clock divider.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDinh Nguyen <dinguyen@kernel.org>
      Link: https://lkml.kernel.org/r/20190814153014.12962-1-dinguyen@kernel.orgSigned-off-by: NStephen Boyd <sboyd@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8f7703f
    • M
      Revert "dm bufio: fix deadlock with loop device" · b608a5a2
      Mikulas Patocka 提交于
      commit cf3591ef832915892f2499b7e54b51d4c578b28c upstream.
      
      Revert the commit bd293d071ffe65e645b4d8104f9d8fe15ea13862. The proper
      fix has been made available with commit d0a255e795ab ("loop: set
      PF_MEMALLOC_NOIO for the worker thread").
      
      Note that the fix offered by commit bd293d071ffe doesn't really prevent
      the deadlock from occuring - if we look at the stacktrace reported by
      Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
      i.e. it has already successfully taken the mutex. Changing the mutex
      from mutex_lock to mutex_trylock won't help with deadlocks that happen
      afterwards.
      
      PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
         #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
         #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
         #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
         #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
         #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
         #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
         #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
         #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
         #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
         #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
        #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
        #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
        #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
        #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
        #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device")
      Depends-on: d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread")
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b608a5a2