提交 · 5a136b4ae327e7f6be9c984a010df8d7ea5a4f83 · openeuler / Kernel

28 6月, 2019 1 次提交

mm/hmm: Fix error flows in hmm_invalidate_range_start · 5a136b4a

由 Jason Gunthorpe 提交于 6月 07, 2019

If the trylock on the hmm->mirrors_sem fails the function will return
without decrementing the notifiers that were previously incremented. Since
the caller will not call invalidate_range_end() on EAGAIN this will result
in notifiers becoming permanently incremented and deadlock.

If the sync_cpu_device_pagetables() required blocking the function will
not return EAGAIN even though the device continues to touch the
pages. This is a violation of the mmu notifier contract.

Switch, and rename, the ranges_lock to a spin lock so we can reliably
obtain it without blocking during error unwind.

The error unwind is necessary since the notifiers count must be held
incremented across the call to sync_cpu_device_pagetables() as we cannot
allow the range to become marked valid by a parallel
invalidate_start/end() pair while doing sync_cpu_device_pagetables().
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

5a136b4a

25 6月, 2019 3 次提交

mm/hmm: Remove confusing comment and logic from hmm_release · 14331726

由 Jason Gunthorpe 提交于 5月 24, 2019

hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.

This fixes a use after-free race of the form:

       CPU0                                   CPU1
                                           hmm_release()
                                             up_write(&hmm->mirrors_sem);
 hmm_mirror_unregister(mirror)
  down_write(&hmm->mirrors_sem);
  up_write(&hmm->mirrors_sem);
  kfree(mirror)
                                             mirror->ops->release(mirror)

The only user we have today for ops->release is an empty function, so this
is unambiguously safe.

As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

14331726

mm/hmm: Poison hmm_range during unregister · 2dcc3eb8

由 Jason Gunthorpe 提交于 5月 23, 2019

Trying to misuse a range outside its lifetime is a kernel bug. Use poison
bytes to help detect this condition. Double unregister will reliably crash.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Acked-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

2dcc3eb8

mm/hmm: Remove racy protection against double-unregistration · 187229c2

由 Jason Gunthorpe 提交于 5月 23, 2019

No other register/unregister kernel API attempts to provide this kind of
protection as it is inherently racy, so just drop it.

Callers should provide their own protection, and it appears nouveau
already does.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

187229c2

18 6月, 2019 5 次提交

mm/hmm: Use lockdep instead of comments · 8a1a0cd0

由 Jason Gunthorpe 提交于 5月 23, 2019

So we can check locking at runtime.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Acked-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

8a1a0cd0

mm/hmm: Hold on to the mmget for the lifetime of the range · 47f24598

由 Jason Gunthorpe 提交于 5月 23, 2019

Range functions like hmm_range_snapshot() and hmm_range_fault() call
find_vma, which requires hodling the mmget() and the mmap_sem for the mm.

Make this simpler for the callers by holding the mmget() inside the range
for the lifetime of the range. Other functions that accept a range should
only be called if the range is registered.

This has the side effect of directly preventing hmm_release() from
happening while a range is registered. That means range->dead cannot be
false during the lifetime of the range, so remove dead and
hmm_mirror_mm_is_alive() entirely.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

47f24598

mm/hmm: Do not use list*_rcu() for hmm->ranges · 157816f3

由 Jason Gunthorpe 提交于 5月 23, 2019

This list is always read and written while holding hmm->lock so there is
no need for the confusing _rcu annotations.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Acked-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NIra Weiny <iweiny@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

157816f3

mm/hmm: Remove duplicate condition test before wait_event_timeout · 378a6040

由 Jason Gunthorpe 提交于 5月 23, 2019

The wait_event_timeout macro already tests the condition as its first
action, so there is no reason to open code another version of this, all
that does is skip the might_sleep() debugging in common cases, which is
not helpful.

Further, based on prior patches, we can now simplify the required condition
test:
 - If range is valid memory then so is range->hmm
 - If hmm_release() has run then range->valid is set to false
   at the same time as dead, so no reason to check both.
 - A valid hmm has a valid hmm->mm.

Allowing the return value of wait_event_timeout() (along with its internal
barriers) to compute the result of the function.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

378a6040

mm/hmm: Simplify hmm_get_or_create and make it reliable · 8a9320b7

由 Jason Gunthorpe 提交于 5月 23, 2019

As coded this function can false-fail in various racy situations. Make it
reliable and simpler by running under the write side of the mmap_sem and
avoiding the false-failing compare/exchange pattern. Due to the mmap_sem
this no longer has to avoid racing with a 2nd parallel
hmm_get_or_create().

Unfortunately this still has to use the page_table_lock as the
non-sleeping lock protecting mm->hmm, since the contexts where we free the
hmm are incompatible with mmap_sem.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

8a9320b7

10 6月, 2019 2 次提交

mm/hmm: Hold a mmgrab from hmm to mm · c8a53b2d

由 Jason Gunthorpe 提交于 5月 23, 2019

So long as a struct hmm pointer exists, so should the struct mm it is
linked too. Hold the mmgrab() as soon as a hmm is created, and mmdrop() it
once the hmm refcount goes to zero.

Since mmdrop() (ie a 0 kref on struct mm) is now impossible with a !NULL
mm->hmm delete the hmm_hmm_destroy().
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

c8a53b2d

mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_register · e36acfe6

由 Jason Gunthorpe 提交于 5月 23, 2019

Ralph observes that hmm_range_register() can only be called by a driver
while a mirror is registered. Make this clear in the API by passing in the
mirror structure as a parameter.

This also simplifies understanding the lifetime model for struct hmm, as
the hmm pointer must be valid as part of a registered mirror so all we
need in hmm_register_range() is a simple kref_get.
Suggested-by: NRalph Campbell <rcampbell@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

e36acfe6

07 6月, 2019 6 次提交

mm/hmm: fix use after free with struct hmm in the mmu notifiers · 6d7c3cde

由 Jason Gunthorpe 提交于 5月 22, 2019

mmu_notifier_unregister_no_release() is not a fence and the mmu_notifier
system will continue to reference hmm->mn until the srcu grace period
expires.

Resulting in use after free races like this:

         CPU0                                     CPU1
                                               __mmu_notifier_invalidate_range_start()
                                                 srcu_read_lock
                                                 hlist_for_each ()
                                                   // mn == hmm->mn
hmm_mirror_unregister()
  hmm_put()
    hmm_free()
      mmu_notifier_unregister_no_release()
         hlist_del_init_rcu(hmm-mn->list)
			                           mn->ops->invalidate_range_start(mn, range);
					             mm_get_hmm()
      mm->hmm = NULL;
      kfree(hmm)
                                                     mutex_lock(&hmm->lock);

Use SRCU to kfree the hmm memory so that the notifiers can rely on hmm
existing. Get the now-safe hmm struct through container_of and directly
check kref_get_unless_zero to lock it against free.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NPhilip Yang <Philip.Yang@amd.com>

6d7c3cde

mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking · 9b1ae605

由 Kuehling, Felix 提交于 5月 10, 2019

Don't set this flag by default in hmm_vma_do_fault. It is set
conditionally just a few lines below. Setting it unconditionally can lead
to handle_mm_fault doing a non-blocking fault, returning -EBUSY and
unlocking mmap_sem unexpectedly.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9b1ae605

mm/hmm: support automatic NUMA balancing · 789c2af8

由 Philip Yang 提交于 5月 23, 2019

While the page is migrating by NUMA balancing, HMM failed to detect this
condition and still return the old page. Application will use the new page
migrated, but driver pass the old page physical address to GPU, this crash
the application later.

Use pte_protnone(pte) to return this condition and then hmm_vma_do_fault
will allocate new page.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

789c2af8

mm/hmm: clean up some coding style and comments · 085ea250

由 Ralph Campbell 提交于 5月 06, 2019

There are no functional changes, just some coding style clean ups and
minor comment changes.

Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

085ea250

mm/hmm: update HMM documentation · 2076e5c0

由 Ralph Campbell 提交于 5月 06, 2019

Update the HMM documentation to reflect the latest API and make a few
minor wording changes.

Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

2076e5c0

mm/hmm.c: suppress compilation warnings when CONFIG_HUGETLB_PAGE is not set · 1c2308f0

由 Jason Gunthorpe 提交于 5月 27, 2019

gcc reports that several variables are defined but not used.

For the first hunk CONFIG_HUGETLB_PAGE the entire if block is already
protected by pud_huge() which is forced to 0.  None of the stuff under the
ifdef causes compilation problems as it is already stubbed out in the
header files.

For the second hunk the dummy huge_page_shift macro doesn't touch the
argument, so just inline the argument.

Link: http://lkml.kernel.org/r/20190522195151.GA23955@ziepe.caSigned-off-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

1c2308f0

03 6月, 2019 13 次提交

L

Linux 5.2-rc3 · f2c7c76c
由 Linus Torvalds 提交于 6月 02, 2019

f2c7c76c

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7bd1d5ed

由 Linus Torvalds 提交于 6月 02, 2019

Pull x86 fixes from Ingo Molnar:
 "Two fixes: a quirk for KVM guests running on certain AMD CPUs, and a
  KASAN related build fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
  x86/boot: Provide KASAN compatible aliases for string routines

7bd1d5ed

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6751b8d9

由 Linus Torvalds 提交于 6月 02, 2019

Pull perf fixes from Ingo Molnar:
"On the kernel side there's a bunch of ring-buffer ordering fixes for a
reproducible bug, plus a PEBS constraints regression fix.

Plus tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools headers UAPI: Sync kvm.h headers with the kernel sources
perf record: Fix s390 missing module symbol and warning for non-root users
perf machine: Read also the end of the kernel
perf test vmlinux-kallsyms: Ignore aliases to _etext when searching on kallsyms
perf session: Add missing swap ops for namespace events
perf namespace: Protect reading thread's namespace
tools headers UAPI: Sync drm/drm.h with the kernel
tools headers UAPI: Sync drm/i915_drm.h with the kernel
tools headers UAPI: Sync linux/fs.h with the kernel
tools headers UAPI: Sync linux/sched.h with the kernel
tools arch x86: Sync asm/cpufeatures.h with the with the kernel
tools include UAPI: Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls
perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel
perf data: Fix 'strncat may truncate' build failure with recent gcc
perf/ring-buffer: Use regular variables for nesting
perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data
perf/ring_buffer: Add ordering to rb->nest increment
perf/ring_buffer: Fix exposing a temporarily decreased data_head
perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints

6751b8d9

Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · af042452

由 Linus Torvalds 提交于 6月 02, 2019

Pull EFI fixes from Ingo Molnar:
 "Two EFI fixes: a quirk for weird systabs, plus add more robust error
  handling in the old 1:1 mapping code"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Allow the number of EFI configuration tables entries to be zero
  efi/x86/Add missing error handling to old_memmap 1:1 mapping code

af042452

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4fb5741c

由 Linus Torvalds 提交于 6月 02, 2019

Pull stacktrace fix from Ingo Molnar:
 "Fix a stack_trace_save_tsk_reliable() regression"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stacktrace: Unbreak stack_trace_save_tsk_reliable()

4fb5741c

Merge tag 'spdx-5.2-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · a68dc618

由 Linus Torvalds 提交于 6月 02, 2019

Pull SPDX fixes from Greg KH:
 "Here are just two small patches, that fix up some found SPDX
  identifier issues.

  The first patch fixes an error in a previous SPDX fixup patch, that
  causes build errors when doing 'make clean' on the tree (the fact that
  almost no one noticed it reflects the fact that kernel developers
  don't like doing that option very often...)

  The second patch fixes up a number of places in the tree where people
  mistyped the string "SPDX-License-Identifier". Given that people can
  not even type their own name all the time without mistakes, this was
  bound to happen, and odds are, we will have to add some type of check
  for this to checkpatch.pl to catch this happening in the future.

  Both of these have passed testing by 0-day"

* tag 'spdx-5.2-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  treewide: fix typos of SPDX-License-Identifier
  crypto: ux500 - fix license comment syntax error

a68dc618

Merge tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 460b48a0

由 Linus Torvalds 提交于 6月 02, 2019

Pull powerpc fixes from Michael Ellerman:
 "A minor fix to our IMC PMU code to print a less confusing error
  message when the driver can't initialise properly.

  A fix for a bug where a user requesting an unsupported branch sampling
  filter can corrupt PMU state, preventing the PMU from counting
  properly.

  And finally a fix for a bug in our support for kexec_file_load(),
  which prevented loading a kernel and initramfs. Most versions of kexec
  don't yet use kexec_file_load().

  Thanks to: Anju T Sudhakar, Dave Young, Madhavan Srinivasan, Ravi
  Bangoria, Thiago Jung Bauermann"

* tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load()
  powerpc/perf: Fix MMCRA corruption by bhrb_filter
  powerpc/powernv: Return for invalid IMC domain

460b48a0

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b44a1dd3

由 Linus Torvalds 提交于 6月 02, 2019

Pull KVM fixes from Paolo Bonzini:
 "Fixes for PPC and s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S HV: Restore SPRG3 in kvmhv_p9_guest_entry()
  KVM: PPC: Book3S HV: Fix lockdep warning when entering guest on POWER9
  KVM: PPC: Book3S HV: XIVE: Fix page offset when clearing ESB pages
  KVM: PPC: Book3S HV: XIVE: Take the srcu read lock when accessing memslots
  KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts
  KVM: PPC: Book3S HV: XIVE: Introduce a new mutex for the XIVE device
  KVM: PPC: Book3S HV: XIVE: Fix the enforced limit on the vCPU identifier
  KVM: PPC: Book3S HV: XIVE: Do not test the EQ flag validity when resetting
  KVM: PPC: Book3S HV: XIVE: Clear file mapping when device is released
  KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
  KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
  KVM: PPC: Book3S HV: Use new mutex to synchronize MMU setup
  KVM: PPC: Book3S HV: Avoid touching arch.mmu_ready in XIVE release functions
  KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID
  kvm: fix compile on s390 part 2

b44a1dd3

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 38baf0bb

由 Linus Torvalds 提交于 6月 02, 2019

Pull i2c fixes from Wolfram Sang:
 "A memleak fix for the core, two driver bugfixes, as well as fixing
  missing file patterns to MAINTAINERS"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: add I2C DT bindings to ARM platforms
  MAINTAINERS: add DT bindings to i2c drivers
  i2c: synquacer: fix synquacer_i2c_doxfer() return value
  i2c: mlxcpld: Fix wrong initialization order in probe
  i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr

38baf0bb

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal · 378e853f

由 Linus Torvalds 提交于 6月 02, 2019

Pull thermal SoC fix from Eduardo Valentin:
 "A single revert, detected to cause issues on the tsens driver"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  Revert "drivers: thermal: tsens: Add new operation to check if a sensor is enabled"

378e853f

Merge tag 'led-fixes-for-5.2-rc3' of... · f58c356e

由 Linus Torvalds 提交于 6月 02, 2019

Merge tag 'led-fixes-for-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED fix from Jacek Anaszewski:
 "Fix for a recent change in LED core, that didn't take into account the
  possibility of calling led_blink_setup() from atomic context"

* tag 'led-fixes-for-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: avoid flush_work in atomic context

f58c356e

Merge tag 'for-linus-20190601' of git://git.kernel.dk/linux-block · 9221dced

由 Linus Torvalds 提交于 6月 02, 2019

Pull block fixes from Jens Axboe:

 - A set of patches fixing code comments / kerneldoc (Bart)

 - Don't allow loop file change for exclusive open (Jan)

 - Fix revalidate of hidden genhd (Jan)

 - Init queue failure memory free fix (Jes)

 - Improve rq limits failure print (John)

 - Fixup for queue removal/addition (Ming)

 - Missed error progagation for io_uring buffer registration (Pavel)

* tag 'for-linus-20190601' of git://git.kernel.dk/linux-block:
  block: print offending values when cloned rq limits are exceeded
  blk-mq: Document the blk_mq_hw_queue_to_node() arguments
  blk-mq: Fix spelling in a source code comment
  block: Fix bsg_setup_queue() kernel-doc header
  block: Fix rq_qos_wait() kernel-doc header
  block: Fix blk_mq_*_map_queues() kernel-doc headers
  block: Fix throtl_pending_timer_fn() kernel-doc header
  block: Convert blk_invalidate_devt() header into a non-kernel-doc header
  block/partitions/ldm: Convert a kernel-doc header into a non-kernel-doc header
  blk-mq: Fix memory leak in error handling
  block: don't protect generic_make_request_checks with blk_queue_enter
  block: move blk_exit_queue into __blk_release_queue
  block: Don't revalidate bdev of hidden gendisk
  loop: Don't change loop device under exclusive opener
  io_uring: Fix __io_uring_register() false success

9221dced

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 1975b337

由 Linus Torvalds 提交于 6月 02, 2019

Pull SCSI fixes from James Bottomley:
 "Six minor fixes to device drivers and one to the multipath alua
  handler.

  The most extensive fix is the zfcp port remove prevention one, but
  it's impact is only s390"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: libsas: delete sas port if expander discover failed
  scsi: libsas: only clear phy->in_shutdown after shutdown event done
  scsi: scsi_dh_alua: Fix possible null-ptr-deref
  scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
  scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
  scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
  scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()

1975b337

02 6月, 2019 10 次提交

Merge branch 'akpm' (patches from Andrew) · 7b3064f0

由 Linus Torvalds 提交于 6月 02, 2019

Merge misc fixes from Andrew Morton:
 "Various fixes and followups"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, compaction: make sure we isolate a valid PFN
  include/linux/generic-radix-tree.h: fix kerneldoc comment
  kernel/signal.c: trace_signal_deliver when signal_group_exit
  drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used
  spdxcheck.py: fix directory structures
  kasan: initialize tag to 0xff in __kasan_kmalloc
  z3fold: fix sheduling while atomic
  scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set
  mm/gup: continue VM_FAULT_RETRY processing even for pre-faults
  ocfs2: fix error path kobject memory leak
  memcg: make it work on sparse non-0-node systems
  mm, memcg: consider subtrees in memory.events
  prctl_set_mm: downgrade mmap_sem to read lock
  prctl_set_mm: refactor checks from validate_prctl_map
  kernel/fork.c: make max_threads symbol static
  arch/arm/boot/compressed/decompress.c: fix build error due to lz4 changes
  arch/parisc/configs/c8000_defconfig: remove obsoleted CONFIG_DEBUG_SLAB_LEAK
  mm/vmalloc.c: fix typo in comment
  lib/sort.c: fix kernel-doc notation warnings
  mm: fix Documentation/vm/hmm.rst Sphinx warnings

7b3064f0

mm, compaction: make sure we isolate a valid PFN · e577c8b6

由 Suzuki K Poulose 提交于 5月 31, 2019

When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(),
triggered by kswapd).

Later if we fail to find a page via fast_isolate_freepages(), we may end
up using the migrate_pfn we started the search with, as valid page.  This
could lead to accessing NULL pointer derefernces like below, due to an
invalid mem_section pointer.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825]
 Mem abort info:
   ESR = 0x96000004
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9
 [0000000000000008] pgd=0000000000000000
 Internal error: Oops: 96000004 [#1] SMP
 ...
 CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
 pstate: 60000005 (nZCv daif -PAN -UAO)
 pc : set_pfnblock_flags_mask+0x58/0xe8
 lr : compaction_alloc+0x300/0x950
 [...]
 Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
 Call trace:
  set_pfnblock_flags_mask+0x58/0xe8
  compaction_alloc+0x300/0x950
  migrate_pages+0x1a4/0xbb0
  compact_zone+0x750/0xde8
  compact_zone_order+0xd8/0x118
  try_to_compact_pages+0xb4/0x290
  __alloc_pages_direct_compact+0x84/0x1e0
  __alloc_pages_nodemask+0x5e0/0xe18
  alloc_pages_vma+0x1cc/0x210
  do_huge_pmd_anonymous_page+0x108/0x7c8
  __handle_mm_fault+0xdd4/0x1190
  handle_mm_fault+0x114/0x1c0
  __get_user_pages+0x198/0x3c0
  get_user_pages_unlocked+0xb4/0x1d8
  __gfn_to_pfn_memslot+0x12c/0x3b8
  gfn_to_pfn_prot+0x4c/0x60
  kvm_handle_guest_abort+0x4b0/0xcd8
  handle_exit+0x140/0x1b8
  kvm_arch_vcpu_ioctl_run+0x260/0x768
  kvm_vcpu_ioctl+0x490/0x898
  do_vfs_ioctl+0xc4/0x898
  ksys_ioctl+0x8c/0xa0
  __arm64_sys_ioctl+0x28/0x38
  el0_svc_common+0x74/0x118
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc
 Code: f8607840 f100001f 8b011401 9a801020 (f9400400)
 ---[ end trace af6a35219325a9b6 ]---

The issue was reported on an arm64 server with 128GB with holes in the
zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
running 100 KVM guest instances.

This patch fixes the issue by ensuring that the page belongs to a valid
PFN when we fallback to using the lower limit of the scan range upon
failure in fast_isolate_freepages().

Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com
Fixes: 5a811889 ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e577c8b6

include/linux/generic-radix-tree.h: fix kerneldoc comment · 590ba22b

由 Jonathan Corbet 提交于 5月 31, 2019

The DOC comment block section in include/linux/generic-radix-tree.h
contained a spurious colon, causing this warning in the documentation
build:

  include/linux/generic-radix-tree.h:1: warning: no structured comments found

Remove the colon and make the docs build happy.

Link: http://lkml.kernel.org/r/20190524141933.74ae9050@lwn.netSigned-off-by: NJonathan Corbet <corbet@lwn.net>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

590ba22b

kernel/signal.c: trace_signal_deliver when signal_group_exit · 98af37d6

由 Zhenliang Wei 提交于 5月 31, 2019

In the fixes commit, removing SIGKILL from each thread signal mask and
executing "goto fatal" directly will skip the call to
"trace_signal_deliver".  At this point, the delivery tracking of the
SIGKILL signal will be inaccurate.

Therefore, we need to add trace_signal_deliver before "goto fatal" after
executing sigdelset.

Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info.

Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com
Fixes: cf43a757 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Signed-off-by: NZhenliang Wei <weizhenliang@huawei.com>
Reviewed-by: NChristian Brauner <christian@brauner.io>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ivan Delalande <colona@arista.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98af37d6

drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used · d3ed71e5

由 Qian Cai 提交于 5月 31, 2019

Commit cf04eee8 ("iommu/vt-d: Include ACPI devices in iommu=pt")
added for_each_active_iommu() in iommu_prepare_static_identity_mapping()
but never used the each element, i.e, "drhd->iommu".

drivers/iommu/intel-iommu.c: In function
'iommu_prepare_static_identity_mapping':
drivers/iommu/intel-iommu.c:3037:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
 struct intel_iommu *iommu;

Fixed the warning by appending a compiler attribute __maybe_unused for it.

Link: http://lkml.kernel.org/r/20190523013314.2732-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3ed71e5

spdxcheck.py: fix directory structures · 8d7a7abf

由 Vincenzo Frascino 提交于 5月 31, 2019

The LICENSE directory has recently changed structure and this makes
spdxcheck fails as per below:

FAIL: "Blob or Tree named 'other' not found"
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 240, in <module>
spdx = read_spdxdata(repo)
  File "scripts/spdxcheck.py", line 41, in read_spdxdata
for el in lictree[d].traverse():
[...]
KeyError: "Blob or Tree named 'other' not found"

Fix the script to restore the correctness on checkpatch License checking.

References: 62be257e ("LICENSES: Rename other to deprecated")
References: 8ea8814f ("LICENSES: Clearly mark dual license only licenses")
Link: http://lkml.kernel.org/r/20190523084755.56739-1-vincenzo.frascino@arm.comSigned-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeremy Cline <jcline@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d7a7abf

kasan: initialize tag to 0xff in __kasan_kmalloc · 0600597c

由 Nathan Chancellor 提交于 5月 31, 2019

When building with -Wuninitialized and CONFIG_KASAN_SW_TAGS unset, Clang
warns:

mm/kasan/common.c:484:40: warning: variable 'tag' is uninitialized when
used here [-Wuninitialized]
        kasan_unpoison_shadow(set_tag(object, tag), size);
                                              ^~~

set_tag ignores tag in this configuration but clang doesn't realize it at
this point in its pipeline, as it points to arch_kasan_set_tag as being
the point where it is used, which will later be expanded to (void
*)(object) without a use of tag.  Initialize tag to 0xff, as it removes
this warning and doesn't change the meaning of the code.

Link: https://github.com/ClangBuiltLinux/linux/issues/465
Link: http://lkml.kernel.org/r/20190502163057.6603-1-natechancellor@gmail.com
Fixes: 7f94ffbc ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NAndrey Konovalov <andreyknvl@google.com>
Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0600597c

z3fold: fix sheduling while atomic · bb9f6f63

由 Vitaly Wool 提交于 5月 31, 2019

kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so
we need to pass correct gfp flags to avoid "scheduling while atomic" bug.

Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com
Fixes: 7c2b8baa ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: NVitaly Wool <vitaly.vul@sony.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb9f6f63

scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set · ef7a77c6

由 Fabiano Rosas 提交于 5月 31, 2019

CLK_GET_RATE_NOCACHE depends on CONFIG_COMMON_CLK.  Importing constants.py
when CONFIG_COMMON_CLK is not defined causes:

  (gdb) lx-symbols
  (...)
    File "scripts/gdb/linux/proc.py", line 15, in <module>
      from linux import constants
    File "scripts/gdb/linux/constants.py", line 2, in <module>
      LX_CLK_GET_RATE_NOCACHE = gdb.parse_and_eval("CLK_GET_RATE_NOCACHE")
  gdb.error: No symbol "CLK_GET_RATE_NOCACHE" in current context.

Link: http://lkml.kernel.org/r/20190523195313.24701-1-farosas@linux.ibm.com
Fixes: e7e6f462 ("scripts/gdb: print cached rate in lx-clk-summary")
Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: NStephen Boyd <swboyd@chromium.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Cc: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef7a77c6

mm/gup: continue VM_FAULT_RETRY processing even for pre-faults · df17277b

由 Mike Rapoport 提交于 5月 31, 2019

When get_user_pages*() is called with pages = NULL, the processing of
VM_FAULT_RETRY terminates early without actually retrying to fault-in all
the pages.

If the pages in the requested range belong to a VMA that has userfaultfd
registered, handle_userfault() returns VM_FAULT_RETRY *after* user space
has populated the page, but for the gup pre-fault case there's no actual
retry and the caller will get no pages although they are present.

This issue was uncovered when running post-copy memory restore in CRIU
after d9c9ce34 ("x86/fpu: Fault-in user stack if
copy_fpstate_to_sigframe() fails").

After this change, the copying of FPU state to the sigframe switched from
copy_to_user() variants which caused a real page fault to get_user_pages()
with pages parameter set to NULL.

In post-copy mode of CRIU, the destination memory is managed with
userfaultfd and lack of the retry for pre-fault case in get_user_pages()
causes a crash of the restored process.

Making the pre-fault behavior of get_user_pages() the same as the "normal"
one fixes the issue.

Link: http://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com
Fixes: d9c9ce34 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Tested-by: Andrei Vagin <avagin@gmail.com> [https://travis-ci.org/avagin/linux/builds/533184940]
Tested-by: NHugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df17277b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功