1. 12 10月, 2019 40 次提交
    • G
      powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt() · f5f31a6e
      Gautham R. Shenoy 提交于
      [ Upstream commit c784be435d5dae28d3b03db31753dd7a18733f0c ]
      
      The calls to arch_add_memory()/arch_remove_memory() are always made
      with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin().
      On pSeries, arch_add_memory()/arch_remove_memory() eventually call
      resize_hpt() which in turn calls stop_machine() which acquires the
      read-side cpu_hotplug_lock again, thereby resulting in the recursive
      acquisition of this lock.
      
      In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
      lockup during a memory hotplug operation because cpus_read_lock() is a
      per-cpu rwsem read, which, in the fast-path (in the absence of the
      writer, which in our case is a CPU-hotplug operation) simply
      increments the read_count on the semaphore. Thus a recursive read in
      the fast-path doesn't cause any problems.
      
      However, we can hit this problem in practice if there is a concurrent
      CPU-Hotplug operation in progress which is waiting to acquire the
      write-side of the lock. This will cause the second recursive read to
      block until the writer finishes. While the writer is blocked since the
      first read holds the lock. Thus both the reader as well as the writers
      fail to make any progress thereby blocking both CPU-Hotplug as well as
      Memory Hotplug operations.
      
      Memory-Hotplug				CPU-Hotplug
      CPU 0					CPU 1
      ------                                  ------
      
      1. down_read(cpu_hotplug_lock.rw_sem)
         [memory_hotplug_begin]
      					2. down_write(cpu_hotplug_lock.rw_sem)
      					[cpu_up/cpu_down]
      3. down_read(cpu_hotplug_lock.rw_sem)
         [stop_machine()]
      
      Lockdep complains as follows in these code-paths.
      
       swapper/0/1 is trying to acquire lock:
       (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
      
      but task is already holding lock:
      (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(cpu_hotplug_lock.rw_sem);
         lock(cpu_hotplug_lock.rw_sem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       3 locks held by swapper/0/1:
        #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0
        #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
        #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
      
      stack backtrace:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166
       Call Trace:
         dump_stack+0xe8/0x164 (unreliable)
         __lock_acquire+0x1110/0x1c70
         lock_acquire+0x240/0x290
         cpus_read_lock+0x64/0xf0
         stop_machine+0x2c/0x60
         pseries_lpar_resize_hpt+0x19c/0x2c0
         resize_hpt_for_hotplug+0x70/0xd0
         arch_add_memory+0x58/0xfc
         devm_memremap_pages+0x5e8/0x8f0
         pmem_attach_disk+0x764/0x830
         nvdimm_bus_probe+0x118/0x240
         really_probe+0x230/0x4b0
         driver_probe_device+0x16c/0x1e0
         __driver_attach+0x148/0x1b0
         bus_for_each_dev+0x90/0x130
         driver_attach+0x34/0x50
         bus_add_driver+0x1a8/0x360
         driver_register+0x108/0x170
         __nd_driver_register+0xd0/0xf0
         nd_pmem_driver_init+0x34/0x48
         do_one_initcall+0x1e0/0x45c
         kernel_init_freeable+0x540/0x64c
         kernel_init+0x2c/0x160
         ret_from_kernel_thread+0x5c/0x68
      
      Fix this issue by
        1) Requiring all the calls to pseries_lpar_resize_hpt() be made
           with cpu_hotplug_lock held.
      
        2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
           as a consequence of 1)
      
        3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
           with cpu_hotplug_lock held.
      
      Fixes: dbcf929c ("powerpc/pseries: Add support for hash table resizing")
      Cc: stable@vger.kernel.org # v4.11+
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.ibm.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      f5f31a6e
    • X
      nbd: fix crash when the blksize is zero · c688982f
      Xiubo Li 提交于
      [ Upstream commit 553768d1169a48c0cd87c4eb4ab57534ee663415 ]
      
      This will allow the blksize to be set zero and then use 1024 as
      default.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NXiubo Li <xiubli@redhat.com>
      [fix to use goto out instead of return in genl_connect]
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c688982f
    • S
      KVM: nVMX: Fix consistency check on injected exception error code · 63bb8b76
      Sean Christopherson 提交于
      [ Upstream commit 567926cca99ba1750be8aae9c4178796bf9bb90b ]
      
      Current versions of Intel's SDM incorrectly state that "bits 31:15 of
      the VM-Entry exception error-code field" must be zero.  In reality, bits
      31:16 must be zero, i.e. error codes are 16-bit values.
      
      The bogus error code check manifests as an unexpected VM-Entry failure
      due to an invalid code field (error number 7) in L1, e.g. when injecting
      a #GP with error_code=0x9f00.
      
      Nadav previously reported the bug[*], both to KVM and Intel, and fixed
      the associated kvm-unit-test.
      
      [*] https://patchwork.kernel.org/patch/11124749/Reported-by: NNadav Amit <namit@vmware.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      63bb8b76
    • C
      KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP · 34b13ff6
      Cédric Le Goater 提交于
      [ Upstream commit 237aed48c642328ff0ab19b63423634340224a06 ]
      
      When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
      disabled and then the event notification queues are freed. When freeing
      the queues, we check for possible escalation interrupts and free them
      also.
      
      But when a XIVE VP is disabled, the underlying XIVE ENDs also are
      disabled in OPAL. When an END (Event Notification Descriptor) is
      disabled, its ESB pages (ESn and ESe) are disabled and loads return all
      1s. Which means that any access on the ESB page of the escalation
      interrupt will return invalid values.
      
      When an interrupt is freed, the shutdown handler computes a 'saved_p'
      field from the value returned by a load in xive_do_source_set_mask().
      This value is incorrect for escalation interrupts for the reason
      described above.
      
      This has no impact on Linux/KVM today because we don't make use of it
      but we will introduce in future changes a xive_get_irqchip_state()
      handler. This handler will use the 'saved_p' field to return the state
      of an interrupt and 'saved_p' being incorrect, softlockup will occur.
      
      Fix the vCPU cleanup sequence by first freeing the escalation interrupts
      if any, then disable the XIVE VP and last free the queues.
      
      Fixes: 90c73795afa2 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode")
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
      34b13ff6
    • H
      drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed · 1b155b4f
      Hans de Goede 提交于
      [ Upstream commit 9dbc88d013b79c62bd845cb9e7c0256e660967c5 ]
      
      Bail from the pci_driver probe function instead of from the drm_driver
      load function.
      
      This avoid /dev/dri/card0 temporarily getting registered and then
      unregistered again, sending unwanted add / remove udev events to
      userspace.
      
      Specifically this avoids triggering the (userspace) bug fixed by this
      plymouth merge-request:
      https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59
      
      Note that despite that being an userspace bug, not sending unnecessary
      udev events is a good idea in general.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490Reviewed-by: NMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1b155b4f
    • N
      nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs · 04e0c84f
      Navid Emamdoost 提交于
      [ Upstream commit 8ce39eb5a67aee25d9f05b40b673c95b23502e3e ]
      
      In nfp_flower_spawn_vnic_reprs in the loop if initialization or the
      allocations fail memory is leaked. Appropriate releases are added.
      
      Fixes: b9452452 ("nfp: flower: add per repr private data for LAG offload")
      Signed-off-by: NNavid Emamdoost <navid.emamdoost@gmail.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      04e0c84f
    • A
      perf unwind: Fix libunwind build failure on i386 systems · 575a5bb3
      Arnaldo Carvalho de Melo 提交于
      [ Upstream commit 26acf400d2dcc72c7e713e1f55db47ad92010cc2 ]
      
      Naresh Kamboju reported, that on the i386 build pr_err()
      doesn't get defined properly due to header ordering:
      
        perf-in.o: In function `libunwind__x86_reg_id':
        tools/perf/util/libunwind/../../arch/x86/util/unwind-libunwind.c:109:
        undefined reference to `pr_err'
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      575a5bb3
    • V
      kernel/elfcore.c: include proper prototypes · b0aaf65b
      Valdis Kletnieks 提交于
      [ Upstream commit 0f74914071ab7e7b78731ed62bf350e3a344e0a5 ]
      
      When building with W=1, gcc properly complains that there's no prototypes:
      
        CC      kernel/elfcore.o
      kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes]
          7 | Elf_Half __weak elf_core_extra_phdrs(void)
            |                 ^~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes]
         12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes]
         17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes]
         22 | size_t __weak elf_core_extra_data_size(void)
            |               ^~~~~~~~~~~~~~~~~~~~~~~~
      
      Provide the include file so gcc is happy, and we don't have potential code drift
      
      Link: http://lkml.kernel.org/r/29875.1565224705@turing-policeSigned-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b0aaf65b
    • T
      perf build: Add detection of java-11-openjdk-devel package · bab46480
      Thomas Richter 提交于
      [ Upstream commit 815c1560bf8fd522b8d93a1d727868b910c1cc24 ]
      
      With Java 11 there is no seperate JRE anymore.
      
      Details:
      
        https://coderanch.com/t/701603/java/JRE-JDK
      
      Therefore the detection of the JRE needs to be adapted.
      
      This change works for s390 and x86.  I have not tested other platforms.
      
      Committer testing:
      
      Continues to work with the OpenJDK 8:
      
        $ rm -f ~acme/lib64/libperf-jvmti.so
        $ rpm -qa | grep jdk-devel
        java-1.8.0-openjdk-devel-1.8.0.222.b10-0.fc30.x86_64
        $ git log --oneline -1
        a51937170f33 (HEAD -> perf/core) perf build: Add detection of java-11-openjdk-devel package
        $ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install > /dev/null 2>1
        $ ls -la ~acme/lib64/libperf-jvmti.so
        -rwxr-xr-x. 1 acme acme 230744 Sep 24 16:46 /home/acme/lib64/libperf-jvmti.so
        $
      Suggested-by: NAndreas Krebbel <krebbel@linux.ibm.com>
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20190909114116.50469-4-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bab46480
    • K
      sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() · 46ff0e2f
      KeMeng Shi 提交于
      [ Upstream commit 714e501e16cd473538b609b3e351b2cc9f7f09ed ]
      
      An oops can be triggered in the scheduler when running qemu on arm64:
      
       Unable to handle kernel paging request at virtual address ffff000008effe40
       Internal error: Oops: 96000007 [#1] SMP
       Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
       pstate: 20000085 (nzCv daIf -PAN -UAO)
       pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
       lr : move_queued_task.isra.21+0x124/0x298
       ...
       Call trace:
        __ll_sc___cmpxchg_case_acq_4+0x4/0x20
        __migrate_task+0xc8/0xe0
        migration_cpu_stop+0x170/0x180
        cpu_stopper_thread+0xec/0x178
        smpboot_thread_fn+0x1ac/0x1e8
        kthread+0x134/0x138
        ret_from_fork+0x10/0x18
      
      __set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
      migrage the process if process is not currently running on any one of the
      CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
      invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
      CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
      check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
      corresponding bit is coincidentally set. As a consequence, kernel will
      access an invalid rq address associate with the invalid CPU in
      migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
      
      The reproduce the crash:
      
        1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
        sched_setaffinity.
      
        2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
        and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
      
        3) Oops appears if the invalid CPU is set in memory after tested cpumask.
      Signed-off-by: NKeMeng Shi <shikemeng@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      46ff0e2f
    • M
      sched/membarrier: Fix private expedited registration check · 6cb7aa1b
      Mathieu Desnoyers 提交于
      [ Upstream commit fc0d77387cb5ae883fd774fc559e056a8dde024c ]
      
      Fix a logic flaw in the way membarrier_register_private_expedited()
      handles ready state checks for private expedited sync core and private
      expedited registrations.
      
      If a private expedited membarrier registration is first performed, and
      then a private expedited sync_core registration is performed, the ready
      state check will skip the second registration when it really should not.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6cb7aa1b
    • M
      sched/membarrier: Call sync_core only before usermode for same mm · e250f2b6
      Mathieu Desnoyers 提交于
      [ Upstream commit 2840cf02fae627860156737e83326df354ee4ec6 ]
      
      When the prev and next task's mm change, switch_mm() provides the core
      serializing guarantees before returning to usermode. The only case
      where an explicit core serialization is needed is when the scheduler
      keeps the same mm for prev and next.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e250f2b6
    • N
      libnvdimm/nfit_test: Fix acpi_handle redefinition · 9f33b178
      Nathan Chancellor 提交于
      [ Upstream commit 59f08896f058a92f03a0041b397a1a227c5e8529 ]
      
      After commit 62974fc389b3 ("libnvdimm: Enable unit test infrastructure
      compile checks"), clang warns:
      
      In file included from
      ../drivers/nvdimm/../../tools/testing/nvdimm/test/iomap.c:15:
      ../drivers/nvdimm/../../tools/testing/nvdimm/test/nfit_test.h:206:15:
      warning: redefinition of typedef 'acpi_handle' is a C11 feature
      [-Wtypedef-redefinition]
      typedef void *acpi_handle;
                    ^
      ../include/acpi/actypes.h:424:15: note: previous definition is here
      typedef void *acpi_handle;      /* Actually a ptr to a NS Node */
                    ^
      1 warning generated.
      
      The include chain:
      
      iomap.c ->
          linux/acpi.h ->
              acpi/acpi.h ->
                  acpi/actypes.h
          nfit_test.h
      
      Avoid this by including linux/acpi.h in nfit_test.h, which allows us to
      remove both the typedef and the forward declaration of acpi_object.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/660Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Link: https://lore.kernel.org/r/20190918042148.77553-1-natechancellor@gmail.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9f33b178
    • Z
      fuse: fix memleak in cuse_channel_open · 7b4f541f
      zhengbin 提交于
      [ Upstream commit 9ad09b1976c562061636ff1e01bfc3a57aebe56b ]
      
      If cuse_send_init fails, need to fuse_conn_put cc->fc.
      
      cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1)
                       ->fuse_dev_alloc->fuse_conn_get
                       ->fuse_dev_free->fuse_conn_put
      
      Fixes: cc080e9e ("fuse: introduce per-instance fuse_dev structure")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7b4f541f
    • A
      libnvdimm/region: Initialize bad block for volatile namespaces · 2e93d24a
      Aneesh Kumar K.V 提交于
      [ Upstream commit c42adf87e4e7ed77f6ffe288dc90f980d07d68df ]
      
      We do check for a bad block during namespace init and that use
      region bad block list. We need to initialize the bad block
      for volatile regions for this to work. We also observe a lockdep
      warning as below because the lock is not initialized correctly
      since we skip bad block init for volatile regions.
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
       Call Trace:
       [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
       [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
       [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
       [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
       [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
       [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
       [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
       [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
       [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
       [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
       [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
       [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
       [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
       [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
       [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
       [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
       [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
       [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
       [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
       [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
       [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
       [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
       [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2e93d24a
    • S
      thermal_hwmon: Sanitize thermal_zone type · 9025adf3
      Stefan Mavrodiev 提交于
      [ Upstream commit 8c7aa184281c01fc26f319059efb94725012921d ]
      
      When calling thermal_add_hwmon_sysfs(), the device type is sanitized by
      replacing '-' with '_'. However tz->type remains unsanitized. Thus
      calling thermal_hwmon_lookup_by_type() returns no device. And if there is
      no device, thermal_remove_hwmon_sysfs() fails with "hwmon device lookup
      failed!".
      
      The result is unregisted hwmon devices in the sysfs.
      
      Fixes: 409ef0ba ("thermal_hwmon: Sanitize attribute name passed to hwmon")
      Signed-off-by: NStefan Mavrodiev <stefan@olimex.com>
      Signed-off-by: NZhang Rui <rui.zhang@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9025adf3
    • I
      thermal: Fix use-after-free when unregistering thermal zone device · c01a9dbe
      Ido Schimmel 提交于
      [ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ]
      
      thermal_zone_device_unregister() cancels the delayed work that polls the
      thermal zone, but it does not wait for it to finish. This is racy with
      respect to the freeing of the thermal zone device, which can result in a
      use-after-free [1].
      
      Fix this by waiting for the delayed work to finish before freeing the
      thermal zone device. Note that thermal_zone_device_set_polling() is
      never invoked from an atomic context, so it is safe to call
      cancel_delayed_work_sync() that can block.
      
      [1]
      [  +0.002221] ==================================================================
      [  +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0
      [  +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
      
      [  +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701
      [  +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check
      [  +0.000012] Call Trace:
      [  +0.000021]  dump_stack+0xa9/0x10e
      [  +0.000020]  print_address_description.cold.2+0x9/0x25e
      [  +0.000018]  __kasan_report.cold.3+0x78/0x9d
      [  +0.000016]  kasan_report+0xe/0x20
      [  +0.000016]  __mutex_lock+0x1076/0x11c0
      [  +0.000014]  step_wise_throttle+0x72/0x150
      [  +0.000018]  handle_thermal_trip+0x167/0x760
      [  +0.000019]  thermal_zone_device_update+0x19e/0x5f0
      [  +0.000019]  process_one_work+0x969/0x16f0
      [  +0.000017]  worker_thread+0x91/0xc40
      [  +0.000014]  kthread+0x33d/0x400
      [  +0.000015]  ret_from_fork+0x3a/0x50
      
      [  +0.000020] Allocated by task 1:
      [  +0.000015]  save_stack+0x19/0x80
      [  +0.000015]  __kasan_kmalloc.constprop.4+0xc1/0xd0
      [  +0.000014]  kmem_cache_alloc_trace+0x152/0x320
      [  +0.000015]  thermal_zone_device_register+0x1b4/0x13a0
      [  +0.000015]  mlxsw_thermal_init+0xc92/0x23d0
      [  +0.000014]  __mlxsw_core_bus_device_register+0x659/0x11b0
      [  +0.000013]  mlxsw_core_bus_device_register+0x3d/0x90
      [  +0.000013]  mlxsw_pci_probe+0x355/0x4b0
      [  +0.000014]  local_pci_probe+0xc3/0x150
      [  +0.000013]  pci_device_probe+0x280/0x410
      [  +0.000013]  really_probe+0x26a/0xbb0
      [  +0.000013]  driver_probe_device+0x208/0x2e0
      [  +0.000013]  device_driver_attach+0xfe/0x140
      [  +0.000013]  __driver_attach+0x110/0x310
      [  +0.000013]  bus_for_each_dev+0x14b/0x1d0
      [  +0.000013]  driver_register+0x1c0/0x400
      [  +0.000015]  mlxsw_sp_module_init+0x5d/0xd3
      [  +0.000014]  do_one_initcall+0x239/0x4dd
      [  +0.000013]  kernel_init_freeable+0x42b/0x4e8
      [  +0.000012]  kernel_init+0x11/0x18b
      [  +0.000013]  ret_from_fork+0x3a/0x50
      
      [  +0.000015] Freed by task 581:
      [  +0.000013]  save_stack+0x19/0x80
      [  +0.000014]  __kasan_slab_free+0x125/0x170
      [  +0.000013]  kfree+0xf3/0x310
      [  +0.000013]  thermal_release+0xc7/0xf0
      [  +0.000014]  device_release+0x77/0x200
      [  +0.000014]  kobject_put+0x1a8/0x4c0
      [  +0.000014]  device_unregister+0x38/0xc0
      [  +0.000014]  thermal_zone_device_unregister+0x54e/0x6a0
      [  +0.000014]  mlxsw_thermal_fini+0x184/0x35a
      [  +0.000014]  mlxsw_core_bus_device_unregister+0x10a/0x640
      [  +0.000013]  mlxsw_devlink_core_bus_device_reload+0x92/0x210
      [  +0.000015]  devlink_nl_cmd_reload+0x113/0x1f0
      [  +0.000014]  genl_family_rcv_msg+0x700/0xee0
      [  +0.000013]  genl_rcv_msg+0xca/0x170
      [  +0.000013]  netlink_rcv_skb+0x137/0x3a0
      [  +0.000012]  genl_rcv+0x29/0x40
      [  +0.000013]  netlink_unicast+0x49b/0x660
      [  +0.000013]  netlink_sendmsg+0x755/0xc90
      [  +0.000013]  __sys_sendto+0x3de/0x430
      [  +0.000013]  __x64_sys_sendto+0xe2/0x1b0
      [  +0.000013]  do_syscall_64+0xa4/0x4d0
      [  +0.000013]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [  +0.000017] The buggy address belongs to the object at ffff8881e48e0008
                     which belongs to the cache kmalloc-2k of size 2048
      [  +0.000012] The buggy address is located 1096 bytes inside of
                     2048-byte region [ffff8881e48e0008, ffff8881e48e0808)
      [  +0.000007] The buggy address belongs to the page:
      [  +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0
      [  +0.000020] flags: 0x200000000010200(slab|head)
      [  +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0
      [  +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
      [  +0.000007] page dumped because: kasan: bad access detected
      
      [  +0.000012] Memory state around the buggy address:
      [  +0.000012]  ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012]  ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000008]                                                  ^
      [  +0.000012]  ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000012]  ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  +0.000007] ==================================================================
      
      Fixes: b1569e99 ("ACPI: move thermal trip handling to generic thermal layer")
      Reported-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NZhang Rui <rui.zhang@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c01a9dbe
    • S
      ntb: point to right memory window index · 55ebeb4e
      Sanjay R Mehta 提交于
      [ Upstream commit ae89339b08f3fe02457ec9edd512ddc3d246d0f8 ]
      
      second parameter of ntb_peer_mw_get_addr is pointing to wrong memory
      window index by passing "peer gidx" instead of "local gidx".
      
      For ex, "local gidx" value is '0' and "peer gidx" value is '1', then
      
      on peer side ntb_mw_set_trans() api is used as below with gidx pointing to
      local side gidx which is '0', so memroy window '0' is chosen and XLAT '0'
      will be programmed by peer side.
      
          ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx, peer->inbuf_xlat,
                          peer->inbuf_size);
      
      Now, on local side ntb_peer_mw_get_addr() is been used as below with gidx
      pointing to "peer gidx" which is '1', so pointing to memory window '1'
      instead of memory window '0'.
      
          ntb_peer_mw_get_addr(perf->ntb,  peer->gidx, &phys_addr,
                              &peer->outbuf_size);
      
      So this patch pass "local gidx" as parameter to ntb_peer_mw_get_addr().
      Signed-off-by: NSanjay R Mehta <sanju.mehta@amd.com>
      Signed-off-by: NJon Mason <jdmason@kudzu.us>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      55ebeb4e
    • A
      x86/purgatory: Disable the stackleak GCC plugin for the purgatory · 9dabade5
      Arvind Sankar 提交于
      [ Upstream commit ca14c996afe7228ff9b480cf225211cc17212688 ]
      
      Since commit:
      
        b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      
      kexec breaks if GCC_PLUGIN_STACKLEAK=y is enabled, as the purgatory
      contains undefined references to stackleak_track_stack.
      
      Attempting to load a kexec kernel results in this failure:
      
        kexec: Undefined symbol: stackleak_track_stack
        kexec-bzImage64: Loading purgatory failed
      
      Fix this by disabling the stackleak plugin for the purgatory.
      Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
      Link: https://lkml.kernel.org/r/20190923171753.GA2252517@rani.riverdale.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9dabade5
    • F
      pwm: stm32-lp: Add check in case requested period cannot be achieved · 65348659
      Fabrice Gasnier 提交于
      [ Upstream commit c91e3234c6035baf5a79763cb4fcd5d23ce75c2b ]
      
      LPTimer can use a 32KHz clock for counting. It depends on clock tree
      configuration. In such a case, PWM output frequency range is limited.
      Although unlikely, nothing prevents user from requesting a PWM frequency
      above counting clock (32KHz for instance):
      - This causes (prd - 1) = 0xffff to be written in ARR register later in
      the apply() routine.
      This results in badly configured PWM period (and also duty_cycle).
      Add a check to report an error is such a case.
      Signed-off-by: NFabrice Gasnier <fabrice.gasnier@st.com>
      Reviewed-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NThierry Reding <thierry.reding@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      65348659
    • T
      pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors · 19b1c70e
      Trond Myklebust 提交于
      [ Upstream commit 9c47b18cf722184f32148784189fca945a7d0561 ]
      
      IF the server rejected our layout return with a state error such as
      NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want
      to clear out all the remaining layout segments and mark that stateid
      as invalid.
      
      Fixes: 1c5bd76d ("pNFS: Enable layoutreturn operation for...")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      19b1c70e
    • T
      drm/amdgpu: Check for valid number of registers to read · 1c70ae6a
      Trek 提交于
      [ Upstream commit 73d8e6c7b841d9bf298c8928f228fb433676635c ]
      
      Do not try to allocate any amount of memory requested by the user.
      Instead limit it to 128 registers. Actually the longest series of
      consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and
      mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).
      
      Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273Signed-off-by: NTrek <trek00@inbox.ru>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1c70ae6a
    • F
      drm/amdgpu: Fix KFD-related kernel oops on Hawaii · e0af3b19
      Felix Kuehling 提交于
      [ Upstream commit dcafbd50f2e4d5cc964aae409fb5691b743fba23 ]
      
      Hawaii needs to flush caches explicitly, submitting an IB in a user
      VMID from kernel mode. There is no s_fence in this case.
      
      Fixes: eb3961a5 ("drm/amdgpu: remove fence context from the job")
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e0af3b19
    • F
      netfilter: nf_tables: allow lookups in dynamic sets · f7ace7f2
      Florian Westphal 提交于
      [ Upstream commit acab713177377d9e0889c46bac7ff0cfb9a90c4d ]
      
      This un-breaks lookups in sets that have the 'dynamic' flag set.
      Given this active example configuration:
      
      table filter {
        set set1 {
          type ipv4_addr
          size 64
          flags dynamic,timeout
          timeout 1m
        }
      
        chain input {
           type filter hook input priority 0; policy accept;
        }
      }
      
      ... this works:
      nft add rule ip filter input add @set1 { ip saddr }
      
      -> whenever rule is triggered, the source ip address is inserted
      into the set (if it did not exist).
      
      This won't work:
      nft add rule ip filter input ip saddr @set1 counter
      Error: Could not process rule: Operation not supported
      
      In other words, we can add entries to the set, but then can't make
      matching decision based on that set.
      
      That is just wrong -- all set backends support lookups (else they would
      not be very useful).
      The failure comes from an explicit rejection in nft_lookup.c.
      
      Looking at the history, it seems like NFT_SET_EVAL used to mean
      'set contains expressions' (aka. "is a meter"), for instance something like
      
       nft add rule ip filter input meter example { ip saddr limit rate 10/second }
       or
       nft add rule ip filter input meter example { ip saddr counter }
      
      The actual meaning of NFT_SET_EVAL however, is
      'set can be updated from the packet path'.
      
      'meters' and packet-path insertions into sets, such as
      'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c)
      and thus require a set backend that provides the ->update() function.
      
      The only set that provides this also is the only one that has the
      NFT_SET_EVAL feature flag.
      
      Removing the wrong check makes the above example work.
      While at it, also fix the flag check during set instantiation to
      allow supported combinations only.
      
      Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f7ace7f2
    • R
      watchdog: aspeed: Add support for AST2600 · f217883b
      Ryan Chen 提交于
      [ Upstream commit b3528b4874480818e38e4da019d655413c233e6a ]
      
      The ast2600 can be supported by the same code as the ast2500.
      Signed-off-by: NRyan Chen <ryan_chen@aspeedtech.com>
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20190819051738.17370-3-joel@jms.id.auSigned-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f217883b
    • E
      ceph: reconnect connection if session hang in opening state · 520c2a64
      Erqi Chen 提交于
      [ Upstream commit 71a228bc8d65900179e37ac309e678f8c523f133 ]
      
      If client mds session is evicted in CEPH_MDS_SESSION_OPENING state,
      mds won't send session msg to client, and delayed_work skip
      CEPH_MDS_SESSION_OPENING state session, the session hang forever.
      
      Allow ceph_con_keepalive to reconnect a session in OPENING to avoid
      session hang. Also, ensure that we skip sessions in RESTARTING and
      REJECTED states since those states can't be resurrected by issuing
      a keepalive.
      
      Link: https://tracker.ceph.com/issues/41551
      Signed-off-by: Erqi Chen chenerqi@gmail.com
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      520c2a64
    • L
      ceph: fix directories inode i_blkbits initialization · 0275113f
      Luis Henriques 提交于
      [ Upstream commit 750670341a24cb714e624e0fd7da30900ad93752 ]
      
      When filling an inode with info from the MDS, i_blkbits is being
      initialized using fl_stripe_unit, which contains the stripe unit in
      bytes.  Unfortunately, this doesn't make sense for directories as they
      have fl_stripe_unit set to '0'.  This means that i_blkbits will be set
      to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
      
        UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12
        shift exponent 255 is too large for 32-bit type 'int'
      
      Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit
      is zero.
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0275113f
    • I
      xen/pci: reserve MCFG areas earlier · 2bc2a90a
      Igor Druzhinin 提交于
      [ Upstream commit a4098bc6eed5e31e0391bcc068e61804c98138df ]
      
      If MCFG area is not reserved in E820, Xen by default will defer its usage
      until Dom0 registers it explicitly after ACPI parser recognizes it as
      a reserved resource in DSDT. Having it reserved in E820 is not
      mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
      and firmware is free to keep a hole in E820 in that place. Xen doesn't know
      what exactly is inside this hole since it lacks full ACPI view of the
      platform therefore it's potentially harmful to access MCFG region
      without additional checks as some machines are known to provide
      inconsistent information on the size of the region.
      
      Now xen_mcfg_late() runs after acpi_init() which is too late as some basic
      PCI enumeration starts exactly there as well. Trying to register a device
      prior to MCFG reservation causes multiple problems with PCIe extended
      capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are
      no convenient hooks for us to subscribe to so register MCFG areas earlier
      upon the first invocation of xen_add_device(). It should be safe to do once
      since all the boot time buses must have their MCFG areas in MCFG table
      already and we don't support PCI bus hot-plug.
      Signed-off-by: NIgor Druzhinin <igor.druzhinin@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2bc2a90a
    • C
      9p: avoid attaching writeback_fid on mmap with type PRIVATE · 18dd2b05
      Chengguang Xu 提交于
      [ Upstream commit c87a37ebd40b889178664c2c09cc187334146292 ]
      
      Currently on mmap cache policy, we always attach writeback_fid
      whether mmap type is SHARED or PRIVATE. However, in the use case
      of kata-container which combines 9p(Guest OS) with overlayfs(Host OS),
      this behavior will trigger overlayfs' copy-up when excute command
      inside container.
      
      Link: http://lkml.kernel.org/r/20190820100325.10313-1-cgxu519@zoho.com.cnSigned-off-by: NChengguang Xu <cgxu519@zoho.com.cn>
      Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      18dd2b05
    • L
      9p: Transport error uninitialized · 07f3596c
      Lu Shuaibing 提交于
      [ Upstream commit 0ce772fe79b68f83df40f07f28207b292785c677 ]
      
      The p9_tag_alloc() does not initialize the transport error t_err field.
      The struct p9_req_t *req is allocated and stored in a struct p9_client
      variable. The field t_err is never initialized before p9_conn_cancel()
      checks its value.
      
      KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
      reports this bug.
      
      ==================================================================
      BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0
      Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216
      
      CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      Workqueue: events p9_write_work
      Call Trace:
       dump_stack+0x75/0xae
       __kumsan_report+0x17c/0x3e6
       kumsan_report+0xe/0x20
       p9_conn_cancel+0x2d9/0x3b0
       p9_write_work+0x183/0x4a0
       process_one_work+0x4d1/0x8c0
       worker_thread+0x6e/0x780
       kthread+0x1ca/0x1f0
       ret_from_fork+0x35/0x40
      
      Allocated by task 1979:
       save_stack+0x19/0x80
       __kumsan_kmalloc.constprop.3+0xbc/0x120
       kmem_cache_alloc+0xa7/0x170
       p9_client_prepare_req.part.9+0x3b/0x380
       p9_client_rpc+0x15e/0x880
       p9_client_create+0x3d0/0xac0
       v9fs_session_init+0x192/0xc80
       v9fs_mount+0x67/0x470
       legacy_get_tree+0x70/0xd0
       vfs_get_tree+0x4a/0x1c0
       do_mount+0xba9/0xf90
       ksys_mount+0xa8/0x120
       __x64_sys_mount+0x62/0x70
       do_syscall_64+0x6d/0x1e0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 0:
      (stack is not available)
      
      The buggy address belongs to the object at ffff88805f9b6008
       which belongs to the cache p9_req_t of size 144
      The buggy address is located 4 bytes inside of
       144-byte region [ffff88805f9b6008, ffff88805f9b6098)
      The buggy address belongs to the page:
      page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0
      flags: 0x100000000010200(slab|head)
      raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740
      raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000
      page dumped because: kumsan: bad access detected
      ==================================================================
      
      Link: http://lkml.kernel.org/r/20190613070854.10434-1-shuaibinglu@126.comSigned-off-by: NLu Shuaibing <shuaibinglu@126.com>
      [dominique.martinet@cea.fr: grouped the added init with the others]
      Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      07f3596c
    • J
      fs: nfs: Fix possible null-pointer dereferences in encode_attrs() · 448deb13
      Jia-Ju Bai 提交于
      [ Upstream commit e2751463eaa6f9fec8fea80abbdc62dbc487b3c5 ]
      
      In encode_attrs(), there is an if statement on line 1145 to check
      whether label is NULL:
          if (label && (attrmask[2] & FATTR4_WORD2_SECURITY_LABEL))
      
      When label is NULL, it is used on lines 1178-1181:
          *p++ = cpu_to_be32(label->lfs);
          *p++ = cpu_to_be32(label->pi);
          *p++ = cpu_to_be32(label->len);
          p = xdr_encode_opaque_fixed(p, label->label, label->len);
      
      To fix these bugs, label is checked before being used.
      
      These bugs are found by a static analysis tool STCheck written by us.
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      448deb13
    • S
      ima: fix freeing ongoing ahash_request · 4753e7a8
      Sascha Hauer 提交于
      [ Upstream commit 4ece3125f21b1d42b84896c5646dbf0e878464e1 ]
      
      integrity_kernel_read() can fail in which case we forward to call
      ahash_request_free() on a currently running request. We have to wait
      for its completion before we can free the request.
      
      This was observed by interrupting a "find / -type f -xdev -print0 | xargs -0
      cat 1>/dev/null" with ctrl-c on an IMA enabled filesystem.
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4753e7a8
    • S
      ima: always return negative code for error · b69c3085
      Sascha Hauer 提交于
      [ Upstream commit f5e1040196dbfe14c77ce3dfe3b7b08d2d961e88 ]
      
      integrity_kernel_read() returns the number of bytes read. If this is
      a short read then this positive value is returned from
      ima_calc_file_hash_atfm(). Currently this is only indirectly called from
      ima_calc_file_hash() and this function only tests for the return value
      being zero or nonzero and also doesn't forward the return value.
      Nevertheless there's no point in returning a positive value as an error,
      so translate a short read into -EINVAL.
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b69c3085
    • W
      arm64: cpufeature: Detect SSBS and advertise to userspace · 6df3c66d
      Will Deacon 提交于
      commit d71be2b6c0e19180b5f80a6d42039cc074a693a2 upstream.
      
      Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass
      Safe (SSBS) which can be used as a mitigation against Spectre variant 4.
      
      Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS
      directly, so that userspace can toggle the SSBS control without trapping
      to the kernel.
      
      This patch probes for the existence of SSBS and advertise the new instructions
      to userspace if they exist.
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6df3c66d
    • J
      cfg80211: initialize on-stack chandefs · 3a0e6733
      Johannes Berg 提交于
      commit f43e5210c739fe76a4b0ed851559d6902f20ceb1 upstream.
      
      In a few places we don't properly initialize on-stack chandefs,
      resulting in EDMG data to be non-zero, which broke things.
      
      Additionally, in a few places we rely on the driver to init the
      data completely, but perhaps we shouldn't as non-EDMG drivers
      may not initialize the EDMG data, also initialize it there.
      
      Cc: stable@vger.kernel.org
      Fixes: 2a38075cd0be ("nl80211: Add support for EDMG channels")
      Reported-by: NDmitry Osipenko <digetx@gmail.com>
      Tested-by: NDmitry Osipenko <digetx@gmail.com>
      Link: https://lore.kernel.org/r/1569239475-I2dcce394ecf873376c386a78f31c2ec8b538fa25@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a0e6733
    • V
      s390/cio: avoid calling strlen on null pointer · 16c75eb1
      Vasily Gorbik 提交于
      commit ea298e6ee8b34b3ed4366be7eb799d0650ebe555 upstream.
      
      Fix the following kasan finding:
      BUG: KASAN: global-out-of-bounds in ccwgroup_create_dev+0x850/0x1140
      Read of size 1 at addr 0000000000000000 by task systemd-udevd.r/561
      
      CPU: 30 PID: 561 Comm: systemd-udevd.r Tainted: G    B
      Hardware name: IBM 3906 M04 704 (LPAR)
      Call Trace:
      ([<0000000231b3db7e>] show_stack+0x14e/0x1a8)
       [<0000000233826410>] dump_stack+0x1d0/0x218
       [<000000023216fac4>] print_address_description+0x64/0x380
       [<000000023216f5a8>] __kasan_report+0x138/0x168
       [<00000002331b8378>] ccwgroup_create_dev+0x850/0x1140
       [<00000002332b618a>] group_store+0x3a/0x50
       [<00000002323ac706>] kernfs_fop_write+0x246/0x3b8
       [<00000002321d409a>] vfs_write+0x132/0x450
       [<00000002321d47da>] ksys_write+0x122/0x208
       [<0000000233877102>] system_call+0x2a6/0x2c8
      
      Triggered by:
      openat(AT_FDCWD, "/sys/bus/ccwgroup/drivers/qeth/group",
      		O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 16
      write(16, "0.0.bd00,0.0.bd01,0.0.bd02", 26) = 26
      
      The problem is that __get_next_id in ccwgroup_create_dev might set "buf"
      buffer pointer to NULL and explicit check for that is required.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NSebastian Ott <sebott@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16c75eb1
    • J
      ieee802154: atusb: fix use-after-free at disconnect · 3f41e88f
      Johan Hovold 提交于
      commit 7fd25e6fc035f4b04b75bca6d7e8daa069603a76 upstream.
      
      The disconnect callback was accessing the hardware-descriptor private
      data after having having freed it.
      
      Fixes: 7490b008 ("ieee802154: add support for atusb transceiver")
      Cc: stable <stable@vger.kernel.org>     # 4.2
      Cc: Alexander Aring <alex.aring@gmail.com>
      Reported-by: syzbot+f4509a9138a1472e7e80@syzkaller.appspotmail.com
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f41e88f
    • J
      xen/xenbus: fix self-deadlock after killing user process · 975859bb
      Juergen Gross 提交于
      commit a8fabb38525c51a094607768bac3ba46b3f4a9d5 upstream.
      
      In case a user process using xenbus has open transactions and is killed
      e.g. via ctrl-C the following cleanup of the allocated resources might
      result in a deadlock due to trying to end a transaction in the xenbus
      worker thread:
      
      [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
      [ 2551.492215]       Tainted: P           OE     5.0.0-29-generic #5
      [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 2551.528585] xenbus          D    0    37      2 0x80000080
      [ 2551.528590] Call Trace:
      [ 2551.528603]  __schedule+0x2c0/0x870
      [ 2551.528606]  ? _cond_resched+0x19/0x40
      [ 2551.528632]  schedule+0x2c/0x70
      [ 2551.528637]  xs_talkv+0x1ec/0x2b0
      [ 2551.528642]  ? wait_woken+0x80/0x80
      [ 2551.528645]  xs_single+0x53/0x80
      [ 2551.528648]  xenbus_transaction_end+0x3b/0x70
      [ 2551.528651]  xenbus_file_free+0x5a/0x160
      [ 2551.528654]  xenbus_dev_queue_reply+0xc4/0x220
      [ 2551.528657]  xenbus_thread+0x7de/0x880
      [ 2551.528660]  ? wait_woken+0x80/0x80
      [ 2551.528665]  kthread+0x121/0x140
      [ 2551.528667]  ? xb_read+0x1d0/0x1d0
      [ 2551.528670]  ? kthread_park+0x90/0x90
      [ 2551.528673]  ret_from_fork+0x35/0x40
      
      Fix this by doing the cleanup via a workqueue instead.
      Reported-by: NJames Dingwall <james@dingwall.me.uk>
      Fixes: fd8aa909 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
      Cc: <stable@vger.kernel.org> # 4.11
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      975859bb
    • W
      Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" · e409b81d
      Wanpeng Li 提交于
      commit 89340d0935c9296c7b8222b6eab30e67cb57ab82 upstream.
      
      This patch reverts commit 75437bb3 (locking/pvqspinlock: Don't
      wait if vCPU is preempted).  A large performance regression was caused
      by this commit.  on over-subscription scenarios.
      
      The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads,
      with three VMs of 80 vCPUs each.  The score of ebizzy -M is reduced from
      13000-14000 records/s to 1700-1800 records/s:
      
                Host                Guest                score
      
      vanilla w/o kvm optimizations     upstream    1700-1800 records/s
      vanilla w/o kvm optimizations     revert      13000-14000 records/s
      vanilla w/ kvm optimizations      upstream    4500-5000 records/s
      vanilla w/ kvm optimizations      revert      14000-15500 records/s
      
      Exit from aggressive wait-early mechanism can result in premature yield
      and extra scheduling latency.
      
      Actually, only 6% of wait_early events are caused by vcpu_is_preempted()
      being true.  However, when one vCPU voluntarily releases its vCPU, all
      the subsequently waiters in the queue will do the same and the cascading
      effect leads to bad performance.
      
      kvm optimizations:
      [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
      [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
      
      Tested-by: loobinliu@tencent.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: loobinliu@tencent.com
      Cc: stable@vger.kernel.org
      Fixes: 75437bb3 (locking/pvqspinlock: Don't wait if vCPU is preempted)
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e409b81d
    • R
      mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence · 7ed2867c
      Russell King 提交于
      commit 121bd08b029e03404c451bb237729cdff76eafed upstream.
      
      We must not unconditionally set the DMA snoop bit; if the DMA API is
      assuming that the device is not DMA coherent, and the device snoops the
      CPU caches, the device can see stale cache lines brought in by
      speculative prefetch.
      
      This leads to the device seeing stale data, potentially resulting in
      corrupted data transfers.  Commonly, this results in a descriptor fetch
      error such as:
      
      mmc0: ADMA error
      mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
      mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
      mmc0: sdhci: Blk size:  0x00000008 | Blk cnt:  0x00000001
      mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
      mmc0: sdhci: Present:   0x01f50008 | Host ctl: 0x00000038
      mmc0: sdhci: Power:     0x00000003 | Blk gap:  0x00000000
      mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x000040d8
      mmc0: sdhci: Timeout:   0x00000003 | Int stat: 0x00000001
      mmc0: sdhci: Int enab:  0x037f108f | Sig enab: 0x037f108b
      mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202
      mmc0: sdhci: Caps:      0x35fa0000 | Caps_1:   0x0000af00
      mmc0: sdhci: Cmd:       0x0000333a | Max curr: 0x00000000
      mmc0: sdhci: Resp[0]:   0x00000920 | Resp[1]:  0x001d8a33
      mmc0: sdhci: Resp[2]:   0x325b5900 | Resp[3]:  0x3f400e00
      mmc0: sdhci: Host ctl2: 0x00000000
      mmc0: sdhci: ADMA Err:  0x00000009 | ADMA Ptr: 0x000000236d43820c
      mmc0: sdhci: ============================================
      mmc0: error -5 whilst initialising SD card
      
      but can lead to other errors, and potentially direct the SDHCI
      controller to read/write data to other memory locations (e.g. if a valid
      descriptor is visible to the device in a stale cache line.)
      
      Fix this by ensuring that the DMA snoop bit corresponds with the
      behaviour of the DMA API.  Since the driver currently only supports DT,
      use of_dma_is_coherent().  Note that device_get_dma_attr() can not be
      used as that risks re-introducing this bug if/when the driver is
      converted to ACPI.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ed2867c