1. 19 9月, 2019 3 次提交
    • Y
      modules: fix compile error if don't have strict module rwx · 52bfcc9c
      Yang Yingliang 提交于
      commit 93651f80dcb616b8c9115cdafc8e57a781af22d0 upstream.
      
      If CONFIG_ARCH_HAS_STRICT_MODULE_RWX is not defined,
      we need stub for module_enable_nx() and module_enable_x().
      
      If CONFIG_ARCH_HAS_STRICT_MODULE_RWX is defined, but
      CONFIG_STRICT_MODULE_RWX is disabled, we need stub for
      module_enable_nx.
      
      Move frob_text() outside of the CONFIG_STRICT_MODULE_RWX,
      because it is needed anyway.
      
      Fixes: 2eef1399a866 ("modules: fix BUG when load module with rodata=n")
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NJessica Yu <jeyu@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52bfcc9c
    • Y
      modules: fix BUG when load module with rodata=n · ae415d7a
      Yang Yingliang 提交于
      commit 2eef1399a866c57687962e15142b141a4f8e7862 upstream.
      
      When loading a module with rodata=n, it causes an executing
      NX-protected page BUG.
      
      [   32.379191] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
      [   32.382917] BUG: unable to handle page fault for address: ffffffffc0005000
      [   32.385947] #PF: supervisor instruction fetch in kernel mode
      [   32.387662] #PF: error_code(0x0011) - permissions violation
      [   32.389352] PGD 240c067 P4D 240c067 PUD 240e067 PMD 421a52067 PTE 8000000421a53063
      [   32.391396] Oops: 0011 [#1] SMP PTI
      [   32.392478] CPU: 7 PID: 2697 Comm: insmod Tainted: G           O      5.2.0-rc5+ #202
      [   32.394588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   32.398157] RIP: 0010:ko_test_init+0x0/0x1000 [ko_test]
      [   32.399662] Code: Bad RIP value.
      [   32.400621] RSP: 0018:ffffc900029f3ca8 EFLAGS: 00010246
      [   32.402171] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [   32.404332] RDX: 00000000000004c7 RSI: 0000000000000cc0 RDI: ffffffffc0005000
      [   32.406347] RBP: ffffffffc0005000 R08: ffff88842fbebc40 R09: ffffffff810ede4a
      [   32.408392] R10: ffffea00108e3480 R11: 0000000000000000 R12: ffff88842bee21a0
      [   32.410472] R13: 0000000000000001 R14: 0000000000000001 R15: ffffc900029f3e78
      [   32.412609] FS:  00007fb4f0c0a700(0000) GS:ffff88842fbc0000(0000) knlGS:0000000000000000
      [   32.414722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   32.416290] CR2: ffffffffc0004fd6 CR3: 0000000421a90004 CR4: 0000000000020ee0
      [   32.418471] Call Trace:
      [   32.419136]  do_one_initcall+0x41/0x1df
      [   32.420199]  ? _cond_resched+0x10/0x40
      [   32.421433]  ? kmem_cache_alloc_trace+0x36/0x160
      [   32.422827]  do_init_module+0x56/0x1f7
      [   32.423946]  load_module+0x1e67/0x2580
      [   32.424947]  ? __alloc_pages_nodemask+0x150/0x2c0
      [   32.426413]  ? map_vm_area+0x2d/0x40
      [   32.427530]  ? __vmalloc_node_range+0x1ef/0x260
      [   32.428850]  ? __do_sys_init_module+0x135/0x170
      [   32.430060]  ? _cond_resched+0x10/0x40
      [   32.431249]  __do_sys_init_module+0x135/0x170
      [   32.432547]  do_syscall_64+0x43/0x120
      [   32.433853]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Because if rodata=n, set_memory_x() can't be called, fix this by
      calling set_memory_x in complete_formation();
      
      Fixes: f2c65fb3221a ("x86/modules: Avoid breaking W^X while loading modules")
      Suggested-by: NJian Cheng <cj.chengjian@huawei.com>
      Reviewed-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NJessica Yu <jeyu@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae415d7a
    • Y
      genirq: Prevent NULL pointer dereference in resend_irqs() · 991b3458
      Yunfeng Ye 提交于
      commit eddf3e9c7c7e4d0707c68d1bb22cc6ec8aef7d4a upstream.
      
      The following crash was observed:
      
        Unable to handle kernel NULL pointer dereference at 0000000000000158
        Internal error: Oops: 96000004 [#1] SMP
        pc : resend_irqs+0x68/0xb0
        lr : resend_irqs+0x64/0xb0
        ...
        Call trace:
         resend_irqs+0x68/0xb0
         tasklet_action_common.isra.6+0x84/0x138
         tasklet_action+0x2c/0x38
         __do_softirq+0x120/0x324
         run_ksoftirqd+0x44/0x60
         smpboot_thread_fn+0x1ac/0x1e8
         kthread+0x134/0x138
         ret_from_fork+0x10/0x18
      
      The reason for this is that the interrupt resend mechanism happens in soft
      interrupt context, which is a asynchronous mechanism versus other
      operations on interrupts. free_irq() does not take resend handling into
      account. Thus, the irq descriptor might be already freed before the resend
      tasklet is executed. resend_irqs() does not check the return value of the
      interrupt descriptor lookup and derefences the return value
      unconditionally.
      
        1):
        __setup_irq
          irq_startup
            check_irq_resend  // activate softirq to handle resend irq
        2):
        irq_domain_free_irqs
          irq_free_descs
            free_desc
              call_rcu(&desc->rcu, delayed_free_desc)
        3):
        __do_softirq
          tasklet_action
            resend_irqs
              desc = irq_to_desc(irq)
              desc->handle_irq(desc)  // desc is NULL --> Ooops
      
      Fix this by adding a NULL pointer check in resend_irqs() before derefencing
      the irq descriptor.
      
      Fixes: a4633adc ("[PATCH] genirq: add genirq sw IRQ-retrigger")
      Signed-off-by: NYunfeng Ye <yeyunfeng@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1630ae13-5c8e-901e-de09-e740b6a426a7@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      991b3458
  2. 16 9月, 2019 7 次提交
  3. 10 9月, 2019 1 次提交
    • A
      kprobes: Fix potential deadlock in kprobe_optimizer() · 5e1d50a3
      Andrea Righi 提交于
      [ Upstream commit f1c6ece23729257fb46562ff9224cf5f61b818da ]
      
      lockdep reports the following deadlock scenario:
      
       WARNING: possible circular locking dependency detected
      
       kworker/1:1/48 is trying to acquire lock:
       000000008d7a62b2 (text_mutex){+.+.}, at: kprobe_optimizer+0x163/0x290
      
       but task is already holding lock:
       00000000850b5e2d (module_mutex){+.+.}, at: kprobe_optimizer+0x31/0x290
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (module_mutex){+.+.}:
              __mutex_lock+0xac/0x9f0
              mutex_lock_nested+0x1b/0x20
              set_all_modules_text_rw+0x22/0x90
              ftrace_arch_code_modify_prepare+0x1c/0x20
              ftrace_run_update_code+0xe/0x30
              ftrace_startup_enable+0x2e/0x50
              ftrace_startup+0xa7/0x100
              register_ftrace_function+0x27/0x70
              arm_kprobe+0xb3/0x130
              enable_kprobe+0x83/0xa0
              enable_trace_kprobe.part.0+0x2e/0x80
              kprobe_register+0x6f/0xc0
              perf_trace_event_init+0x16b/0x270
              perf_kprobe_init+0xa7/0xe0
              perf_kprobe_event_init+0x3e/0x70
              perf_try_init_event+0x4a/0x140
              perf_event_alloc+0x93a/0xde0
              __do_sys_perf_event_open+0x19f/0xf30
              __x64_sys_perf_event_open+0x20/0x30
              do_syscall_64+0x65/0x1d0
              entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       -> #0 (text_mutex){+.+.}:
              __lock_acquire+0xfcb/0x1b60
              lock_acquire+0xca/0x1d0
              __mutex_lock+0xac/0x9f0
              mutex_lock_nested+0x1b/0x20
              kprobe_optimizer+0x163/0x290
              process_one_work+0x22b/0x560
              worker_thread+0x50/0x3c0
              kthread+0x112/0x150
              ret_from_fork+0x3a/0x50
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(module_mutex);
                                      lock(text_mutex);
                                      lock(module_mutex);
         lock(text_mutex);
      
        *** DEADLOCK ***
      
      As a reproducer I've been using bcc's funccount.py
      (https://github.com/iovisor/bcc/blob/master/tools/funccount.py),
      for example:
      
       # ./funccount.py '*interrupt*'
      
      That immediately triggers the lockdep splat.
      
      Fix by acquiring text_mutex before module_mutex in kprobe_optimizer().
      Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: d5b844a2cf50 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()")
      Link: http://lkml.kernel.org/r/20190812184302.GA7010@xps-13Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5e1d50a3
  4. 06 9月, 2019 3 次提交
  5. 29 8月, 2019 1 次提交
  6. 25 8月, 2019 1 次提交
  7. 16 8月, 2019 1 次提交
  8. 09 8月, 2019 5 次提交
    • T
      cgroup: Fix css_task_iter_advance_css_set() cset skip condition · ebda41dd
      Tejun Heo 提交于
      commit c596687a008b579c503afb7a64fcacc7270fae9e upstream.
      
      While adding handling for dying task group leaders c03cd7738a83
      ("cgroup: Include dying leaders with live threads in PROCS
      iterations") added an inverted cset skip condition to
      css_task_iter_advance_css_set().  It should skip cset if it's
      completely empty but was incorrectly testing for the inverse condition
      for the dying_tasks list.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations")
      Reported-by: syzbot+d4bba5ccd4f9a2a68681@syzkaller.appspotmail.com
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebda41dd
    • T
      cgroup: css_task_iter_skip()'d iterators must be advanced before accessed · 0a9abd27
      Tejun Heo 提交于
      commit cee0c33c546a93957a52ae9ab6bebadbee765ec5 upstream.
      
      b636fd38dc40 ("cgroup: Implement css_task_iter_skip()") introduced
      css_task_iter_skip() which is used to fix task iterations skipping
      dying threadgroup leaders with live threads.  Skipping is implemented
      as a subportion of full advancing but css_task_iter_next() forgot to
      fully advance a skipped iterator before determining the next task to
      visit causing it to return invalid task pointers.
      
      Fix it by making css_task_iter_next() fully advance the iterator if it
      has been skipped since the previous iteration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: syzbot
      Link: http://lkml.kernel.org/r/00000000000097025d058a7fd785@google.com
      Fixes: b636fd38dc40 ("cgroup: Implement css_task_iter_skip()")
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a9abd27
    • T
      cgroup: Include dying leaders with live threads in PROCS iterations · 4340d175
      Tejun Heo 提交于
      commit c03cd7738a83b13739f00546166969342c8ff014 upstream.
      
      CSS_TASK_ITER_PROCS currently iterates live group leaders; however,
      this means that a process with dying leader and live threads will be
      skipped.  IOW, cgroup.procs might be empty while cgroup.threads isn't,
      which is confusing to say the least.
      
      Fix it by making cset track dying tasks and include dying leaders with
      live threads in PROCS iteration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NTopi Miettinen <toiwoton@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4340d175
    • T
      cgroup: Implement css_task_iter_skip() · 370b9e63
      Tejun Heo 提交于
      commit b636fd38dc40113f853337a7d2a6885ad23b8811 upstream.
      
      When a task is moved out of a cset, task iterators pointing to the
      task are advanced using the normal css_task_iter_advance() call.  This
      is fine but we'll be tracking dying tasks on csets and thus moving
      tasks from cset->tasks to (to be added) cset->dying_tasks.  When we
      remove a task from cset->tasks, if we advance the iterators, they may
      move over to the next cset before we had the chance to add the task
      back on the dying list, which can allow the task to escape iteration.
      
      This patch separates out skipping from advancing.  Skipping only moves
      the affected iterators to the next pointer rather than fully advancing
      it and the following advancing will recognize that the cursor has
      already been moved forward and do the rest of advancing.  This ensures
      that when a task moves from one list to another in its cset, as long
      as it moves in the right direction, it's always visible to iteration.
      
      This doesn't cause any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      370b9e63
    • T
      cgroup: Call cgroup_release() before __exit_signal() · 7528e95b
      Tejun Heo 提交于
      commit 6b115bf58e6f013ca75e7115aabcbd56c20ff31d upstream.
      
      cgroup_release() calls cgroup_subsys->release() which is used by the
      pids controller to uncharge its pid.  We want to use it to manage
      iteration of dying tasks which requires putting it before
      __unhash_process().  Move cgroup_release() above __exit_signal().
      While this makes it uncharge before the pid is freed, pid is RCU freed
      anyway and the window is very narrow.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7528e95b
  9. 07 8月, 2019 2 次提交
    • P
      kernel/module.c: Only return -EEXIST for modules that have finished loading · 09ec6c67
      Prarit Bhargava 提交于
      [ Upstream commit 6e6de3dee51a439f76eb73c22ae2ffd2c9384712 ]
      
      Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
      linux guests boot with repeated errors:
      
      amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
      amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
      amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
      amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
      amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
      amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
      
      The warnings occur because the module code erroneously returns -EEXIST
      for modules that have failed to load and are in the process of being
      removed from the module list.
      
      module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
      modules.dep, systemd will load edac_mce_amd for every request of
      amd64_edac_mod.  When the edac_mce_amd module loads, the module has
      state MODULE_STATE_UNFORMED and once the module load fails and the state
      becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
      executes and add_unformed_module() will erroneously return -EEXIST even
      though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
      Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
      fails because of unknown symbols from edac_mce_amd.
      
      add_unformed_module() must wait to return for any case other than
      MODULE_STATE_LIVE to prevent a race between multiple loads of
      dependent modules.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NBarret Rhoden <brho@google.com>
      Cc: David Arcari <darcari@redhat.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NJessica Yu <jeyu@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      09ec6c67
    • C
      ftrace: Enable trampoline when rec count returns back to one · f486088d
      Cheng Jian 提交于
      [ Upstream commit a124692b698b00026a58d89831ceda2331b2e1d0 ]
      
      Custom trampolines can only be enabled if there is only a single ops
      attached to it. If there's only a single callback registered to a function,
      and the ops has a trampoline registered for it, then we can call the
      trampoline directly. This is very useful for improving the performance of
      ftrace and livepatch.
      
      If more than one callback is registered to a function, the general
      trampoline is used, and the custom trampoline is not restored back to the
      direct call even if all the other callbacks were unregistered and we are
      back to one callback for the function.
      
      To fix this, set FTRACE_FL_TRAMP flag if rec count is decremented
      to one, and the ops that left has a trampoline.
      
      Testing After this patch :
      
      insmod livepatch_unshare_files.ko
      cat /sys/kernel/debug/tracing/enabled_functions
      
      	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0
      
      echo unshare_files > /sys/kernel/debug/tracing/set_ftrace_filter
      echo function > /sys/kernel/debug/tracing/current_tracer
      cat /sys/kernel/debug/tracing/enabled_functions
      
      	unshare_files (2) R I ->ftrace_ops_list_func+0x0/0x150
      
      echo nop > /sys/kernel/debug/tracing/current_tracer
      cat /sys/kernel/debug/tracing/enabled_functions
      
      	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0
      
      Link: http://lkml.kernel.org/r/1556969979-111047-1-git-send-email-cj.chengjian@huawei.comSigned-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f486088d
  10. 04 8月, 2019 2 次提交
  11. 31 7月, 2019 3 次提交
    • L
      access: avoid the RCU grace period for the temporary subjective credentials · 408af823
      Linus Torvalds 提交于
      commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream.
      
      It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
      work because it installs a temporary credential that gets allocated and
      freed for each system call.
      
      The allocation and freeing overhead is mostly benign, but because
      credentials can be accessed under the RCU read lock, the freeing
      involves a RCU grace period.
      
      Which is not a huge deal normally, but if you have a lot of access()
      calls, this causes a fair amount of seconday damage: instead of having a
      nice alloc/free patterns that hits in hot per-CPU slab caches, you have
      all those delayed free's, and on big machines with hundreds of cores,
      the RCU overhead can end up being enormous.
      
      But it turns out that all of this is entirely unnecessary.  Exactly
      because access() only installs the credential as the thread-local
      subjective credential, the temporary cred pointer doesn't actually need
      to be RCU free'd at all.  Once we're done using it, we can just free it
      synchronously and avoid all the RCU overhead.
      
      So add a 'non_rcu' flag to 'struct cred', which can be set by users that
      know they only use it in non-RCU context (there are other potential
      users for this).  We can make it a union with the rcu freeing list head
      that we need for the RCU case, so this doesn't need any extra storage.
      
      Note that this also makes 'get_current_cred()' clear the new non_rcu
      flag, in case we have filesystems that take a long-term reference to the
      cred and then expect the RCU delayed freeing afterwards.  It's not
      entirely clear that this is required, but it makes for clear semantics:
      the subjective cred remains non-RCU as long as you only access it
      synchronously using the thread-local accessors, but you _can_ use it as
      a generic cred if you want to.
      
      It is possible that we should just remove the whole RCU markings for
      ->cred entirely.  Only ->real_cred is really supposed to be accessed
      through RCU, and the long-term cred copies that nfs uses might want to
      explicitly re-enable RCU freeing if required, rather than have
      get_current_cred() do it implicitly.
      
      But this is a "minimal semantic changes" change for the immediate
      problem.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jan Glauber <jglauber@marvell.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      408af823
    • A
      locking/lockdep: Hide unused 'class' variable · 148959cc
      Arnd Bergmann 提交于
      [ Upstream commit 68037aa78208f34bda4e5cd76c357f718b838cbb ]
      
      The usage is now hidden in an #ifdef, so we need to move
      the variable itself in there as well to avoid this warning:
      
        kernel/locking/lockdep_proc.c:203:21: error: unused variable 'class' [-Werror,-Wunused-variable]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yuyang Du <duyuyang@gmail.com>
      Cc: frederic@kernel.org
      Fixes: 68d41d8c94a3 ("locking/lockdep: Fix lock used or unused stats error")
      Link: https://lkml.kernel.org/r/20190715092809.736834-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      148959cc
    • Y
      locking/lockdep: Fix lock used or unused stats error · 4acb04ef
      Yuyang Du 提交于
      [ Upstream commit 68d41d8c94a31dfb8233ab90b9baf41a2ed2da68 ]
      
      The stats variable nr_unused_locks is incremented every time a new lock
      class is register and decremented when the lock is first used in
      __lock_acquire(). And after all, it is shown and checked in lockdep_stats.
      
      However, under configurations that either CONFIG_TRACE_IRQFLAGS or
      CONFIG_PROVE_LOCKING is not defined:
      
      The commit:
      
        091806515124b20 ("locking/lockdep: Consolidate lock usage bit initialization")
      
      missed marking the LOCK_USED flag at IRQ usage initialization because
      as mark_usage() is not called. And the commit:
      
        886532aee3cd42d ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING")
      
      further made mark_lock() not defined such that the LOCK_USED cannot be
      marked at all when the lock is first acquired.
      
      As a result, we fix this by not showing and checking the stats under such
      configurations for lockdep_stats.
      Reported-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NYuyang Du <duyuyang@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: arnd@arndb.de
      Cc: frederic@kernel.org
      Link: https://lkml.kernel.org/r/20190709101522.9117-1-duyuyang@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4acb04ef
  12. 28 7月, 2019 2 次提交
    • P
      perf/core: Fix race between close() and fork() · 4a5cc64d
      Peter Zijlstra 提交于
      commit 1cf8dfe8a661f0462925df943140e9f6d1ea5233 upstream.
      
      Syzcaller reported the following Use-after-Free bug:
      
      	close()						clone()
      
      							  copy_process()
      							    perf_event_init_task()
      							      perf_event_init_context()
      							        mutex_lock(parent_ctx->mutex)
      								inherit_task_group()
      								  inherit_group()
      								    inherit_event()
      								      mutex_lock(event->child_mutex)
      								      // expose event on child list
      								      list_add_tail()
      								      mutex_unlock(event->child_mutex)
      							        mutex_unlock(parent_ctx->mutex)
      
      							    ...
      							    goto bad_fork_*
      
      							  bad_fork_cleanup_perf:
      							    perf_event_free_task()
      
      	  perf_release()
      	    perf_event_release_kernel()
      	      list_for_each_entry()
      		mutex_lock(ctx->mutex)
      		mutex_lock(event->child_mutex)
      		// event is from the failing inherit
      		// on the other CPU
      		perf_remove_from_context()
      		list_move()
      		mutex_unlock(event->child_mutex)
      		mutex_unlock(ctx->mutex)
      
      							      mutex_lock(ctx->mutex)
      							      list_for_each_entry_safe()
      							        // event already stolen
      							      mutex_unlock(ctx->mutex)
      
      							    delayed_free_task()
      							      free_task()
      
      	     list_for_each_entry_safe()
      	       list_del()
      	       free_event()
      	         _free_event()
      		   // and so event->hw.target
      		   // is the already freed failed clone()
      		   if (event->hw.target)
      		     put_task_struct(event->hw.target)
      		       // WHOOPSIE, already quite dead
      
      Which puts the lie to the the comment on perf_event_free_task():
      'unexposed, unused context' not so much.
      
      Which is a 'fun' confluence of fail; copy_process() doing an
      unconditional free_task() and not respecting refcounts, and perf having
      creative locking. In particular:
      
        82d94856 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
      
      seems to have overlooked this 'fun' parade.
      
      Solve it by using the fact that detached events still have a reference
      count on their (previous) context. With this perf_event_free_task()
      can detect when events have escaped and wait for their destruction.
      Debugged-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 82d94856 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a5cc64d
    • A
      perf/core: Fix exclusive events' grouping · 75100ec5
      Alexander Shishkin 提交于
      commit 8a58ddae23796c733c5dfbd717538d89d036c5bd upstream.
      
      So far, we tried to disallow grouping exclusive events for the fear of
      complications they would cause with moving between contexts. Specifically,
      moving a software group to a hardware context would violate the exclusivity
      rules if both groups contain matching exclusive events.
      
      This attempt was, however, unsuccessful: the check that we have in the
      perf_event_open() syscall is both wrong (looks at wrong PMU) and
      insufficient (group leader may still be exclusive), as can be illustrated
      by running:
      
        $ perf record -e '{intel_pt//,cycles}' uname
        $ perf record -e '{cycles,intel_pt//}' uname
      
      ultimately successfully.
      
      Furthermore, we are completely free to trigger the exclusivity violation
      by:
      
         perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'
      
      even though the helpful perf record will not allow that, the ABI will.
      
      The warning later in the perf_event_open() path will also not trigger, because
      it's also wrong.
      
      Fix all this by validating the original group before moving, getting rid
      of broken safeguards and placing a useful one to perf_install_in_context().
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: mathieu.poirier@linaro.org
      Cc: will.deacon@arm.com
      Fixes: bed5b25a ("perf: Add a pmu capability for "exclusive" events")
      Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75100ec5
  13. 26 7月, 2019 8 次提交
    • D
      padata: use smp_mb in padata_reorder to avoid orphaned padata jobs · 1e4247d7
      Daniel Jordan 提交于
      commit cf144f81a99d1a3928f90b0936accfd3f45c9a0a upstream.
      
      Testing padata with the tcrypt module on a 5.2 kernel...
      
          # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
          # modprobe tcrypt mode=211 sec=1
      
      ...produces this splat:
      
          INFO: task modprobe:10075 blocked for more than 120 seconds.
                Not tainted 5.2.0-base+ #16
          modprobe        D    0 10075  10064 0x80004080
          Call Trace:
           ? __schedule+0x4dd/0x610
           ? ring_buffer_unlock_commit+0x23/0x100
           schedule+0x6c/0x90
           schedule_timeout+0x3b/0x320
           ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
           wait_for_common+0x160/0x1a0
           ? wake_up_q+0x80/0x80
           { crypto_wait_req }             # entries in braces added by hand
           { do_one_aead_op }
           { test_aead_jiffies }
           test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
           do_test+0x4053/0x6a2b [tcrypt]
           ? 0xffffffffa00f4000
           tcrypt_mod_init+0x50/0x1000 [tcrypt]
           ...
      
      The second modprobe command never finishes because in padata_reorder,
      CPU0's load of reorder_objects is executed before the unlocking store in
      spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:
      
      CPU0                                 CPU1
      
      padata_reorder                       padata_do_serial
        LOAD reorder_objects  // 0
                                             INC reorder_objects  // 1
                                             padata_reorder
                                               TRYLOCK pd->lock   // failed
        UNLOCK pd->lock
      
      CPU0 deletes the timer before returning from padata_reorder and since no
      other job is submitted to padata, modprobe waits indefinitely.
      
      Add a pair of full barriers to guarantee proper ordering:
      
      CPU0                                 CPU1
      
      padata_reorder                       padata_do_serial
        UNLOCK pd->lock
        smp_mb()
        LOAD reorder_objects
                                             INC reorder_objects
                                             smp_mb__after_atomic()
                                             padata_reorder
                                               TRYLOCK pd->lock
      
      smp_mb__after_atomic is needed so the read part of the trylock operation
      comes after the INC, as Andrea points out.   Thanks also to Andrea for
      help with writing a litmus test.
      
      Fixes: 16295bec ("padata: Generic parallelization/serialization interface")
      Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-crypto@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e4247d7
    • N
      timer_list: Guard procfs specific code · b9f547b7
      Nathan Huckleberry 提交于
      [ Upstream commit a9314773a91a1d3b36270085246a6715a326ff00 ]
      
      With CONFIG_PROC_FS=n the following warning is emitted:
      
      kernel/time/timer_list.c:361:36: warning: unused variable
      'timer_list_sops' [-Wunused-const-variable]
         static const struct seq_operations timer_list_sops = {
      
      Add #ifdef guard around procfs specific code.
      Signed-off-by: NNathan Huckleberry <nhuck@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Cc: john.stultz@linaro.org
      Cc: sboyd@kernel.org
      Cc: clang-built-linux@googlegroups.com
      Link: https://github.com/ClangBuiltLinux/linux/issues/534
      Link: https://lkml.kernel.org/r/20190614181604.112297-1-nhuck@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      b9f547b7
    • M
      ntp: Limit TAI-UTC offset · d86c0b73
      Miroslav Lichvar 提交于
      [ Upstream commit d897a4ab11dc8a9fda50d2eccc081a96a6385998 ]
      
      Don't allow the TAI-UTC offset of the system clock to be set by adjtimex()
      to a value larger than 100000 seconds.
      
      This prevents an overflow in the conversion to int, prevents the CLOCK_TAI
      clock from getting too far ahead of the CLOCK_REALTIME clock, and it is
      still large enough to allow leap seconds to be inserted at the maximum rate
      currently supported by the kernel (once per day) for the next ~270 years,
      however unlikely it is that someone can survive a catastrophic event which
      slowed down the rotation of the Earth so much.
      Reported-by: NWeikang shi <swkhack@gmail.com>
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Link: https://lkml.kernel.org/r/20190618154713.20929-1-mlichvar@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      d86c0b73
    • Q
      sched/fair: Fix "runnable_avg_yN_inv" not used warnings · 7fc96cd2
      Qian Cai 提交于
      [ Upstream commit 509466b7d480bc5d22e90b9fbe6122ae0e2fbe39 ]
      
      runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was
      included in several other places because they need other macros all
      came from kernel/sched/sched-pelt.h which was generated by
      Documentation/scheduler/sched-pelt. As the result, it causes compilation
      a lot of warnings,
      
        kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
        kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
        kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
        ...
      
      Silence it by appending the __maybe_unused attribute for it, so all
      generated variables and macros can still be kept in the same file.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pwSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7fc96cd2
    • G
      sched/core: Add __sched tag for io_schedule() · d8b7db6c
      Gao Xiang 提交于
      [ Upstream commit e3b929b0a184edb35531153c5afcaebb09014f9d ]
      
      Non-inline io_schedule() was introduced in:
      
        commit 10ab5643 ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()")
      
      Keep in line with io_schedule_timeout(), otherwise "/proc/<pid>/wchan" will
      report io_schedule() rather than its callers when waiting for IO.
      Reported-by: NJilong Kou <koujilong@huawei.com>
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miao Xie <miaoxie@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 10ab5643 ("sched/core: Separate out io_schedule_prepare() and io_schedule_finish()")
      Link: https://lkml.kernel.org/r/20190603091338.2695-1-gaoxiang25@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d8b7db6c
    • V
      bpf: silence warning messages in core · 7c10f894
      Valdis Klētnieks 提交于
      [ Upstream commit aee450cbe482a8c2f6fa5b05b178ef8b8ff107ca ]
      
      Compiling kernel/bpf/core.c with W=1 causes a flood of warnings:
      
      kernel/bpf/core.c:1198:65: warning: initialized field overwritten [-Woverride-init]
       1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
            |                                                                 ^~~~
      kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
       1087 |  INSN_3(ALU, ADD,  X),   \
            |  ^~~~~~
      kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
       1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
            |   ^~~~~~~~~~~~
      kernel/bpf/core.c:1198:65: note: (near initialization for 'public_insntable[12]')
       1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
            |                                                                 ^~~~
      kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
       1087 |  INSN_3(ALU, ADD,  X),   \
            |  ^~~~~~
      kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
       1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
            |   ^~~~~~~~~~~~
      
      98 copies of the above.
      
      The attached patch silences the warnings, because we *know* we're overwriting
      the default initializer. That leaves bpf/core.c with only 6 other warnings,
      which become more visible in comparison.
      Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7c10f894
    • I
      locking/lockdep: Fix merging of hlocks with non-zero references · 2fbde274
      Imre Deak 提交于
      [ Upstream commit d9349850e188b8b59e5322fda17ff389a1c0cd7d ]
      
      The sequence
      
      	static DEFINE_WW_CLASS(test_ww_class);
      
      	struct ww_acquire_ctx ww_ctx;
      	struct ww_mutex ww_lock_a;
      	struct ww_mutex ww_lock_b;
      	struct ww_mutex ww_lock_c;
      	struct mutex lock_c;
      
      	ww_acquire_init(&ww_ctx, &test_ww_class);
      
      	ww_mutex_init(&ww_lock_a, &test_ww_class);
      	ww_mutex_init(&ww_lock_b, &test_ww_class);
      	ww_mutex_init(&ww_lock_c, &test_ww_class);
      
      	mutex_init(&lock_c);
      
      	ww_mutex_lock(&ww_lock_a, &ww_ctx);
      
      	mutex_lock(&lock_c);
      
      	ww_mutex_lock(&ww_lock_b, &ww_ctx);
      	ww_mutex_lock(&ww_lock_c, &ww_ctx);
      
      	mutex_unlock(&lock_c);	(*)
      
      	ww_mutex_unlock(&ww_lock_c);
      	ww_mutex_unlock(&ww_lock_b);
      	ww_mutex_unlock(&ww_lock_a);
      
      	ww_acquire_fini(&ww_ctx); (**)
      
      will trigger the following error in __lock_release() when calling
      mutex_release() at **:
      
      	DEBUG_LOCKS_WARN_ON(depth <= 0)
      
      The problem is that the hlock merging happening at * updates the
      references for test_ww_class incorrectly to 3 whereas it should've
      updated it to 4 (representing all the instances for ww_ctx and
      ww_lock_[abc]).
      
      Fix this by updating the references during merging correctly taking into
      account that we can have non-zero references (both for the hlock that we
      merge into another hlock or for the hlock we are merging into).
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/r/20190524201509.9199-2-imre.deak@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2fbde274
    • E
      signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig · b397462a
      Eric W. Biederman 提交于
      [ Upstream commit f9070dc94542093fd516ae4ccea17ef46a4362c5 ]
      
      The locking in force_sig_info is not prepared to deal with a task that
      exits or execs (as sighand may change).  The is not a locking problem
      in force_sig as force_sig is only built to handle synchronous
      exceptions.
      
      Further the function force_sig_info changes the signal state if the
      signal is ignored, or blocked or if SIGNAL_UNKILLABLE will prevent the
      delivery of the signal.  The signal SIGKILL can not be ignored and can
      not be blocked and SIGNAL_UNKILLABLE won't prevent it from being
      delivered.
      
      So using force_sig rather than send_sig for SIGKILL is confusing
      and pointless.
      
      Because it won't impact the sending of the signal and and because
      using force_sig is wrong, replace force_sig with send_sig.
      
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Fixes: cf3f8921 ("pidns: add reboot_pid_ns() to handle the reboot syscall")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b397462a
  14. 21 7月, 2019 1 次提交
    • T
      genirq: Add optional hardware synchronization for shutdown · 6074f604
      Thomas Gleixner 提交于
      commit 62e0468650c30f0298822c580f382b16328119f6 upstream
      
      free_irq() ensures that no hardware interrupt handler is executing on a
      different CPU before actually releasing resources and deactivating the
      interrupt completely in a domain hierarchy.
      
      But that does not catch the case where the interrupt is on flight at the
      hardware level but not yet serviced by the target CPU. That creates an
      interesing race condition:
      
         CPU 0                  CPU 1               IRQ CHIP
      
                                                    interrupt is raised
                                                    sent to CPU1
      			  Unable to handle
      			  immediately
      			  (interrupts off,
      			   deep idle delay)
         mask()
         ...
         free()
           shutdown()
           synchronize_irq()
           release_resources()
                                do_IRQ()
                                  -> resources are not available
      
      That might be harmless and just trigger a spurious interrupt warning, but
      some interrupt chips might get into a wedged state.
      
      Utilize the existing irq_get_irqchip_state() callback for the
      synchronization in free_irq().
      
      synchronize_hardirq() is not using this mechanism as it might actually
      deadlock unter certain conditions, e.g. when called with interrupts
      disabled and the target CPU is the one on which the synchronization is
      invoked. synchronize_irq() uses it because that function cannot be called
      from non preemtible contexts as it might sleep.
      
      No functional change intended and according to Marc the existing GIC
      implementations where the driver supports the callback should be able
      to cope with that core change. Famous last words.
      
      Fixes: 464d1230 ("x86/vector: Switch IOAPIC to global reservation mode")
      Reported-by: NRobert Hodaszi <Robert.Hodaszi@digi.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/20190628111440.279463375@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      6074f604