1. 24 3月, 2019 40 次提交
    • Z
      rcu: Do RCU GP kthread self-wakeup from softirq and interrupt · e97a32a5
      Zhang, Jun 提交于
      commit 1d1f898df6586c5ea9aeaf349f13089c6fa37903 upstream.
      
      The rcu_gp_kthread_wake() function is invoked when it might be necessary
      to wake the RCU grace-period kthread.  Because self-wakeups are normally
      a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
      this kthread, it naturally refuses to do the wakeup.
      
      Unfortunately, natural though it might be, this heuristic fails when
      rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
      that interrupted the grace-period kthread just after the final check of
      the wait-event condition but just before the schedule() call.  In this
      case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
      is within the RCU grace-period kthread's context.  Failing to provide
      this wakeup can result in grace periods failing to start, which in turn
      results in out-of-memory conditions.
      
      This race window is quite narrow, but it actually did happen during real
      testing.  It would of course need to be fixed even if it was strictly
      theoretical in nature.
      
      This patch does not Cc stable because it does not apply cleanly to
      earlier kernel versions.
      
      Fixes: 48a7639c ("rcu: Make callers awaken grace-period kthread")
      Reported-by: N"He, Bo" <bo.he@intel.com>
      Co-developed-by: N"Zhang, Jun" <jun.zhang@intel.com>
      Co-developed-by: N"He, Bo" <bo.he@intel.com>
      Co-developed-by: N"xiao, jin" <jin.xiao@intel.com>
      Co-developed-by: NBai, Jie A <jie.a.bai@intel.com>
      Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
      Signed-off: "He, Bo" <bo.he@intel.com>
      Signed-off: "xiao, jin" <jin.xiao@intel.com>
      Signed-off: Bai, Jie A <jie.a.bai@intel.com>
      Signed-off-by: N"Zhang, Jun" <jun.zhang@intel.com>
      [ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
        !in_serving_softirq() to avoid redundant wakeups and to also handle the
        interrupt-handler scenario as well as the softirq-handler scenario that
        actually occurred in testing. ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      e97a32a5
    • J
      tpm: Unify the send callback behaviour · bce45a54
      Jarkko Sakkinen 提交于
      commit f5595f5baa30e009bf54d0d7653a9a0cc465be60 upstream.
      
      The send() callback should never return length as it does not in every
      driver except tpm_crb in the success case. The reason is that the main
      transmit functionality only cares about whether the transmit was
      successful or not and ignores the count completely.
      Suggested-by: NStefan Berger <stefanb@linux.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Reviewed-by: NStefan Berger <stefanb@linux.ibm.com>
      Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
      Tested-by: NAlexander Steffen <Alexander.Steffen@infineon.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bce45a54
    • J
      tpm/tpm_crb: Avoid unaligned reads in crb_recv() · af0c1bd0
      Jarkko Sakkinen 提交于
      commit 3d7a850fdc1a2e4d2adbc95cc0fc962974725e88 upstream.
      
      The current approach to read first 6 bytes from the response and then tail
      of the response, can cause the 2nd memcpy_fromio() to do an unaligned read
      (e.g. read 32-bit word from address aligned to a 16-bits), depending on how
      memcpy_fromio() is implemented. If this happens, the read will fail and the
      memory controller will fill the read with 1's.
      
      This was triggered by 170d13ca3a2f, which should be probably refined to
      check and react to the address alignment. Before that commit, on x86
      memcpy_fromio() turned out to be memcpy(). By a luck GCC has done the right
      thing (from tpm_crb's perspective) for us so far, but we should not rely on
      that. Thus, it makes sense to fix this also in tpm_crb, not least because
      the fix can be then backported to stable kernels and make them more robust
      when compiled in differing environments.
      
      Cc: stable@vger.kernel.org
      Cc: James Morris <jmorris@namei.org>
      Cc: Tomas Winkler <tomas.winkler@intel.com>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Fixes: 30fc8d13 ("tpm: TPM 2.0 CRB Interface")
      Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
      Acked-by: NTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af0c1bd0
    • A
      md: Fix failed allocation of md_register_thread · cc3b79d4
      Aditya Pakki 提交于
      commit e406f12dde1a8375d77ea02d91f313fb1a9c6aec upstream.
      
      mddev->sync_thread can be set to NULL on kzalloc failure downstream.
      The patch checks for such a scenario and frees allocated resources.
      
      Committer node:
      
      Added similar fix to raid5.c, as suggested by Guoqing.
      
      Cc: stable@vger.kernel.org # v3.16+
      Acked-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NAditya Pakki <pakki001@umn.edu>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc3b79d4
    • A
      perf intel-pt: Fix divide by zero when TSC is not available · 01088750
      Adrian Hunter 提交于
      commit 076333870c2f5bdd9b6d31e7ca1909cf0c84cbfa upstream.
      
      When TSC is not available, "timeless" decoding is used but a divide by
      zero occurs if perf_time_to_tsc() is called.
      
      Ensure the divisor is not zero.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.9+
      Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01088750
    • K
      perf/x86/intel/uncore: Fix client IMC events return huge result · 30cedf18
      Kan Liang 提交于
      commit 8041ffd36f42d8521d66dd1e236feb58cecd68bc upstream.
      
      The client IMC bandwidth events currently return very large values:
      
        $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a
      
        10.000117222 34,788.76 MiB uncore_imc/data_reads/
        10.000117222 8.26 MiB uncore_imc/data_writes/
        20.000374584 34,842.89 MiB uncore_imc/data_reads/
        20.000374584 10.45 MiB uncore_imc/data_writes/
        30.000633299 37,965.29 MiB uncore_imc/data_reads/
        30.000633299 323.62 MiB uncore_imc/data_writes/
        40.000891548 41,012.88 MiB uncore_imc/data_reads/
        40.000891548 6.98 MiB uncore_imc/data_writes/
        50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
        50.001142480 6.97 MiB uncore_imc/data_writes/
      
      The client IMC events are freerunning counters. They still use the
      old event encoding format (0x1 for data_read and 0x2 for data write).
      The counter bit width is calculated by common code, which assume that
      the standard encoding format is used for the freerunning counters.
      Error bit width information is calculated.
      
      The patch intends to convert the old client IMC event encoding to the
      standard encoding format.
      
      Current common code uses event->attr.config which directly copy from
      user space. We should not implicitly modify it for a converted event.
      The event->hw.config is used to replace the event->attr.config in
      common code.
      
      For client IMC events, the event->attr.config is used to calculate a
      converted event with standard encoding format in the custom
      event_init(). The converted event is stored in event->hw.config.
      For other events of freerunning counters, they already use the standard
      encoding format. The same value as event->attr.config is assigned to
      event->hw.config in common event_init().
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@kernel.org # v4.18+
      Fixes: 9aae1780 ("perf/x86/intel/uncore: Clean up client IMC uncore")
      Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30cedf18
    • A
      perf intel-pt: Fix overlap calculation for padding · a46a8cdf
      Adrian Hunter 提交于
      commit 5a99d99e3310a565b0cf63f785b347be9ee0da45 upstream.
      
      Auxtrace records might have up to 7 bytes of padding appended. Adjust
      the overlap accordingly.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a46a8cdf
    • A
      perf auxtrace: Define auxtrace record alignment · fa592fc0
      Adrian Hunter 提交于
      commit c3fcadf0bb765faf45d6d562246e1d08885466df upstream.
      
      Define auxtrace record alignment so that it can be referenced elsewhere.
      
      Note this is preparation for patch "perf intel-pt: Fix overlap calculation
      for padding"
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa592fc0
    • A
      perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols · d8f691f2
      Adrian Hunter 提交于
      commit d6d457451eb94fa747dc202765592eb8885a7352 upstream.
      
      Kallsyms symbols do not have a size, so the size becomes the distance to
      the next symbol.
      
      Consequently the recently added trampoline symbols end up with large
      sizes because the trampolines are some distance from one another and the
      main kernel map.
      
      However, symbols that end outside their map can disrupt the symbol tree
      because, after mapping, it can appear incorrectly that they overlap
      other symbols.
      
      Add logic to truncate symbol size to the end of the corresponding map.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: d83212d5 ("kallsyms, x86: Export addresses of PTI entry trampolines")
      Link: http://lkml.kernel.org/r/20190109091835.5570-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8f691f2
    • A
      perf intel-pt: Fix CYC timestamp calculation after OVF · e25353a0
      Adrian Hunter 提交于
      commit 03997612904866abe7cdcc992784ef65cb3a4b81 upstream.
      
      CYC packet timestamp calculation depends upon CBR which was being
      cleared upon overflow (OVF). That can cause errors due to failing to
      synchronize with sideband events. Even if a CBR change has been lost,
      the old CBR is still a better estimate than zero. So remove the clearing
      of CBR.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e25353a0
    • J
      x86/unwind/orc: Fix ORC unwind table alignment · 3e5a054b
      Josh Poimboeuf 提交于
      commit f76a16adc485699f95bb71fce114f97c832fe664 upstream.
      
      The .orc_unwind section is a packed array of 6-byte structs.  It's
      currently aligned to 6 bytes, which is causing warnings in the LLD
      linker.
      
      Six isn't a power of two, so it's not a valid alignment value.  The
      actual alignment doesn't matter much because it's an array of packed
      structs.  An alignment of two is sufficient.  In reality it always gets
      aligned to four bytes because it comes immediately after the
      4-byte-aligned .orc_unwind_ip section.
      
      Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
      Reported-by: NNick Desaulniers <ndesaulniers@google.com>
      Reported-by: NDmitry Golovin <dima@golovin.in>
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://github.com/ClangBuiltLinux/linux/issues/218
      Link: https://lkml.kernel.org/r/d55027ee95fe73e952dcd8be90aebd31b0095c45.1551892041.git.jpoimboe@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e5a054b
    • N
      vt: perform safe console erase in the right order · b05581b8
      Nicolas Pitre 提交于
      commit a6dbe442755999960ca54a9b8ecfd9606be0ea75 upstream.
      
      Commit 4b4ecd9c ("vt: Perform safe console erase only once") removed
      what appeared to be an extra call to scr_memsetw(). This missed the fact
      that set_origin() must be called before clearing the screen otherwise
      old screen content gets restored on the screen when using vgacon. Let's
      fix that by moving all the scrollback handling to flush_scrollback()
      where it logically belongs, and invoking it before the actual screen
      clearing in csi_J(), making the code simpler in the end.
      Reported-by: NMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Tested-by: NMatthew Whitehead <tedheadster@gmail.com>
      Fixes: 4b4ecd9c ("vt: Perform safe console erase only once")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b05581b8
    • G
      stable-kernel-rules.rst: add link to networking patch queue · 2ca85aac
      Greg Kroah-Hartman 提交于
      commit a41e8f25fa8f8f67360d88eb0eebbabe95a64bdf upstream.
      
      The networking maintainer keeps a public list of the patches being
      queued up for the next round of stable releases.  Be sure to check there
      before asking for a patch to be applied so that you do not waste
      people's time.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ca85aac
    • D
      bcache: never writeback a discard operation · 622afe5c
      Daniel Axtens 提交于
      commit 9951379b0ca88c95876ad9778b9099e19a95d566 upstream.
      
      Some users see panics like the following when performing fstrim on a
      bcached volume:
      
      [  529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [  530.183928] #PF error: [normal kernel read fault]
      [  530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0
      [  530.750887] Oops: 0000 [#1] SMP PTI
      [  530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3
      [  531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
      [  531.693137] RIP: 0010:blk_queue_split+0x148/0x620
      [  531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48
      8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3
      [  532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246
      [  533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000
      [  533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000
      [  533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000
      [  534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000
      [  534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020
      [  534.833319] FS:  00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000
      [  535.224098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0
      [  535.851759] Call Trace:
      [  535.970308]  ? mempool_alloc_slab+0x15/0x20
      [  536.174152]  ? bch_data_insert+0x42/0xd0 [bcache]
      [  536.403399]  blk_mq_make_request+0x97/0x4f0
      [  536.607036]  generic_make_request+0x1e2/0x410
      [  536.819164]  submit_bio+0x73/0x150
      [  536.980168]  ? submit_bio+0x73/0x150
      [  537.149731]  ? bio_associate_blkg_from_css+0x3b/0x60
      [  537.391595]  ? _cond_resched+0x1a/0x50
      [  537.573774]  submit_bio_wait+0x59/0x90
      [  537.756105]  blkdev_issue_discard+0x80/0xd0
      [  537.959590]  ext4_trim_fs+0x4a9/0x9e0
      [  538.137636]  ? ext4_trim_fs+0x4a9/0x9e0
      [  538.324087]  ext4_ioctl+0xea4/0x1530
      [  538.497712]  ? _copy_to_user+0x2a/0x40
      [  538.679632]  do_vfs_ioctl+0xa6/0x600
      [  538.853127]  ? __do_sys_newfstat+0x44/0x70
      [  539.051951]  ksys_ioctl+0x6d/0x80
      [  539.212785]  __x64_sys_ioctl+0x1a/0x20
      [  539.394918]  do_syscall_64+0x5a/0x110
      [  539.568674]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      We have observed it where both:
      1) LVM/devmapper is involved (bcache backing device is LVM volume) and
      2) writeback cache is involved (bcache cache_mode is writeback)
      
      On one machine, we can reliably reproduce it with:
      
       # echo writeback > /sys/block/bcache0/bcache/cache_mode
         (not sure whether above line is required)
       # mount /dev/bcache0 /test
       # for i in {0..10}; do
      	file="$(mktemp /test/zero.XXX)"
      	dd if=/dev/zero of="$file" bs=1M count=256
      	sync
      	rm $file
          done
        # fstrim -v /test
      
      Observing this with tracepoints on, we see the following writes:
      
      fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4260112 + 196352 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4456464 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4718608 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5324816 + 180224 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5505040 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5767184 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 6373392 + 180224 hit 1 bypass 0
      <crash>
      
      Note the final one has different hit/bypass flags.
      
      This is because in should_writeback(), we were hitting a case where
      the partial stripe condition was returning true and so
      should_writeback() was returning true early.
      
      If that hadn't been the case, it would have hit the would_skip test, and
      as would_skip == s->iop.bypass == true, should_writeback() would have
      returned false.
      
      Looking at the git history from 'commit 72c27061 ("bcache: Write out
      full stripes")', it looks like the idea was to optimise for raid5/6:
      
             * If a stripe is already dirty, force writes to that stripe to
      	 writeback mode - to help build up full stripes of dirty data
      
      To fix this issue, make sure that should_writeback() on a discard op
      never returns true.
      
      More details of debugging:
      https://www.spinics.net/lists/linux-bcache/msg06996.html
      
      Previous reports:
       - https://bugzilla.kernel.org/show_bug.cgi?id=201051
       - https://bugzilla.kernel.org/show_bug.cgi?id=196103
       - https://www.spinics.net/lists/linux-bcache/msg06885.html
      
      (Coly Li: minor modification to follow maximum 75 chars per line rule)
      
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 72c27061 ("bcache: Write out full stripes")
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      622afe5c
    • V
      PM / wakeup: Rework wakeup source timer cancellation · cd738246
      Viresh Kumar 提交于
      commit 1fad17fb1bbcd73159c2b992668a6957ecc5af8a upstream.
      
      If wakeup_source_add() is called right after wakeup_source_remove()
      for the same wakeup source, timer_setup() may be called for a
      potentially scheduled timer which is incorrect.
      
      To avoid that, move the wakeup source timer cancellation from
      wakeup_source_drop() to wakeup_source_remove().
      
      Moreover, make wakeup_source_remove() clear the timer function after
      canceling the timer to let wakeup_source_not_registered() treat
      unregistered wakeup sources in the same way as the ones that have
      never been registered.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
      [ rjw: Subject, changelog, merged two patches together ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd738246
    • J
      svcrpc: fix UDP on servers with lots of threads · 43bceddc
      J. Bruce Fields 提交于
      commit b7e5034cbecf5a65b7bfdc2b20a8378039577706 upstream.
      
      James Pearson found that an NFS server stopped responding to UDP
      requests if started with more than 1017 threads.
      
      sv_max_mesg is about 2^20, so that is probably where the calculation
      performed by
      
      	svc_sock_setbufsize(svsk->sk_sock,
                                  (serv->sv_nrthreads+3) * serv->sv_max_mesg,
                                  (serv->sv_nrthreads+3) * serv->sv_max_mesg);
      
      starts to overflow an int.
      Reported-by: NJames Pearson <jcpearson@gmail.com>
      Tested-by: NJames Pearson <jcpearson@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43bceddc
    • T
      NFSv4.1: Reinitialise sequence results before retransmitting a request · 4af185fe
      Trond Myklebust 提交于
      commit c1dffe0bf7f9c3d57d9f237a7cb2a81e62babd2b upstream.
      
      If we have to retransmit a request, we should ensure that we reinitialise
      the sequence results structure, since in the event of a signal
      we need to treat the request as if it had not been sent.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af185fe
    • Y
      nfsd: fix wrong check in write_v4_end_grace() · ecab6ab1
      Yihao Wu 提交于
      commit dd838821f0a29781b185cd8fb8e48d5c177bd838 upstream.
      
      Commit 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before
      nfsd startup" is trying to fix a NULL dereference issue, but it
      mistakenly checks if the nfsd server is started. So fix it.
      
      Fixes: 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before nfsd startup"
      Cc: stable@vger.kernel.org
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecab6ab1
    • N
      nfsd: fix memory corruption caused by readdir · 8056912c
      NeilBrown 提交于
      commit b602345da6cbb135ba68cf042df8ec9a73da7981 upstream.
      
      If the result of an NFSv3 readdir{,plus} request results in the
      "offset" on one entry having to be split across 2 pages, and is sized
      so that the next directory entry doesn't fit in the requested size,
      then memory corruption can happen.
      
      When encode_entry() is called after encoding the last entry that fits,
      it notices that ->offset and ->offset1 are set, and so stores the
      offset value in the two pages as required.  It clears ->offset1 but
      *does not* clear ->offset.
      
      Normally this omission doesn't matter as encode_entry_baggage() will
      be called, and will set ->offset to a suitable value (not on a page
      boundary).
      But in the case where cd->buflen < elen and nfserr_toosmall is
      returned, ->offset is not reset.
      
      This means that nfsd3proc_readdirplus will see ->offset with a value 4
      bytes before the end of a page, and ->offset1 set to NULL.
      It will try to write 8bytes to ->offset.
      If we are lucky, the next page will be read-only, and the system will
        BUG: unable to handle kernel paging request at...
      
      If we are unlucky, some innocent page will have the first 4 bytes
      corrupted.
      
      nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly
      writes 8 bytes to the offset wherever it is.
      
      Fix this by clearing ->offset after it is used, and copying the
      ->offset handling code from nfsd3_proc_readdirplus into
      nfsd3_proc_readdir.
      
      (Note that the commit hash in the Fixes tag is from the 'history'
       tree - this bug predates git).
      
      Fixes: 0b1d57cf7654 ("[PATCH] kNFSd: Fix nfs3 dentry encoding")
      Fixes-URL: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0b1d57cf7654
      Cc: stable@vger.kernel.org (v2.6.12+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8056912c
    • J
      nfsd: fix performance-limiting session calculation · 10a68cdf
      J. Bruce Fields 提交于
      commit c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 upstream.
      
      We're unintentionally limiting the number of slots per nfsv4.1 session
      to 10.  Often more than 10 simultaneous RPCs are needed for the best
      performance.
      
      This calculation was meant to prevent any one client from using up more
      than a third of the limit we set for total memory use across all clients
      and sessions.  Instead, it's limiting the client to a third of the
      maximum for a single session.
      
      Fix this.
      Reported-by: NChris Tracy <ctracy@engr.scu.edu>
      Cc: stable@vger.kernel.org
      Fixes: de766e57 "nfsd: give out fewer session slots as limit approaches"
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10a68cdf
    • T
      NFS: Don't recoalesce on error in nfs_pageio_complete_mirror() · 2c648caf
      Trond Myklebust 提交于
      commit 8127d82705998568b52ac724e28e00941538083d upstream.
      
      If the I/O completion failed with a fatal error, then we should just
      exit nfs_pageio_complete_mirror() rather than try to recoalesce.
      
      Fixes: a7d42ddb ("nfs: add mirroring support to pgio layer")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c648caf
    • T
      NFS: Fix an I/O request leakage in nfs_do_recoalesce · 63b0ee12
      Trond Myklebust 提交于
      commit 4d91969ed4dbcefd0e78f77494f0cb8fada9048a upstream.
      
      Whether we need to exit early, or just reprocess the list, we
      must not lost track of the request which failed to get recoalesced.
      
      Fixes: 03d5eb65 ("NFS: Fix a memory leak in nfs_do_recoalesce")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63b0ee12
    • T
      NFS: Fix I/O request leakages · be74fddc
      Trond Myklebust 提交于
      commit f57dcf4c72113c745d83f1c65f7291299f65c14f upstream.
      
      When we fail to add the request to the I/O queue, we currently leave it
      to the caller to free the failed request. However since some of the
      requests that fail are actually created by nfs_pageio_add_request()
      itself, and are not passed back the caller, this leads to a leakage
      issue, which can again cause page locks to leak.
      
      This commit addresses the leakage by freeing the created requests on
      error, using desc->pg_completion_ops->error_cleanup()
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Fixes: a7d42ddb ("nfs: add mirroring support to pgio layer")
      Cc: stable@vger.kernel.org # v4.0: c18b96a1: nfs: clean up rest of reqs
      Cc: stable@vger.kernel.org # v4.0: d600ad1f: NFS41: pop some layoutget
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be74fddc
    • P
      cpcap-charger: generate events for userspace · 4ea4f347
      Pavel Machek 提交于
      commit fd10606f93a149a9f3d37574e5385b083b4a7b32 upstream.
      
      The driver doesn't generate uevents on charger connect/disconnect.
      This leads to UPower not detecting when AC is on or off... and that is
      bad.
      
      Reported by Arthur D. on github (
      https://github.com/maemo-leste/bugtracker/issues/206 ), thanks to
      Merlijn Wajer for suggesting a fix.
      
      Cc: stable@kernel.org
      Signed-off-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NSebastian Reichel <sebastian.reichel@collabora.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ea4f347
    • G
      mfd: sm501: Fix potential NULL pointer dereference · ce02d82c
      Gustavo A. R. Silva 提交于
      commit ae7b8eda27b33b1f688dfdebe4d46f690a8f9162 upstream.
      
      There is a potential NULL pointer dereference in case devm_kzalloc()
      fails and returns NULL.
      
      Fix this by adding a NULL check on *lookup*
      
      This bug was detected with the help of Coccinelle.
      
      Fixes: b2e63555 ("i2c: gpio: Convert to use descriptors")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce02d82c
    • M
      dm integrity: limit the rate of error messages · 5579d97e
      Mikulas Patocka 提交于
      commit 225557446856448039a9e495da37b72c20071ef2 upstream.
      
      When using dm-integrity underneath md-raid, some tests with raid
      auto-correction trigger large amounts of integrity failures - and all
      these failures print an error message. These messages can bring the
      system to a halt if the system is using serial console.
      
      Fix this by limiting the rate of error messages - it improves the speed
      of raid recovery and avoids the hang.
      
      Fixes: 7eada909 ("dm: add integrity target")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5579d97e
    • N
      dm: fix to_sector() for 32bit · 7668d6e4
      NeilBrown 提交于
      commit 0bdb50c531f7377a9da80d3ce2d61f389c84cb30 upstream.
      
      A dm-raid array with devices larger than 4GB won't assemble on
      a 32 bit host since _check_data_dev_sectors() was added in 4.16.
      This is because to_sector() treats its argument as an "unsigned long"
      which is 32bits (4GB) on a 32bit host.  Using "unsigned long long"
      is more correct.
      
      Kernels as early as 4.2 can have other problems due to to_sector()
      being used on the size of a device.
      
      Fixes: 0cf45031 ("dm raid: add support for the MD RAID0 personality")
      cc: stable@vger.kernel.org (v4.2+)
      Reported-and-tested-by: NGuillaume Perréal <gperreal@free.fr>
      Signed-off-by: NNeilBrown <neil@brown.name>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7668d6e4
    • Y
      ipmi_si: fix use-after-free of resource->name · a441fdaf
      Yang Yingliang 提交于
      commit 401e7e88d4ef80188ffa07095ac00456f901b8c4 upstream.
      
      When we excute the following commands, we got oops
      rmmod ipmi_si
      cat /proc/ioports
      
      [ 1623.482380] Unable to handle kernel paging request at virtual address ffff00000901d478
      [ 1623.482382] Mem abort info:
      [ 1623.482383]   ESR = 0x96000007
      [ 1623.482385]   Exception class = DABT (current EL), IL = 32 bits
      [ 1623.482386]   SET = 0, FnV = 0
      [ 1623.482387]   EA = 0, S1PTW = 0
      [ 1623.482388] Data abort info:
      [ 1623.482389]   ISV = 0, ISS = 0x00000007
      [ 1623.482390]   CM = 0, WnR = 0
      [ 1623.482393] swapper pgtable: 4k pages, 48-bit VAs, pgdp = 00000000d7d94a66
      [ 1623.482395] [ffff00000901d478] pgd=000000dffbfff003, pud=000000dffbffe003, pmd=0000003f5d06e003, pte=0000000000000000
      [ 1623.482399] Internal error: Oops: 96000007 [#1] SMP
      [ 1623.487407] Modules linked in: ipmi_si(E) nls_utf8 isofs rpcrdma ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm dm_mirror dm_region_hash dm_log iw_cm dm_mod aes_ce_blk crypto_simd cryptd aes_ce_cipher ses ghash_ce sha2_ce enclosure sha256_arm64 sg sha1_ce hisi_sas_v2_hw hibmc_drm sbsa_gwdt hisi_sas_main ip_tables mlx5_ib ib_uverbs marvell ib_core mlx5_core ixgbe mdio hns_dsaf ipmi_devintf hns_enet_drv ipmi_msghandler hns_mdio [last unloaded: ipmi_si]
      [ 1623.532410] CPU: 30 PID: 11438 Comm: cat Kdump: loaded Tainted: G            E     5.0.0-rc3+ #168
      [ 1623.541498] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.37 11/21/2017
      [ 1623.548822] pstate: a0000005 (NzCv daif -PAN -UAO)
      [ 1623.553684] pc : string+0x28/0x98
      [ 1623.557040] lr : vsnprintf+0x368/0x5e8
      [ 1623.560837] sp : ffff000013213a80
      [ 1623.564191] x29: ffff000013213a80 x28: ffff00001138abb5
      [ 1623.569577] x27: ffff000013213c18 x26: ffff805f67d06049
      [ 1623.574963] x25: 0000000000000000 x24: ffff00001138abb5
      [ 1623.580349] x23: 0000000000000fb7 x22: ffff0000117ed000
      [ 1623.585734] x21: ffff000011188fd8 x20: ffff805f67d07000
      [ 1623.591119] x19: ffff805f67d06061 x18: ffffffffffffffff
      [ 1623.596505] x17: 0000000000000200 x16: 0000000000000000
      [ 1623.601890] x15: ffff0000117ed748 x14: ffff805f67d07000
      [ 1623.607276] x13: ffff805f67d0605e x12: 0000000000000000
      [ 1623.612661] x11: 0000000000000000 x10: 0000000000000000
      [ 1623.618046] x9 : 0000000000000000 x8 : 000000000000000f
      [ 1623.623432] x7 : ffff805f67d06061 x6 : fffffffffffffffe
      [ 1623.628817] x5 : 0000000000000012 x4 : ffff00000901d478
      [ 1623.634203] x3 : ffff0a00ffffff04 x2 : ffff805f67d07000
      [ 1623.639588] x1 : ffff805f67d07000 x0 : ffffffffffffffff
      [ 1623.644974] Process cat (pid: 11438, stack limit = 0x000000008d4cbc10)
      [ 1623.651592] Call trace:
      [ 1623.654068]  string+0x28/0x98
      [ 1623.657071]  vsnprintf+0x368/0x5e8
      [ 1623.660517]  seq_vprintf+0x70/0x98
      [ 1623.668009]  seq_printf+0x7c/0xa0
      [ 1623.675530]  r_show+0xc8/0xf8
      [ 1623.682558]  seq_read+0x330/0x440
      [ 1623.689877]  proc_reg_read+0x78/0xd0
      [ 1623.697346]  __vfs_read+0x60/0x1a0
      [ 1623.704564]  vfs_read+0x94/0x150
      [ 1623.711339]  ksys_read+0x6c/0xd8
      [ 1623.717939]  __arm64_sys_read+0x24/0x30
      [ 1623.725077]  el0_svc_common+0x120/0x148
      [ 1623.732035]  el0_svc_handler+0x30/0x40
      [ 1623.738757]  el0_svc+0x8/0xc
      [ 1623.744520] Code: d1000406 aa0103e2 54000149 b4000080 (39400085)
      [ 1623.753441] ---[ end trace f91b6a4937de9835 ]---
      [ 1623.760871] Kernel panic - not syncing: Fatal exception
      [ 1623.768935] SMP: stopping secondary CPUs
      [ 1623.775718] Kernel Offset: disabled
      [ 1623.781998] CPU features: 0x002,21006008
      [ 1623.788777] Memory Limit: none
      [ 1623.798329] Starting crashdump kernel...
      [ 1623.805202] Bye!
      
      If io_setup is called successful in try_smi_init() but try_smi_init()
      goes out_err before calling ipmi_register_smi(), so ipmi_unregister_smi()
      will not be called while removing module. It leads to the resource that
      allocated in io_setup() can not be freed, but the name(DEVICE_NAME) of
      resource is freed while removing the module. It causes use-after-free
      when cat /proc/ioports.
      
      Fix this by calling io_cleanup() while try_smi_init() goes to out_err.
      and don't call io_cleanup() until io_setup() returns successful to avoid
      warning prints.
      
      Fixes: 93c303d2 ("ipmi_si: Clean up shutdown a bit")
      Cc: stable@vger.kernel.org
      Reported-by: NNuoHan Qiao <qiaonuohan@huawei.com>
      Suggested-by: NCorey Minyard <cminyard@mvista.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a441fdaf
    • D
      arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 · 3cbae9fa
      Dave Martin 提交于
      commit c88b093693ccbe41991ef2e9b1d251945e6e54ed upstream.
      
      Due to what looks like a typo dating back to the original addition
      of FPEXC32_EL2 handling, KVM currently initialises this register to
      an architecturally invalid value.
      
      As a result, the VECITR field (RES1) in bits [10:8] is initialised
      with 0, and the two reserved (RES0) bits [6:5] are initialised with
      1.  (In the Common VFP Subarchitecture as specified by ARMv7-A,
      these two bits were IMP DEF.  ARMv8-A removes them.)
      
      This patch changes the reset value from 0x70 to 0x700, which
      reflects the architectural constraints and is presumably what was
      originally intended.
      
      Cc: <stable@vger.kernel.org> # 4.12.x-
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Fixes: 62a89c44 ("arm64: KVM: 32bit handling of coprocessor traps")
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cbae9fa
    • W
      arm64: debug: Ensure debug handlers check triggering exception level · c113a7fb
      Will Deacon 提交于
      commit 6bd288569b50bc89fa5513031086746968f585cb upstream.
      
      Debug exception handlers may be called for exceptions generated both by
      user and kernel code. In many cases, this is checked explicitly, but
      in other cases things either happen to work by happy accident or they
      go slightly wrong. For example, executing 'brk #4' from userspace will
      enter the kprobes code and be ignored, but the instruction will be
      retried forever in userspace instead of delivering a SIGTRAP.
      
      Fix this issue in the most stable-friendly fashion by simply adding
      explicit checks of the triggering exception level to all of our debug
      exception handlers.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c113a7fb
    • J
      arm64: Fix HCR.TGE status for NMI contexts · 85c8ea22
      Julien Thierry 提交于
      commit 5870970b9a828d8693aa6d15742573289d7dbcd0 upstream.
      
      When using VHE, the host needs to clear HCR_EL2.TGE bit in order
      to interact with guest TLBs, switching from EL2&0 translation regime
      to EL1&0.
      
      However, some non-maskable asynchronous event could happen while TGE is
      cleared like SDEI. Because of this address translation operations
      relying on EL2&0 translation regime could fail (tlb invalidation,
      userspace access, ...).
      
      Fix this by properly setting HCR_EL2.TGE when entering NMI context and
      clear it if necessary when returning to the interrupted context.
      Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
      Suggested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85c8ea22
    • G
      ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify · 58691e6a
      Gustavo A. R. Silva 提交于
      commit e2477233145f2156434afb799583bccd878f3e9f upstream.
      
      Fix boolean expressions by using logical AND operator '&&' instead of
      bitwise operator '&'.
      
      This issue was detected with the help of Coccinelle.
      
      Fixes: 4fa084af ("ARM: OSIRIS: DVS (Dynamic Voltage Scaling) supoort.")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      [krzk: Fix -Wparentheses warning]
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58691e6a
    • C
      powerpc/traps: Fix the message printed when stack overflows · d6d004b3
      Christophe Leroy 提交于
      commit 9bf3d3c4e4fd82c7174f4856df372ab2a71005b9 upstream.
      
      Today's message is useless:
      
        [   42.253267] Kernel stack overflow in process (ptrval), r1=c65500b0
      
      This patch fixes it:
      
        [   66.905235] Kernel stack overflow in process sh[356], r1=c65560b0
      
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Use task_pid_nr()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6d004b3
    • C
      powerpc/traps: fix recoverability of machine check handling on book3s/32 · 461a52a4
      Christophe Leroy 提交于
      commit 0bbea75c476b77fa7d7811d6be911cc7583e640f upstream.
      
      Looks like book3s/32 doesn't set RI on machine check, so
      checking RI before calling die() will always be fatal
      allthought this is not an issue in most cases.
      
      Fixes: b96672dd ("powerpc: Machine check interrupt is a non-maskable interrupt")
      Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts")
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      461a52a4
    • A
      powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration · baed68a9
      Aneesh Kumar K.V 提交于
      commit 35f2806b481f5b9207f25e1886cba5d1c4d12cc7 upstream.
      
      We added runtime allocation of 16G pages in commit 4ae279c2
      ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.") That was done
      to enable 16G allocation on PowerNV and KVM config. In case of KVM
      config, we mostly would have the entire guest RAM backed by 16G
      hugetlb pages for this to work. PAPR do support partial backing of
      guest RAM with hugepages via ibm,expected#pages node of memory node in
      the device tree. This means rest of the guest RAM won't be backed by
      16G contiguous pages in the host and hence a hash page table insertion
      can fail in such case.
      
      An example error message will look like
      
        hash-mmu: mm: Hashing failure ! EA=0x7efc00000000 access=0x8000000000000006 current=readback
        hash-mmu:     trap=0x300 vsid=0x67af789 ssize=1 base psize=14 psize 14 pte=0xc000000400000386
        readback[12260]: unhandled signal 7 at 00007efc00000000 nip 00000000100012d0 lr 000000001000127c code 2
      
      This patch address that by preventing runtime allocation of 16G
      hugepages in LPAR config. To allocate 16G hugetlb one need to kernel
      command line hugepagesz=16G hugepages=<number of 16G pages>
      
      With radix translation mode we don't run into this issue.
      
      This change will prevent runtime allocation of 16G hugetlb pages on
      kvm with hash translation mode. However, with the current upstream it
      was observed that 16G hugetlbfs backed guest doesn't boot at all.
      
      We observe boot failure with the below message:
        [131354.647546] KVM: map_vrma at 0 failed, ret=-4
      
      That means this patch is not resulting in an observable regression.
      Once we fix the boot issue with 16G hugetlb backed memory, we need to
      use ibm,expected#pages memory node attribute to indicate 16G page
      reservation to the guest. This will also enable partial backing of
      guest RAM with 16G pages.
      
      Fixes: 4ae279c2 ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baed68a9
    • M
      powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning · 9d2e929c
      Michael Ellerman 提交于
      commit ca6d5149d2ad0a8d2f9c28cbe379802260a0a5e0 upstream.
      
      GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks
      the build:
      
        In function ‘user_regset_copyin’,
            inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:628:9:
        include/linux/regset.h:295:4: error: ‘memcpy’ offset [-527, -529] is
        out of the bounds [0, 16] of object ‘vrsave’ with type ‘union
        <anonymous>’ [-Werror=array-bounds]
        arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
        arch/powerpc/kernel/ptrace.c:623:5: note: ‘vrsave’ declared here
           } vrsave;
      
      This has been identified as a regression in GCC, see GCC bug 88273.
      
      However we can avoid the warning and also simplify the logic and make
      it more robust.
      
      Currently we pass -1 as end_pos to user_regset_copyout(). This says
      "copy up to the end of the regset".
      
      The definition of the regset is:
      	[REGSET_VMX] = {
      		.core_note_type = NT_PPC_VMX, .n = 34,
      		.size = sizeof(vector128), .align = sizeof(vector128),
      		.active = vr_active, .get = vr_get, .set = vr_set
      	},
      
      The end is calculated as (n * size), ie. 34 * sizeof(vector128).
      
      In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning
      we can copy up to sizeof(vector128) into/out-of vrsave.
      
      The on-stack vrsave is defined as:
        union {
      	  elf_vrreg_t reg;
      	  u32 word;
        } vrsave;
      
      And elf_vrreg_t is:
        typedef __vector128 elf_vrreg_t;
      
      So there is no bug, but we rely on all those sizes lining up,
      otherwise we would have a kernel stack exposure/overwrite on our
      hands.
      
      Rather than relying on that we can pass an explict end_pos based on
      the sizeof(vrsave). The result should be exactly the same but it's
      more obviously not over-reading/writing the stack and it avoids the
      compiler warning.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NMathieu Malaterre <malat@debian.org>
      Cc: stable@vger.kernel.org
      Tested-by: NMathieu Malaterre <malat@debian.org>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d2e929c
    • M
      powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest · 344996a8
      Mark Cave-Ayland 提交于
      commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f upstream.
      
      Commit 8792468d "powerpc: Add the ability to save FPU without
      giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
      the bitmask used to update the MSR of the previous thread in
      __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
      host kernel.
      
      Leaving FE0/1 enabled means unrelated processes might receive FPEs
      when they're not expecting them and crash. In particular if this
      happens to init the host will then panic.
      
      eg (transcribed):
        qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
        systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
        Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      Reinstate these bits to the MSR bitmask to enable MacOS guests to run
      under 32-bit KVM-PR once again without issue.
      
      Fixes: 8792468d ("powerpc: Add the ability to save FPU without giving it up")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      344996a8
    • P
      powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit · 3bf8ff7b
      Paul Mackerras 提交于
      commit 19f8a5b5be2898573a5e1dc1db93e8d40117606a upstream.
      
      Commit 24be85a2 ("powerpc/powernv: Clear PECE1 in LPCR via stop-api
      only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg()
      inside pnv_cpu_offline(), with the aim of changing the LPCR value in
      the SLW image to disable wakeups from the decrementer while a CPU is
      offline.  However, pnv_cpu_offline() gets called each time a secondary
      CPU thread is woken up to participate in running a KVM guest, that is,
      not just when a CPU is offlined.
      
      Since opal_slw_set_reg() is a very slow operation (with observed
      execution times around 20 milliseconds), this means that an offline
      secondary CPU can often be busy doing the opal_slw_set_reg() call
      when the primary CPU wants to grab all the secondary threads so that
      it can run a KVM guest.  This leads to messages like "KVM: couldn't
      grab CPU n" being printed and guest execution failing.
      
      There is no need to reprogram the SLW image on every KVM guest entry
      and exit.  So that we do it only when a CPU is really transitioning
      between online and offline, this moves the calls to
      pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self().
      
      Fixes: 24be85a2 ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bf8ff7b
    • C
      powerpc/83xx: Also save/restore SPRG4-7 during suspend · f6f03d60
      Christophe Leroy 提交于
      commit 36da5ff0bea2dc67298150ead8d8471575c54c7d upstream.
      
      The 83xx has 8 SPRG registers and uses at least SPRG4
      for DTLB handling LRU.
      
      Fixes: 2319f123 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6f03d60
    • J
      powerpc/powernv: Make opal log only readable by root · b0934990
      Jordan Niethe 提交于
      commit 7b62f9bd2246b7d3d086e571397c14ba52645ef1 upstream.
      
      Currently the opal log is globally readable. It is kernel policy to
      limit the visibility of physical addresses / kernel pointers to root.
      Given this and the fact the opal log may contain this information it
      would be better to limit the readability to root.
      
      Fixes: bfc36894 ("powerpc/powernv: Add OPAL message log interface")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
      Reviewed-by: NStewart Smith <stewart@linux.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0934990