1. 26 9月, 2017 1 次提交
    • B
      powerpc/powernv: Rework EEH initialization on powernv · b9fde58d
      Benjamin Herrenschmidt 提交于
      Remove the post_init callback which is only used
      by powernv, we can just call it explicitly from
      the powernv code.
      
      This partially kills the ability to "disable" eeh at
      runtime via debugfs as this was calling that same
      callback again, but this is both unused and broken
      in several ways. If we want to revive it, we need
      to create a dedicated enable/disable callback on the
      backend that does the right thing.
      
      Let the bulk of eeh initialize normally at
      core_initcall() like it does on pseries by removing
      the hack in eeh_init() that delays it.
      
      Instead we make sure our eeh->probe cleanly bails
      out of the PEs haven't been created yet and we force
      a re-probe where we used to call eeh_init() again.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b9fde58d
  2. 21 9月, 2017 3 次提交
    • T
      powerpc/pseries: Fix parent_dn reference leak in add_dt_node() · b537ca6f
      Tyrel Datwyler 提交于
      A reference to the parent device node is held by add_dt_node() for the
      node to be added. If the call to dlpar_configure_connector() fails
      add_dt_node() returns ENOENT and that reference is not freed.
      
      Add a call to of_node_put(parent_dn) prior to bailing out after a
      failed dlpar_configure_connector() call.
      
      Fixes: 8d5ff320 ("powerpc/pseries: Make dlpar_configure_connector parent node aware")
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b537ca6f
    • T
      powerpc/pseries: Fix "OF: ERROR: Bad of_node_put() on /cpus" during DLPAR · 087ff6a5
      Tyrel Datwyler 提交于
      Commit 215ee763 ("powerpc: pseries: remove dlpar_attach_node
      dependency on full path") reworked dlpar_attach_node() to no longer
      look up the parent node "/cpus", but instead to have the parent node
      passed by the caller in the function parameter list.
      
      As a result dlpar_attach_node() is no longer responsible for freeing
      the reference to the parent node. However, commit 215ee763 failed
      to remove the of_node_put(parent) call in dlpar_attach_node(), or to
      take into account that the reference to the parent in the caller
      dlpar_cpu_add() needs to be held until after dlpar_attach_node()
      returns.
      
      As a result doing repeated cpu add/remove dlpar operations will
      eventually result in the following error:
      
        OF: ERROR: Bad of_node_put() on /cpus
        CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1
        Call Trace:
         dump_stack+0x15c/0x1f8 (unreliable)
         of_node_release+0x1a4/0x1c0
         kobject_put+0x1a8/0x310
         kobject_del+0xbc/0xf0
         __of_detach_node_sysfs+0x144/0x210
         of_detach_node+0xf0/0x180
         dlpar_detach_node+0xc4/0x120
         dlpar_cpu_remove+0x280/0x560
         dlpar_cpu_release+0xbc/0x1b0
         arch_cpu_release+0x6c/0xb0
         cpu_release_store+0xa0/0x100
         dev_attr_store+0x68/0xa0
         sysfs_kf_write+0xa8/0xf0
         kernfs_fop_write+0x2cc/0x400
         __vfs_write+0x5c/0x340
         vfs_write+0x1a8/0x3d0
         SyS_write+0xa8/0x1a0
         system_call+0x58/0x6c
      
      Fix the issue by removing the of_node_put(parent) call from
      dlpar_attach_node(), and ensuring that the reference to the parent
      node is properly held and released by the caller dlpar_cpu_add().
      
      Fixes: 215ee763 ("powerpc: pseries: remove dlpar_attach_node dependency on full path")
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      [mpe: Add a comment in the code and frob the change log slightly]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      087ff6a5
    • B
      powerpc/eeh: Create PHB PEs after EEH is initialized · 3e77adee
      Benjamin Herrenschmidt 提交于
      Otherwise we end up not yet having computed the right diag data size
      on powernv where EEH initialization is delayed, thus causing memory
      corruption later on when calling OPAL.
      
      Fixes: 5cb1f8fd ("powerpc/powernv/pci: Dynamically allocate PHB diag data")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3e77adee
  3. 20 9月, 2017 8 次提交
  4. 15 9月, 2017 2 次提交
  5. 14 9月, 2017 1 次提交
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
  6. 12 9月, 2017 3 次提交
    • P
      KVM: PPC: Book3S HV: Fix bug causing host SLB to be restored incorrectly · 67f8a8c1
      Paul Mackerras 提交于
      Aneesh Kumar reported seeing host crashes when running recent kernels
      on POWER8.  The symptom was an oops like this:
      
      Unable to handle kernel paging request for data at address 0xf00000000786c620
      Faulting instruction address: 0xc00000000030e1e4
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE SMP NR_CPUS=2048 NUMA PowerNV
      Modules linked in: powernv_op_panel
      CPU: 24 PID: 6663 Comm: qemu-system-ppc Tainted: G        W 4.13.0-rc7-43932-gfc36c59 #2
      task: c000000fdeadfe80 task.stack: c000000fdeb68000
      NIP:  c00000000030e1e4 LR: c00000000030de6c CTR: c000000000103620
      REGS: c000000fdeb6b450 TRAP: 0300   Tainted: G        W        (4.13.0-rc7-43932-gfc36c59)
      MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24044428  XER: 20000000
      CFAR: c00000000030e134 DAR: f00000000786c620 DSISR: 40000000 SOFTE: 0
      GPR00: 0000000000000000 c000000fdeb6b6d0 c0000000010bd000 000000000000e1b0
      GPR04: c00000000115e168 c000001fffa6e4b0 c00000000115d000 c000001e1b180386
      GPR08: f000000000000000 c000000f9a8913e0 f00000000786c600 00007fff587d0000
      GPR12: c000000fdeb68000 c00000000fb0f000 0000000000000001 00007fff587cffff
      GPR16: 0000000000000000 c000000000000000 00000000003fffff c000000fdebfe1f8
      GPR20: 0000000000000004 c000000fdeb6b8a8 0000000000000001 0008000000000040
      GPR24: 07000000000000c0 00007fff587cffff c000000fdec20bf8 00007fff587d0000
      GPR28: c000000fdeca9ac0 00007fff587d0000 00007fff587c0000 00007fff587d0000
      NIP [c00000000030e1e4] __get_user_pages_fast+0x434/0x1070
      LR [c00000000030de6c] __get_user_pages_fast+0xbc/0x1070
      Call Trace:
      [c000000fdeb6b6d0] [c00000000139dab8] lock_classes+0x0/0x35fe50 (unreliable)
      [c000000fdeb6b7e0] [c00000000030ef38] get_user_pages_fast+0xf8/0x120
      [c000000fdeb6b830] [c000000000112318] kvmppc_book3s_hv_page_fault+0x308/0xf30
      [c000000fdeb6b960] [c00000000010e10c] kvmppc_vcpu_run_hv+0xfdc/0x1f00
      [c000000fdeb6bb20] [c0000000000e915c] kvmppc_vcpu_run+0x2c/0x40
      [c000000fdeb6bb40] [c0000000000e5650] kvm_arch_vcpu_ioctl_run+0x110/0x300
      [c000000fdeb6bbe0] [c0000000000d6468] kvm_vcpu_ioctl+0x528/0x900
      [c000000fdeb6bd40] [c0000000003bc04c] do_vfs_ioctl+0xcc/0x950
      [c000000fdeb6bde0] [c0000000003bc930] SyS_ioctl+0x60/0x100
      [c000000fdeb6be30] [c00000000000b96c] system_call+0x58/0x6c
      Instruction dump:
      7ca81a14 2fa50000 41de0010 7cc8182a 68c60002 78c6ffe2 0b060000 3cc2000a
      794a3664 390610d8 e9080000 7d485214 <e90a0020> 7d435378 790507e1 408202f0
      ---[ end trace fad4a342d0414aa2 ]---
      
      It turns out that what has happened is that the SLB entry for the
      vmmemap region hasn't been reloaded on exit from a guest, and it has
      the wrong page size.  Then, when the host next accesses the vmemmap
      region, it gets a page fault.
      
      Commit a25bd72b ("powerpc/mm/radix: Workaround prefetch issue with
      KVM", 2017-07-24) modified the guest exit code so that it now only clears
      out the SLB for hash guest.  The code tests the radix flag and puts the
      result in a non-volatile CR field, CR2, and later branches based on CR2.
      
      Unfortunately, the kvmppc_save_tm function, which gets called between
      those two points, modifies all the user-visible registers in the case
      where the guest was in transactional or suspended state, except for a
      few which it restores (namely r1, r2, r9 and r13).  Thus the hash/radix indication in CR2 gets corrupted.
      
      This fixes the problem by re-doing the comparison just before the
      result is needed.  For good measure, this also adds comments next to
      the call sites of kvmppc_save_tm and kvmppc_restore_tm pointing out
      that non-volatile register state will be lost.
      
      Cc: stable@vger.kernel.org # v4.13
      Fixes: a25bd72b ("powerpc/mm/radix: Workaround prefetch issue with KVM")
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      67f8a8c1
    • P
      KVM: PPC: Book3S HV: Hold kvm->lock around call to kvmppc_update_lpcr · cf5f6f31
      Paul Mackerras 提交于
      Commit 468808bd ("KVM: PPC: Book3S HV: Set process table for HPT
      guests on POWER9", 2017-01-30) added a call to kvmppc_update_lpcr()
      which doesn't hold the kvm->lock mutex around the call, as required.
      This adds the lock/unlock pair, and for good measure, includes
      the kvmppc_setup_partition_table() call in the locked region, since
      it is altering global state of the VM.
      
      This error appears not to have any fatal consequences for the host;
      the consequences would be that the VCPUs could end up running with
      different LPCR values, or an update to the LPCR value by userspace
      using the one_reg interface could get overwritten, or the update
      done by kvmhv_configure_mmu() could get overwritten.
      
      Cc: stable@vger.kernel.org # v4.10+
      Fixes: 468808bd ("KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      cf5f6f31
    • B
      KVM: PPC: Book3S HV: Don't access XIVE PIPR register using byte accesses · d222af07
      Benjamin Herrenschmidt 提交于
      The XIVE interrupt controller on POWER9 machines doesn't support byte
      accesses to any register in the thread management area other than the
      CPPR (current processor priority register).  In particular, when
      reading the PIPR (pending interrupt priority register), we need to
      do a 32-bit or 64-bit load.
      
      Cc: stable@vger.kernel.org # v4.13
      Fixes: 2c4fb78f ("KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss")
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d222af07
  7. 09 9月, 2017 2 次提交
    • A
      treewide: make "nr_cpu_ids" unsigned · 9b130ad5
      Alexey Dobriyan 提交于
      First, number of CPUs can't be negative number.
      
      Second, different signnnedness leads to suboptimal code in the following
      cases:
      
      1)
      	kmalloc(nr_cpu_ids * sizeof(X));
      
      "int" has to be sign extended to size_t.
      
      2)
      	while (loff_t *pos < nr_cpu_ids)
      
      MOVSXD is 1 byte longed than the same MOV.
      
      Other cases exist as well. Basically compiler is told that nr_cpu_ids
      can't be negative which can't be deduced if it is "int".
      
      Code savings on allyesconfig kernel: -3KB
      
      	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
      	function                                     old     new   delta
      	coretemp_cpu_online                          450     512     +62
      	rcu_init_one                                1234    1272     +38
      	pci_device_probe                             374     399     +25
      
      				...
      
      	pgdat_reclaimable_pages                      628     556     -72
      	select_fallback_rq                           446     369     -77
      	task_numa_find_cpu                          1923    1807    -116
      
      Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b130ad5
    • M
      vga: optimise console scrolling · ac036f95
      Matthew Wilcox 提交于
      Where possible, call memset16(), memmove() or memcpy() instead of using
      open-coded loops.  I don't like the calling convention that uses a byte
      count instead of a count of u16s, but it's a little late to change that.
      Reduces code size of fbcon.o by almost 400 bytes on my laptop build.
      
      [akpm@linux-foundation.org: fix build]
      Link: http://lkml.kernel.org/r/20170720184539.31609-9-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac036f95
  8. 07 9月, 2017 1 次提交
  9. 04 9月, 2017 2 次提交
    • C
      powerpc/xive: Fix section __init warning · 265601f0
      Cédric Le Goater 提交于
      xive_spapr_init() is called from a __init routine and calls __init
      routines.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      265601f0
    • P
      powerpc: Fix kernel crash in emulation of vector loads and stores · 4716e488
      Paul Mackerras 提交于
      Commit 350779a2 ("powerpc: Handle most loads and stores in
      instruction emulation code", 2017-08-30) changed the register usage
      in get_vr and put_vr with the aim of leaving the register number in
      r3 untouched on return.  Unfortunately, r6 was not a good choice, as
      the callers as of 350779a2 store a MSR value in r6.  Then, in
      commit c22435a5 ("powerpc: Emulate FP/vector/VSX loads/stores
      correctly when regs not live", 2017-08-30), the saving and restoring
      of the MSR got moved into get_vr and put_vr.  Either way, the effect
      is that we put a value in MSR that only has the 0x3f8 bits non-zero,
      meaning that we are switching to 32-bit mode.  That leads to a crash
      like this:
      
      Unable to handle kernel paging request for instruction fetch
      Faulting instruction address: 0x0007bea0
      Oops: Kernel access of bad area, sig: 11 [#12]
      LE SMP NR_CPUS=2048 NUMA PowerNV
      Modules linked in: vmx_crypto binfmt_misc ip_tables x_tables autofs4 crc32c_vpmsum
      CPU: 6 PID: 32659 Comm: trashy_testcase Tainted: G      D         4.13.0-rc2-00313-gf3026f57e6ed-dirty #23
      task: c000000f1bb9e780 task.stack: c000000f1ba98000
      NIP:  000000000007bea0 LR: c00000000007b054 CTR: c00000000007be70
      REGS: c000000f1ba9b960 TRAP: 0400   Tainted: G      D          (4.13.0-rc2-00313-gf3026f57e6ed-dirty)
      MSR:  10000000400010a1 <HV,ME,IR,LE>  CR: 48000228  XER: 00000000
      CFAR: c00000000007be74 SOFTE: 1
      GPR00: c00000000007b054 c000000f1ba9bbe0 c000000000e6e000 000000000000001d
      GPR04: c000000f1ba9bc00 c00000000007be70 00000000000000e8 9000000002009033
      GPR08: 0000000002000000 100000000282f033 000000000b0a0900 0000000000001009
      GPR12: 0000000000000000 c00000000fd42100 0706050303020100 a5a5a5a5a5a5a5a5
      GPR16: 2e2e2e2e2e2de70c 2e2e2e2e2e2e2e2d 0000000000ff00ff 0606040202020000
      GPR20: 000000000000005b ffffffffffffffff 0000000003020100 0000000000000000
      GPR24: c000000f1ab90020 c000000f1ba9bc00 0000000000000001 0000000000000001
      GPR28: c000000f1ba9bc90 c000000f1ba9bea0 000000000b0a0908 0000000000000001
      NIP [000000000007bea0] 0x7bea0
      LR [c00000000007b054] emulate_loadstore+0x1044/0x1280
      Call Trace:
      [c000000f1ba9bbe0] [c000000000076b80] analyse_instr+0x60/0x34f0 (unreliable)
      [c000000f1ba9bc70] [c00000000007b7ec] emulate_step+0x23c/0x544
      [c000000f1ba9bce0] [c000000000053424] arch_uprobe_skip_sstep+0x24/0x40
      [c000000f1ba9bd00] [c00000000024b2f8] uprobe_notify_resume+0x598/0xba0
      [c000000f1ba9be00] [c00000000001c284] do_notify_resume+0xd4/0xf0
      [c000000f1ba9be30] [c00000000000bd44] ret_from_except_lite+0x70/0x74
      Instruction dump:
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      ---[ end trace a7ae7a7f3e0256b5 ]---
      
      To fix this, we just revert to using r3 as before, since the callers
      don't rely on r3 being left unmodified.
      
      Fortunately, this can't be triggered by a misaligned load or store,
      because vector loads and stores truncate misaligned addresses rather
      than taking an alignment interrupt.  It can be triggered using
      uprobes.
      
      Fixes: 350779a2 ("powerpc: Handle most loads and stores in instruction emulation code")
      Reported-by: NAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4716e488
  10. 02 9月, 2017 9 次提交
  11. 01 9月, 2017 8 次提交