1. 01 11月, 2017 2 次提交
    • P
      KVM: PPC: Book3S HV: Rename hpte_setup_done to mmu_ready · 1b151ce4
      Paul Mackerras 提交于
      This renames the kvm->arch.hpte_setup_done field to mmu_ready because
      we will want to use it for radix guests too -- both for setting things
      up before vcpu execution, and for excluding vcpus from executing while
      MMU-related things get changed, such as in future switching the MMU
      from radix to HPT mode or vice-versa.
      
      This also moves the call to kvmppc_setup_partition_table() that was
      done in kvmppc_hv_setup_htab_rma() for HPT guests, and the setting
      of mmu_ready, into the caller in kvmppc_vcpu_run_hv().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1b151ce4
    • P
      KVM: PPC: Book3S HV: Don't rely on host's page size information · 8dc6cca5
      Paul Mackerras 提交于
      This removes the dependence of KVM on the mmu_psize_defs array (which
      stores information about hardware support for various page sizes) and
      the things derived from it, chiefly hpte_page_sizes[], hpte_page_size(),
      hpte_actual_page_size() and get_sllp_encoding().  We also no longer
      rely on the mmu_slb_size variable or the MMU_FTR_1T_SEGMENTS feature
      bit.
      
      The reason for doing this is so we can support a HPT guest on a radix
      host.  In a radix host, the mmu_psize_defs array contains information
      about page sizes supported by the MMU in radix mode rather than the
      page sizes supported by the MMU in HPT mode.  Similarly, mmu_slb_size
      and the MMU_FTR_1T_SEGMENTS bit are not set.
      
      Instead we hard-code knowledge of the behaviour of the HPT MMU in the
      POWER7, POWER8 and POWER9 processors (which are the only processors
      supported by HV KVM) - specifically the encoding of the LP fields in
      the HPT and SLB entries, and the fact that they have 32 SLB entries
      and support 1TB segments.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8dc6cca5
  2. 19 10月, 2017 1 次提交
  3. 09 9月, 2017 1 次提交
    • M
      vga: optimise console scrolling · ac036f95
      Matthew Wilcox 提交于
      Where possible, call memset16(), memmove() or memcpy() instead of using
      open-coded loops.  I don't like the calling convention that uses a byte
      count instead of a count of u16s, but it's a little late to change that.
      Reduces code size of fbcon.o by almost 400 bytes on my laptop build.
      
      [akpm@linux-foundation.org: fix build]
      Link: http://lkml.kernel.org/r/20170720184539.31609-9-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac036f95
  4. 02 9月, 2017 4 次提交
    • C
      powerpc/xive: add XIVE Exploitation Mode to CAS · ac5e5a54
      Cédric Le Goater 提交于
      On POWER9, the Client Architecture Support (CAS) negotiation process
      determines whether the guest operates in XIVE Legacy compatibility or
      in XIVE exploitation mode. Now that we have initial guest support for
      the XIVE interrupt controller, let's inform the hypervisor what we can
      do.
      
      The platform advertises the XIVE Exploitation Mode support using the
      property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :
      
       - 0b00 XIVE legacy mode Only
       - 0b01 XIVE exploitation mode Only
       - 0b10 XIVE legacy or exploitation mode
      
      The OS asks for XIVE Exploitation Mode support using the property
      "ibm,architecture-vec-5", byte 23 bits 0-1:
      
       - 0b00 XIVE legacy mode Only
       - 0b01 XIVE exploitation mode Only
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac5e5a54
    • C
      powerpc/xive: introduce H_INT_ESB hcall · bed81ee1
      Cédric Le Goater 提交于
      The H_INT_ESB hcall() is used to issue a load or store to the ESB page
      instead of using the MMIO pages. This can be used as a workaround on
      some HW issues. The OS knows that this hcall should be used on an
      interrupt source when the ESB hcall flag is set to 1 in the hcall
      H_INT_GET_SOURCE_INFO.
      
      To maintain the frontier between the xive frontend and backend, we
      introduce a new xive operation 'esb_rw' to be used in the routines
      doing memory accesses on the ESBs.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bed81ee1
    • C
      powerpc/xive: add the HW IRQ number under xive_irq_data · c58a14a9
      Cédric Le Goater 提交于
      It will be required later by the H_INT_ESB hcall.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c58a14a9
    • C
      powerpc/xive: guest exploitation of the XIVE interrupt controller · eac1e731
      Cédric Le Goater 提交于
      This is the framework for using XIVE in a PowerVM guest. The support
      is very similar to the native one in a much simpler form.
      
      Each source is associated with an Event State Buffer (ESB). This is a
      two bit state machine which is used to trigger events. The bits are
      named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
      The Guest OS registers event (or notifications) queues on which the HW
      will post event data for a target to notify.
      
      Instead of OPAL calls, a set of Hypervisors call are used to configure
      the interrupt sources and the event/notification queues of the guest:
      
       - H_INT_GET_SOURCE_INFO
      
         used to obtain the address of the MMIO page of the Event State
         Buffer (PQ bits) entry associated with the source.
      
       - H_INT_SET_SOURCE_CONFIG
      
         assigns a source to a "target".
      
       - H_INT_GET_SOURCE_CONFIG
      
         determines to which "target" and "priority" is assigned to a source
      
       - H_INT_GET_QUEUE_INFO
      
         returns the address of the notification management page associated
         with the specified "target" and "priority".
      
       - H_INT_SET_QUEUE_CONFIG
      
         sets or resets the event queue for a given "target" and "priority".
         It is also used to set the notification config associated with the
         queue, only unconditional notification for the moment.  Reset is
         performed with a queue size of 0 and queueing is disabled in that
         case.
      
       - H_INT_GET_QUEUE_CONFIG
      
         returns the queue settings for a given "target" and "priority".
      
       - H_INT_RESET
      
         resets all of the partition's interrupt exploitation structures to
         their initial state, losing all configuration set via the hcalls
         H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
      
       - H_INT_SYNC
      
         issue a synchronisation on a source to make sure sure all
         notifications have reached their queue.
      
      As for XICS, the XIVE interface for the guest is described in the
      device tree under the "interrupt-controller" node. A couple of new
      properties are specific to XIVE :
      
       - "reg"
      
         contains the base address and size of the thread interrupt
         managnement areas (TIMA), also called rings, for the User level and
         for the Guest OS level. Only the Guest OS level is taken into
         account today.
      
       - "ibm,xive-eq-sizes"
      
         the size of the event queues. One cell per size supported, contains
         log2 of size, in ascending order.
      
       - "ibm,xive-lisn-ranges"
      
         the interrupt numbers ranges assigned to the guest. These are
         allocated using a simple bitmap.
      
      and also :
      
       - "/ibm,plat-res-int-priorities"
      
         contains a list of priorities that the hypervisor has reserved for
         its own use.
      
      Tested with a QEMU XIVE model for pseries and with the Power hypervisor.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eac1e731
  5. 01 9月, 2017 11 次提交
    • H
      crypto/nx: Add P9 NX specific error codes for 842 engine · 146e9f1b
      Haren Myneni 提交于
      This patch adds changes for checking P9 specific 842 engine
      error codes. These errros are reported in coprocessor status
      block (CSB) for failures.
      Signed-off-by: NHaren Myneni <haren@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      146e9f1b
    • C
      powerpc/32: add memset16() · da74f659
      Christophe Leroy 提交于
      Commit 694fc88c ("powerpc/string: Implement optimized
      memset variants") added memset16(), memset32() and memset64()
      for the 64 bits PPC.
      
      On 32 bits, memset64() is not relevant, and as shown below,
      the generic version of memset32() gives a good code, so only
      memset16() is candidate for an optimised version.
      
      000009c0 <memset32>:
       9c0:   2c 05 00 00     cmpwi   r5,0
       9c4:   39 23 ff fc     addi    r9,r3,-4
       9c8:   4d 82 00 20     beqlr
       9cc:   7c a9 03 a6     mtctr   r5
       9d0:   94 89 00 04     stwu    r4,4(r9)
       9d4:   42 00 ff fc     bdnz    9d0 <memset32+0x10>
       9d8:   4e 80 00 20     blr
      
      The last part of memset() handling the not 4-bytes multiples
      operates on bytes, making it unsuitable for handling word without
      modification. As it would increase memset() complexity, it is
      better to implement memset16() from scratch. In addition it
      has the advantage of allowing a more optimised memset16() than what
      we would have by using the memset() function.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da74f659
    • P
      powerpc: Emulate load/store floating point as integer word instructions · d2b65ac6
      Paul Mackerras 提交于
      This adds emulation for the lfiwax, lfiwzx and stfiwx instructions.
      This necessitated adding a new flag to indicate whether a floating
      point or an integer conversion was needed for LOAD_FP and STORE_FP,
      so this moves the size field in op->type up 4 bits.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d2b65ac6
    • P
      powerpc: Separate out load/store emulation into its own function · a53d5182
      Paul Mackerras 提交于
      This moves the parts of emulate_step() that deal with emulating
      load and store instructions into a new function called
      emulate_loadstore().  This is to make it possible to reuse this
      code in the alignment handler.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a53d5182
    • P
      powerpc: Handle opposite-endian processes in emulation code · d955189a
      Paul Mackerras 提交于
      This adds code to the load and store emulation code to byte-swap
      the data appropriately when the process being emulated is set to
      the opposite endianness to that of the kernel.
      
      This also enables the emulation for the multiple-register loads
      and stores (lmw, stmw, lswi, stswi, lswx, stswx) to work for
      little-endian.  In little-endian mode, the partial word at the
      end of a transfer for lsw*/stsw* (when the byte count is not a
      multiple of 4) is loaded/stored at the least-significant end of
      the register.  Additionally, this fixes a bug in the previous
      code in that it could call read_mem/write_mem with a byte count
      that was not 1, 2, 4 or 8.
      
      Note that this only works correctly on processors with "true"
      little-endian mode, such as IBM POWER processors from POWER6 on, not
      the so-called "PowerPC" little-endian mode that uses address swizzling
      as implemented on the old 32-bit 603, 604, 740/750, 74xx CPUs.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d955189a
    • P
      powerpc: Emulate the dcbz instruction · b2543f7b
      Paul Mackerras 提交于
      This adds code to analyse_instr() and emulate_step() to understand the
      dcbz (data cache block zero) instruction.  The emulate_dcbz() function
      is made public so it can be used by the alignment handler in future.
      (The apparently unnecessary cropping of the address to 32 bits is
      there because it will be needed in that situation.)
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b2543f7b
    • P
      powerpc: Emulate FP/vector/VSX loads/stores correctly when regs not live · c22435a5
      Paul Mackerras 提交于
      At present, the analyse_instr/emulate_step code checks for the
      relevant MSR_FP/VEC/VSX bit being set when a FP/VMX/VSX load
      or store is decoded, but doesn't recheck the bit before reading or
      writing the relevant FP/VMX/VSX register in emulate_step().
      
      Since we don't have preemption disabled, it is possible that we get
      preempted between checking the MSR bit and doing the register access.
      If that happened, then the registers would have been saved to the
      thread_struct for the current process.  Accesses to the CPU registers
      would then potentially read stale values, or write values that would
      never be seen by the user process.
      
      Another way that the registers can become non-live is if a page
      fault occurs when accessing user memory, and the page fault code
      calls a copy routine that wants to use the VMX or VSX registers.
      
      To fix this, the code for all the FP/VMX/VSX loads gets restructured
      so that it forms an image in a local variable of the desired register
      contents, then disables preemption, checks the MSR bit and either
      sets the CPU register or writes the value to the thread struct.
      Similarly, the code for stores checks the MSR bit, copies either the
      CPU register or the thread struct to a local variable, then reenables
      preemption and then copies the register image to memory.
      
      If the instruction being emulated is in the kernel, then we must not
      use the register values in the thread_struct.  In this case, if the
      relevant MSR enable bit is not set, then emulate_step refuses to
      emulate the instruction.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c22435a5
    • P
      powerpc/64: Fix update forms of loads and stores to write 64-bit EA · d120cdbc
      Paul Mackerras 提交于
      When a 64-bit processor is executing in 32-bit mode, the update forms
      of load and store instructions are required by the architecture to
      write the full 64-bit effective address into the RA register, though
      only the bottom 32 bits are used to address memory.  Currently,
      the instruction emulation code writes the truncated address to the
      RA register.  This fixes it by keeping the full 64-bit EA in the
      instruction_op structure, truncating the address in emulate_step()
      where it is used to address memory, rather than in the address
      computations in analyse_instr().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d120cdbc
    • P
      powerpc: Handle most loads and stores in instruction emulation code · 350779a2
      Paul Mackerras 提交于
      This extends the instruction emulation infrastructure in sstep.c to
      handle all the load and store instructions defined in the Power ISA
      v3.0, except for the atomic memory operations, ldmx (which was never
      implemented), lfdp/stfdp, and the vector element load/stores.
      
      The instructions added are:
      
      Integer loads and stores: lbarx, lharx, lqarx, stbcx., sthcx., stqcx.,
      lq, stq.
      
      VSX loads and stores: lxsiwzx, lxsiwax, stxsiwx, lxvx, lxvl, lxvll,
      lxvdsx, lxvwsx, stxvx, stxvl, stxvll, lxsspx, lxsdx, stxsspx, stxsdx,
      lxvw4x, lxsibzx, lxvh8x, lxsihzx, lxvb16x, stxvw4x, stxsibx, stxvh8x,
      stxsihx, stxvb16x, lxsd, lxssp, lxv, stxsd, stxssp, stxv.
      
      These instructions are handled both in the analyse_instr phase and in
      the emulate_step phase.
      
      The code for lxvd2ux and stxvd2ux has been taken out, as those
      instructions were never implemented in any processor and have been
      taken out of the architecture, and their opcodes have been reused for
      other instructions in POWER9 (lxvb16x and stxvb16x).
      
      The emulation for the VSX loads and stores uses helper functions
      which don't access registers or memory directly, which can hopefully
      be reused by KVM later.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      350779a2
    • P
      powerpc: Change analyse_instr so it doesn't modify *regs · 3cdfcbfd
      Paul Mackerras 提交于
      The analyse_instr function currently doesn't just work out what an
      instruction does, it also executes those instructions whose effect
      is only to update CPU registers that are stored in struct pt_regs.
      This is undesirable because optprobes uses analyse_instr to work out
      if an instruction could be successfully emulated in future.
      
      This changes analyse_instr so it doesn't modify *regs; instead it
      stores information in the instruction_op structure to indicate what
      registers (GPRs, CR, XER, LR) would be set and what value they would
      be set to.  A companion function called emulate_update_regs() can
      then use that information to update a pt_regs struct appropriately.
      
      As a minor cleanup, this replaces inline asm using the cntlzw and
      cntlzd instructions with calls to __builtin_clz() and __builtin_clzl().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3cdfcbfd
    • J
      KVM: update to new mmu_notifier semantic v2 · fb1522e0
      Jérôme Glisse 提交于
      Calls to mmu_notifier_invalidate_page() were replaced by calls to
      mmu_notifier_invalidate_range() and are now bracketed by calls to
      mmu_notifier_invalidate_range_start()/end()
      
      Remove now useless invalidate_page callback.
      
      Changed since v1 (Linus Torvalds)
          - remove now useless kvm_arch_mmu_notifier_invalidate_page()
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      Tested-by: NMike Galbraith <efault@gmx.de>
      Tested-by: NAdam Borowski <kilobyte@angband.pl>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb1522e0
  6. 31 8月, 2017 19 次提交
  7. 29 8月, 2017 2 次提交