1. 08 1月, 2015 3 次提交
    • G
      ARM: 8253/1: mm: use phys_addr_t type in map_lowmem() for kernel mem region · ac084688
      Grygorii Strashko 提交于
      Now local variables kernel_x_start and kernel_x_end defined using
      'unsigned long' type which is wrong because they represent physical
      memory range and will be calculated wrongly if LPAE is enabled.
      As result, all following code in map_lowmem() will not work correctly.
      
      For example, Keystone 2 boot is broken because
       kernel_x_start == 0x0000 0000
       kernel_x_end   == 0x0080 0000
      
      instead of
       kernel_x_start == 0x0000 0008 0000 0000
       kernel_x_end   == 0x0000 0008 0080 0000
      and as result whole low memory will be mapped with MT_MEMORY_RW
      permissions by code (start > kernel_x_end):
      		} else if (start >= kernel_x_end) {
      			map.pfn = __phys_to_pfn(start);
      			map.virtual = __phys_to_virt(start);
      			map.length = end - start;
      			map.type = MT_MEMORY_RW;
      
      			create_mapping(&map);
      		}
      
      Hence, fix it by using phys_addr_t type for variables kernel_x_start
      and kernel_x_end.
      Tested-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      ac084688
    • M
      ARM: 8249/1: mm: dump: don't skip regions · cca547e9
      Mark Rutland 提交于
      Currently the arm page table dumping code starts dumping page tables
      from USER_PGTABLES_CEILING. This is unnecessary for skipping any entries
      related to userspace as the swapper_pg_dir does not contain such
      entries, and results in a couple of unfortuante side effects.
      
      Firstly, any kernel mappings which might exist below
      USER_PGTABLES_CEILING will not be accounted in the dump output. This
      masks any entries erroneously created below this address.
      
      Secondly, if the final page table entry walked is part of a valid
      mapping the page table dumping code will not log the region this entry
      is part of, as the final note_page call in walk_pgd will trigger an
      early return when 0 < USER_PGTABLES_CEILING. Luckily this isn't seen on
      contemporary systems as they typically don't have enough RAM to extend
      the linear mapping right to the end of the address space.
      
      Due to the way addr is constructed in the walk_* functions, it can never
      be less than USER_PGTABLES_CEILING when walking the page tables, so it
      is not necessary to avoid dereferencing invalid table addresses. The
      existing checks for st->current_prot and st->marker[1].start_address are
      sufficient to ensure we will not print and/or dereference garbage when
      trying to log information.
      
      This patch removes both problematic uses of USER_PGTABLES_CEILING from
      the arm page table dumping code, preventing both of these issues. We
      will now report any low mappings, and the final note_page call will not
      return early, ensuring all regions are logged.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      cca547e9
    • R
      ARM: wire up execveat syscall · 841ee230
      Russell King 提交于
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      841ee230
  2. 20 12月, 2014 15 次提交
  3. 19 12月, 2014 2 次提交
  4. 18 12月, 2014 12 次提交
  5. 17 12月, 2014 8 次提交
    • S
      KVM: PPC: Book3S HV: Improve H_CONFER implementation · 90fd09f8
      Sam Bobroff 提交于
      Currently the H_CONFER hcall is implemented in kernel virtual mode,
      meaning that whenever a guest thread does an H_CONFER, all the threads
      in that virtual core have to exit the guest.  This is bad for
      performance because it interrupts the other threads even if they
      are doing useful work.
      
      The H_CONFER hcall is called by a guest VCPU when it is spinning on a
      spinlock and it detects that the spinlock is held by a guest VCPU that
      is currently not running on a physical CPU.  The idea is to give this
      VCPU's time slice to the holder VCPU so that it can make progress
      towards releasing the lock.
      
      To avoid having the other threads exit the guest unnecessarily,
      we add a real-mode implementation of H_CONFER that checks whether
      the other threads are doing anything.  If all the other threads
      are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
      it returns H_TOO_HARD which causes a guest exit and allows the
      H_CONFER to be handled in virtual mode.
      
      Otherwise it spins for a short time (up to 10 microseconds) to give
      other threads the chance to observe that this thread is trying to
      confer.  The spin loop also terminates when any thread exits the guest
      or when all other threads are idle or trying to confer.  If the
      timeout is reached, the H_CONFER returns H_SUCCESS.  In this case the
      guest VCPU will recheck the spinlock word and most likely call
      H_CONFER again.
      
      This also improves the implementation of the H_CONFER virtual mode
      handler.  If the VCPU is part of a virtual core (vcore) which is
      runnable, there will be a 'runner' VCPU which has taken responsibility
      for running the vcore.  In this case we yield to the runner VCPU
      rather than the target VCPU.
      
      We also introduce a check on the target VCPU's yield count: if it
      differs from the yield count passed to H_CONFER, the target VCPU
      has run since H_CONFER was called and may have already released
      the lock.  This check is required by PAPR.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      90fd09f8
    • P
      KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61
      Paul Mackerras 提交于
      There are two ways in which a guest instruction can be obtained from
      the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
      exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
      instruction), the offending instruction is in the HEIR register
      (Hypervisor Emulation Instruction Register).  If the exit was caused
      by a load or store to an emulated MMIO device, we load the instruction
      from the guest by turning data relocation on and loading the instruction
      with an lwz instruction.
      
      Unfortunately, in the case where the guest has opposite endianness to
      the host, these two methods give results of different endianness, but
      both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
      using guest endianness, whereas the lwz will load the instruction using
      host endianness.  The rest of the code that uses vcpu->arch.last_inst
      assumes it was loaded using host endianness.
      
      To fix this, we define a new vcpu field to store the HEIR value.  Then,
      in kvmppc_handle_exit_hv(), we transfer the value from this new field to
      vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
      differ.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4a157d61
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
    • S
      KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions · 3c78f78a
      Suresh E. Warrier 提交于
      This patch adds trace points in the guest entry and exit code and also
      for exceptions handled by the host in kernel mode - hypercalls and page
      faults. The new events are added to /sys/kernel/debug/tracing/events
      under a new subsystem called kvm_hv.
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3c78f78a
    • P
      KVM: PPC: Book3S HV: Simplify locking around stolen time calculations · 2711e248
      Paul Mackerras 提交于
      Currently the calculations of stolen time for PPC Book3S HV guests
      uses fields in both the vcpu struct and the kvmppc_vcore struct.  The
      fields in the kvmppc_vcore struct are protected by the
      vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
      running the virtual core.  This works correctly but confuses lockdep,
      because it sees that the code takes the tbacct_lock for a vcpu in
      kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
      vcore_stolen_time(), and it thinks there is a possibility of deadlock,
      causing it to print reports like this:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
      ---------------------------------------------
      qemu-system-ppc/6188 is trying to acquire lock:
       (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
      
      but task is already holding lock:
       (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&vcpu->arch.tbacct_lock)->rlock);
        lock(&(&vcpu->arch.tbacct_lock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by qemu-system-ppc/6188:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm]
       #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv]
       #2:  (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
      
      stack backtrace:
      CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89
      Call Trace:
      [c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable)
      [c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190
      [c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0
      [c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70
      [c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
      [c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv]
      [c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv]
      [c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
      [c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
      [c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm]
      [c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770
      [c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0
      [c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98
      
      In order to make the locking easier to analyse, we change the code to
      use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
      preempt_tb fields.  This lock needs to be an irq-safe lock since it is
      used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
      functions, which are called with the scheduler rq lock held, which is
      an irq-safe lock.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2711e248
    • R
      arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function · a0499cf7
      Rickard Strandqvist 提交于
      Remove the function inst_set_field() that is not used anywhere.
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0499cf7
    • R
      arch: powerpc: kvm: book3s_pr.c: Remove unused function · 6178839b
      Rickard Strandqvist 提交于
      Remove the function get_fpr_index() that is not used anywhere.
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6178839b
    • R
      arch: powerpc: kvm: book3s.c: Remove some unused functions · 54ca162a
      Rickard Strandqvist 提交于
      Removes some functions that are not used anywhere:
      kvmppc_core_load_guest_debugstate() kvmppc_core_load_host_debugstate()
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54ca162a