1. 21 12月, 2014 1 次提交
    • A
      x86_64, vdso: Fix the vdso address randomization algorithm · 394f56fe
      Andy Lutomirski 提交于
      The theory behind vdso randomization is that it's mapped at a random
      offset above the top of the stack.  To avoid wasting a page of
      memory for an extra page table, the vdso isn't supposed to extend
      past the lowest PMD into which it can fit.  Other than that, the
      address should be a uniformly distributed address that meets all of
      the alignment requirements.
      
      The current algorithm is buggy: the vdso has about a 50% probability
      of being at the very end of a PMD.  The current algorithm also has a
      decent chance of failing outright due to incorrect handling of the
      case where the top of the stack is near the top of its PMD.
      
      This fixes the implementation.  The paxtest estimate of vdso
      "randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
      don't know what the paxtest code is actually calculating.)
      
      It's worth noting that this algorithm is inherently biased: the vdso
      is more likely to end up near the end of its PMD than near the
      beginning.  Ideally we would either nix the PMD sharing requirement
      or jointly randomize the vdso and the stack to reduce the bias.
      
      In the mean time, this is a considerable improvement with basically
      no risk of compatibility issues, since the allowed outputs of the
      algorithm are unchanged.
      
      As an easy test, doing this:
      
      for i in `seq 10000`
        do grep -P vdso /proc/self/maps |cut -d- -f1
      done |sort |uniq -d
      
      used to produce lots of output (1445 lines on my most recent run).
      A tiny subset looks like this:
      
      7fffdfffe000
      7fffe01fe000
      7fffe05fe000
      7fffe07fe000
      7fffe09fe000
      7fffe0bfe000
      7fffe0dfe000
      
      Note the suspicious fe000 endings.  With the fix, I get a much more
      palatable 76 repeated addresses.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      394f56fe
  2. 20 12月, 2014 15 次提交
  3. 19 12月, 2014 2 次提交
  4. 18 12月, 2014 12 次提交
  5. 17 12月, 2014 10 次提交
    • S
      KVM: PPC: Book3S HV: Improve H_CONFER implementation · 90fd09f8
      Sam Bobroff 提交于
      Currently the H_CONFER hcall is implemented in kernel virtual mode,
      meaning that whenever a guest thread does an H_CONFER, all the threads
      in that virtual core have to exit the guest.  This is bad for
      performance because it interrupts the other threads even if they
      are doing useful work.
      
      The H_CONFER hcall is called by a guest VCPU when it is spinning on a
      spinlock and it detects that the spinlock is held by a guest VCPU that
      is currently not running on a physical CPU.  The idea is to give this
      VCPU's time slice to the holder VCPU so that it can make progress
      towards releasing the lock.
      
      To avoid having the other threads exit the guest unnecessarily,
      we add a real-mode implementation of H_CONFER that checks whether
      the other threads are doing anything.  If all the other threads
      are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
      it returns H_TOO_HARD which causes a guest exit and allows the
      H_CONFER to be handled in virtual mode.
      
      Otherwise it spins for a short time (up to 10 microseconds) to give
      other threads the chance to observe that this thread is trying to
      confer.  The spin loop also terminates when any thread exits the guest
      or when all other threads are idle or trying to confer.  If the
      timeout is reached, the H_CONFER returns H_SUCCESS.  In this case the
      guest VCPU will recheck the spinlock word and most likely call
      H_CONFER again.
      
      This also improves the implementation of the H_CONFER virtual mode
      handler.  If the VCPU is part of a virtual core (vcore) which is
      runnable, there will be a 'runner' VCPU which has taken responsibility
      for running the vcore.  In this case we yield to the runner VCPU
      rather than the target VCPU.
      
      We also introduce a check on the target VCPU's yield count: if it
      differs from the yield count passed to H_CONFER, the target VCPU
      has run since H_CONFER was called and may have already released
      the lock.  This check is required by PAPR.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      90fd09f8
    • P
      KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61
      Paul Mackerras 提交于
      There are two ways in which a guest instruction can be obtained from
      the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
      exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
      instruction), the offending instruction is in the HEIR register
      (Hypervisor Emulation Instruction Register).  If the exit was caused
      by a load or store to an emulated MMIO device, we load the instruction
      from the guest by turning data relocation on and loading the instruction
      with an lwz instruction.
      
      Unfortunately, in the case where the guest has opposite endianness to
      the host, these two methods give results of different endianness, but
      both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
      using guest endianness, whereas the lwz will load the instruction using
      host endianness.  The rest of the code that uses vcpu->arch.last_inst
      assumes it was loaded using host endianness.
      
      To fix this, we define a new vcpu field to store the HEIR value.  Then,
      in kvmppc_handle_exit_hv(), we transfer the value from this new field to
      vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
      differ.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4a157d61
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
    • S
      KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions · 3c78f78a
      Suresh E. Warrier 提交于
      This patch adds trace points in the guest entry and exit code and also
      for exceptions handled by the host in kernel mode - hypercalls and page
      faults. The new events are added to /sys/kernel/debug/tracing/events
      under a new subsystem called kvm_hv.
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3c78f78a
    • P
      KVM: PPC: Book3S HV: Simplify locking around stolen time calculations · 2711e248
      Paul Mackerras 提交于
      Currently the calculations of stolen time for PPC Book3S HV guests
      uses fields in both the vcpu struct and the kvmppc_vcore struct.  The
      fields in the kvmppc_vcore struct are protected by the
      vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
      running the virtual core.  This works correctly but confuses lockdep,
      because it sees that the code takes the tbacct_lock for a vcpu in
      kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
      vcore_stolen_time(), and it thinks there is a possibility of deadlock,
      causing it to print reports like this:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
      ---------------------------------------------
      qemu-system-ppc/6188 is trying to acquire lock:
       (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
      
      but task is already holding lock:
       (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&vcpu->arch.tbacct_lock)->rlock);
        lock(&(&vcpu->arch.tbacct_lock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by qemu-system-ppc/6188:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm]
       #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv]
       #2:  (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
      
      stack backtrace:
      CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89
      Call Trace:
      [c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable)
      [c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190
      [c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0
      [c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70
      [c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
      [c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv]
      [c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv]
      [c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
      [c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
      [c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm]
      [c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770
      [c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0
      [c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98
      
      In order to make the locking easier to analyse, we change the code to
      use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
      preempt_tb fields.  This lock needs to be an irq-safe lock since it is
      used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
      functions, which are called with the scheduler rq lock held, which is
      an irq-safe lock.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2711e248
    • R
      arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function · a0499cf7
      Rickard Strandqvist 提交于
      Remove the function inst_set_field() that is not used anywhere.
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0499cf7
    • R
      arch: powerpc: kvm: book3s_pr.c: Remove unused function · 6178839b
      Rickard Strandqvist 提交于
      Remove the function get_fpr_index() that is not used anywhere.
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6178839b
    • R
      arch: powerpc: kvm: book3s.c: Remove some unused functions · 54ca162a
      Rickard Strandqvist 提交于
      Removes some functions that are not used anywhere:
      kvmppc_core_load_guest_debugstate() kvmppc_core_load_host_debugstate()
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54ca162a
    • R
      arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function · 24aaaf22
      Rickard Strandqvist 提交于
      Remove the function sr_nx() that is not used anywhere.
      
      This was partially found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      24aaaf22
    • L
      microblaze: Fix mmap for cache coherent memory · 3a8e3265
      Lars-Peter Clausen 提交于
      When running in non-cache coherent configuration the memory that was
      allocated with dma_alloc_coherent() has a custom mapping and so there is no
      1-to-1 relationship between the kernel virtual address and the PFN. This
      means that virt_to_pfn() will not work correctly for those addresses and the
      default mmap implementation in the form of dma_common_mmap() will map some
      random, but not the requested, memory area.
      
      Fix this by providing a custom mmap implementation that looks up the PFN
      from the page table rather than using virt_to_pfn.
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: NMichal Simek <michal.simek@xilinx.com>
      3a8e3265