1. 15 9月, 2012 3 次提交
    • O
      ptrace/x86: Partly fix set_task_blockstep()->update_debugctlmsr() logic · 95cf00fa
      Oleg Nesterov 提交于
      Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in
      step.c was always very wrong.
      
      1. update_debugctlmsr() was simply unneeded. The child sleeps
         TASK_TRACED, __switch_to_xtra(next_p => child) should notice
         TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if
         needed.
      
      2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register
         should always match the state of current's TIF_BLOCKSTEP bit.
      
      3. Even get_debugctlmsr() + update_debugctlmsr() itself does not
         look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR
         register or the caller can be preempted in between.
      
      4. It is not safe to play with TIF_BLOCKSTEP if task != current.
         DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each
         other if the task is running. The tracee is stopped but it
         can be SIGKILL'ed right before set/clear_tsk_thread_flag().
      
      However, now that uprobes uses user_enable_single_step(current)
      we can't simply remove update_debugctlmsr(). So this patch adds
      the additional "task == current" check and disables irqs to avoid
      the race with interrupts/preemption.
      
      Unfortunately this patch doesn't solve the last problem, we need
      another fix. Probably we should teach ptrace_stop() to set/clear
      single/block stepping after resume.
      
      And afaics there is yet another problem: perf can play with
      MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even
      __switch_to_xtra() has problems.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      95cf00fa
    • O
      ptrace/x86: Introduce set_task_blockstep() helper · 848e8f5f
      Oleg Nesterov 提交于
      No functional changes, preparation for the next fix and for uprobes
      single-step fixes.
      
      Move the code playing with TIF_BLOCKSTEP/DEBUGCTLMSR_BTF into the
      new helper, set_task_blockstep().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      848e8f5f
    • S
      uprobes/x86: Implement x86 specific arch_uprobe_*_step · bdc1e472
      Sebastian Andrzej Siewior 提交于
      The arch specific implementation behaves like user_enable_single_step()
      except that it does not disable single stepping if it was already
      enabled by ptrace. This allows the debugger to single step over an
      uprobe. The state of block stepping is not restored. It makes only sense
      together with TF and if that was enabled then the debugger is notified.
      
      Note: this is still not correct. For example, TIF_SINGLESTEP check
      is not right, the application itself can set X86_EFLAGS_TF. And otoh
      we leak TIF_SINGLESTEP (set by enable) if the probed insn is "popf".
      See the next patches, we need the changes in arch/x86/kernel/step.c
      first.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      bdc1e472
  2. 27 8月, 2012 1 次提交
  3. 23 8月, 2012 4 次提交
    • S
      ftrace/x86: Add support for -mfentry to x86_64 · d57c5d51
      Steven Rostedt 提交于
      If the kernel is compiled with gcc 4.6.0 which supports -mfentry,
      then use that instead of mcount.
      
      With mcount, frame pointers are forced with the -pg option and we
      get something like:
      
      <can_vma_merge_before>:
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             53                      push   %rbx
             41 51                   push   %r9
             e8 fe 6a 39 00          callq  ffffffff81483d00 <mcount>
             31 c0                   xor    %eax,%eax
             48 89 fb                mov    %rdi,%rbx
             48 89 d7                mov    %rdx,%rdi
             48 33 73 30             xor    0x30(%rbx),%rsi
             48 f7 c6 ff ff ff f7    test   $0xfffffffff7ffffff,%rsi
      
      With -mfentry, frame pointers are no longer forced and the call looks
      like this:
      
      <can_vma_merge_before>:
             e8 33 af 37 00          callq  ffffffff81461b40 <__fentry__>
             53                      push   %rbx
             48 89 fb                mov    %rdi,%rbx
             31 c0                   xor    %eax,%eax
             48 89 d7                mov    %rdx,%rdi
             41 51                   push   %r9
             48 33 73 30             xor    0x30(%rbx),%rsi
             48 f7 c6 ff ff ff f7    test   $0xfffffffff7ffffff,%rsi
      
      This adds the ftrace hook at the beginning of the function before a
      frame is set up, and allows the function callbacks to be able to access
      parameters. As kprobes now can use function tracing (at least on x86)
      this speeds up the kprobe hooks that are at the beginning of the
      function.
      
      Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.orgAcked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d57c5d51
    • K
      xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. · c96aae1f
      Konrad Rzeszutek Wilk 提交于
      When we are finished with return PFNs to the hypervisor, then
      populate it back, and also mark the E820 MMIO and E820 gaps
      as IDENTITY_FRAMEs, we then call P2M to set areas that can
      be used for ballooning. We were off by one, and ended up
      over-writting a P2M entry that most likely was an IDENTITY_FRAME.
      For example:
      
      1-1 mapping on 40000->40200
      1-1 mapping on bc558->bc5ac
      1-1 mapping on bc5b4->bc8c5
      1-1 mapping on bc8c6->bcb7c
      1-1 mapping on bcd00->100000
      Released 614 pages of unused memory
      Set 277889 page(s) to 1-1 mapping
      Populating 40200-40466 pfn range: 614 pages added
      
      => here we set from 40466 up to bc559 P2M tree to be
      INVALID_P2M_ENTRY. We should have done it up to bc558.
      
      The end result is that if anybody is trying to construct
      a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
      
      CC: stable@vger.kernel.org
      Reported-by-and-Tested-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c96aae1f
    • A
      x86, microcode, AMD: Fix broken ucode patch size check · 36bf50d7
      Andreas Herrmann 提交于
      This issue was recently observed on an AMD C-50 CPU where a patch of
      maximum size was applied.
      
      Commit be62adb4 ("x86, microcode, AMD: Simplify ucode verification")
      added current_size in get_matching_microcode(). This is calculated as
      size of the ucode patch + 8 (ie. size of the header). Later this is
      compared against the maximum possible ucode patch size for a CPU family.
      And of course this fails if the patch has already maximum size.
      
      Cc: <stable@vger.kernel.org> [3.3+]
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      36bf50d7
    • A
      KVM: x86 emulator: use stack size attribute to mask rsp in stack ops · 5ad105e5
      Avi Kivity 提交于
      The sub-register used to access the stack (sp, esp, or rsp) is not
      determined by the address size attribute like other memory references,
      but by the stack segment's B bit (if not in x86_64 mode).
      
      Fix by using the existing stack_mask() to figure out the correct mask.
      
      This long-existing bug was exposed by a combination of a27685c3
      (emulate invalid guest state by default), which causes many more
      instructions to be emulated, and a seabios change (possibly a bug) which
      causes the high 16 bits of esp to become polluted across calls to real
      mode software interrupts.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      5ad105e5
  4. 22 8月, 2012 5 次提交
    • T
      KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended · 35f2d16b
      Takuya Yoshikawa 提交于
      Although the possible race described in
      
        commit 85b70591
        KVM: MMU: fix shrinking page from the empty mmu
      
      was correct, the real cause of that issue was a more trivial bug of
      mmu_shrink() introduced by
      
        commit 19526396
        KVM: MMU: do not iterate over all VMs in mmu_shrink()
      
      Here is the bug:
      
      	if (kvm->arch.n_used_mmu_pages > 0) {
      		if (!nr_to_scan--)
      			break;
      		continue;
      	}
      
      We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
      in other words we try to shrink empty ones by mistake.
      
      This patch reverses the logic so that mmu_shrink() can free pages from
      the first VM whose n_used_mmu_pages is not zero.  Note that we also add
      comments explaining the role of nr_to_scan which is not practically
      important now, hoping this will be improved in the future.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      35f2d16b
    • A
      x86/alternatives: Fix p6 nops on non-modular kernels · cb09cad4
      Avi Kivity 提交于
      Probably a leftover from the early days of self-patching, p6nops
      are marked __initconst_or_module, which causes them to be
      discarded in a non-modular kernel.  If something later triggers
      patching, it will overwrite kernel code with garbage.
      Reported-by: NTomas Racek <tracek@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Cc: Michael Tokarev <mjt@tls.msk.ru>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: qemu-devel@nongnu.org
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb09cad4
    • L
      x86/fixup_irq: Use cpu_online_mask instead of cpu_all_mask · 2530cd4f
      Liu, Chuansheng 提交于
      When one CPU is going down and this CPU is the last one in irq
      affinity, current code is setting cpu_all_mask as the new
      affinity for that irq.
      
      But for some systems (such as in Medfield Android mobile) the
      firmware sends the interrupt to each CPU in the irq affinity
      mask, averaged, and cpu_all_mask includes all potential CPUs,
      i.e. offline ones as well.
      
      So replace cpu_all_mask with cpu_online_mask.
      Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
      Acked-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2530cd4f
    • R
      x86/spinlocks: Fix comment in spinlock.h · 83be4ffa
      Richard Weinberger 提交于
      This comment is no longer true.  We support up to 2^16 CPUs
      because __ticket_t is an u16 if NR_CPUS is larger than 256.
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      83be4ffa
    • M
      mm: hugetlbfs: correctly populate shared pmd · eb48c071
      Michal Hocko 提交于
      Each page mapped in a process's address space must be correctly
      accounted for in _mapcount.  Normally the rules for this are
      straightforward but hugetlbfs page table sharing is different.  The page
      table pages at the PMD level are reference counted while the mapcount
      remains the same.
      
      If this accounting is wrong, it causes bugs like this one reported by
      Larry Woodman:
      
        kernel BUG at mm/filemap.c:135!
        invalid opcode: 0000 [#1] SMP
        CPU 22
        Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
        Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
        RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
        Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
        Call Trace:
          delete_from_page_cache+0x40/0x80
          truncate_hugepages+0x115/0x1f0
          hugetlbfs_evict_inode+0x18/0x30
          evict+0x9f/0x1b0
          iput_final+0xe3/0x1e0
          iput+0x3e/0x50
          d_kill+0xf8/0x110
          dput+0xe2/0x1b0
          __fput+0x162/0x240
      
      During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
      shared page tables with the check dst_pte == src_pte.  The logic is if
      the PMD page is the same, they must be shared.  This assumes that the
      sharing is between the parent and child.  However, if the sharing is
      with a different process entirely then this check fails as in this
      diagram:
      
        parent
          |
          ------------>pmd
                       src_pte----------> data page
                                              ^
        other--------->pmd--------------------|
                        ^
        child-----------|
                       dst_pte
      
      For this situation to occur, it must be possible for Parent and Other to
      have faulted and failed to share page tables with each other.  This is
      possible due to the following style of race.
      
        PROC A                                          PROC B
        copy_hugetlb_page_range                         copy_hugetlb_page_range
          src_pte == huge_pte_offset                      src_pte == huge_pte_offset
          !src_pte so no sharing                          !src_pte so no sharing
      
        (time passes)
      
        hugetlb_fault                                   hugetlb_fault
          huge_pte_alloc                                  huge_pte_alloc
            huge_pmd_share                                 huge_pmd_share
              LOCK(i_mmap_mutex)
              find nothing, no sharing
              UNLOCK(i_mmap_mutex)
                                                            LOCK(i_mmap_mutex)
                                                            find nothing, no sharing
                                                            UNLOCK(i_mmap_mutex)
            pmd_alloc                                       pmd_alloc
            LOCK(instantiation_mutex)
            fault
            UNLOCK(instantiation_mutex)
                                                        LOCK(instantiation_mutex)
                                                        fault
                                                        UNLOCK(instantiation_mutex)
      
      These two processes are not poing to the same data page but are not
      sharing page tables because the opportunity was missed.  When either
      process later forks, the src_pte == dst pte is potentially insufficient.
      As the check falls through, the wrong PTE information is copied in
      (harmless but wrong) and the mapcount is bumped for a page mapped by a
      shared page table leading to the BUG_ON.
      
      This patch addresses the issue by moving pmd_alloc into huge_pmd_share
      which guarantees that the shared pud is populated in the same critical
      section as pmd.  This also means that huge_pte_offset test in
      huge_pmd_share is serialized correctly now which in turn means that the
      success of the sharing will be higher as the racing tasks see the pud
      and pmd populated together.
      
      Race identified and changelog written mostly by Mel Gorman.
      
      {akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
      Reported-by: NLarry Woodman <lwoodman@redhat.com>
      Tested-by: NLarry Woodman <lwoodman@redhat.com>
      Reviewed-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Ken Chen <kenchen@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb48c071
  5. 19 8月, 2012 1 次提交
    • M
      x32: Use compat shims for {g,s}etsockopt · 515c7af8
      Mike Frysinger 提交于
      Some of the arguments to {g,s}etsockopt are passed in userland pointers.
      If we try to use the 64bit entry point, we end up sometimes failing.
      
      For example, dhcpcd doesn't run in x32:
      	# dhcpcd eth0
      	dhcpcd[1979]: version 5.5.6 starting
      	dhcpcd[1979]: eth0: broadcasting for a lease
      	dhcpcd[1979]: eth0: open_socket: Invalid argument
      	dhcpcd[1979]: eth0: send_raw_packet: Bad file descriptor
      
      The code in particular is getting back EINVAL when doing:
      	struct sock_fprog pf;
      	setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &pf, sizeof(pf));
      
      Diving into the kernel code, we can see:
      include/linux/filter.h:
      	struct sock_fprog {
      		unsigned short len;
      		struct sock_filter __user *filter;
      	};
      
      net/core/sock.c:
      	case SO_ATTACH_FILTER:
      		ret = -EINVAL;
      		if (optlen == sizeof(struct sock_fprog)) {
      			struct sock_fprog fprog;
      
      			ret = -EFAULT;
      			if (copy_from_user(&fprog, optval, sizeof(fprog)))
      				break;
      
      			ret = sk_attach_filter(&fprog, sk);
      		}
      		break;
      
      arch/x86/syscalls/syscall_64.tbl:
      	54 common setsockopt sys_setsockopt
      	55 common getsockopt sys_getsockopt
      
      So for x64, sizeof(sock_fprog) is 16 bytes.  For x86/x32, it's 8 bytes.
      This comes down to the pointer being 32bit for x32, which means we need
      to do structure size translation.  But since x32 comes in directly to
      sys_setsockopt, it doesn't get translated like x86.
      
      After changing the syscall table and rebuilding glibc with the new kernel
      headers, dhcp runs fine in an x32 userland.
      
      Oddly, it seems like Linus noted the same thing during the initial port,
      but I guess that was missed/lost along the way:
      	https://lkml.org/lkml/2011/8/26/452
      
      [ hpa: tagging for -stable since this is an ABI fix. ]
      
      Bugzilla: https://bugs.gentoo.org/423649Reported-by: NMads <mads@ab3.no>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Link: http://lkml.kernel.org/r/1345320697-15713-1-git-send-email-vapier@gentoo.org
      Cc: H. J. Lu <hjl.tools@gmail.com>
      Cc: <stable@vger.kernel.org> v3.4..v3.5
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      515c7af8
  6. 17 8月, 2012 2 次提交
  7. 15 8月, 2012 2 次提交
  8. 14 8月, 2012 4 次提交
  9. 11 8月, 2012 1 次提交
  10. 10 8月, 2012 4 次提交
    • J
      perf: Add ability to attach user stack dump to sample · c5ebcedb
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
      of the user level stack on sample. The size of the dump is specified by
      sample_stack_user value.
      
      Being able to dump parts of the user stack, starting from the stack
      pointer, will be useful to make a post mortem dwarf CFI based stack
      unwinding.
      
      Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
      architecture provides user stack dump on perf event samples.  This needs
      access to the user stack pointer which is not unified across
      architectures. Enabling this for x86 architecture.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5ebcedb
    • F
      perf: Factor __output_copy to be usable with specific copy function · 91d7753a
      Frederic Weisbecker 提交于
      Adding a generic way to use __output_copy function with specific copy
      function via DEFINE_PERF_OUTPUT_COPY macro.
      
      Using this to add new __output_copy_user function, that provides output
      copy from user pointers. For x86 the copy_from_user_nmi function is used
      and __copy_from_user_inatomic for the rest of the architectures.
      
      This new function will be used in user stack dump on sample, coming in
      next patches.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91d7753a
    • J
      perf: Add ability to attach user level registers dump to sample · 4018994f
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
      user level registers on sample. Registers we want to dump are specified
      by sample_regs_user bitmask.
      
      Only user level registers are dumped at the moment. Meaning the register
      values of the user space context as it was before the user entered the
      kernel for whatever reason (syscall, irq, exception, or a PMI happening
      in userspace).
      
      The layout of the sample_regs_user bitmap is described in
      asm/perf_regs.h for archs that support register dump.
      
      This is going to be useful to bring Dwarf CFI based stack unwinding on
      top of samples.
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      [ Dump registers ABI specification. ]
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Suggested-by: NStephane Eranian <eranian@google.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4018994f
    • J
      perf: Unified API to record selective sets of arch registers · c5e63197
      Jiri Olsa 提交于
      This brings a new API to help the selective dump of registers on event
      sampling, and its implementation for x86 arch.
      
      Added HAVE_PERF_REGS config option to determine if the architecture
      provides perf registers ABI.
      
      The information about desired registers will be passed in u64 mask.
      It's up to the architecture to map the registers into the mask bits.
      
      For the x86 arch implementation, both 32 and 64 bit registers bits are
      defined within single enum to ensure 64 bit system can provide register
      dump for compat task if needed in the future.
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      [ Added missing linux/errno.h include ]
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-2-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5e63197
  11. 09 8月, 2012 1 次提交
  12. 05 8月, 2012 1 次提交
  13. 03 8月, 2012 1 次提交
  14. 02 8月, 2012 5 次提交
    • K
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. · 5bc6f988
      Konrad Rzeszutek Wilk 提交于
      When we release pages back during bootup:
      
      Freeing  9d-100 pfn range: 99 pages freed
      Freeing  9cf36-9d0d2 pfn range: 412 pages freed
      Freeing  9f6bd-9f6bf pfn range: 2 pages freed
      Freeing  9f714-9f7bf pfn range: 171 pages freed
      Freeing  9f7e0-9f7ff pfn range: 31 pages freed
      Freeing  9f800-100000 pfn range: 395264 pages freed
      Released 395979 pages of unused memory
      
      We then try to populate those pages back. In the P2M tree however
      the space for those leafs must be reserved - as such we use extend_brk.
      We reserve 8MB of _brk space, which means we can fit over
      1048576 PFNs - which is more than we should ever need.
      
      Without this, on certain compilation of the kernel we would hit:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff818aad3b>]
      (XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81a7c000   rbx: 000000000000003d   rcx: 0000000000001000
      (XEN) rdx: ffffffff81a7b000   rsi: 0000000000001000   rdi: 0000000000001000
      (XEN) rbp: ffffffff81801cd8   rsp: ffffffff81801c98   r8:  0000000000100000
      (XEN) r9:  ffffffff81a7a000   r10: 0000000000000001   r11: 0000000000000003
      (XEN) r12: 0000000000000004   r13: 0000000000000004   r14: 000000000000003d
      (XEN) r15: 00000000000001e8   cr0: 000000008005003b   cr4: 00000000000006f0
      (XEN) cr3: 0000000125803000   cr2: 0000000000000000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81801c98:
      
      .. which is extend_brk hitting a BUG_ON.
      
      Interestingly enough, most of the time we are not going to hit this
      b/c the _brk space is quite large (v3.5):
       ffffffff81a25000 B __brk_base
       ffffffff81e43000 B __brk_limit
      = ~4MB.
      
      vs earlier kernels (with this back-ported), the space is smaller:
       ffffffff81a25000 B __brk_base
       ffffffff81a7b000 B __brk_limit
      = 344 kBytes.
      
      where we would certainly hit this and hit extend_brk.
      
      Note that git commit c3d93f88
      (xen: populate correct number of pages when across mem boundary (v2))
      exposed this bug).
      
      [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]
      
      CC: stable@vger.kernel.org #only for 3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5bc6f988
    • A
      KVM: VMX: Fix ds/es corruption on i386 with preemption · aa67f609
      Avi Kivity 提交于
      Commit b2da15ac ("KVM: VMX: Optimize %ds, %es reload") broke i386
      in the following scenario:
      
        vcpu_load
        ...
        vmx_save_host_state
        vmx_vcpu_run
        (ds.rpl, es.rpl cleared by hardware)
      
        interrupt
          push ds, es  # pushes bad ds, es
          schedule
            vmx_vcpu_put
              vmx_load_host_state
                reload ds, es (with __USER_DS)
          pop ds, es  # of other thread's stack
          iret
        # other thread runs
        interrupt
          push ds, es
          schedule  # back in vcpu thread
          pop ds, es  # now with rpl=0
          iret
        ...
        vcpu_put
        resume_userspace
        iret  # clears ds, es due to mismatched rpl
      
      (instead of resume_userspace, we might return with SYSEXIT and then
      take an exception; when the exception IRETs we end up with cleared
      ds, es)
      
      Fix by avoiding the optimization on i386 and reloading ds, es on the
      lightweight exit path.
      Reported-by: NChris Clayron <chris2553@googlemail.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      aa67f609
    • H
      x86-64, kcmp: The kcmp system call can be common · eaf4ce6c
      H. Peter Anvin 提交于
      We already use the same system call handler for i386 and x86-64, there
      is absolutely no reason x32 can't use the same system call, too.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: <stable@vger.kernel.org> v3.5
      Link: http://lkml.kernel.org/n/tip-vwzk3qbcr3yjyxjg2j38vgy9@git.kernel.org
      eaf4ce6c
    • A
      a3170d2e
    • B
      KVM: x86: apply kvmclock offset to guest wall clock time · 4b648665
      Bruce Rogers 提交于
      When a guest migrates to a new host, the system time difference from the
      previous host is used in the updates to the kvmclock system time visible
      to the guest, resulting in a continuation of correct kvmclock based guest
      timekeeping.
      
      The wall clock component of the kvmclock provided time is currently not
      updated with this same time offset. Since the Linux guest caches the
      wall clock based time, this discrepency is not noticed until the guest is
      rebooted. After reboot the guest's time calculations are off.
      
      This patch adjusts the wall clock by the kvmclock_offset, resulting in
      correct guest time after a reboot.
      
      Cc: Zachary Amsden <zamsden@gmail.com>
      Signed-off-by: NBruce Rogers <brogers@suse.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      4b648665
  15. 01 8月, 2012 5 次提交