1. 28 9月, 2012 4 次提交
    • G
      um: Preinclude include/linux/kern_levels.h · 9429ec96
      Geert Uytterhoeven 提交于
      The userspace part of UML uses the asm-offsets.h generator mechanism to
      create definitions for UM_KERN_<LEVEL> that match the in-kernel
      KERN_<LEVEL> constant definitions.
      
      As of commit 04d2c8c8 ("printk: convert
      the format for KERN_<LEVEL> to a 2 byte pattern"), KERN_<LEVEL> is no
      longer expanded to the literal '"<LEVEL>"', but to '"\001" "LEVEL"', i.e.
      it contains two parts.
      
      However, the combo of DEFINE_STR() in
      arch/x86/um/shared/sysdep/kernel-offsets.h and sed-y in Kbuild doesn't
      support string literals consisting of multiple parts. Hence for all
      UM_KERN_<LEVEL> definitions, only the SOH character is retained in the actual
      definition, while the remainder ends up in the comment. E.g. in
      include/generated/asm-offsets.h we get
      
          #define UM_KERN_INFO "\001" /* "6" KERN_INFO */
      
      instead of
      
          #define UM_KERN_INFO "\001" "6" /* KERN_INFO */
      
      This causes spurious '^A' output in some kernel messages:
      
          Calibrating delay loop... 4640.76 BogoMIPS (lpj=23203840)
          pid_max: default: 32768 minimum: 301
          Mount-cache hash table entries: 256
          ^AChecking that host ptys support output SIGIO...Yes
          ^AChecking that host ptys support SIGIO on close...No, enabling workaround
          ^AUsing 2.6 host AIO
          NET: Registered protocol family 16
          bio: create slab <bio-0> at 0
          Switching to clocksource itimer
      
      To fix this:
        - Move the mapping from UM_KERN_<LEVEL> to KERN_<LEVEL> from
          arch/um/include/shared/common-offsets.h to
          arch/um/include/shared/user.h, which is preincluded for all userspace
          parts,
        - Preinclude include/linux/kern_levels.h for all userspace parts, to
          obtain the in-kernel KERN_<LEVEL> constant definitions. This doesn't
          violate the kernel/userspace separation, as include/linux/kern_levels.h
          is self-contained and doesn't expose any other kernel internals.
        - Remove the now unused STR() and DEFINE_STR() macros.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      9429ec96
    • R
      um: Fix IPC on um · bbb35efc
      Richard Weinberger 提交于
      commit c1d7e01d (ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION)
      forgot UML and broke IPC on it.
      Also UML has to select ARCH_WANT_IPC_PARSE_VERSION usin Kconfig.
      
      Reported-and-tested-by: <Toralf Förster toralf.foerster@gmx.de>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      bbb35efc
    • A
      um: kill thread->forking · d2ce4e92
      Al Viro 提交于
      we only use that to tell copy_thread() done by syscall from that
      done by kernel_thread().  However, it's easier to do simply by
      checking PF_KTHREAD in thread flags.
      
      Merge sys_clone() guts for 32bit and 64bit, while we are at it...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d2ce4e92
    • A
      um: let signal_delivered() do SIGTRAP on singlestepping into handler · f9a38eac
      Al Viro 提交于
      ... rather than duplicating that in sigframe setup code (and doing that
      inconsistently, at that)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f9a38eac
  2. 27 9月, 2012 1 次提交
  3. 26 9月, 2012 9 次提交
    • F
      x86: Exit RCU extended QS on notify resume · edf55fda
      Frederic Weisbecker 提交于
      do_notify_resume() may be called on irq or exception
      exit. But at that time the exception has already called
      rcu_user_enter() and the irq has already called rcu_irq_exit().
      
      Since it can use RCU read side critical section, we must call
      rcu_user_exit() before doing anything there. Then we must call
      back rcu_user_enter() after this function because we know we are
      going to userspace from there.
      
      This complete support for userspace RCU extended quiescent state
      in x86-64.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      edf55fda
    • F
      x86: Use the new schedule_user API on userspace preemption · 0430499c
      Frederic Weisbecker 提交于
      This way we can exit the RCU extended quiescent state before
      we schedule a new task from irq/exception exit.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      0430499c
    • F
      x86: Exception hooks for userspace RCU extended QS · 6ba3c97a
      Frederic Weisbecker 提交于
      Add necessary hooks to x86 exception for userspace
      RCU extended quiescent state support.
      
      This includes traps, page fault, debug exceptions, etc...
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6ba3c97a
    • F
      x86: Unspaghettize do_general_protection() · ef3f6288
      Frederic Weisbecker 提交于
      There is some unnatural label based layout in this function.
      Convert the unnecessary goto to readable conditional blocks.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      ef3f6288
    • F
      x86: Syscall hooks for userspace RCU extended QS · bf5a3c13
      Frederic Weisbecker 提交于
      Add syscall slow path hooks to notify syscall entry
      and exit on CPUs that want to support userspace RCU
      extended quiescent state.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      bf5a3c13
    • F
      x86: Unspaghettize do_trap() · c416ddf5
      Frederic Weisbecker 提交于
      Cleanup the label maze in this function. Having a
      seperate function to first handle the traps that don't
      generate a signal makes it easier to convert into
      more readable conditional paths.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1348577479-2564-1-git-send-email-fweisbec@gmail.com
      [ Fixed 32-bit build failure. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c416ddf5
    • T
      x86_64: Work around old GAS bug · 1b2b23d8
      Tao Guo 提交于
      GAS in binutils(2.16.91) could not parse parentheses within
      macro parameters unless fully parenthesized, and this is a
      workaround to make old gas work without generating below errors:
      
       arch/x86/kernel/entry_64.S: Assembler messages:
       arch/x86/kernel/entry_64.S:387: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:389: Error: too many positional arguments
       [...]
      Signed-off-by: NTao Guo <glorioustao@gmail.com>
      Reluctantly-Acked-by: NJan Beulich <jbeulich@novell.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1348648102-12653-1-git-send-email-glorioustao@gmail.com
      [ Jan argues that these old GAS versions are fragile - which is so, but lets give them a chance. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b2b23d8
    • Y
      x86/api: Rename mp_register_lapic in a comment · 57c078ce
      Yasuaki Ishimatsu 提交于
      Commit 31d2092e ("x86: move
      mp_register_lapic_address to boot.c") renamed mp_register_lapic
      to acpi_register_lapic. But mp_register_lapic remains in a
      comment. So the patch rename it.
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/50625239.3050403@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57c078ce
    • M
      x86: Remove the useless branch in c_start() · dec08a83
      Michael Wang 提交于
      Since 'cpu == -1' in cpumask_next() is legal, no need to handle
      '*pos == 0' specially.
      
      About the comments:
      
      	/* just in case, cpu 0 is not the first */
      
      A test with a cpumask in which cpu 0 is not the first has been
      done, and it works well.
      
      This patch will remove that useless branch to clean the code.
      Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Cc: kjwinchester@gmail.com
      Cc: borislav.petkov@amd.com
      Cc: ak@linux.intel.com
      Link: http://lkml.kernel.org/r/1348033343-23658-1-git-send-email-wangyun@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dec08a83
  4. 25 9月, 2012 1 次提交
    • F
      cputime: Make finegrained irqtime accounting generally available · fdf9c356
      Frederic Weisbecker 提交于
      There is no known reason for this option to be unavailable on other
      archs than x86. They just need to call enable_sched_clock_irqtime()
      if they have a sufficiently finegrained clock to make it working.
      
      Move it to the general option and let the user choose between
      it and pure tick based or virtual cputime accounting.
      
      Note that virtual cputime accounting already performs a finegrained
      irqtime accounting. CONFIG_IRQ_TIME_ACCOUNTING is a kind of middle ground
      between tick and virtual based accounting. So CONFIG_IRQ_TIME_ACCOUNTING
      and CONFIG_VIRT_CPU_ACCOUNTING are mutually exclusive choices.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      fdf9c356
  5. 24 9月, 2012 1 次提交
    • K
      xen/boot: Disable NUMA for PV guests. · 8d54db79
      Konrad Rzeszutek Wilk 提交于
      The hypervisor is in charge of allocating the proper "NUMA" memory
      and dealing with the CPU scheduler to keep them bound to the proper
      NUMA node. The PV guests (and PVHVM) have no inkling of where they
      run and do not need to know that right now. In the future we will
      need to inject NUMA configuration data (if a guest spans two or more
      NUMA nodes) so that the kernel can make the right choices. But those
      patches are not yet present.
      
      In the meantime, disable the NUMA capability in the PV guest, which
      also fixes a bootup issue. Andre says:
      
      "we see Dom0 crashes due to the kernel detecting the NUMA topology not
      by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
      
      This will detect the actual NUMA config of the physical machine, but
      will crash about the mismatch with Dom0's virtual memory. Variation of
      the theme: Dom0 sees what it's not supposed to see.
      
      This happens with the said config option enabled and on a machine where
      this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
      
      We have this dump then:
      NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
      Scanning NUMA topology in Northbridge 24
      Number of physical nodes 4
      Node 0 MemBase 0000000000000000 Limit 0000000040000000
      Node 1 MemBase 0000000040000000 Limit 0000000138000000
      Node 2 MemBase 0000000138000000 Limit 00000001f8000000
      Node 3 MemBase 00000001f8000000 Limit 0000000238000000
      Initmem setup node 0 0000000000000000-0000000040000000
        NODE_DATA [000000003ffd9000 - 000000003fffffff]
      Initmem setup node 1 0000000040000000-0000000138000000
        NODE_DATA [0000000137fd9000 - 0000000137ffffff]
      Initmem setup node 2 0000000138000000-00000001f8000000
        NODE_DATA [00000001f095e000 - 00000001f0984fff]
      Initmem setup node 3 00000001f8000000-0000000238000000
      Cannot find 159744 bytes in node 3
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
      Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
      RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
      .. snip..
        [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178
        [<ffffffff81d23348>] sparse_init+0xe4/0x25a
        [<ffffffff81d16840>] paging_init+0x13/0x22
        [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
        [<ffffffff81683954>] ? printk+0x3c/0x3e
        [<ffffffff81d01a38>] start_kernel+0xe5/0x468
        [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
        [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
        [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
      "
      
      so we just disable NUMA scanning by setting numa_off=1.
      
      CC: stable@vger.kernel.org
      Reported-and-Tested-by: NAndre Przywara <andre.przywara@amd.com>
      Acked-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8d54db79
  6. 23 9月, 2012 2 次提交
    • S
      Use get_online_cpus to avoid races involving CPU hotplug · 429227bb
      Silas Boyd-Wickizer 提交于
      If arch/x86/kernel/cpuid.c is a module, a CPU might offline or online
      between the for_each_online_cpu() loop and the call to
      register_hotcpu_notifier in cpuid_init or the call to
      unregister_hotcpu_notifier in cpuid_exit.  The potential races can
      lead to leaks/duplicates, attempts to destroy non-existant devices, or
      random pointer dereferences.
      
      For example, in cpuid_exit if:
      
              for_each_online_cpu(cpu)
                      cpuid_device_destroy(cpu);
              class_destroy(cpuid_class);
              __unregister_chrdev(CPUID_MAJOR, 0, NR_CPUS, "cpu/cpuid");
              <----- CPU onlines
              unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
      
      the hotcpu notifier will attempt to create a device for the
      cpuid_class, which the module already destroyed.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier or
      unregister_hotcpu_notifier with get_online_cpus+put_online_cpus.
      
      Tested on a VM.
      Signed-off-by: NSilas Boyd-Wickizer <sbw@mit.edu>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      429227bb
    • S
      Use get_online_cpus to avoid races involving CPU hotplug · a2db672a
      Silas Boyd-Wickizer 提交于
      If arch/x86/kernel/msr.c is a module, a CPU might offline or online
      between the for_each_online_cpu(i) loop and the call to
      register_hotcpu_notifier in msr_init or the call to
      unregister_hotcpu_notifier in msr_exit. The potential races can lead
      to leaks/duplicates, attempts to destroy non-existant devices, or
      random pointer dereferences.
      
      For example, in msr_init if:
      
              for_each_online_cpu(i) {
                      err = msr_device_create(i);
                      if (err != 0)
                              goto out_class;
              }
              <----- CPU offlines
              register_hotcpu_notifier(&msr_class_cpu_notifier);
      
      and the CPU never onlines before msr_exit, then the module will never
      call msr_device_destroy for the associated CPU.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier or
      unregister_hotcpu_notifier with get_online_cpus+put_online_cpus.
      
      Tested on a VM.
      Signed-off-by: NSilas Boyd-Wickizer <sbw@mit.edu>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a2db672a
  7. 21 9月, 2012 2 次提交
  8. 20 9月, 2012 2 次提交
    • B
      kprobes/x86: Move skip_singlestep up · 50a011f6
      Borislav Petkov 提交于
      I get this warning:
      
        arch/x86/kernel/kprobes.c:544:23: warning: ‘skip_singlestep’ declared ‘static’ but never defined
      
      on tip/auto-latest.
      
      Put the skip_singlestep function declaration up, in
      KPROBES_CAN_USE_FTRACE and drop the superfluous forward
      declaration.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1348145034-16603-1-git-send-email-bp@amd64.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      50a011f6
    • K
      xen/boot: Disable BIOS SMP MP table search. · bd49940a
      Konrad Rzeszutek Wilk 提交于
      As the initial domain we are able to search/map certain regions
      of memory to harvest configuration data. For all low-level we
      use ACPI tables - for interrupts we use exclusively ACPI _PRT
      (so DSDT) and MADT for INT_SRC_OVR.
      
      The SMP MP table is not used at all. As a matter of fact we do
      not even support machines that only have SMP MP but no ACPI tables.
      
      Lets follow how Moorestown does it and just disable searching
      for BIOS SMP tables.
      
      This also fixes an issue on HP Proliant BL680c G5 and DL380 G6:
      
      9f->100 for 1:1 PTE
      Freeing 9f-100 pfn range: 97 pages freed
      1-1 mapping on 9f->100
      .. snip..
      e820: BIOS-provided physical RAM map:
      Xen: [mem 0x0000000000000000-0x000000000009efff] usable
      Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved
      Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable
      .. snip..
      Scan for SMP in [mem 0x00000000-0x000003ff]
      Scan for SMP in [mem 0x0009fc00-0x0009ffff]
      Scan for SMP in [mem 0x000f0000-0x000fffff]
      found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0]
      (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e()
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71
      . snip..
      Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5
      .. snip..
      Call Trace:
       [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81ad32ac>] early_ioremap+0x13/0x15
       [<ffffffff81acc140>] get_mpc_size+0x2f/0x67
       [<ffffffff81acc284>] smp_scan_config+0x10c/0x136
       [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a
       [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81abca7f>] start_kernel+0x90/0x390
       [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136
       [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661
      (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
      
      which is that ioremap would end up mapping 0xff using _PAGE_IOMAP
      (which is what early_ioremap sticks as a flag) - which meant
      we would get MFN 0xFF (pte ff461, which is OK), and then it would
      also map 0x100 (b/c ioremap tries to get page aligned request, and
      it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page)
      as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP
      bypasses the P2M lookup we would happily set the PTE to 1000461.
      Xen would deny the request since we do not have access to the
      Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example
      0x80140.
      
      CC: stable@vger.kernel.org
      Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bd49940a
  9. 19 9月, 2012 5 次提交
  10. 18 9月, 2012 1 次提交
  11. 15 9月, 2012 7 次提交
    • O
      uprobes: Make arch_uprobe_task->saved_trap_nr "unsigned int" · baedbf02
      Oleg Nesterov 提交于
      Make arch_uprobe_task->saved_trap_nr "unsigned int" and move it down
      after ->saved_scratch_register, this changes sizeof() from 24 to 16.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      baedbf02
    • O
      uprobes/x86: Fix arch_uprobe_disable_step() && UTASK_SSTEP_TRAPPED interaction · d6a00b35
      Oleg Nesterov 提交于
      arch_uprobe_disable_step() should also take UTASK_SSTEP_TRAPPED into
      account. In this case the probed insn was not executed, we need to
      clear X86_EFLAGS_TF if it was set by us and that is all.
      
      Again, this code will look more clean when we move it into
      arch_uprobe_post_xol() and arch_uprobe_abort_xol().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      d6a00b35
    • O
      uprobes/x86: Xol should send SIGTRAP if X86_EFLAGS_TF was set · 3a4664aa
      Oleg Nesterov 提交于
      arch_uprobe_disable_step() correctly preserves X86_EFLAGS_TF and
      returns to user-mode. But this means the application gets SIGTRAP
      only after the next insn.
      
      This means that UPROBE_CLEAR_TF logic is not really right. _enable
      should only record the state of X86_EFLAGS_TF, and _disable should
      check it separately from UPROBE_FIX_SETF.
      
      Remove arch_uprobe_task->restore_flags, add ->saved_tf instead, and
      change enable/disable accordingly. This assumes that the probed insn
      was not trapped, see the next patch.
      
      arch_uprobe_skip_sstep() logic has the same problem, change it to
      check X86_EFLAGS_TF and send SIGTRAP as well. We will cleanup this
      all after we fold enable/disable_step into pre/post_hol hooks.
      
      Note: send_sig(SIGTRAP) is not actually right, we need send_sigtrap().
      But this needs more changes, handle_swbp() does the same and this is
      equally wrong.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      3a4664aa
    • O
      uprobes/x86: Do not (ab)use TIF_SINGLESTEP/user_*_single_step() for single-stepping · 9bd1190a
      Oleg Nesterov 提交于
      user_enable/disable_single_step() was designed for ptrace, it assumes
      a single user and does unnecessary and wrong things for uprobes. For
      example:
      
      	- arch_uprobe_enable_step() can't trust TIF_SINGLESTEP, an
      	  application itself can set X86_EFLAGS_TF which must be
      	  preserved after arch_uprobe_disable_step().
      
      	- we do not want to set TIF_SINGLESTEP/TIF_FORCED_TF in
      	  arch_uprobe_enable_step(), this only makes sense for ptrace.
      
      	- otoh we leak TIF_SINGLESTEP if arch_uprobe_disable_step()
      	  doesn't do user_disable_single_step(), the application will
      	  be killed after the next syscall.
      
      	- arch_uprobe_enable_step() does access_process_vm() we do
      	  not need/want.
      
      Change arch_uprobe_enable/disable_step() to set/clear X86_EFLAGS_TF
      directly, this is much simpler and more correct. However, we need to
      clear TIF_BLOCKSTEP/DEBUGCTLMSR_BTF before executing the probed insn,
      add set_task_blockstep(false).
      
      Note: with or without this patch, there is another (hopefully minor)
      problem. A probed "pushf" insn can see the wrong X86_EFLAGS_TF set by
      uprobes. Perhaps we should change _disable to update the stack, or
      teach arch_uprobe_skip_sstep() to emulate this insn.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      9bd1190a
    • O
      ptrace/x86: Partly fix set_task_blockstep()->update_debugctlmsr() logic · 95cf00fa
      Oleg Nesterov 提交于
      Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in
      step.c was always very wrong.
      
      1. update_debugctlmsr() was simply unneeded. The child sleeps
         TASK_TRACED, __switch_to_xtra(next_p => child) should notice
         TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if
         needed.
      
      2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register
         should always match the state of current's TIF_BLOCKSTEP bit.
      
      3. Even get_debugctlmsr() + update_debugctlmsr() itself does not
         look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR
         register or the caller can be preempted in between.
      
      4. It is not safe to play with TIF_BLOCKSTEP if task != current.
         DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each
         other if the task is running. The tracee is stopped but it
         can be SIGKILL'ed right before set/clear_tsk_thread_flag().
      
      However, now that uprobes uses user_enable_single_step(current)
      we can't simply remove update_debugctlmsr(). So this patch adds
      the additional "task == current" check and disables irqs to avoid
      the race with interrupts/preemption.
      
      Unfortunately this patch doesn't solve the last problem, we need
      another fix. Probably we should teach ptrace_stop() to set/clear
      single/block stepping after resume.
      
      And afaics there is yet another problem: perf can play with
      MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even
      __switch_to_xtra() has problems.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      95cf00fa
    • O
      ptrace/x86: Introduce set_task_blockstep() helper · 848e8f5f
      Oleg Nesterov 提交于
      No functional changes, preparation for the next fix and for uprobes
      single-step fixes.
      
      Move the code playing with TIF_BLOCKSTEP/DEBUGCTLMSR_BTF into the
      new helper, set_task_blockstep().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      848e8f5f
    • S
      uprobes/x86: Implement x86 specific arch_uprobe_*_step · bdc1e472
      Sebastian Andrzej Siewior 提交于
      The arch specific implementation behaves like user_enable_single_step()
      except that it does not disable single stepping if it was already
      enabled by ptrace. This allows the debugger to single step over an
      uprobe. The state of block stepping is not restored. It makes only sense
      together with TF and if that was enabled then the debugger is notified.
      
      Note: this is still not correct. For example, TIF_SINGLESTEP check
      is not right, the application itself can set X86_EFLAGS_TF. And otoh
      we leak TIF_SINGLESTEP (set by enable) if the probed insn is "popf".
      See the next patches, we need the changes in arch/x86/kernel/step.c
      first.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      bdc1e472
  12. 14 9月, 2012 4 次提交
  13. 13 9月, 2012 1 次提交