1. 14 1月, 2014 1 次提交
  2. 13 1月, 2014 3 次提交
    • P
      sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs · 20d1c86a
      Peter Zijlstra 提交于
      Use a ring-buffer like multi-version object structure which allows
      always having a coherent object; we use this to avoid having to
      disable IRQs while reading sched_clock() and avoids a problem when
      getting an NMI while changing the cyc2ns data.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     331312     257223
          (cold) local_clock: 301773     310296     309889
          (warm) sched_clock: 38375      38247      25280
          (warm) local_clock: 100371     102713     85268
          (warm) rdtsc:       27340      27289      24247
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     372706     301224
          (cold) local_clock: 396890     399275     399870
          (warm) sched_clock: 38194      38124      25630
          (warm) local_clock: 143452     148698     129629
          (warm) rdtsc:       27345      27365      24307
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20d1c86a
    • P
      sched/clock, x86: Move some cyc2ns() code around · 57c67da2
      Peter Zijlstra 提交于
      There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
      so move it there.
      
      There are no cycles_2_ns() users.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57c67da2
    • P
      sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock() · 5dd12c21
      Peter Zijlstra 提交于
      Use mul_u64_u32_shr() so that x86_64 can use a single 64x64->128 mul.
      
      Before:
      
      0000000000000560 <native_sched_clock>:
       560:   44 8b 1d 00 00 00 00    mov    0x0(%rip),%r11d        # 567 <native_sched_clock+0x7>
       567:   55                      push   %rbp
       568:   48 89 e5                mov    %rsp,%rbp
       56b:   45 85 db                test   %r11d,%r11d
       56e:   75 4f                   jne    5bf <native_sched_clock+0x5f>
       570:   0f 31                   rdtsc
       572:   89 c0                   mov    %eax,%eax
       574:   48 c1 e2 20             shl    $0x20,%rdx
       578:   48 c7 c1 00 00 00 00    mov    $0x0,%rcx
       57f:   48 09 c2                or     %rax,%rdx
       582:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
       589:   65 8b 04 25 00 00 00    mov    %gs:0x0,%eax
       590:   00
       591:   48 98                   cltq
       593:   48 8b 34 c5 00 00 00    mov    0x0(,%rax,8),%rsi
       59a:   00
       59b:   48 89 d0                mov    %rdx,%rax
       59e:   81 e2 ff 03 00 00       and    $0x3ff,%edx
       5a4:   48 c1 e8 0a             shr    $0xa,%rax
       5a8:   48 0f af 14 0e          imul   (%rsi,%rcx,1),%rdx
       5ad:   48 0f af 04 0e          imul   (%rsi,%rcx,1),%rax
       5b2:   5d                      pop    %rbp
       5b3:   48 03 04 3e             add    (%rsi,%rdi,1),%rax
       5b7:   48 c1 ea 0a             shr    $0xa,%rdx
       5bb:   48 01 d0                add    %rdx,%rax
       5be:   c3                      retq
      
      After:
      
      0000000000000550 <native_sched_clock>:
       550:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 556 <native_sched_clock+0x6>
       556:   55                      push   %rbp
       557:   48 89 e5                mov    %rsp,%rbp
       55a:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
       55e:   85 ff                   test   %edi,%edi
       560:   75 2c                   jne    58e <native_sched_clock+0x3e>
       562:   0f 31                   rdtsc
       564:   89 c0                   mov    %eax,%eax
       566:   48 c1 e2 20             shl    $0x20,%rdx
       56a:   48 09 c2                or     %rax,%rdx
       56d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
       574:   00 00
       576:   89 c0                   mov    %eax,%eax
       578:   48 f7 e2                mul    %rdx
       57b:   65 48 8b 0c 25 00 00    mov    %gs:0x0,%rcx
       582:   00 00
       584:   c9                      leaveq
       585:   48 0f ac d0 0a          shrd   $0xa,%rdx,%rax
       58a:   48 01 c8                add    %rcx,%rax
       58d:   c3                      retq
      
                              MAINLINE   POST
      
          sched_clock_stable: 1	   1
          (cold) sched_clock: 329841     331312
          (cold) local_clock: 301773     310296
          (warm) sched_clock: 38375      38247
          (warm) local_clock: 100371     102713
          (warm) rdtsc:       27340      27289
          sched_clock_stable: 0          0
          (cold) sched_clock: 382634     372706
          (cold) local_clock: 396890     399275
          (warm) sched_clock: 38194      38124
          (warm) local_clock: 143452     148698
          (warm) rdtsc:       27345      27365
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-piu203ses5y1g36bnyw2n16x@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5dd12c21
  3. 12 1月, 2014 3 次提交
    • P
      x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs · 9345005f
      Prarit Bhargava 提交于
      During heavy CPU-hotplug operations the following spurious kernel warnings
      can trigger:
      
        do_IRQ: No ... irq handler for vector (irq -1)
      
        [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]
      
      When downing a cpu it is possible that there are unhandled irqs
      left in the APIC IRR register.  The following code path shows
      how the problem can occur:
      
       1. CPU 5 is to go down.
      
       2. cpu_disable() on CPU 5 executes with interrupt flag cleared
          by local_irq_save() via stop_machine().
      
       3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
          interrupt flag is cleared (CPU unabled to handle the irq)
      
       4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
          to -1. 5. stop_machine() finishes cpu_disable()
      
       6. cpu_die() for CPU 5 executes in normal context.
      
       7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
          IRQ 12.  The code attempts to find the vector's IRQ and cannot
          because it has been set to -1. 8. do_IRQ() warning displays
          warning about CPU 5 IRQ 12.
      
      I added a debug printk to output which CPU & vector was
      retriggered and discovered that that we are getting bogus
      events.  I see a 100% correlation between this debug printk in
      fixup_irqs() and the do_IRQ() warning.
      
      This patchset resolves this by adding definitions for
      VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
      the code to use them.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Reviewed-by: NRui Wang <rui.y.wang@intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: janet.morgan@Intel.com
      Cc: tony.luck@Intel.com
      Cc: ruiv.wang@gmail.com
      Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
      [ Cleaned up the code a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9345005f
    • P
      arch: Introduce smp_load_acquire(), smp_store_release() · 47933ad4
      Peter Zijlstra 提交于
      A number of situations currently require the heavyweight smp_mb(),
      even though there is no need to order prior stores against later
      loads.  Many architectures have much cheaper ways to handle these
      situations, but the Linux kernel currently has no portable way
      to make use of them.
      
      This commit therefore supplies smp_load_acquire() and
      smp_store_release() to remedy this situation.  The new
      smp_load_acquire() primitive orders the specified load against
      any subsequent reads or writes, while the new smp_store_release()
      primitive orders the specifed store against any prior reads or
      writes.  These primitives allow array-based circular FIFOs to be
      implemented without an smp_mb(), and also allow a theoretical
      hole in rcu_assign_pointer() to be closed at no additional
      expense on most architectures.
      
      In addition, the RCU experience transitioning from explicit
      smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
      and rcu_assign_pointer(), respectively resulted in substantial
      improvements in readability.  It therefore seems likely that
      replacing other explicit barriers with smp_load_acquire() and
      smp_store_release() will provide similar benefits.  It appears
      that roughly half of the explicit barriers in core kernel code
      might be so replaced.
      
      [Changelog by PaulMck]
      Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47933ad4
    • L
      x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround · 26bef131
      Linus Torvalds 提交于
      Before we do an EMMS in the AMD FXSAVE information leak workaround we
      need to clear any pending exceptions, otherwise we trap with a
      floating-point exception inside this code.
      Reported-by: Nhalfdog <me@halfdog.net>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      26bef131
  4. 05 1月, 2014 1 次提交
    • S
      x86, sparse: Do not force removal of __user when calling copy_to/from_user_nocheck() · df90ca96
      Steven Rostedt 提交于
      Commit ff47ab4f "x86: Add 1/2/4/8 byte optimization to 64bit
      __copy_{from,to}_user_inatomic" added a "_nocheck" call in between
      the copy_to/from_user() and copy_user_generic(). As both the
      normal and nocheck versions of theses calls use the proper __user
      annotation, a typecast to remove it should not be added.
      This causes sparse to spin out the following warnings:
      
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20140103164500.5f6478f5@gandalf.local.homeSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      df90ca96
  5. 28 12月, 2013 2 次提交
  6. 20 12月, 2013 2 次提交
  7. 19 12月, 2013 1 次提交
    • R
      mm: fix TLB flush race between migration, and change_protection_range · 20841405
      Rik van Riel 提交于
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20841405
  8. 11 12月, 2013 1 次提交
  9. 05 12月, 2013 1 次提交
  10. 19 11月, 2013 1 次提交
  11. 15 11月, 2013 2 次提交
    • K
      x86, mm: enable split page table lock for PMD level · 9491846f
      Kirill A. Shutemov 提交于
      Enable PMD split page table lock for X86_64 and PAE.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9491846f
    • R
      ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node · 7b199811
      Rafael J. Wysocki 提交于
      Modify struct acpi_dev_node to contain a pointer to struct acpi_device
      associated with the given device object (that is, its ACPI companion
      device) instead of an ACPI handle corresponding to it.  Introduce two
      new macros for manipulating that pointer in a CONFIG_ACPI-safe way,
      ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the
      ACPI_HANDLE() macro to take the above changes into account.
      Drop the ACPI_HANDLE_SET() macro entirely and rework its users to
      use ACPI_COMPANION_SET() instead.  For some of them who used to
      pass the result of acpi_get_child() directly to ACPI_HANDLE_SET()
      introduce a helper routine acpi_preset_companion() doing an
      equivalent thing.
      
      The main motivation for doing this is that there are things
      represented by struct acpi_device objects that don't have valid
      ACPI handles (so called fixed ACPI hardware features, such as
      power and sleep buttons) and we would like to create platform
      device objects for them and "glue" them to their ACPI companions
      in the usual way (which currently is impossible due to the
      lack of valid ACPI handles).  However, there are more reasons
      why it may be useful.
      
      First, struct acpi_device pointers allow of much better type checking
      than void pointers which are ACPI handles, so it should be more
      difficult to write buggy code using modified struct acpi_dev_node
      and the new macros.  Second, the change should help to reduce (over
      time) the number of places in which the result of ACPI_HANDLE() is
      passed to acpi_bus_get_device() in order to obtain a pointer to the
      struct acpi_device associated with the given "physical" device,
      because now that pointer is returned by ACPI_COMPANION() directly.
      Finally, the change should make it easier to write generic code that
      will build both for CONFIG_ACPI set and unset without adding explicit
      compiler directives to it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part
      7b199811
  12. 14 11月, 2013 1 次提交
  13. 13 11月, 2013 3 次提交
    • V
      x86: move fpu_counter into ARCH specific thread_struct · c375f15a
      Vineet Gupta 提交于
      Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
      be moved out into ARCH specific thread_struct, reducing the size of
      task_struct for other arches.
      
      Compile tested i386_defconfig + gcc 4.7.3
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Paul Mundt <paul.mundt@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c375f15a
    • L
      tools / power turbostat: Support Silvermont · 144b44b1
      Len Brown 提交于
      Support the next generation Intel Atom processor
      mirco-architecture, formerly called Silvermont.
      
      The server version, formerly called "Avoton",
      is named the "Intel(R) Atom(TM) Processor C2000 Product Family".
      
      The client version, formerly called "Bay Trail",
      is named the "Intel Atom Processor Z3000 Series",
      as well as various "Intel Pentium Processor"
      and "Intel Celeron Processor" brands, depending
      on form-factor.
      
      Silvermont has a set of MSRs not far off from NHM,
      but the RAPL register set is a sub-set of those previously supported.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      144b44b1
    • J
      x86/dumpstack: Fix printk_address for direct addresses · 5f01c988
      Jiri Slaby 提交于
      Consider a kernel crash in a module, simulated the following way:
      
       static int my_init(void)
       {
               char *map = (void *)0x5;
               *map = 3;
               return 0;
       }
       module_init(my_init);
      
      When we turn off FRAME_POINTERs, the very first instruction in
      that function causes a BUG. The problem is that we print IP in
      the BUG report using %pB (from printk_address). And %pB
      decrements the pointer by one to fix printing addresses of
      functions with tail calls.
      
      This was added in commit 71f9e598 ("x86, dumpstack: Use
      %pB format specifier for stack trace") to fix the call stack
      printouts.
      
      So instead of correct output:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
      
      We get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa0152000>] 0xffffffffa0151fff
      
      To fix that, we use %pS only for stack addresses printouts (via
      newly added printk_stack_address) and %pB for regs->ip (via
      printk_address). I.e. we revert to the old behaviour for all
      except call stacks. And since from all those reliable is 1, we
      remove that parameter from printk_address.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: joe@perches.com
      Cc: jirislaby@gmail.com
      Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5f01c988
  14. 12 11月, 2013 2 次提交
  15. 09 11月, 2013 4 次提交
  16. 07 11月, 2013 3 次提交
    • K
      PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() · 0e4ccb15
      Konrad Rzeszutek Wilk 提交于
      Certain platforms do not allow writes in the MSI-X BARs to setup or tear
      down vector values.  To combat against the generic code trying to write to
      that and either silently being ignored or crashing due to the pagetables
      being marked R/O this patch introduces a platform override.
      
      Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
      and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
      and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
      code.
      
      For Xen, which does not allow the guest to write to MSI-X tables - as the
      hypervisor is solely responsible for setting the vector values - we
      implement two nops.
      
      This fixes a Xen guest crash when passing a PCI device with MSI-X to the
      guest.  See the bugzilla for more details.
      
      [bhelgaas: add bugzilla info]
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      0e4ccb15
    • O
      uprobes: Introduce arch_uprobe->ixol · 8a8de66c
      Oleg Nesterov 提交于
      Currently xol_get_insn_slot() assumes that we should simply copy
      arch_uprobe->insn[] which is (ignoring arch_uprobe_analyze_insn)
      just the copy of the original insn.
      
      This is not true for arm which needs to create another insn to
      execute it out-of-line.
      
      So this patch simply adds the new member, ->ixol into the union.
      This doesn't make any difference for x86 and powerpc, but arm
      can divorce insn/ixol and initialize the correct xol insn in
      arch_uprobe_analyze_insn().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      8a8de66c
    • D
      uprobes: Move function declarations out of arch · 3820b4d2
      David A. Long 提交于
      Move the function declarations from the arch headers to the common
      header, since only the function bodies are architecture-specific.
      These changes are from Vincent Rabin's uprobes patch.
      
      [ oleg: update arch/powerpc/include/asm/uprobes.h ]
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NDavid A. Long <dave.long@linaro.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      3820b4d2
  17. 06 11月, 2013 3 次提交
  18. 31 10月, 2013 6 次提交
    • G
      percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3
      Greg Thelen 提交于
      this_cpu_sub() is implemented as negation and addition.
      
      This patch casts the adjustment to the counter type before negation to
      sign extend the adjustment.  This helps in cases where the counter type
      is wider than an unsigned adjustment.  An alternative to this patch is
      to declare such operations unsupported, but it seemed useful to avoid
      surprises.
      
      This patch specifically helps the following example:
        unsigned int delta = 1
        preempt_disable()
        this_cpu_write(long_counter, 0)
        this_cpu_sub(long_counter, delta)
        preempt_enable()
      
      Before this change long_counter on a 64 bit machine ends with value
      0xffffffff, rather than 0xffffffffffffffff.  This is because
      this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
      which is basically:
        long_counter = 0 + 0xffffffff
      
      Also apply the same cast to:
        __this_cpu_sub()
        __this_cpu_sub_return()
        this_cpu_sub_return()
      
      All percpu_test.ko passes, especially the following cases which
      previously failed:
      
        l -= ui_one;
        __this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
      
        l -= ui_one;
        this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
        CHECK(l, long_counter, 0xffffffffffffffff);
      
        ul -= ui_one;
        __this_cpu_sub(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, -1);
        CHECK(ul, ulong_counter, 0xffffffffffffffff);
      
        ul = this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 2);
      
        ul = __this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 1);
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd09d9a3
    • A
      kvm: Create non-coherent DMA registeration · e0f0bbc5
      Alex Williamson 提交于
      We currently use some ad-hoc arch variables tied to legacy KVM device
      assignment to manage emulation of instructions that depend on whether
      non-coherent DMA is present.  Create an interface for this, adapting
      legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
      For now we assume that non-coherent DMA is possible any time we have a
      VFIO group.  Eventually an interface can be developed as part of the
      VFIO external user interface to query the coherency of a group.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0f0bbc5
    • A
      kvm/x86: Convert iommu_flags to iommu_noncoherent · d96eb2c6
      Alex Williamson 提交于
      Default to operating in coherent mode.  This simplifies the logic when
      we switch to a model of registering and unregistering noncoherent I/O
      with KVM.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d96eb2c6
    • B
      kvm, emulator: Rename VendorSpecific flag · b51e974f
      Borislav Petkov 提交于
      Call it EmulateOnUD which is exactly what we're trying to do with
      vendor-specific instructions.
      
      Rename ->only_vendor_specific_insn to something shorter, while at it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b51e974f
    • B
      kvm, emulator: Use opcode length · 1ce19dc1
      Borislav Petkov 提交于
      Add a field to the current emulation context which contains the
      instruction opcode length. This will streamline handling of opcodes of
      different length.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ce19dc1
    • B
      kvm: Add KVM_GET_EMULATED_CPUID · 9c15bb1d
      Borislav Petkov 提交于
      Add a kvm ioctl which states which system functionality kvm emulates.
      The format used is that of CPUID and we return the corresponding CPUID
      bits set for which we do emulate functionality.
      
      Make sure ->padding is being passed on clean from userspace so that we
      can use it for something in the future, after the ioctl gets cast in
      stone.
      
      s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
      it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c15bb1d