1. 15 1月, 2014 1 次提交
  2. 12 1月, 2014 1 次提交
  3. 19 12月, 2013 1 次提交
    • R
      mm: fix TLB flush race between migration, and change_protection_range · 20841405
      Rik van Riel 提交于
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20841405
  4. 11 12月, 2013 1 次提交
  5. 05 12月, 2013 1 次提交
  6. 19 11月, 2013 1 次提交
  7. 15 11月, 2013 2 次提交
    • K
      x86, mm: enable split page table lock for PMD level · 9491846f
      Kirill A. Shutemov 提交于
      Enable PMD split page table lock for X86_64 and PAE.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9491846f
    • R
      ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node · 7b199811
      Rafael J. Wysocki 提交于
      Modify struct acpi_dev_node to contain a pointer to struct acpi_device
      associated with the given device object (that is, its ACPI companion
      device) instead of an ACPI handle corresponding to it.  Introduce two
      new macros for manipulating that pointer in a CONFIG_ACPI-safe way,
      ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the
      ACPI_HANDLE() macro to take the above changes into account.
      Drop the ACPI_HANDLE_SET() macro entirely and rework its users to
      use ACPI_COMPANION_SET() instead.  For some of them who used to
      pass the result of acpi_get_child() directly to ACPI_HANDLE_SET()
      introduce a helper routine acpi_preset_companion() doing an
      equivalent thing.
      
      The main motivation for doing this is that there are things
      represented by struct acpi_device objects that don't have valid
      ACPI handles (so called fixed ACPI hardware features, such as
      power and sleep buttons) and we would like to create platform
      device objects for them and "glue" them to their ACPI companions
      in the usual way (which currently is impossible due to the
      lack of valid ACPI handles).  However, there are more reasons
      why it may be useful.
      
      First, struct acpi_device pointers allow of much better type checking
      than void pointers which are ACPI handles, so it should be more
      difficult to write buggy code using modified struct acpi_dev_node
      and the new macros.  Second, the change should help to reduce (over
      time) the number of places in which the result of ACPI_HANDLE() is
      passed to acpi_bus_get_device() in order to obtain a pointer to the
      struct acpi_device associated with the given "physical" device,
      because now that pointer is returned by ACPI_COMPANION() directly.
      Finally, the change should make it easier to write generic code that
      will build both for CONFIG_ACPI set and unset without adding explicit
      compiler directives to it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part
      7b199811
  8. 14 11月, 2013 1 次提交
  9. 13 11月, 2013 3 次提交
    • V
      x86: move fpu_counter into ARCH specific thread_struct · c375f15a
      Vineet Gupta 提交于
      Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
      be moved out into ARCH specific thread_struct, reducing the size of
      task_struct for other arches.
      
      Compile tested i386_defconfig + gcc 4.7.3
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Paul Mundt <paul.mundt@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c375f15a
    • L
      tools / power turbostat: Support Silvermont · 144b44b1
      Len Brown 提交于
      Support the next generation Intel Atom processor
      mirco-architecture, formerly called Silvermont.
      
      The server version, formerly called "Avoton",
      is named the "Intel(R) Atom(TM) Processor C2000 Product Family".
      
      The client version, formerly called "Bay Trail",
      is named the "Intel Atom Processor Z3000 Series",
      as well as various "Intel Pentium Processor"
      and "Intel Celeron Processor" brands, depending
      on form-factor.
      
      Silvermont has a set of MSRs not far off from NHM,
      but the RAPL register set is a sub-set of those previously supported.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      144b44b1
    • J
      x86/dumpstack: Fix printk_address for direct addresses · 5f01c988
      Jiri Slaby 提交于
      Consider a kernel crash in a module, simulated the following way:
      
       static int my_init(void)
       {
               char *map = (void *)0x5;
               *map = 3;
               return 0;
       }
       module_init(my_init);
      
      When we turn off FRAME_POINTERs, the very first instruction in
      that function causes a BUG. The problem is that we print IP in
      the BUG report using %pB (from printk_address). And %pB
      decrements the pointer by one to fix printing addresses of
      functions with tail calls.
      
      This was added in commit 71f9e598 ("x86, dumpstack: Use
      %pB format specifier for stack trace") to fix the call stack
      printouts.
      
      So instead of correct output:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
      
      We get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa0152000>] 0xffffffffa0151fff
      
      To fix that, we use %pS only for stack addresses printouts (via
      newly added printk_stack_address) and %pB for regs->ip (via
      printk_address). I.e. we revert to the old behaviour for all
      except call stacks. And since from all those reliable is 1, we
      remove that parameter from printk_address.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: joe@perches.com
      Cc: jirislaby@gmail.com
      Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5f01c988
  10. 12 11月, 2013 2 次提交
  11. 09 11月, 2013 4 次提交
  12. 07 11月, 2013 3 次提交
    • K
      PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() · 0e4ccb15
      Konrad Rzeszutek Wilk 提交于
      Certain platforms do not allow writes in the MSI-X BARs to setup or tear
      down vector values.  To combat against the generic code trying to write to
      that and either silently being ignored or crashing due to the pagetables
      being marked R/O this patch introduces a platform override.
      
      Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
      and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
      and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
      code.
      
      For Xen, which does not allow the guest to write to MSI-X tables - as the
      hypervisor is solely responsible for setting the vector values - we
      implement two nops.
      
      This fixes a Xen guest crash when passing a PCI device with MSI-X to the
      guest.  See the bugzilla for more details.
      
      [bhelgaas: add bugzilla info]
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      0e4ccb15
    • O
      uprobes: Introduce arch_uprobe->ixol · 8a8de66c
      Oleg Nesterov 提交于
      Currently xol_get_insn_slot() assumes that we should simply copy
      arch_uprobe->insn[] which is (ignoring arch_uprobe_analyze_insn)
      just the copy of the original insn.
      
      This is not true for arm which needs to create another insn to
      execute it out-of-line.
      
      So this patch simply adds the new member, ->ixol into the union.
      This doesn't make any difference for x86 and powerpc, but arm
      can divorce insn/ixol and initialize the correct xol insn in
      arch_uprobe_analyze_insn().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      8a8de66c
    • D
      uprobes: Move function declarations out of arch · 3820b4d2
      David A. Long 提交于
      Move the function declarations from the arch headers to the common
      header, since only the function bodies are architecture-specific.
      These changes are from Vincent Rabin's uprobes patch.
      
      [ oleg: update arch/powerpc/include/asm/uprobes.h ]
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NDavid A. Long <dave.long@linaro.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      3820b4d2
  13. 06 11月, 2013 3 次提交
  14. 31 10月, 2013 6 次提交
    • G
      percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3
      Greg Thelen 提交于
      this_cpu_sub() is implemented as negation and addition.
      
      This patch casts the adjustment to the counter type before negation to
      sign extend the adjustment.  This helps in cases where the counter type
      is wider than an unsigned adjustment.  An alternative to this patch is
      to declare such operations unsupported, but it seemed useful to avoid
      surprises.
      
      This patch specifically helps the following example:
        unsigned int delta = 1
        preempt_disable()
        this_cpu_write(long_counter, 0)
        this_cpu_sub(long_counter, delta)
        preempt_enable()
      
      Before this change long_counter on a 64 bit machine ends with value
      0xffffffff, rather than 0xffffffffffffffff.  This is because
      this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
      which is basically:
        long_counter = 0 + 0xffffffff
      
      Also apply the same cast to:
        __this_cpu_sub()
        __this_cpu_sub_return()
        this_cpu_sub_return()
      
      All percpu_test.ko passes, especially the following cases which
      previously failed:
      
        l -= ui_one;
        __this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
      
        l -= ui_one;
        this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
        CHECK(l, long_counter, 0xffffffffffffffff);
      
        ul -= ui_one;
        __this_cpu_sub(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, -1);
        CHECK(ul, ulong_counter, 0xffffffffffffffff);
      
        ul = this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 2);
      
        ul = __this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 1);
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd09d9a3
    • A
      kvm: Create non-coherent DMA registeration · e0f0bbc5
      Alex Williamson 提交于
      We currently use some ad-hoc arch variables tied to legacy KVM device
      assignment to manage emulation of instructions that depend on whether
      non-coherent DMA is present.  Create an interface for this, adapting
      legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
      For now we assume that non-coherent DMA is possible any time we have a
      VFIO group.  Eventually an interface can be developed as part of the
      VFIO external user interface to query the coherency of a group.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0f0bbc5
    • A
      kvm/x86: Convert iommu_flags to iommu_noncoherent · d96eb2c6
      Alex Williamson 提交于
      Default to operating in coherent mode.  This simplifies the logic when
      we switch to a model of registering and unregistering noncoherent I/O
      with KVM.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d96eb2c6
    • B
      kvm, emulator: Rename VendorSpecific flag · b51e974f
      Borislav Petkov 提交于
      Call it EmulateOnUD which is exactly what we're trying to do with
      vendor-specific instructions.
      
      Rename ->only_vendor_specific_insn to something shorter, while at it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b51e974f
    • B
      kvm, emulator: Use opcode length · 1ce19dc1
      Borislav Petkov 提交于
      Add a field to the current emulation context which contains the
      instruction opcode length. This will streamline handling of opcodes of
      different length.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ce19dc1
    • B
      kvm: Add KVM_GET_EMULATED_CPUID · 9c15bb1d
      Borislav Petkov 提交于
      Add a kvm ioctl which states which system functionality kvm emulates.
      The format used is that of CPUID and we return the corresponding CPUID
      bits set for which we do emulate functionality.
      
      Make sure ->padding is being passed on clean from userspace so that we
      can use it for something in the future, after the ioctl gets cast in
      stone.
      
      s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
      it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c15bb1d
  15. 29 10月, 2013 1 次提交
  16. 27 10月, 2013 1 次提交
  17. 26 10月, 2013 2 次提交
    • J
      x86: Unify copy_to_user() and add size checking to it · 7a3d9b0f
      Jan Beulich 提交于
      Similarly to copy_from_user(), where the range check is to
      protect against kernel memory corruption, copy_to_user() can
      benefit from such checking too: Here it protects against kernel
      information leaks.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: <arjan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/5265059502000078000FC4F6@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      7a3d9b0f
    • J
      x86: Unify copy_from_user() size checking · 3df7b41a
      Jan Beulich 提交于
      Commits 4a312769 ("x86: Turn the
      copy_from_user check into an (optional) compile time warning")
      and 63312b6a ("x86: Add a
      Kconfig option to turn the copy_from_user warnings into errors")
      touched only the 32-bit variant of copy_from_user(), whereas the
      original commit 9f0cf4ad ("x86:
      Use __builtin_object_size() to validate the buffer size for
      copy_from_user()") also added the same code to the 64-bit one.
      
      Further the earlier conversion from an inline WARN() to the call
      to copy_from_user_overflow() went a little too far: When the
      number of bytes to be copied is not a constant (e.g. [looking at
      3.11] in drivers/net/tun.c:__tun_chr_ioctl() or
      drivers/pci/pcie/aer/aer_inject.c:aer_inject_write()), the
      compiler will always have to keep the funtion call, and hence
      there will always be a warning. By using __builtin_constant_p()
      we can avoid this.
      
      And then this slightly extends the effect of
      CONFIG_DEBUG_STRICT_USER_COPY_CHECKS in that apart from
      converting warnings to errors in the constant size case, it
      retains the (possibly wrong) warnings in the non-constant size
      case, such that if someone is prepared to get a few false
      positives, (s)he'll be able to recover the current behavior
      (except that these diagnostics now will never be converted to
      errors).
      
      Since the 32-bit variant (intentionally) didn't call
      might_fault(), the unification results in this being called
      twice now. Adding a suitable #ifdef would be the alternative if
      that's a problem.
      
      I'd like to point out though that with
      __compiletime_object_size() being restricted to gcc before 4.6,
      the whole construct is going to become more and more pointless
      going forward. I would question however that commit
      2fb0815c ("gcc4: disable
      __compiletime_object_size for GCC 4.6+") was really necessary,
      and instead this should have been dealt with as is done here
      from the beginning.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5265056D02000078000FC4F3@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3df7b41a
  18. 25 10月, 2013 1 次提交
  19. 24 10月, 2013 1 次提交
  20. 18 10月, 2013 4 次提交