1. 10 5月, 2012 1 次提交
  2. 09 5月, 2012 7 次提交
  3. 08 5月, 2012 2 次提交
    • D
      KVM: PPC: Book3S HV: Fix refcounting of hugepages · de6c0b02
      David Gibson 提交于
      The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
      guest memory pages into host memory so that they can be safely accessed
      from usermode.  It does this used get_user_pages_fast().  When the VPA is
      unregistered, or the VCPUs are cleaned up, these pages are released using
      put_page().
      
      However, the get_user_pages() is invoked on the specific memory are of the
      VPA which could lie within hugepages.  In case the pinned page is huge,
      we explicitly find the head page of the compound page before calling
      put_page() on it.
      
      At least with the latest kernel, this is not correct.  put_page() already
      handles finding the correct head page of a compound, and also deals with
      various counts on the individual tail page which are important for
      transparent huge pages.  We don't support transparent hugepages on Power,
      but even so, bypassing this count maintenance can lead (when the VM ends)
      to a hugepage being released back to the pool with a non-zero mapcount on
      one of the tail pages.  This can then lead to a bad_page() when the page
      is released from the hugepage pool.
      
      This removes the explicit compound_head() call to correct this bug.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      de6c0b02
    • D
      xen/pci: don't use PCI BIOS service for configuration space accesses · 76a8df7b
      David Vrabel 提交于
      The accessing PCI configuration space with the PCI BIOS32 service does
      not work in PV guests.
      
      On systems without MMCONFIG or where the BIOS hasn't marked the
      MMCONFIG region as reserved in the e820 map, the BIOS service is
      probed (even though direct access is preferred) and this hangs.
      
      CC: stable@kernel.org
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v1: Fixed compile error when CONFIG_PCI is not set]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      76a8df7b
  4. 07 5月, 2012 4 次提交
    • K
      xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs · b7e5ffe5
      Konrad Rzeszutek Wilk 提交于
      If I try to do "cat /sys/kernel/debug/kernel_page_tables"
      I end up with:
      
      BUG: unable to handle kernel paging request at ffffc7fffffff000
      IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480
      PGD 0
      Oops: 0000 [#1] SMP
      CPU 0
      .. snip..
      RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000
      RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000
      
      which is due to the fact we are trying to access a PFN that is not
      accessible to us. The reason (at least in this case) was that
      PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the
      hypervisor) to point to a read-only linear map of the MFN->PFN array.
      During our parsing we would get the MFN (a valid one), try to look
      it up in the MFN->PFN tree and find it invalid and return ~0 as PFN.
      Then pte_mfn_to_pfn would happilly feed that in, attach the flags
      and return it back to the caller. 'ptdump_show' bitshifts it and
      gets and invalid value that it tries to dereference.
      
      Instead of doing all of that, we detect the ~0 case and just
      return !_PAGE_PRESENT.
      
      This bug has been in existence .. at least until 2.6.37 (yikes!)
      
      CC: stable@kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b7e5ffe5
    • K
      xen/apic: Return the APIC ID (and version) for CPU 0. · 558daa28
      Konrad Rzeszutek Wilk 提交于
      On x86_64 on AMD machines where the first APIC_ID is not zero, we get:
      
      ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
      BIOS bug: APIC version is 0 for CPU 1/0x10, fixing up to 0x10
      BIOS bug: APIC version mismatch, boot CPU: 0, CPU 1: version 10
      
      which means that when the ACPI processor driver loads and
      tries to parse the _Pxx states it fails to do as, as it
      ends up calling acpi_get_cpuid which does this:
      
      for_each_possible_cpu(i) {
              if (cpu_physical_id(i) == apic_id)
                      return i;
      }
      
      And the bootup CPU, has not been found so it fails and returns -1
      for the first CPU - which then subsequently in the loop that
      "acpi_processor_get_info" does results in returning an error, which
      means that "acpi_processor_add" failing and per_cpu(processor)
      is never set (and is NULL).
      
      That means that when xen-acpi-processor tries to load (much much
      later on) and parse the P-states it gets -ENODEV from
      acpi_processor_register_performance() (which tries to read
      the per_cpu(processor)) and fails to parse the data.
      Reported-by-and-Tested-by: NStefan Bader <stefan.bader@canonical.com>
      Suggested-by: NBoris Ostrovsky <boris.ostrovsky@amd.com>
      [v2: Bit-shift APIC ID by 24 bits]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      558daa28
    • L
      IA32 emulation: Fix build problem for modular ia32 a.out support · febb72a6
      Larry Finger 提交于
      Commit ce7e5d2d ("x86: fix broken TASK_SIZE for ia32_aout") breaks
      kernel builds when "CONFIG_IA32_AOUT=m" with
      
        ERROR: "set_personality_ia32" [arch/x86/ia32/ia32_aout.ko] undefined!
        make[1]: *** [__modpost] Error 1
      
      The entry point needs to be exported.
      Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Acked-by: NAl Viro <viro@zeniv.linux.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      febb72a6
    • A
      x86: fix broken TASK_SIZE for ia32_aout · ce7e5d2d
      Al Viro 提交于
      Setting TIF_IA32 in load_aout_binary() used to be enough; these days
      TASK_SIZE is controlled by TIF_ADDR32 and that one doesn't get set
      there.  Switch to use of set_personality_ia32()...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce7e5d2d
  5. 06 5月, 2012 4 次提交
    • G
      KVM: Do not take reference to mm during async #PF · 62c49cc9
      Gleb Natapov 提交于
      It turned to be totally unneeded. The reason the code was introduced is
      so that KVM can prefault swapped in page, but prefault can fail even
      if mm is pinned since page table can change anyway. KVM handles this
      situation correctly though and does not inject spurious page faults.
      
      Fixes:
       "INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected" warning while
       running LTP inside a KVM guest using the recent -next kernel.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      62c49cc9
    • G
      KVM: ensure async PF event wakes up vcpu from halt · a4fa1635
      Gleb Natapov 提交于
      If vcpu executes hlt instruction while async PF is waiting to be delivered
      vcpu can block and deliver async PF only after another even wakes it
      up. This happens because kvm_check_async_pf_completion() will remove
      completion event from vcpu->async_pf.done before entering kvm_vcpu_block()
      and this will make kvm_arch_vcpu_runnable() return false. The solution
      is to make vcpu runnable when processing completion.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a4fa1635
    • C
      ARM: 7414/1: SMP: prevent use of the console when using idmap_pgd · fde165b2
      Colin Cross 提交于
      Commit 4e8ee7de (ARM: SMP: use
      idmap_pgd for mapping MMU enable during secondary booting)
      switched secondary boot to use idmap_pgd, which is initialized
      during early_initcall, instead of a page table initialized during
      __cpu_up.  This causes idmap_pgd to contain the static mappings
      but be missing all dynamic mappings.
      
      If a console is registered that creates a dynamic mapping, the
      printk in secondary_start_kernel will trigger a data abort on
      the missing mapping before the exception handlers have been
      initialized, leading to a hang.  Initial boot is not affected
      because no consoles have been registered, and resume is usually
      not affected because the offending console is suspended.
      Onlining a cpu with hotplug triggers the problem.
      
      A workaround is to the printk in secondary_start_kernel until
      after the page tables have been switched back to init_mm.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NColin Cross <ccross@android.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      fde165b2
    • J
      TTY: pdc_cons, fix regression in close · 49a5f3cf
      Jiri Slaby 提交于
      The test in pdc_console_tty_close '!tty->count' was always wrong
      because tty->count is decremented after tty->ops->close is called and
      thus can never be zero. Hence the 'then' branch was never executed and
      the timer never deleted.
      
      This did not matter until commit 5dd5bc40 ("TTY: pdc_cons, use
      tty_port").  There we needed to set TTY in tty_port to NULL, but this
      never happened due to the bug above.
      
      So change the test to really trigger at the last close by changing the
      condition to 'tty->count == 1'.
      
      Well, the driver should not touch tty->count at all.  It should use
      tty_port->count and count open count there itself.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Reported-and-tested-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49a5f3cf
  6. 05 5月, 2012 6 次提交
  7. 04 5月, 2012 1 次提交
    • L
      vfs: make word-at-a-time accesses handle a non-existing page · e419b4cc
      Linus Torvalds 提交于
      It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that
      can have holes in the kernel address space: it seems to happen easily
      with Xen, and it looks like the AMD gart64 code will also punch holes
      dynamically.
      
      Actually hitting that case is still very unlikely, so just do the
      access, and take an exception and fix it up for the very unlikely case
      of it being a page-crosser with no next page.
      
      And hey, this abstraction might even help other architectures that have
      other issues with unaligned word accesses than the possible missing next
      page.  IOW, this could do the byte order magic too.
      
      Peter Anvin fixed a thinko in the shifting for the exception case.
      Reported-and-tested-by: NJana Saout <jana@saout.de>
      Cc:  Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e419b4cc
  8. 03 5月, 2012 4 次提交
  9. 01 5月, 2012 2 次提交
  10. 30 4月, 2012 3 次提交
    • G
      powerpc/pseries: Rivet CONFIG_EEH for pSeries platform · e49f7a99
      Gavin Shan 提交于
      Recently, Ryan Wang tried to compile PPC pSeries platform without
      CONFIG_EEH and eventually run into errors. Nishanth Aravamudan
      helped to narrow down the root cause. Actually, the pSeries platform
      depends on CONFIG_EEH heavily and that won't work properly without
      EEH support.
      
      According to Ben's suggestion, the patch make CONFIG_EEH invisible
      and keep it as always selected on pSeries platform.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e49f7a99
    • G
      powerpc/irqdomain: Fix broken NR_IRQ references · 4013369f
      Grant Likely 提交于
      The switch from using irq_map to irq_alloc_desc*() for managing irq
      number allocations introduced new bugs in some of the powerpc
      interrupt code.  Several functions rely on the value of NR_IRQS to
      determine the maximum irq number that could get allocated.  However,
      with sparse_irq and using irq_alloc_desc*() the maximum possible irq
      number is now specified with 'nr_irqs' which may be a number larger
      than NR_IRQS.  This has caused breakage on powermac when
      CONFIG_NR_IRQS is set to 32.
      
      This patch removes most of the direct references to NR_IRQS in the
      powerpc code and replaces them with either a nr_irqs reference or by
      using the common for_each_irq_desc() macro.  The powerpc-specific
      for_each_irq() macro is removed at the same time.
      
      Also, the Cell axon_msi driver is refactored to remove the global
      build assumption on the size of NR_IRQS and instead add a limit to the
      maximum irq number when calling irq_domain_add_nomap().
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4013369f
    • G
      powerpc/8xx: Fix NR_IRQ bugs and refactor 8xx interrupt controller · 8751ed14
      Grant Likely 提交于
      The mpc8xx driver uses a reference to NR_IRQS that is buggy.  It uses
      NR_IRQs for the array size of the ppc_cached_irq_mask bitmap, but
      NR_IRQs could be smaller than the number of hardware irqs that
      ppc_cached_irq_mask tracks.
      
      Also, while fixing that problem, it became apparent that the interrupt
      controller only supports 32 interrupt numbers, but it is written as if
      it supports multiple register banks which is more complicated.
      
      This patch pulls out the buggy reference to NR_IRQs and fixes the size
      of the ppc_cached_irq_mask to match the number of HW irqs.  It also
      drops the now-unnecessary code since ppc_cached_irq_mask is no longer
      an array.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8751ed14
  11. 28 4月, 2012 6 次提交
    • W
      ARM: 7406/1: hotplug: copy the affinity mask when forcefully migrating IRQs · 5e7371de
      Will Deacon 提交于
      When a CPU is hotplugged off, we migrate any IRQs currently affine to it
      away and onto another online CPU by calling the irq_set_affinity
      function of the relevant interrupt controller chip. This function
      returns either IRQ_SET_MASK_OK or IRQ_SET_MASK_OK_NOCOPY, to indicate
      whether irq_data.affinity was updated.
      
      If we are forcefully migrating an interrupt (because the affinity mask
      no longer identifies any online CPUs) then we should update the IRQ
      affinity mask to reflect the new CPU set. Failure to do so can
      potentially leave /proc/irq/n/smp_affinity identifying only offline
      CPUs, which may confuse userspace IRQ balancing daemons.
      
      This patch updates migrate_one_irq to copy the affinity mask when
      the interrupt chip returns IRQ_SET_MASK_OK after forcefully changing the
      affinity of an interrupt.
      
      Cc: stable@vger.kernel.org
      Reported-by: NLeif Lindholm <leif.lindholm@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      5e7371de
    • W
      ARM: 7405/1: kexec: call platform_cpu_kill on the killer rather than the victim · 6fa99b7f
      Will Deacon 提交于
      When performing a kexec on an SMP system, the secondary cores are stopped
      by calling machine_shutdown(), which in turn issues IPIs to offline the
      other CPUs. Unfortunately, this isn't enough to reboot the cores into
      a new kernel (since they are just executing a cpu_relax loop somewhere
      in memory) so we make use of platform_cpu_kill, part of the CPU hotplug
      implementation, to place the cores somewhere safe. This function expects
      to be called on the killing CPU for each core that it takes out.
      
      This patch moves the platform_cpu_kill callback out of the IPI handler
      and into smp_send_stop, therefore ensuring that it executes on the
      killing CPU rather than on the victim, matching what the hotplug code
      requires.
      
      Cc: stable@vger.kernel.org
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      6fa99b7f
    • W
      ARM: 7403/1: tls: remove covert channel via TPIDRURW · 6a1c5312
      Will Deacon 提交于
      TPIDRURW is a user read/write register forming part of the group of
      thread registers in more recent versions of the ARM architecture (~v6+).
      
      Currently, the kernel does not touch this register, which allows tasks
      to communicate covertly by reading and writing to the register without
      context-switching affecting its contents.
      
      This patch clears TPIDRURW when TPIDRURO is updated via the set_tls
      macro, which is called directly from __switch_to. Since the current
      behaviour makes the register useless to userspace as far as thread
      pointers are concerned, simply clearing the register (rather than saving
      and restoring it) will not cause any problems to userspace.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      6a1c5312
    • S
      ARM: 7401/1: mm: Fix section mismatches · 14904927
      Stephen Boyd 提交于
      WARNING: vmlinux.o(.text+0x111b8): Section mismatch in reference
      from the function arm_memory_present() to the function
      .init.text:memory_present()
      The function arm_memory_present() references
      the function __init memory_present().
      This is often because arm_memory_present lacks a __init
      annotation or the annotation of memory_present is wrong.
      
      WARNING: arch/arm/mm/built-in.o(.text+0x1edc): Section mismatch
      in reference from the function alloc_init_pud() to the function
      .init.text:alloc_init_section()
      The function alloc_init_pud() references
      the function __init alloc_init_section().
      This is often because alloc_init_pud lacks a __init
      annotation or the annotation of alloc_init_section is wrong.
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      14904927
    • D
      xen: correctly check for pending events when restoring irq flags · 7eb7ce4d
      David Vrabel 提交于
      In xen_restore_fl_direct(), xen_force_evtchn_callback() was being
      called even if no events were pending.  This resulted in (depending on
      workload) about a 100 times as many xen_version hypercalls as
      necessary.
      
      Fix this by correcting the sense of the conditional jump.
      
      This seems to give a significant performance benefit for some
      workloads.
      
      There is some subtle tricksy "..since the check here is trying to
      check both pending and masked in a single cmpw, but I think this is
      correct. It will call check_events now only when the combined
      mask+pending word is 0x0001 (aka unmasked, pending)." (Ian)
      
      CC: stable@kernel.org
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7eb7ce4d
    • D
      MIPS: Remove get_current_pgd(). · dc5efaa0
      David Daney 提交于
      It is unused in the tree.
      Signed-off-by: NDavid Daney <ddaney@caviumnetworks.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/3557/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      dc5efaa0