1. 03 10月, 2014 1 次提交
  2. 23 9月, 2014 5 次提交
    • D
      x86: remove the Xen-specific _PAGE_IOMAP PTE flag · f955371c
      David Vrabel 提交于
      The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
      that were used to map I/O regions that are 1:1 in the p2m.  This
      allowed Xen to obtain the correct PFN when converting the MFNs read
      from a PTE back to their PFN.
      
      Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
      returns the correct PFN by using a combination of the m2p and p2m to
      determine if an MFN corresponds to a 1:1 mapping in the the p2m.
      
      Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
      future uses of the PTE flag.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      f955371c
    • D
      x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings · 7f2f8822
      David Vrabel 提交于
      Since mfn_to_pfn() returns the correct PFN for identity mappings (as
      used for MMIO regions), the use of _PAGE_IOMAP is not required in
      pte_mfn_to_pfn().
      
      Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it
      in pte_mfn_to_pfn().
      
      This will allow _PAGE_IOMAP to be removed, making it available for
      future use.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f2f8822
    • D
      x86: skip check for spurious faults for non-present faults · 31668511
      David Vrabel 提交于
      If a fault on a kernel address is due to a non-present page, then it
      cannot be the result of stale TLB entry from a protection change (RO
      to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
      skipped.
      
      See the initial if in spurious_fault() and the tests in
      spurious_fault_check()) for the set of possible error codes checked
      for spurious faults.  These are:
      
               IRUWP
      Before   x00xx && ( 1xxxx || xxx1x )
      After  ( 10001 || 00011 ) && ( 1xxxx || xxx1x )
      
      Thus the new condition is a subset of the previous one, excluding only
      non-present faults (I == 1 and W == 1 are mutually exclusive).
      
      This avoids spurious_fault() oopsing in some cases if the pagetables
      it attempts to walk are not accessible.  This obscures the location of
      the original fault.
      
      This also fixes a crash with Xen PV guests when they access entries in
      the M2P corresponding to device MMIO regions.  The M2P is mapped
      (read-only) by Xen into the kernel address space of the guest and this
      mapping may contains holes for non-RAM regions.  Read faults will
      result in calls to spurious_fault(), but because the page tables for
      the M2P mappings are not accessible by the guest the pagetable walk
      would fault.
      
      This was not normally a problem as MMIO mappings would not normally
      result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
      PTE.  However, removing the _PAGE_IOMAP bit requires M2P lookups for
      MMIO mappings as well.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      31668511
    • D
      xen/efi: Directly include needed headers · 342cd340
      Daniel Kiper 提交于
      I discovered that some needed stuff is defined/declared in headers
      which are not included directly. Currently it works but if somebody
      remove required headers from currently included headers then build
      will break. So, just in case directly include all needed headers.
      Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      342cd340
    • M
      xen/setup: Remap Xen Identity Mapped RAM · 4fbb67e3
      Matt Rushton 提交于
      Instead of ballooning up and down dom0 memory this remaps the existing mfns
      that were replaced by the identity map. The reason for this is that the
      existing implementation ballooned memory up and and down which caused dom0
      to have discontiguous pages. In some cases this resulted in the use of bounce
      buffers which reduced network I/O performance significantly. This change will
      honor the existing order of the pages with the exception of some boundary
      conditions.
      
      To do this we need to update both the Linux p2m table and the Xen m2p table.
      Particular care must be taken when updating the p2m table since it's important
      to limit table memory consumption and reuse the existing leaf pages which get
      freed when an entire leaf page is set to the identity map. To implement this,
      mapping updates are grouped into blocks with table entries getting cached
      temporarily and then released.
      
      On my test system before:
      Total pages: 2105014
      Total contiguous: 1640635
      
      After:
      Total pages: 2105014
      Total contiguous: 2098904
      Signed-off-by: NMatthew Rushton <mrushton@amazon.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      4fbb67e3
  3. 17 9月, 2014 2 次提交
    • M
      MIPS: SmartMIPS: Disable assembler warnings · dab1b445
      Markos Chandras 提交于
      The kernel code overrides the default ISA as passed by the compiler
      in quite a few places. This has unfortunate side effects when smartmips
      is enabled leading to hundreds of warnings during build such as:
      
      {standard input}: Assembler messages:
      {standard input}:411: Warning: the `smartmips' extension requires MIPS32
      revision 1 or greater
      {standard input}: Assembler messages:
      {standard input}:43: Warning: the 64-bit MIPS architecture does not support the
      `smartmips' extension
      [...]
      
      Until the kernel code is fixed properly (if possible), disable all the
      assembler warning messages to make the build logs readable again.
      This has no runtime side effects but it makes it easier to spot
      more critical warnings and problems during build.
      Signed-off-by: NMarkos Chandras <markos.chandras@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/7356/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      dab1b445
    • B
      vgaarb: Don't default exclusively to first video device with mem+io · 86fd887b
      Bruno Prémont 提交于
      Commit 20cde694 ("x86, ia64: Move EFI_FB vga_default_device()
      initialization to pci_vga_fixup()") moved boot video device detection from
      efifb to x86 and ia64 pci/fixup.c.
      
      For dual-GPU Apple computers above change represents a regression as code
      in efifb did forcefully override vga_default_device while the merge did not
      (vgaarb happens prior to PCI fixup).
      
      To improve on initial device selection by vgaarb (it cannot know if PCI
      device not behind bridges see/decode legacy VGA I/O or not), move the
      screen_info based check from pci_video_fixup() to vgaarb's init function and
      use it to refine/override decision taken while adding the individual PCI
      VGA devices.  This way PCI fixup has no reason to adjust vga_default_device
      anymore but can depend on its value for flagging shadowed VBIOS.
      
      This has the nice benefit of removing duplicated code but does introduce a
      #if defined() block in vgaarb.  Not all architectures have screen_info and
      would cause compile to fail without it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=84461Reported-and-Tested-By: NAndreas Noever <andreas.noever@gmail.com>
      Signed-off-by: NBruno Prémont <bonbons@linux-vserver.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: Matthew Garrett <matthew.garrett@nebula.com>
      CC: stable@vger.kernel.org # v3.5+
      86fd887b
  4. 16 9月, 2014 4 次提交
    • S
      ARM: 8149/1: perf: Don't sleep while atomic when enabling per-cpu interrupts · 505013bc
      Stephen Boyd 提交于
      Rob Clark reports a sleeping while atomic bug when using perf.
      
      BUG: sleeping function called from invalid context at ../kernel/locking/mutex.c:583
      in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0
      ------------[ cut here ]------------
      WARNING: CPU: 2 PID: 4828 at ../kernel/locking/mutex.c:479 mutex_lock_nested+0x3a0/0x3e8()
      DEBUG_LOCKS_WARN_ON(in_interrupt())
      Modules linked in:
      CPU: 2 PID: 4828 Comm: Xorg.bin Tainted: G        W      3.17.0-rc3-00234-gd535c45-dirty #819
      [<c0216690>] (unwind_backtrace) from [<c0212174>] (show_stack+0x10/0x14)
      [<c0212174>] (show_stack) from [<c0867cc0>] (dump_stack+0x98/0xb8)
      [<c0867cc0>] (dump_stack) from [<c02492a4>] (warn_slowpath_common+0x70/0x8c)
      [<c02492a4>] (warn_slowpath_common) from [<c02492f0>] (warn_slowpath_fmt+0x30/0x40)
      [<c02492f0>] (warn_slowpath_fmt) from [<c086a3f8>] (mutex_lock_nested+0x3a0/0x3e8)
      [<c086a3f8>] (mutex_lock_nested) from [<c0294d08>] (irq_find_host+0x20/0x9c)
      [<c0294d08>] (irq_find_host) from [<c0769d50>] (of_irq_get+0x28/0x48)
      [<c0769d50>] (of_irq_get) from [<c057d104>] (platform_get_irq+0x1c/0x8c)
      [<c057d104>] (platform_get_irq) from [<c021a06c>] (cpu_pmu_enable_percpu_irq+0x14/0x38)
      [<c021a06c>] (cpu_pmu_enable_percpu_irq) from [<c02b1634>] (flush_smp_call_function_queue+0x88/0x178)
      [<c02b1634>] (flush_smp_call_function_queue) from [<c0214dc0>] (handle_IPI+0x88/0x160)
      [<c0214dc0>] (handle_IPI) from [<c0208930>] (gic_handle_irq+0x64/0x68)
      [<c0208930>] (gic_handle_irq) from [<c0212d04>] (__irq_svc+0x44/0x5c)
      Exception stack(0xe63ddea0 to 0xe63ddee8)
      dea0: 00000001 00000001 00000000 c2f3b200 c16db380 c032d4a0 e63ddf40 60010013
      dec0: 00000000 001fbfd4 00000100 00000000 00000001 e63ddee8 c0284770 c02a2e30
      dee0: 20010013 ffffffff
      [<c0212d04>] (__irq_svc) from [<c02a2e30>] (ktime_get_ts64+0x1c8/0x200)
      [<c02a2e30>] (ktime_get_ts64) from [<c032d4a0>] (poll_select_set_timeout+0x60/0xa8)
      [<c032d4a0>] (poll_select_set_timeout) from [<c032df64>] (SyS_select+0xa8/0x118)
      [<c032df64>] (SyS_select) from [<c020e8e0>] (ret_fast_syscall+0x0/0x48)
      ---[ end trace 0bb583b46342da6f ]---
      INFO: lockdep is turned off.
      
      We don't really need to get the platform irq again when we're
      enabling or disabling the per-cpu irq. Furthermore, we don't
      really need to set and clear bits in the active_irqs bitmask
      because that's only used in the non-percpu irq case to figure out
      when the last CPU PMU has been disabled. Just pass the irq
      directly to the enable/disable functions to clean all this up.
      This should be slightly more efficient and also fix the
      scheduling while atomic bug.
      
      Fixes: bbd64559 "ARM: perf: support percpu irqs for the CPU PMU"
      Reported-by: NRob Clark <robdclark@gmail.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      505013bc
    • N
      ARM: 8148/1: flush TLS and thumbee register state during exec · fbfb872f
      Nathan Lynch 提交于
      The TPIDRURO and TPIDRURW registers need to be flushed during exec;
      otherwise TLS information is potentially leaked.  TPIDRURO in
      particular needs careful treatment.  Since flush_thread basically
      needs the same code used to set the TLS in arm_syscall, pull that into
      a common set_tls helper in tls.h and use it in both places.
      
      Similarly, TEEHBR needs to be cleared during exec as well.  Clearing
      its save slot in thread_info isn't right as there is no guarantee
      that a thread switch will occur before the new program runs.  Just
      setting the register directly is sufficient.
      Signed-off-by: NNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      fbfb872f
    • V
      ARM: 8151/1: add missing exports for asm functions required by get_user macro · 7a0bd497
      Victor Kamensky 提交于
      Previous commits that dealt with get_user for 64bit type missed to
      export proper functions, so if get_user macro with particular target/value
      types are used by kernel module modpost would produce 'undefined!' error.
      Solution is to export all required functions.
      Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      7a0bd497
    • T
      ia64: Fix syscall number for memfd_create · f3b59331
      Tony Luck 提交于
      Cut & paste typo from the line above.
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3b59331
  5. 14 9月, 2014 2 次提交
    • G
      parisc: Implement new LWS CAS supporting 64 bit operations. · 89206491
      Guy Martin 提交于
      The current LWS cas only works correctly for 32bit. The new LWS allows
      for CAS operations of variable size.
      Signed-off-by: NGuy Martin <gmsoft@tuxicoman.be>
      Cc: <stable@vger.kernel.org> # 3.13+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      89206491
    • L
      Make ARCH_HAS_FAST_MULTIPLIER a real config variable · 72d93104
      Linus Torvalds 提交于
      It used to be an ad-hoc hack defined by the x86 version of
      <asm/bitops.h> that enabled a couple of library routines to know whether
      an integer multiply is faster than repeated shifts and additions.
      
      This just makes it use the real Kconfig system instead, and makes x86
      (which was the only architecture that did this) select the option.
      
      NOTE! Even for x86, this really is kind of wrong.  If we cared, we would
      probably not enable this for builds optimized for netburst (P4), where
      shifts-and-adds are generally faster than multiplies.  This patch does
      *not* change that kind of logic, though, it is purely a syntactic change
      with no code changes.
      
      This was triggered by the fact that we have other places that really
      want to know "do I want to expand multiples by constants by hand or
      not", particularly the hash generation code.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72d93104
  6. 13 9月, 2014 2 次提交
  7. 12 9月, 2014 4 次提交
    • S
      xen/arm: remove mach_to_phys rbtree · d50582e0
      Stefano Stabellini 提交于
      Remove the rbtree used to keep track of machine to physical mappings:
      the frontend can grant the same page multiple times, leading to errors
      inserting or removing entries from the mach_to_phys tree.
      
      Linux only needed to know the physical address corresponding to a given
      machine address in swiotlb-xen. Now that swiotlb-xen can call the
      xen_dma_* functions passing the machine address directly, we can remove
      it.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Tested-by: NDenis Schneider <v1ne2go@gmail.com>
      d50582e0
    • S
      xen/arm: reimplement xen_dma_unmap_page & friends · 340720be
      Stefano Stabellini 提交于
      xen_dma_unmap_page, xen_dma_sync_single_for_cpu and
      xen_dma_sync_single_for_device are currently implemented by calling into
      the corresponding generic ARM implementation of these functions. In
      order to do this, firstly the dma_addr_t handle, that on Xen is a
      machine address, needs to be translated into a physical address.  The
      operation is expensive and inaccurate, given that a single machine
      address can correspond to multiple physical addresses in one domain,
      because the same page can be granted multiple times by the frontend.
      
      To avoid this problem, we introduce a Xen specific implementation of
      xen_dma_unmap_page, xen_dma_sync_single_for_cpu and
      xen_dma_sync_single_for_device, that can operate on machine addresses
      directly.
      
      The new implementation relies on the fact that the hypervisor creates a
      second p2m mapping of any grant pages at physical address == machine
      address of the page for dom0. Therefore we can access memory at physical
      address == dma_addr_r handle and perform the cache flushing there. Some
      cache maintenance operations require a virtual address. Instead of using
      ioremap_cache, that is not safe in interrupt context, we allocate a
      per-cpu PAGE_KERNEL scratch page and we manually update the pte for it.
      
      arm64 doesn't need cache maintenance operations on unmap for now.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Tested-by: NDenis Schneider <v1ne2go@gmail.com>
      340720be
    • S
      xen/arm: introduce XENFEAT_grant_map_identity · 5ebc77de
      Stefano Stabellini 提交于
      The flag tells us that the hypervisor maps a grant page to guest
      physical address == machine address of the page in addition to the
      normal grant mapping address. It is needed to properly issue cache
      maintenance operation at the completion of a DMA operation involving a
      foreign grant.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Tested-by: NDenis Schneider <v1ne2go@gmail.com>
      5ebc77de
    • W
      arm64: flush TLS registers during exec · eb35bdd7
      Will Deacon 提交于
      Nathan reports that we leak TLS information from the parent context
      during an exec, as we don't clear the TLS registers when flushing the
      thread state.
      
      This patch updates the flushing code so that we:
      
        (1) Unconditionally zero the tpidr_el0 register (since this is fully
            context switched for native tasks and zeroed for compat tasks)
      
        (2) Zero the tp_value state in thread_info before clearing the
            tpidrr0_el0 register for compat tasks (since this is only writable
            by the set_tls compat syscall and therefore not fully switched).
      
      A missing compiler barrier is also added to the compat set_tls syscall.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: NNathan Lynch <Nathan_Lynch@mentor.com>
      Reported-by: NNathan Lynch <Nathan_Lynch@mentor.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      eb35bdd7
  8. 11 9月, 2014 1 次提交
  9. 10 9月, 2014 1 次提交
    • S
      x86/xen: don't copy bogus duplicate entries into kernel page tables · 0b5a5063
      Stefan Bader 提交于
      When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
      modules exceeds 512 MiB, then loading modules fails with a warning
      (and hence a vmalloc allocation failure) because the PTEs for the
      newly-allocated vmalloc address space are not zero.
      
        WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
                 vmap_page_range_noflush+0x2a1/0x360()
      
      This is caused by xen_setup_kernel_pagetables() copying
      level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present
      entries.
      
      Without KASLR, the normal kernel image size only covers the first half
      of level2_kernel_pgt and module space starts after that.
      
      L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[  0..255]->kernel
                                                        [256..511]->module
                                [511]->level2_fixmap_pgt[  0..505]->module
      
      This allows 512 MiB of of module vmalloc space to be used before
      having to use the corrupted level2_fixmap_pgt entries.
      
      With KASLR enabled, the kernel image uses the full PUD range of 1G and
      module space starts in the level2_fixmap_pgt. So basically:
      
      L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
                                [511]->level2_fixmap_pgt[0..505]->module
      
      And now no module vmalloc space can be used without using the corrupt
      level2_fixmap_pgt entries.
      
      Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
      and setting level1_fixmap_pgt as read-only.
      
      A number of comments were also using the the wrong L3 offset for
      level2_kernel_pgt.  These have been corrected.
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: stable@vger.kernel.org
      0b5a5063
  10. 09 9月, 2014 15 次提交
  11. 05 9月, 2014 3 次提交