1. 30 5月, 2012 3 次提交
    • B
      KVM: PPC: booke: Added DECAR support · 21bd000a
      Bharat Bhushan 提交于
      Added the decrementer auto-reload support. DECAR is readable
      on e500v2/e500mc and later cpus.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      21bd000a
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
    • L
      KVM: PPC: Factor out guest epapr initialization · 2e1ae9c0
      Liu Yu-B13201 提交于
      epapr paravirtualization support is now a Kconfig
      selectable option
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      [stuart.yoder@freescale.com: misc minor fixes, description update]
      Signed-off-by: NStuart Yoder <stuart.yoder@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2e1ae9c0
  2. 28 5月, 2012 2 次提交
  3. 27 5月, 2012 4 次提交
    • D
      sparc: use the new generic strnlen_user() function · 2c66f623
      David Miller 提交于
      This throws away the sparc-specific functions in favor of the generic
      optimized version.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c66f623
    • L
      x86: use the new generic strnlen_user() function · 5723aa99
      Linus Torvalds 提交于
      This throws away the old x86-specific functions in favor of the generic
      optimized version.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5723aa99
    • L
      word-at-a-time: make the interfaces truly generic · 36126f8f
      Linus Torvalds 提交于
      This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
      complicated, but a lot more generic.
      
      In particular, it allows us to really do the operations efficiently on
      both little-endian and big-endian machines, pretty much regardless of
      machine details.  For example, if you can rely on a fast population
      count instruction on your architecture, this will allow you to make your
      optimized <asm/word-at-a-time.h> file with that.
      
      NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
      not truly generic, it actually only works on big-endian.  Why? Because
      on little-endian the generic algorithms are wasteful, since you can
      inevitably do better. The x86 implementation is an example of that.
      
      (The only truly non-generic part of the asm-generic implementation is
      the "find_zero()" function, and you could make a little-endian version
      of it.  And if the Kbuild infrastructure allowed us to pick a particular
      header file, that would be lovely)
      
      The <asm/word-at-a-time.h> functions are as follows:
      
       - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
         uses.
      
       - has_zero(): take a word, and determine if it has a zero byte in it.
         It gets the word, the pointer to the constant pool, and a pointer to
         an intermediate "data" field it can set.
      
         This is the "quick-and-dirty" zero tester: it's what is run inside
         the hot loops.
      
       - "prep_zero_mask()": take the word, the data that has_zero() produced,
         and the constant pool, and generate an *exact* mask of which byte had
         the first zero.  This is run directly *outside* the loop, and allows
         the "has_zero()" function to answer the "is there a zero byte"
         question without necessarily getting exactly *which* byte is the
         first one to contain a zero.
      
         If you do multiple byte lookups concurrently (eg "hash_name()", which
         looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
         phase, the result of those can be or'ed together to get the "either
         or" case.
      
       - The result from "prep_zero_mask()" can then be fed into "find_zero()"
         (to find the byte offset of the first byte that was zero) or into
         "zero_bytemask()" (to find the bytemask of the bytes preceding the
         zero byte).
      
         The existence of zero_bytemask() is optional, and is not necessary
         for the normal string routines.  But dentry name hashing needs it, so
         if you enable DENTRY_WORD_AT_A_TIME you need to expose it.
      
      This changes the generic strncpy_from_user() function and the dentry
      hashing functions to use these modified word-at-a-time interfaces.  This
      gets us back to the optimized state of the x86 strncpy that we lost in
      the previous commit when moving over to the generic version.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36126f8f
    • L
      x86: use generic strncpy_from_user routine · 4ae73f2d
      Linus Torvalds 提交于
      The generic strncpy_from_user() is not really optimal, since it is
      designed to work on both little-endian and big-endian.  And on
      little-endian you can simplify much of the logic to find the first zero
      byte, since little-endian arithmetic doesn't have to worry about the
      carry bit propagating into earlier bytes (only later bytes, which we
      don't care about).
      
      But I have patches to make the generic routines use the architecture-
      specific <asm/word-at-a-time.h> infrastructure, so that we can regain
      the little-endian optimizations.  But before we do that, switch over to
      the generic routines to make the patches each do just one well-defined
      thing.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ae73f2d
  4. 26 5月, 2012 15 次提交
    • C
      tile: default to tilegx_defconfig for ARCH=tile · 1fcb78e9
      Chris Metcalf 提交于
      There is no "ARCH=tile" (just like there is no "ARCH=x86") so we need
      to pick a default configuration, either tilepro or tilegx, when users
      specify ARCH=tile.  We'll use tilegx, since that's our current chip.
      Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      1fcb78e9
    • C
      tile: fix bug where fls(0) was not returning 0 · 9f1d62be
      Chris Metcalf 提交于
      This is because __builtin_clz(0) returns 64 for the "undefined" case
      of 0, since the builtin just does a right-shift 32 and "clz" instruction.
      So, use the alpha approach of casting to u32 and using __builtin_clzll().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      9f1d62be
    • C
      arch/tile: mark TILEGX as not EXPERIMENTAL · acd1a19e
      Chris Metcalf 提交于
      Also create a TILEPRO config setting to use for #ifdefs where it
      is cleaner to do so, and make the 64BIT setting depend directly
      on the setting of TILEGX.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      acd1a19e
    • K
      tile/mm/fault.c: Port OOM changes to handle_page_fault · 4ce6bea2
      Kautuk Consul 提交于
      Commit d065bd81
      (mm: retry page fault when blocking on disk transfer) and
      commit 37b23e05
      (x86,mm: make pagefault killable)
      
      The above commits introduced changes into the x86 pagefault handler
      for making the page fault handler retryable as well as killable.
      
      These changes reduce the mmap_sem hold time, which is crucial
      during OOM killer invocation.
      
      Port these changes to tile.
      Signed-off-by: NKautuk Consul <consul.kautuk@gmail.com>
      [cmetcalf@tilera.com: initialize "flags" after "write" updated.]
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      4ce6bea2
    • C
      arch/tile: add descriptive text if the kernel reports a bad trap · c6f696f6
      Chris Metcalf 提交于
      If the kernel unexpectedly takes a bad trap, it's convenient to
      have it report the type of trap as part of the error.  This gives
      customers a bit more context before they call up customer support.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      c6f696f6
    • C
      arch/tile: allow querying cpu module information from the hypervisor · 8703d6e0
      Chris Metcalf 提交于
      This just adds a few more attributes to the information Linux
      can query from the hypervisor for the /sys/hypervisor/board/ directory,
      providing part, serial#, revision#, and description for cpu modules
      (as opposed to the board itself, or any mezzanine boards).
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      8703d6e0
    • C
      arch/tile: fix hardwall for tilegx and generalize for idn and ipi · b8ace083
      Chris Metcalf 提交于
      The hardwall drain code was not properly implemented for tilegx,
      just tilepro, so you couldn't reliably restart an application that
      made use of the udn.
      
      In addition, the code was only applicable to the udn (user dynamic
      network).  On tilegx there is a second user network that is available
      (the "idn"), and there is support for having I/O shims deliver
      user-level interrupts to applications ("ipi") which functions in a
      very similar way to the inter-core permissions used for udn/idn.
      So this change also generalizes the code from supporting just the udn
      to supports udn/idn/ipi on tilegx.
      
      By default we now use /dev/hardwall/{udn,idn,ipi} with separate
      minor numbers for the three devices.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      b8ace083
    • C
      arch/tile: support multiple huge page sizes dynamically · 621b1955
      Chris Metcalf 提交于
      This change adds support for a new "super" bit in the PTE, using the new
      arch_make_huge_pte() method.  The Tilera hypervisor sees the bit set at a
      given level of the page table and gangs together 4, 16, or 64 consecutive
      pages from that level of the hierarchy to create a larger TLB entry.
      
      One extra "super" page size can be specified at each of the three levels
      of the page table hierarchy on tilegx, using the "hugepagesz" argument
      on the boot command line.  A new hypervisor API is added to allow Linux
      to tell the hypervisor how many PTEs to gang together at each level of
      the page table.
      
      To allow pre-allocating huge pages larger than the buddy allocator can
      handle, this change modifies the Tilera bootmem support to put all of
      memory on tilegx platforms into bootmem.
      
      As part of this change I eliminate the vestigial CONFIG_HIGHPTE support,
      which never worked anyway, and eliminate the hv_page_size() API in favor
      of the standard vma_kernel_pagesize() API.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      621b1955
    • C
      arch/tile: support kexec() for tilegx · fc0c49f5
      Chris Metcalf 提交于
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      fc0c49f5
    • C
      arch/tile: support <asm/cachectl.h> header for cacheflush() syscall · cd6f32aa
      Chris Metcalf 提交于
      We already had a syscall that did some dcache flushing, but it was
      not used in practice.  Make it MIPS compatible instead so it can
      do both the DCACHE and ICACHE actions.  We have code that wants to
      be able to use the ICACHE flush mode from userspace so this change
      enables that.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      cd6f32aa
    • C
      arch/tile: Allow tilegx to build with either 16K or 64K page size · d5d14ed6
      Chris Metcalf 提交于
      This change introduces new flags for the hv_install_context()
      API that passes a page table pointer to the hypervisor.  Clients
      can explicitly request 4K, 16K, or 64K small pages when they
      install a new context.  In practice, the page size is fixed at
      kernel compile time and the same size is always requested every
      time a new page table is installed.
      
      The <hv/hypervisor.h> header changes so that it provides more abstract
      macros for managing "page" things like PFNs and page tables.  For
      example there is now a HV_DEFAULT_PAGE_SIZE_SMALL instead of the old
      HV_PAGE_SIZE_SMALL.  The various PFN routines have been eliminated and
      only PA- or PTFN-based ones remain (since PTFNs are always expressed
      in fixed 2KB "page" size).  The page-table management macros are
      renamed with a leading underscore and take page-size arguments with
      the presumption that clients will use those macros in some single
      place to provide the "real" macros they will use themselves.
      
      I happened to notice the old hv_set_caching() API was totally broken
      (it assumed 4KB pages) so I changed it so it would nominally work
      correctly with other page sizes.
      
      Tag modules with the page size so you can't load a module built with
      a conflicting page size.  (And add a test for SMP while we're at it.)
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      d5d14ed6
    • C
      arch/tile: optimize get_user/put_user and friends · 47d632f9
      Chris Metcalf 提交于
      Use direct load/store for the get_user/put_user.
      
      Previously, we would call out to a helper routine that would do the
      appropriate thing and then return, handling the possible exception
      internally.  Now we inline the load or store, along with a "we succeeded"
      indication in a register; if the load or store faults, we write a
      "we failed" indication into the same register and then return to the
      following instruction.  This is more efficient and gives us more compact
      code, as well as being more in line with what other architectures do.
      
      The special futex assembly source file for TILE-Gx also disappears in
      this change; we just use the same inlining idiom there as well, putting
      the appropriate atomic operations directly into futex_atomic_op_inuser()
      (and thus into the FUTEX_WAIT function).
      
      The underlying atomic copy_from_user, copy_to_user functions were
      renamed using the (cryptic) x86 convention as copy_from_user_ll and
      copy_to_user_ll.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      47d632f9
    • C
      arch/tile: support building big-endian kernel · 1efea40d
      Chris Metcalf 提交于
      The toolchain supports big-endian mode now, so add support for building
      the kernel to run big-endian as well.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      1efea40d
    • C
      arch/tile: allow building Linux with transparent huge pages enabled · 73636b1a
      Chris Metcalf 提交于
      The change adds some infrastructure for managing tile pmd's more generally,
      using pte_pmd() and pmd_pte() methods to translate pmd values to and
      from ptes, since on TILEPro a pmd is really just a nested structure
      holding a pgd (aka pte).  Several existing pmd methods are moved into
      this framework, and a whole raft of additional pmd accessors are defined
      that are used by the transparent hugepage framework.
      
      The tile PTE now has a "client2" bit.  The bit is used to indicate a
      transparent huge page is in the process of being split into subpages.
      
      This change also fixes a generic bug where the return value of the
      generic pmdp_splitting_flush() was incorrect.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      73636b1a
    • C
      arch/tile: use interrupt critical sections less · 51007004
      Chris Metcalf 提交于
      In general we want to avoid ever touching memory while within an
      interrupt critical section, since the page fault path goes through
      a different path from the hypervisor when in an interrupt critical
      section, and we carefully decided with tilegx that we didn't need
      to support this path in the kernel.  (On tilepro we did implement
      that path as part of supporting atomic instructions in software.)
      
      In practice we always need to touch the kernel stack, since that's
      where we store the interrupt state before releasing the critical
      section, but this change cleans up a few things.  The IRQ_ENABLE
      macro is split up so that when we want to enable interrupts in a
      deferred way (e.g. for cpu_idle or for interrupt return) we can
      read the per-cpu enable mask before entering the critical section.
      The cache-migration code is changed to use interrupt masking instead
      of interrupt critical sections.  And, the interrupt-entry code is
      changed so that we defer loading "tp" from per-cpu data until after
      we have released the interrupt critical section.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      51007004
  5. 25 5月, 2012 8 次提交
  6. 24 5月, 2012 6 次提交
  7. 23 5月, 2012 2 次提交