1. 28 4月, 2008 1 次提交
    • N
      mm: introduce pte_special pte bit · 7e675137
      Nick Piggin 提交于
      s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
      model (which is more dynamic than most).  Instead, they had proposed to
      implement it with an additional path through vm_normal_page(), using a bit in
      the pte to determine whether or not the page should be refcounted:
      
      vm_normal_page()
      {
      	...
              if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
                      if (vma->vm_flags & VM_MIXEDMAP) {
      #ifdef s390
      			if (!mixedmap_refcount_pte(pte))
      				return NULL;
      #else
                              if (!pfn_valid(pfn))
                                      return NULL;
      #endif
                              goto out;
                      }
      	...
      }
      
      This is fine, however if we are allowed to use a bit in the pte to determine
      refcountedness, we can use that to _completely_ replace all the vma based
      schemes.  So instead of adding more cases to the already complex vma-based
      scheme, we can have a clearly seperate and simple pte-based scheme (and get
      slightly better code generation in the process):
      
      vm_normal_page()
      {
      #ifdef s390
      	if (!mixedmap_refcount_pte(pte))
      		return NULL;
      	return pte_page(pte);
      #else
      	...
      #endif
      }
      
      And finally, we may rather make this concept usable by any architecture rather
      than making it s390 only, so implement a new type of pte state for this.
      Unfortunately the old vma based code must stay, because some architectures may
      not be able to spare pte bits.  This makes vm_normal_page a little bit more
      ugly than we would like, but the 2 cases are clearly seperate.
      
      So introduce a pte_special pte state, and use it in mm/memory.c.  It is
      currently a noop for all architectures, so this doesn't actually result in any
      compiled code changes to mm/memory.o.
      
      BTW:
      I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
      The reason is that, regardless of where vm_normal_page is actually
      implemented, the *abstraction* is still exactly the same. Also, while it
      depends on whether the architecture has pte_special or not, that is the
      only two possible cases, and it really isn't an arch specific function --
      the role of the arch code should be to provide primitive functions and
      accessors with which to build the core code; pte_special does that. We do
      not want architectures to know or care about vm_normal_page itself, and
      we definitely don't want them being able to invent something new there
      out of sight of mm/ code. If we made vm_normal_page an arch function, then
      we have to make vm_insert_mixed (next patch) an arch function too. So I
      don't think moving it to arch code fundamentally improves any abstractions,
      while it does practically make the code more difficult to follow, for both
      mm and arch developers, and easier to misuse.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NCarsten Otte <cotte@de.ibm.com>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e675137
  2. 27 4月, 2008 13 次提交
    • H
      KVM: s390: Improve pgste accesses · c71799c1
      Heiko Carstens 提交于
      There is no need to use interlocked updates when the rcp
      lock is held. Therefore the simple bitops variants can be
      used. This should improve performance.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      c71799c1
    • C
      s390: KVM guest: virtio device support, and kvm hypercalls · e976a2b9
      Christian Borntraeger 提交于
      This patch implements kvm guest kernel support for paravirtualized devices
      and contains two parts:
      o a basic virtio stub using virtio_ring and external interrupts and hypercalls
      o full hypercall implementation in kvm_para.h
      
      Currently we dont have PCI on s390. Making virtio_pci usable for s390 seems
      more complicated that providing an own stub. This virtio stub is similar to
      the lguest one, the memory for the descriptors and the device detection is made
      via additional mapped memory on top of the guest storage. We use an external
      interrupt with extint code 0x2603 for host->guest notification.
      
      The hypercall definition uses the diag instruction for issuing a hypercall. The
      parameters are written in R2-R7, the hypercall number is written in R1. This is
      similar to the system call ABI (svc) which can use R1 for the number and R2-R6
      for the parameters.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      e976a2b9
    • C
      s390: KVM guest: detect when running on kvm · fa587743
      Carsten Otte 提交于
      This patch adds functionality to detect if the kernel runs under the KVM
      hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
      allows drivers to skip device detection if the systems runs non-virtualized.
      We also define a preferred console to avoid having the ttyS0, which is a line
      mode only console.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      fa587743
    • C
      KVM: s390: intercepts for diagnose instructions · e28acfea
      Christian Borntraeger 提交于
      This patch introduces interpretation of some diagnose instruction intercepts.
      Diagnose is our classic architected way of doing a hypercall. This patch
      features the following diagnose codes:
      - vm storage size, that tells the guest about its memory layout
      - time slice end, which is used by the guest to indicate that it waits
        for a lock and thus cannot use up its time slice in a useful way
      - ipl functions, which a guest can use to reset and reboot itself
      
      In order to implement ipl functions, we also introduce an exit reason that
      causes userspace to perform various resets on the virtual machine. All resets
      are described in the principles of operation book, except KVM_S390_RESET_IPL
      which causes a reboot of the machine.
      Acked-by: NMartin Schwidefsky <martin.schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      e28acfea
    • C
      KVM: s390: interprocessor communication via sigp · 5288fbf0
      Christian Borntraeger 提交于
      This patch introduces in-kernel handling of _some_ sigp interprocessor
      signals (similar to ipi).
      kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
      handlers depending on the operation requested:
      - sigp sense tries to retrieve information such as existence or running state
        of the remote cpu
      - sigp emergency sends an external interrupt to the remove cpu
      - sigp stop stops a remove cpu
      - sigp stop store status stops a remote cpu, and stores its entire internal
        state to the cpus lowcore
      - sigp set arch sets the architecture mode of the remote cpu. setting to
        ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
        denied, all others are passed to userland
      - sigp set prefix sets the prefix register of a remote cpu
      
      For implementation of this, the stop intercept indication starts to get reused
      on purpose: a set of action bits defines what to do once a cpu gets stopped:
      ACTION_STOP_ON_STOP  really stops the cpu when a stop intercept is recognized
      ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
                           recognized
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      5288fbf0
    • C
      KVM: s390: intercepts for privileged instructions · 453423dc
      Christian Borntraeger 提交于
      This patch introduces in-kernel handling of some intercepts for privileged
      instructions:
      
      handle_set_prefix()        sets the prefix register of the local cpu
      handle_store_prefix()      stores the content of the prefix register to memory
      handle_store_cpu_address() stores the cpu number of the current cpu to memory
      handle_skey()              just decrements the instruction address and retries
      handle_stsch()             delivers condition code 3 "operation not supported"
      handle_chsc()              same here
      handle_stfl()              stores the facility list which contains the
                                 capabilities of the cpu
      handle_stidp()             stores cpu type/model/revision and such
      handle_stsi()              stores information about the system topology
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      453423dc
    • C
      KVM: s390: interrupt subsystem, cpu timer, waitpsw · ba5c1e9b
      Carsten Otte 提交于
      This patch contains the s390 interrupt subsystem (similar to in kernel apic)
      including timer interrupts (similar to in-kernel-pit) and enabled wait
      (similar to in kernel hlt).
      
      In order to achieve that, this patch also introduces intercept handling
      for instruction intercepts, and it implements load control instructions.
      
      This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
      the vm file descriptors and the vcpu file descriptors. In case this ioctl is
      issued against a vm file descriptor, the interrupt is considered floating.
      Floating interrupts may be delivered to any virtual cpu in the configuration.
      
      The following interrupts are supported:
      SIGP STOP       - interprocessor signal that stops a remote cpu
      SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
                        (stopped) remote cpu
      INT EMERGENCY   - interprocessor interrupt, usually used to signal need_reshed
                        and for smp_call_function() in the guest.
      PROGRAM INT     - exception during program execution such as page fault, illegal
                        instruction and friends
      RESTART         - interprocessor signal that starts a stopped cpu
      INT VIRTIO      - floating interrupt for virtio signalisation
      INT SERVICE     - floating interrupt for signalisations from the system
                        service processor
      
      struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
      an interrupt, also carrys parameter data for interrupts along with the interrupt
      type. Interrupts on s390 usually have a state that represents the current
      operation, or identifies which device has caused the interruption on s390.
      
      kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
      disabled wait (that is, disabled for interrupts), we exit to userspace. In case
      of an enabled wait we set up a timer that equals the cpu clock comparator value
      and sleep on a wait queue.
      
      [christian: change virtio interrupt to 0x2603]
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      ba5c1e9b
    • C
      KVM: s390: sie intercept handling · 8f2abe6a
      Christian Borntraeger 提交于
      This path introduces handling of sie intercepts in three flavors: Intercepts
      are either handled completely in-kernel by kvm_handle_sie_intercept(),
      or passed to userspace with corresponding data in struct kvm_run in case
      kvm_handle_sie_intercept() returns -ENOTSUPP.
      In case of partial execution in kernel with the need of userspace support,
      kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
      -EREMOTE.
      
      The trivial intercept reasons are handled in this patch:
      handle_noop() just does nothing for intercepts that don't require our support
        at all
      handle_stop() is called when a cpu enters stopped state, and it drops out to
        userland after updating our vcpu state
      handle_validity() faults in the cpu lowcore if needed, or passes the request
        to userland
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      8f2abe6a
    • H
      KVM: s390: arch backend for the kvm kernel module · b0c632db
      Heiko Carstens 提交于
      This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
       (aka s390x, mainframe) architecture. It uses the mainframe's virtualization
      instruction SIE to run virtual machines with up to 64 virtual CPUs each.
      This port is only usable on 64bit host kernels, and can only run 64bit guest
      kernels. However, running 31bit applications in guest userspace is possible.
      
      The following source files are introduced by this patch
      arch/s390/kvm/kvm-s390.c    similar to arch/x86/kvm/x86.c, this implements all
                                  arch callbacks for kvm. __vcpu_run calls back into
                                  sie64a to enter the guest machine context
      arch/s390/kvm/sie64a.S      assembler function sie64a, which enters guest
                                  context via SIE, and switches world before and after                            that
      include/asm-s390/kvm_host.h contains all vital data structures needed to run
                                  virtual machines on the mainframe
      include/asm-s390/kvm.h      defines kvm_regs and friends for user access to
                                  guest register content
      arch/s390/kvm/gaccess.h     functions similar to uaccess to access guest memory
      arch/s390/kvm/kvm-s390.h    header file for kvm-s390 internals, extended by
                                  later patches
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      b0c632db
    • C
      s390: KVM preparation: address of the 64bit extint parm in lowcore · 8a88ac61
      Christian Borntraeger 提交于
      The address 0x11b8 is used by z/VM for pfault and diag 250 I/O to
      provide a 64 bit extint parameter. virtio uses the same address, so
      its time to update the lowcore structure.
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      8a88ac61
    • C
      s390: KVM preparation: host memory management changes for s390 kvm · 5b7baf05
      Christian Borntraeger 提交于
      This patch changes the s390 memory management defintions to use the pgste field
      for dirty and reference bit tracking of host and guest code. Usually on s390,
      dirty and referenced are tracked in storage keys, which belong to the physical
      page. This changes with virtualization: The guest and host dirty/reference bits
      are defined to be the logical OR of the values for the mapping and the physical
      page. This patch implements the necessary changes in pgtable.h for s390.
      
      There is a common code change in mm/rmap.c, the call to
      page_test_and_clear_young must be moved. This is a no-op for all
      architecture but s390. page_referenced checks the referenced bits for
      the physiscal page and for all mappings:
      o The physical page is checked with page_test_and_clear_young.
      o The mappings are checked with ptep_test_and_clear_young and friends.
      
      Without pgstes (the current implementation on Linux s390) the physical page
      check is implemented but the mapping callbacks are no-ops because dirty
      and referenced are not tracked in the s390 page tables. The pgstes introduces
      guest and host dirty and reference bits for s390 in the host mapping. These
      mapping must be checked before page_test_and_clear_young resets the reference
      bit.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      5b7baf05
    • C
      s390: KVM preparation: provide hook to enable pgstes in user pagetable · 402b0862
      Carsten Otte 提交于
      The SIE instruction on s390 uses the 2nd half of the page table page to
      virtualize the storage keys of a guest. This patch offers the s390_enable_sie
      function, which reorganizes the page tables of a single-threaded process to
      reserve space in the page table:
      s390_enable_sie makes sure that the process is single threaded and then uses
      dup_mm to create a new mm with reorganized page tables. The old mm is freed
      and the process has now a page status extended field after every page table.
      
      Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
      
      This patch has a small common code hit, namely making dup_mm non-static.
      
      Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
      review feedback. Now we do have the prototype for dup_mm in
      include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
      call task_lock() to prevent race against ptrace modification of mm_users.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      402b0862
    • A
      generic: implement __fls on all 64-bit archs · 56a6b1eb
      Alexander van Heukelum 提交于
      Implement __fls on all 64-bit archs:
      
      alpha has an implementation of fls64.
      	Added __fls(x) = fls64(x) - 1.
      
      ia64 has fls, but not __fls.
      	Added __fls based on code of fls.
      
      mips and powerpc have __ilog2, which is the same as __fls.
      	Added __fls = __ilog2.
      
      parisc, s390, sh and sparc64:
      	Include generic __fls.
      
      x86_64 already has __fls.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56a6b1eb
  3. 17 4月, 2008 15 次提交
  4. 03 4月, 2008 1 次提交
    • C
      kvm: provide kvm.h for all architecture: fixes headers_install · dd135ebb
      Christian Borntraeger 提交于
      Currently include/linux/kvm.h is not considered by make headers_install,
      because Kbuild cannot handle " unifdef-$(CONFIG_FOO) += foo.h.  This problem
      was introduced by
      
      commit fb56dbb3
      Author: Avi Kivity <avi@qumranet.com>
      Date:   Sun Dec 2 10:50:06 2007 +0200
      
          KVM: Export include/linux/kvm.h only if $ARCH actually supports KVM
      
          Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
          includes, not existing.  Rather than add a zillion <asm/kvm.h>s, export kvm.
          only if the arch actually supports it.
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      
      which makes this an 2.6.25 regression.
      
      One way of solving the issue is to enhance Kbuild, but Avi and David conviced
      me, that changing headers_install is not the way to go.  This patch changes
      the definition for linux/kvm.h to unifdef-y.
      
      If  unifdef-y is used for linux/kvm.h "make headers_check" will fail on all
      architectures without asm/kvm.h.  Therefore, this patch also provides
      asm/kvm.h on all architectures.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NAvi Kivity <avi@qumranet.com>
      Cc: Sam Ravnborg <sam@ravnborg.org
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd135ebb
  5. 05 3月, 2008 1 次提交
  6. 19 2月, 2008 1 次提交
  7. 10 2月, 2008 7 次提交
  8. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd