1. 25 7月, 2017 1 次提交
  2. 24 8月, 2016 2 次提交
  3. 13 7月, 2016 1 次提交
  4. 13 6月, 2016 2 次提交
    • M
      s390/mm: simplify the TLB flushing code · 64f31d58
      Martin Schwidefsky 提交于
      ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to
      decide between a lazy TLB flush vs an immediate TLB flush. The field
      contains two 16-bit counters, the number of CPUs that have the mm
      attached and can create TLB entries for it and the number of CPUs in
      the middle of a page table update.
      
      The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions
      use the attach counter and a mask check with mm_cpumask(mm) to decide
      between a local flush local of the current CPU and a global flush.
      
      For all these functions the decision between lazy vs immediate and
      local vs global TLB flush can be based on CPU masks. There are two
      masks:  the mm->context.cpu_attach_mask with the CPUs that are actively
      using the mm, and the mm_cpumask(mm) with the CPUs that have used the
      mm since the last full flush. The decision between lazy vs immediate
      flush is based on the mm->context.cpu_attach_mask, to decide between
      local vs global flush the mm_cpumask(mm) is used.
      
      With this patch all checks will use the CPU masks, the old counter
      mm->context.attach_count with its two 16-bit values is turned into a
      single counter mm->context.flush_count that keeps track of the number
      of CPUs with incomplete page table updates. The sole user of this
      counter is finish_arch_post_lock_switch() which waits for the end of
      all page table updates.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      64f31d58
    • H
      s390/pgtable: introduce and use generic csp inline asm · 4ccccc52
      Heiko Carstens 提交于
      We have already two inline assemblies which make use of the csp
      instruction. Since I need a third instance let's introduce a generic
      inline assmebly which can be used by everyone.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4ccccc52
  5. 21 4月, 2016 1 次提交
    • G
      s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd
      Gerald Schaefer 提交于
      There is a race with multi-threaded applications between context switch and
      pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
      mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
      pagetable upgrade on another thread in crst_table_upgrade() could already
      have set new asce_bits, but not yet the new mm->pgd. This would result in a
      corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
      translation exception.
      
      Fix this by storing the complete asce instead of just the asce_bits, which
      can then be read atomically from switch_mm(), so that it either sees the
      old value or the new value, but no mixture. Both cases are OK. Having the
      old value would result in a page fault on access to the higher level memory,
      but the fault handler would see the new mm->pgd, if it was a valid access
      after the mmap on the other thread has completed. So as worst-case scenario
      we would have a page fault loop for the racing thread until the next time
      slice.
      
      Also remove dead code and simplify the upgrade/downgrade path, there are no
      upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
      There are also no concurrent upgrades, because the mmap_sem is held with
      down_write() in do_mmap, so the flush and table checks during upgrade can
      be removed.
      Reported-by: NMichael Munday <munday@ca.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      723cacbd
  6. 25 3月, 2015 1 次提交
    • H
      s390: remove 31 bit support · 5a79859a
      Heiko Carstens 提交于
      Remove the 31 bit support in order to reduce maintenance cost and
      effectively remove dead code. Since a couple of years there is no
      distribution left that comes with a 31 bit kernel.
      
      The 31 bit kernel also has been broken since more than a year before
      anybody noticed. In addition I added a removal warning to the kernel
      shown at ipl for 5 minutes: a960062e ("s390: add 31 bit warning
      message") which let everybody know about the plan to remove 31 bit
      code. We didn't get any response.
      
      Given that the last 31 bit only machine was introduced in 1999 let's
      remove the code.
      Anybody with 31 bit user space code can still use the compat mode.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      5a79859a
  7. 03 4月, 2014 1 次提交
    • M
      s390/mm,tlb: optimize TLB flushing for zEC12 · 1b948d6c
      Martin Schwidefsky 提交于
      The zEC12 machines introduced the local-clearing control for the IDTE
      and IPTE instruction. If the control is set only the TLB of the local
      CPU is cleared of entries, either all entries of a single address space
      for IDTE, or the entry for a single page-table entry for IPTE.
      Without the local-clearing control the TLB flush is broadcasted to all
      CPUs in the configuration, which is expensive.
      
      The reset of the bit mask of the CPUs that need flushing after a
      non-local IDTE is tricky. As TLB entries for an address space remain
      in the TLB even if the address space is detached a new bit field is
      required to keep track of attached CPUs vs. CPUs in the need of a
      flush. After a non-local flush with IDTE the bit-field of attached CPUs
      is copied to the bit-field of CPUs in need of a flush. The ordering
      of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is
      such that an underindication in mm_cpumask(mm) is prevented but an
      overindication in mm_cpumask(mm) is possible.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1b948d6c
  8. 22 8月, 2013 1 次提交
  9. 05 3月, 2013 1 次提交
    • H
      s390/mm: fix flush_tlb_kernel_range() · f6a70a07
      Heiko Carstens 提交于
      Our flush_tlb_kernel_range() implementation calls __tlb_flush_mm() with
      &init_mm as argument. __tlb_flush_mm() however will only flush tlbs
      for the passed in mm if its mm_cpumask is not empty.
      
      For the init_mm however its mm_cpumask has never any bits set. Which in
      turn means that our flush_tlb_kernel_range() implementation doesn't
      work at all.
      
      This can be easily verified with a vmalloc/vfree loop which allocates
      a page, writes to it and then frees the page again. A crash will follow
      almost instantly.
      
      To fix this remove the cpumask_empty() check in __tlb_flush_mm() since
      there shouldn't be too many mms with a zero mm_cpumask, besides the
      init_mm of course.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f6a70a07
  10. 03 9月, 2012 1 次提交
  11. 24 5月, 2012 1 次提交
  12. 30 10月, 2011 1 次提交
  13. 24 7月, 2011 1 次提交
    • M
      [S390] kvm guest address space mapping · e5992f2e
      Martin Schwidefsky 提交于
      Add code that allows KVM to control the virtual memory layout that
      is seen by a guest. The guest address space uses a second page table
      that shares the last level pte-tables with the process page table.
      If a page is unmapped from the process page table it is automatically
      unmapped from the guest page table as well.
      
      The guest address space mapping starts out empty, KVM can map any
      individual 1MB segments from the process virtual memory to any 1MB
      aligned location in the guest virtual memory. If a target segment in
      the process virtual memory does not exist or is unmapped while a
      guest mapping exists the desired target address is stored as an
      invalid segment table entry in the guest page table.
      The population of the guest page table is fault driven.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e5992f2e
  14. 23 5月, 2011 2 次提交
  15. 24 8月, 2010 1 次提交
    • M
      [S390] fix tlb flushing vs. concurrent /proc accesses · 050eef36
      Martin Schwidefsky 提交于
      The tlb flushing code uses the mm_users field of the mm_struct to
      decide if each page table entry needs to be flushed individually with
      IPTE or if a global flush for the mm_struct is sufficient after all page
      table updates have been done. The comment for mm_users says "How many
      users with user space?" but the /proc code increases mm_users after it
      found the process structure by pid without creating a new user process.
      Which makes mm_users useless for the decision between the two tlb
      flusing methods. The current code can be confused to not flush tlb
      entries by a concurrent access to /proc files if e.g. a fork is in
      progres. The solution for this problem is to make the tlb flushing
      logic independent from the mm_users field.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      050eef36
  16. 26 3月, 2009 1 次提交
  17. 02 8月, 2008 1 次提交
  18. 30 4月, 2008 1 次提交
  19. 17 4月, 2008 2 次提交
  20. 10 2月, 2008 1 次提交
  21. 26 1月, 2008 2 次提交
  22. 22 10月, 2007 2 次提交
    • M
      [S390] Cleanup page table definitions. · 3610cce8
      Martin Schwidefsky 提交于
      - De-confuse the defines for the address-space-control-elements
        and the segment/region table entries.
      - Create out of line functions for page table allocation / freeing.
      - Simplify get_shadow_xxx functions.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3610cce8
    • M
      [S390] tlb flush fix. · ba8a9229
      Martin Schwidefsky 提交于
      The current tlb flushing code for page table entries violates the
      s390 architecture in a small detail. The relevant section from the
      principles of operation (SA22-7832-02 page 3-47):
      
         "A valid table entry must not be changed while it is attached
         to any CPU and may be used for translation by that CPU except to
         (1) invalidate the entry by using INVALIDATE PAGE TABLE ENTRY or
         INVALIDATE DAT TABLE ENTRY, (2) alter bits 56-63 of a page-table
         entry, or (3) make a change by means of a COMPARE AND SWAP AND
         PURGE instruction that purges the TLB."
      
      That means if one thread of a multithreaded applciation uses a vma
      while another thread does an unmap on it, the page table entries of
      that vma needs to get removed with IPTE, IDTE or CSP. In some strange
      and rare situations a cpu could check-stop (die) because a entry has
      been pushed out of the TLB that is still needed to complete a
      (milli-coded) instruction. I've never seen it happen with the current
      code on any of the supported machines, so right now this is a
      theoretical problem. But I want to fix it nevertheless, to avoid
      headaches in the futures.
      
      To get this implemented correctly without changing common code the
      primitives ptep_get_and_clear, ptep_get_and_clear_full and
      ptep_set_wrprotect need to use the IPTE instruction to invalidate the
      pte before the new pte value gets stored. If IPTE is always used for
      the three primitives three important operations will have a performace
      hit: fork, mprotect and exit_mmap. Time for some workarounds:
      
      * 1: ptep_get_and_clear_full is used in unmap_vmas to remove page
      tables entries in a batched tlb gather operation. If the mmu_gather
      context passed to unmap_vmas has been started with full_mm_flush==1
      or if only one cpu is online or if the only user of a mm_struct is the
      current process then the fullmm indication in the mmu_gather context is
      set to one. All TLBs for mm_struct are flushed by the tlb_gather_mmu
      call. No new TLBs can be created while the unmap is in progress. In
      this case ptep_get_and_clear_full clears the ptes with a simple store.
      
      * 2: ptep_get_and_clear is used in change_protection to clear the
      ptes from the page tables before they are reentered with the new
      access flags. At the end of the update flush_tlb_range clears the
      remaining TLBs. In general the ptep_get_and_clear has to issue IPTE
      for each pte and flush_tlb_range is a nop. But if there is only one
      user of the mm_struct then ptep_get_and_clear uses simple stores
      to do the update and flush_tlb_range will flush the TLBs.
      
      * 3: Similar to 2, ptep_set_wrprotect is used in copy_page_range
      for a fork to make all ptes of a cow mapping read-only. At the end of
      of copy_page_range dup_mmap will flush the TLBs with a call to
      flush_tlb_mm.  Check for mm->mm_users and if there is only one user
      avoid using IPTE in ptep_set_wrprotect and let flush_tlb_mm clear the
      TLBs.
      
      Overall for single threaded programs the tlb flush code now performs
      better, for multi threaded programs it is slightly worse. In particular
      exit_mmap() now does a single IDTE for the mm and then just frees every
      page cache reference and every page table page directly without a delay
      over the mmu_gather structure.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ba8a9229
  23. 20 10月, 2007 1 次提交
  24. 06 2月, 2007 1 次提交
    • G
      [S390] noexec protection · c1821c2e
      Gerald Schaefer 提交于
      This provides a noexec protection on s390 hardware. Our hardware does
      not have any bits left in the pte for a hw noexec bit, so this is a
      different approach using shadow page tables and a special addressing
      mode that allows separate address spaces for code and data.
      
      As a special feature of our "secondary-space" addressing mode, separate
      page tables can be specified for the translation of data addresses
      (storage operands) and instruction addresses. The shadow page table is
      used for the instruction addresses and the standard page table for the
      data addresses.
      The shadow page table is linked to the standard page table by a pointer
      in page->lru.next of the struct page corresponding to the page that
      contains the standard page table (since page->private is not really
      private with the pte_lock and the page table pages are not in the LRU
      list).
      Depending on the software bits of a pte, it is either inserted into
      both page tables or just into the standard (data) page table. Pages of
      a vma that does not have the VM_EXEC bit set get mapped only in the
      data address space. Any try to execute code on such a page will cause a
      page translation exception. The standard reaction to this is a SIGSEGV
      with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn)
      and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the
      kernel to the signal stack frame. Unfortunately, the signal return
      mechanism cannot be modified to use an SA_RESTORER because the
      exception unwinding code depends on the system call opcode stored
      behind the signal stack frame.
      
      This feature requires that user space is executed in secondary-space
      mode and the kernel in home-space mode, which means that the addressing
      modes need to be switched and that the noexec protection only works
      for user space.
      After switching the addressing modes, we cannot use the mvcp/mvcs
      instructions anymore to copy between kernel and user space. A new
      mvcos instruction has been added to the z9 EC/BC hardware which allows
      to copy between arbitrary address spaces, but on older hardware the
      page tables need to be walked manually.
      Signed-off-by: NGerald Schaefer <geraldsc@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c1821c2e
  25. 28 9月, 2006 1 次提交
    • M
      [S390] Inline assembly cleanup. · 94c12cc7
      Martin Schwidefsky 提交于
      Major cleanup of all s390 inline assemblies. They now have a common
      coding style. Quite a few have been shortened, mainly by using register
      asm variables. Use of the EX_TABLE macro helps  as well. The atomic ops,
      bit ops and locking inlines new use the Q-constraint if a newer gcc
      is used.  That results in slightly better code.
      
      Thanks to Christian Borntraeger for proof reading the changes.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      94c12cc7
  26. 26 4月, 2006 1 次提交
  27. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4