1. 16 5月, 2012 2 次提交
    • M
      s390: replace TIF_SIE with PF_VCPU · 5e8010cb
      Martin Schwidefsky 提交于
      Replace the check for TIF_SIE in the fault handler by a check for PF_VCPU.
      With the last user of TIF_SIE gone we can now remove the bit.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      5e8010cb
    • M
      s390: allow absolute memory access for /dev/mem · b2a68c23
      Michael Holzheu 提交于
      Currently dev/mem for s390 provides only real memory access. This means
      that the CPU prefix pages are swapped. The prefix swap for real memory
      works as follows:
      
      Each CPU owns a prefix register that points to a page aligned memory
      location "P". If this CPU accesses the address range [0,0x1fff], it is
      translated by the hardware to [P,P+0x1fff]. Accordingly if this CPU
      accesses the address range [P,P+0x1fff], it is translated by the hardware
      to [0,0x1fff].  Therefore, if [P,P+0x1fff] or [0,0x1fff] is read from
      the current /dev/mem device, the incorrectly swapped memory content is
      returned.
      
      With this patch the /dev/mem architecture code is modified to provide
      absolute memory access. This is done via the arch specific functions
      xlate_dev_mem_ptr() and unxlate_dev_mem_ptr(). For swapped pages on
      s390 the function xlate_dev_mem_ptr() now returns a new buffer with a
      copy of the requested absolute memory. In case the buffer was allocated,
      the unxlate_dev_mem_ptr() function frees it after /dev/mem code has
      called copy_to_user().
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b2a68c23
  2. 11 4月, 2012 2 次提交
    • M
      [S390] fix tlb flushing for page table pages · cd94154c
      Martin Schwidefsky 提交于
      Git commit 36409f63 "use generic RCU
      page-table freeing code" introduced a tlb flushing bug. Partially revert
      the above git commit and go back to s390 specific page table flush code.
      
      For s390 the TLB can contain three types of entries, "normal" TLB
      page-table entries, TLB combined region-and-segment-table (CRST) entries
      and real-space entries. Linux does not use real-space entries which
      leaves normal TLB entries and CRST entries. The CRST entries are
      intermediate steps in the page-table translation called translation paths.
      For example a 4K page access in a three-level page table setup will
      create two CRST TLB entries and one page-table TLB entry. The advantage
      of that approach is that a page access next to the previous one can reuse
      the CRST entries and needs just a single read from memory to create the
      page-table TLB entry. The disadvantage is that the TLB flushing rules are
      more complicated, before any page-table may be freed the TLB needs to be
      flushed.
      
      In short: the generic RCU page-table freeing code is incorrect for the
      CRST entries, in particular the check for mm_users < 2 is troublesome.
      
      This is applicable to 3.0+ kernels.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      cd94154c
    • M
      [S390] kernel: Use local_irq_save() for memcpy_real() · b785e0d0
      Michael Holzheu 提交于
      Currently in the memcpy_real() function interrupts are disabled with
      __arch_local_irq_stnsm(). In order to notify lockdep that interrupts
      are disabled, with this patch local_irq_save() is used instead. The
      function __arch_local_irq_stnsm() is still used for switching to
      real mode.
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b785e0d0
  3. 29 3月, 2012 1 次提交
  4. 23 3月, 2012 1 次提交
  5. 11 3月, 2012 1 次提交
    • H
      [S390] irq: external interrupt code passing · fde15c3a
      Heiko Carstens 提交于
      The external interrupt handlers have a parameter called ext_int_code.
      Besides the name this paramter does not only contain the ext_int_code
      but in addition also the "cpu address" (POP) which caused the external
      interrupt.
      To make the code a bit more obvious pass a struct instead so the called
      function can easily distinguish between external interrupt code and
      cpu address. The cpu address field however is named "subcode" since
      some external interrupt sources do not pass a cpu address but a
      different parameter (or none at all).
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fde15c3a
  6. 27 2月, 2012 1 次提交
  7. 25 2月, 2012 1 次提交
  8. 17 2月, 2012 1 次提交
  9. 27 12月, 2011 4 次提交
    • M
      [S390] cleanup trap handling · aa33c8cb
      Martin Schwidefsky 提交于
      Move the program interruption code and the translation exception identifier
      to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
      first level interrupt handler in entry[64].S store the two values. That
      makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
      and to reduce the number of arguments to a lot of functions. Finally
      un-inline do_trap. Overall this saves 5812 bytes in the .text section of
      the 64 bit kernel.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      aa33c8cb
    • C
      [S390] disable MACHINE_IS_VM check for pfault · f32269a0
      Carsten Otte 提交于
      This patch disables the check for MACHINE_IS_VM when initializing the
      pfault infrastructure. The code checks for successful completion of
      diag 258 anyway, thus it's safe to try initialization on LPAR anyway.
      This is needed to use pfault on kvm
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f32269a0
    • M
      [S390] add support for physical memory > 4TB · 14045ebf
      Martin Schwidefsky 提交于
      The kernel address space of a 64 bit kernel currently uses a three level
      page table and the vmemmap array has a fixed address and a fixed maximum
      size. A three level page table is good enough for systems with less than
      3.8TB of memory, for bigger systems four page table levels need to be
      used. Each page table level costs a bit of performance, use 3 levels for
      normal systems and 4 levels only for the really big systems.
      To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
      maximum of 64TB of memory.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      14045ebf
    • C
      [S390] kvm: fix sleeping function ... at mm/page_alloc.c:2260 · c86cce2a
      Christian Borntraeger 提交于
      commit cc772456
          [S390] fix list corruption in gmap reverse mapping
      
      added a potential dead lock:
      
      BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
      in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
      3 locks held by qemu-system-s39/1108:
       #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
       #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
       #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
      CPU: 0 Not tainted 3.1.3 #45
      Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
      00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
             00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
             0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
             000000000000000d 000000000000000c 00000004fd597988 0000000000000000
             0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
      Call Trace:
      ([<0000000000100926>] show_trace+0xee/0x144)
       [<0000000000131f3a>] __might_sleep+0x12a/0x158
       [<0000000000217fb4>] __alloc_pages_nodemask+0x224/0xadc
       [<0000000000123086>] gmap_alloc_table+0x46/0x114
       [<000000000012395c>] gmap_map_segment+0x268/0x298
       [<000003e00486b014>] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
       [<000003e004866414>] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
       [<000003e004866554>] kvm_set_memory_region+0x4c/0x6c [kvm]
       [<000003e004867c7a>] kvm_vm_ioctl+0x14a/0x314 [kvm]
       [<0000000000292100>] do_vfs_ioctl+0x94/0x588
       [<0000000000292688>] SyS_ioctl+0x94/0xac
       [<000000000061e124>] sysc_noemu+0x22/0x28
       [<000003fffcd5e7ca>] 0x3fffcd5e7ca
      3 locks held by qemu-system-s39/1108:
       #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
       #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
       #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
      
      Fix this by freeing the lock on the alloc path. This is ok, since the
      gmap table is never freed until we call gmap_free, so the table we are
      walking cannot go.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c86cce2a
  10. 14 11月, 2011 1 次提交
  11. 03 11月, 2011 3 次提交
    • A
      thp: share get_huge_page_tail() · b35a35b5
      Andrea Arcangeli 提交于
      This avoids duplicating the function in every arch gup_fast.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35a35b5
    • A
      s390: gup_huge_pmd() return 0 if pte changes · 0693bc9c
      Andrea Arcangeli 提交于
      s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
      should also return zero to avoid adding pages to the array at the wrong
      offset.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0693bc9c
    • A
      s390: gup_huge_pmd() support THP tail recounting · 220a2eb2
      Andrea Arcangeli 提交于
      Up to this point the code assumed old refcounting for hugepages (pre-thp).
      This updates the code directly to the thp mapcount tail page refcounting.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      220a2eb2
  12. 01 11月, 2011 1 次提交
  13. 30 10月, 2011 10 次提交
  14. 26 9月, 2011 1 次提交
  15. 20 9月, 2011 1 次提交
  16. 03 8月, 2011 2 次提交
  17. 24 7月, 2011 1 次提交
    • M
      [S390] kvm guest address space mapping · e5992f2e
      Martin Schwidefsky 提交于
      Add code that allows KVM to control the virtual memory layout that
      is seen by a guest. The guest address space uses a second page table
      that shares the last level pte-tables with the process page table.
      If a page is unmapped from the process page table it is automatically
      unmapped from the guest page table as well.
      
      The guest address space mapping starts out empty, KVM can map any
      individual 1MB segments from the process virtual memory to any 1MB
      aligned location in the guest virtual memory. If a target segment in
      the process virtual memory does not exist or is unmapped while a
      guest mapping exists the desired target address is stored as an
      invalid segment table entry in the guest page table.
      The population of the guest page table is fault driven.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e5992f2e
  18. 01 7月, 2011 1 次提交
    • P
      perf: Remove the nmi parameter from the swevent and overflow interface · a8b0ca17
      Peter Zijlstra 提交于
      The nmi parameter indicated if we could do wakeups from the current
      context, if not, we would set some state and self-IPI and let the
      resulting interrupt do the wakeup.
      
      For the various event classes:
      
        - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
          the PMI-tail (ARM etc.)
        - tracepoint: nmi=0; since tracepoint could be from NMI context.
        - software: nmi=[0,1]; some, like the schedule thing cannot
          perform wakeups, and hence need 0.
      
      As one can see, there is very little nmi=1 usage, and the down-side of
      not using it is that on some platforms some software events can have a
      jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
      
      The up-side however is that we can remove the nmi parameter and save a
      bunch of conditionals in fast paths.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Cree <mcree@orcon.net.nz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8b0ca17
  19. 06 6月, 2011 1 次提交
    • M
      [S390] use generic RCU page-table freeing code · 36409f63
      Martin Schwidefsky 提交于
      Replace the s390 specific rcu page-table freeing code with the
      generic variant. This requires to duplicate the definition for the
      struct mmu_table_batch as s390 does not use the generic tlb flush
      code.
      
      While we are at it remove the restriction that page table fragments
      can not be reused after a single fragment has been freed with rcu
      and split out allocation and freeing of page tables with pgstes.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      36409f63
  20. 29 5月, 2011 1 次提交
    • H
      [S390] mm: fix mmu_gather rework · 3c5cffb6
      Heiko Carstens 提交于
      Quite a few functions that get called from the tlb gather code require that
      preemption must be disabled. So disable preemption inside of the called
      functions instead.
      The only drawback is that rcu_table_freelist_finish() doesn't get necessarily
      called on the cpu(s) that filled the free lists. So we may see a delay, until
      we finally see an rcu callback. However over time this shouldn't matter.
      
      So we get rid of lots of "BUG: using smp_processor_id() in preemptible"
      messages.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      3c5cffb6
  21. 26 5月, 2011 3 次提交