1. 24 7月, 2011 6 次提交
    • C
      [S390] use gmap address spaces for kvm guest images · 598841ca
      Carsten Otte 提交于
      This patch switches kvm from using (Qemu's) user address space to
      Martin's gmap address space. This way QEMU does not have to use a
      linker script in order to fit large guests at low addresses in its
      address space.
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      598841ca
    • M
      [S390] kvm guest address space mapping · e5992f2e
      Martin Schwidefsky 提交于
      Add code that allows KVM to control the virtual memory layout that
      is seen by a guest. The guest address space uses a second page table
      that shares the last level pte-tables with the process page table.
      If a page is unmapped from the process page table it is automatically
      unmapped from the guest page table as well.
      
      The guest address space mapping starts out empty, KVM can map any
      individual 1MB segments from the process virtual memory to any 1MB
      aligned location in the guest virtual memory. If a target segment in
      the process virtual memory does not exist or is unmapped while a
      guest mapping exists the desired target address is stored as an
      invalid segment table entry in the guest page table.
      The population of the guest page table is fault driven.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e5992f2e
    • J
      [S390] fix s390 assembler code alignments · 144d634a
      Jan Glauber 提交于
      The alignment is missing for various global symbols in s390 assembly code.
      With a recent gcc and an instruction like stgrl this can lead to a
      specification exception if the instruction uses such a mis-aligned address.
      
      Specify the alignment explicitely and while add it define __ALIGN for s390
      and use the ENTRY define to save some lines of code.
      Signed-off-by: NJan Glauber <jang@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      144d634a
    • M
      [S390] move sie code to entry.S · 603d1a50
      Martin Schwidefsky 提交于
      The entry to / exit from sie has subtle dependencies to the first level
      interrupt handler. Move the sie assembler code to entry64.S and replace
      the SIE_HOOK callback with a test and the new _TIF_SIE bit.
      In addition this patch fixes several problems in regard to the check for
      the_TIF_EXIT_SIE bits. The old code checked the TIF bits before executing
      the interrupt handler and it only modified the instruction address if it
      pointed directly to the sie instruction. In both cases it could miss
      a TIF bit that normally would cause an exit from the guest and would
      reenter the guest context.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      603d1a50
    • C
      [S390] kvm: handle tprot intercepts · bb25b9ba
      Christian Borntraeger 提交于
      When running a kvm guest we can get intercepts for tprot, if the host
      page table is read-only or not populated. This patch implements the
      most common case (linux memory detection).
      This also allows host copy on write for guest memory on newer systems.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bb25b9ba
    • S
      [S390] irqs: Do not trace arch_local_{*,irq_*} functions · f433c4ae
      Steven Rostedt 提交于
      Do not trace arch_local_save_flags(), arch_local_irq_*() and friends.
      Although they are marked inline, gcc may still make a function out of
      them and add it to the pool of functions that are traced by the function
      tracer. This can cause undesirable results (kernel panic, triple faults,
      etc).
      
      Add the notrace notation to prevent them from ever being traced.
      
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f433c4ae
  2. 06 6月, 2011 3 次提交
    • M
      [S390] fix kvm defines for 31 bit compile · 6c61cfe9
      Martin Schwidefsky 提交于
      KVM is not available for 31 bit but the KVM defines cause warnings:
      
      arch/s390/include/asm/pgtable.h: In function 'ptep_test_and_clear_user_dirty':
      arch/s390/include/asm/pgtable.h:817: warning: integer constant is too large for 'unsigned long' type
      arch/s390/include/asm/pgtable.h:818: warning: integer constant is too large for 'unsigned long' type
      arch/s390/include/asm/pgtable.h: In function 'ptep_test_and_clear_user_young':
      arch/s390/include/asm/pgtable.h:837: warning: integer constant is too large for 'unsigned long' type
      arch/s390/include/asm/pgtable.h:838: warning: integer constant is too large for 'unsigned long' type
      
      Add 31 bit versions of the KVM defines to remove the warnings.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6c61cfe9
    • M
      [S390] use generic RCU page-table freeing code · 36409f63
      Martin Schwidefsky 提交于
      Replace the s390 specific rcu page-table freeing code with the
      generic variant. This requires to duplicate the definition for the
      struct mmu_table_batch as s390 does not use the generic tlb flush
      code.
      
      While we are at it remove the restriction that page table fragments
      can not be reused after a single fragment has been freed with rcu
      and split out allocation and freeing of page tables with pgstes.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      36409f63
    • J
      [S390] qdio: Split SBAL entry flags · 3ec90878
      Jan Glauber 提交于
      The qdio SBAL entry flag is made-up of four different values that are
      independent of one another. Some of the bits are reserved by the
      hardware and should not be changed by qdio. Currently all four values
      are overwritten since the SBAL entry flag is defined as an u32.
      
      Split the SBAL entry flag into four u8's as defined by the hardware
      and don't touch the reserved bits.
      Signed-off-by: NJan Glauber <jang@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3ec90878
  3. 29 5月, 2011 2 次提交
    • H
      [S390] mm: fix storage key handling · a43a9d93
      Heiko Carstens 提交于
      page_get_storage_key() and page_set_storage_key() expect a page address
      and not its page frame number. This got inconsistent with 2d42552d
      "[S390] merge page_test_dirty and page_clear_dirty".
      
      Result is that we read/write storage keys from random pages and do not
      have a working dirty bit tracking at all.
      E.g. SetPageUpdate() doesn't clear the dirty bit of requested pages, which
      for example ext4 doesn't like very much and panics after a while.
      
      Unable to handle kernel paging request at virtual user address (null)
      Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in:
      CPU: 1 Not tainted 2.6.39-07551-g139f37f5-dirty #152
      Process flush-94:0 (pid: 1576, task: 000000003eb34538, ksp: 000000003c287b70)
      Krnl PSW : 0704c00180000000 0000000000316b12 (jbd2_journal_file_inode+0x10e/0x138)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
      Krnl GPRS: 0000000000000000 0000000000000000 0000000000000000 0700000000000000
                 0000000000316a62 000000003eb34cd0 0000000000000025 000000003c287b88
                 0000000000000001 000000003c287a70 000000003f1ec678 000000003f1ec000
                 0000000000000000 000000003e66ec00 0000000000316a62 000000003c287988
      Krnl Code: 0000000000316b04: f0a0000407f4       srp     4(11,%r0),2036,0
                 0000000000316b0a: b9020022           ltgr    %r2,%r2
                 0000000000316b0e: a7740015           brc     7,316b38
                >0000000000316b12: e3d0c0000024       stg     %r13,0(%r12)
                 0000000000316b18: 4120c010           la      %r2,16(%r12)
                 0000000000316b1c: 4130d060           la      %r3,96(%r13)
                 0000000000316b20: e340d0600004       lg      %r4,96(%r13)
                 0000000000316b26: c0e50002b567       brasl   %r14,36d5f4
      Call Trace:
      ([<0000000000316a62>] jbd2_journal_file_inode+0x5e/0x138)
       [<00000000002da13c>] mpage_da_map_and_submit+0x2e8/0x42c
       [<00000000002daac2>] ext4_da_writepages+0x2da/0x504
       [<00000000002597e8>] writeback_single_inode+0xf8/0x268
       [<0000000000259f06>] writeback_sb_inodes+0xd2/0x18c
       [<000000000025a700>] writeback_inodes_wb+0x80/0x168
       [<000000000025aa92>] wb_writeback+0x2aa/0x324
       [<000000000025abde>] wb_do_writeback+0xd2/0x274
       [<000000000025ae3a>] bdi_writeback_thread+0xba/0x1c4
       [<00000000001737be>] kthread+0xa6/0xb0
       [<000000000056c1da>] kernel_thread_starter+0x6/0xc
       [<000000000056c1d4>] kernel_thread_starter+0x0/0xc
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       [<0000000000316a8a>] jbd2_journal_file_inode+0x86/0x138
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a43a9d93
    • E
      ns: Wire up the setns system call · 7b21fddd
      Eric W. Biederman 提交于
      32bit and 64bit on x86 are tested and working.  The rest I have looked
      at closely and I can't find any problems.
      
      setns is an easy system call to wire up.  It just takes two ints so I
      don't expect any weird architecture porting problems.
      
      While doing this I have noticed that we have some architectures that are
      very slow to get new system calls.  cris seems to be the slowest where
      the last system calls wired up were preadv and pwritev.  avr32 is weird
      in that recvmmsg was wired up but never declared in unistd.h.  frv is
      behind with perf_event_open being the last syscall wired up.  On h8300
      the last system call wired up was epoll_wait.  On m32r the last system
      call wired up was fallocate.  mn10300 has recvmmsg as the last system
      call wired up.  The rest seem to at least have syncfs wired up which was
      new in the 2.6.39.
      
      v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
      v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
      v4: Moved wiring up of the system call to another patch
      v5: ported to v2.6.39-rc6
      v6: rebased onto parisc-next and net-next to avoid syscall  conflicts.
      v7: ported to Linus's latest post 2.6.39 tree.
      
      >  arch/blackfin/include/asm/unistd.h     |    3 ++-
      >  arch/blackfin/mach-common/entry.S      |    1 +
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      
      Oh - ia64 wiring looks good.
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b21fddd
  4. 27 5月, 2011 2 次提交
  5. 26 5月, 2011 5 次提交
  6. 25 5月, 2011 2 次提交
  7. 23 5月, 2011 8 次提交
    • H
      [S390] oprofile: add missing irq stats counter · fcdd65b0
      Heiko Carstens 提交于
      Count CPU measurement external interrupts as well.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      fcdd65b0
    • H
      [S390] Ignore sendmmsg system call note wired up warning · bfac1d2d
      Heiko Carstens 提交于
      sendmmsg is reachable via the socket system call. We don't enable a second
      way on s390 to reach the same system call.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bfac1d2d
    • M
      [S390] refactor page table functions for better pgste support · b2fa47e6
      Martin Schwidefsky 提交于
      Rework the architecture page table functions to access the bits in the
      page table extension array (pgste). There are a number of changes:
      1) Fix missing pgste update if the attach_count for the mm is <= 1.
      2) For every operation that affects the invalid bit in the pte or the
         rcp byte in the pgste the pcl lock needs to be acquired. The function
         pgste_get_lock gets the pcl lock and returns the current pgste value
         for a pte pointer. The function pgste_set_unlock stores the pgste
         and releases the lock. Between these two calls the bits in the pgste
         can be shuffled.
      3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid
         calling SetPageDirty and SetPageReferenced from pgtable.h. If the
         host reference backup bit or the host change backup bit has been
         set the dirty/referenced state is transfered to the pte. The common
         code will pick up the state from the pte.
      4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect.
      5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel
         pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate.
      6) Rename kvm_s390_test_and_clear_page_dirty to
         ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young.
      7) Define mm_exclusive() and mm_has_pgste() helper to improve readability.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b2fa47e6
    • M
      [S390] merge page_test_dirty and page_clear_dirty · 2d42552d
      Martin Schwidefsky 提交于
      The page_clear_dirty primitive always sets the default storage key
      which resets the access control bits and the fetch protection bit.
      That will surprise a KVM guest that sets non-zero access control
      bits or the fetch protection bit. Merge page_test_dirty and
      page_clear_dirty back to a single function and only clear the
      dirty bit from the storage key.
      
      In addition move the function page_test_and_clear_dirty and
      page_test_and_clear_young to page.h where they belong. This
      requires to change the parameter from a struct page * to a page
      frame number.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      2d42552d
    • K
      0f1959f5
    • H
      [S390] pfault: cpu hotplug vs missing completion interrupts · f2db2e6c
      Heiko Carstens 提交于
      On cpu hot remove a PFAULT CANCEL command is sent to the hypervisor
      which in turn will cancel all outstanding pfault requests that have
      been issued on that cpu (the same happens with a SIGP cpu reset).
      
      The result is that we end up with uninterruptible processes where
      the interrupt that would wake up these processes never arrives.
      
      In order to solve this all processes which wait for a pfault
      completion interrupt get woken up after a cpu hot remove. The worst
      case that could happen is that they fault again and in turn need to
      wait again.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f2db2e6c
    • H
      [S390] percpu: implement arch specific irqsafe_cpu_ops · 4c2241fd
      Heiko Carstens 提交于
      Implement arch specific irqsafe_cpu ops. The arch specific ops do not
      disable/enable interrupts since that is an expensive operation. Instead
      we disable preemption and perform a compare and swap loop.
      Since on server distros (the ones we care about) preemption is disabled
      the preempt_disable()/preempt_enable() pair is a nop.
      In the end this code should be faster than the generic one.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4c2241fd
    • M
      [S390] Remove data execution protection · 043d0708
      Martin Schwidefsky 提交于
      The noexec support on s390 does not rely on a bit in the page table
      entry but utilizes the secondary space mode to distinguish between
      memory accesses for instructions vs. data. The noexec code relies
      on the assumption that the cpu will always use the secondary space
      page table for data accesses while it is running in the secondary
      space mode. Up to the z9-109 class machines this has been the case.
      Unfortunately this is not true anymore with z10 and later machines.
      The load-relative-long instructions lrl, lgrl and lgfrl access the
      memory operand using the same addressing-space mode that has been
      used to fetch the instruction.
      This breaks the noexec mode for all user space binaries compiled
      with march=z10 or later. The only option is to remove the current
      noexec support.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      043d0708
  8. 19 5月, 2011 1 次提交
    • J
      module: undo module RONX protection correctly. · 448694a1
      Jan Glauber 提交于
      While debugging I stumbled over two problems in the code that protects module
      pages.
      
      First issue is that disabling the protection before freeing init or unload of
      a module is not symmetric with the enablement. For instance, if pages are set
      to RO the page range from module_core to module_core + core_ro_size is
      protected. If a module is unloaded the page range from module_core to
      module_core + core_size is set back to RW.
      So pages that were not set to RO are also changed to RW.
      This is not critical but IMHO it should be symmetric.
      
      Second issue is that while set_memory_rw & set_memory_ro are used for
      RO/RW changes only set_memory_nx is involved for NX/X. One would await that
      the inverse function is called when the NX protection should be removed,
      which is not the case here, unless I'm missing something.
      Signed-off-by: NJan Glauber <jang@linux.vnet.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      448694a1
  9. 17 5月, 2011 2 次提交
  10. 10 5月, 2011 2 次提交
  11. 05 4月, 2011 1 次提交
  12. 31 3月, 2011 1 次提交
  13. 24 3月, 2011 4 次提交
  14. 23 3月, 2011 1 次提交