1. 17 4月, 2008 3 次提交
  2. 10 2月, 2008 3 次提交
  3. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  4. 05 2月, 2008 3 次提交
  5. 26 1月, 2008 4 次提交
  6. 20 11月, 2007 1 次提交
  7. 22 10月, 2007 2 次提交
  8. 20 10月, 2007 1 次提交
    • S
      pid namespaces: define is_global_init() and is_container_init() · b460cbc5
      Serge E. Hallyn 提交于
      is_init() is an ambiguous name for the pid==1 check.  Split it into
      is_global_init() and is_container_init().
      
      A cgroup init has it's tsk->pid == 1.
      
      A global init also has it's tsk->pid == 1 and it's active pid namespace
      is the init_pid_ns.  But rather than check the active pid namespace,
      compare the task structure with 'init_pid_ns.child_reaper', which is
      initialized during boot to the /sbin/init process and never changes.
      
      Changelog:
      
      	2.6.22-rc4-mm2-pidns1:
      	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
      	  global init (/sbin/init) process. This would improve performance
      	  and remove dependence on the task_pid().
      
      	2.6.21-mm2-pidns2:
      
      	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
      	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
      	  This way, we kill only the cgroup if the cgroup's init has a
      	  bug rather than force a kernel panic.
      
      [akpm@linux-foundation.org: fix comment]
      [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
      [bunk@stusta.de: kernel/pid.c: remove unused exports]
      [sukadev@us.ibm.com: Fix capability.c to work with threaded init]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b460cbc5
  9. 17 10月, 2007 2 次提交
    • D
      oom: move prototypes to appropriate header file · 5a3135c2
      David Rientjes 提交于
      Move the OOM killer's extern function prototypes to include/linux/oom.h and
      include it where necessary.
      
      [clg@fr.ibm.com: build fix]
      Cc: Andrea Arcangeli <andrea@suse.de>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a3135c2
    • W
      During VM oom condition, kill all threads in process group · dcca2bde
      Will Schmidt 提交于
      We have had complaints where a threaded application is left in a bad state
      after one of it's threads is killed when we hit a VM: out_of_memory
      condition.
      
      Killing just one of the process threads can leave the application in a bad
      state, whereas killing the entire process group would allow for the
      application to restart, or be otherwise handled, and makes it very obvious
      that something has gone wrong.
      
      This change allows the entire process group to be taken down, rather
      than just the one thread.
      Signed-off-by: NWill Schmidt <will_schmidt@vnet.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcca2bde
  10. 12 10月, 2007 1 次提交
  11. 22 8月, 2007 1 次提交
  12. 27 7月, 2007 1 次提交
  13. 20 7月, 2007 1 次提交
    • N
      mm: fault feedback #2 · 83c54070
      Nick Piggin 提交于
      This patch completes Linus's wish that the fault return codes be made into
      bit flags, which I agree makes everything nicer.  This requires requires
      all handle_mm_fault callers to be modified (possibly the modifications
      should go further and do things like fault accounting in handle_mm_fault --
      however that would be for another patch).
      
      [akpm@linux-foundation.org: fix alpha build]
      [akpm@linux-foundation.org: fix s390 build]
      [akpm@linux-foundation.org: fix sparc build]
      [akpm@linux-foundation.org: fix sparc64 build]
      [akpm@linux-foundation.org: fix ia64 build]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Still apparently needs some ARM and PPC loving - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c54070
  14. 21 5月, 2007 1 次提交
  15. 10 5月, 2007 1 次提交
  16. 09 5月, 2007 1 次提交
    • C
      move die notifier handling to common code · 1eeb66a1
      Christoph Hellwig 提交于
      This patch moves the die notifier handling to common code.  Previous
      various architectures had exactly the same code for it.  Note that the new
      code is compiled unconditionally, this should be understood as an appel to
      the other architecture maintainer to implement support for it aswell (aka
      sprinkling a notify_die or two in the proper place)
      
      arm had a notifiy_die that did something totally different, I renamed it to
      arm_notify_die as part of the patch and made it static to the file it's
      declared and used at.  avr32 used to pass slightly less information through
      this interface and I brought it into line with the other architectures.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
      [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NBryan Wu <bryan.wu@analog.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eeb66a1
  17. 05 5月, 2007 1 次提交
  18. 27 4月, 2007 2 次提交
  19. 06 3月, 2007 1 次提交
    • G
      [S390] Fixed handling of access register mode faults. · 482b05dd
      Gerald Schaefer 提交于
      Replaced check_user_space() + __check_access_register with the new
      check_space(). The old functions made wrong assumptions about kernel
      and user space when the kernel and user address spaces are switched
      (kernel in home space, user in primary/secondary space).
      Secondly the user process can switch to the accress register mode if
      it is running in primary or secondary mode. In addition it can load
      an arbitrary value to the access registers. If any other value than
      0 for primary space or 1 for secondary space is loaded and memory
      is accessed using the base register related to the access register,
      the program should be terminated with a SIGSEGV. To achieve that the
      DUALD pointer in the DUCT and the PSALD pointer in the PASTE need
      to point to an array of 8 invalid access-list entries to get a
      ALEN-translation exception if an invalid alet is used.
      Signed-off-by: NGerald Schaefer <geraldsc@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      482b05dd
  20. 21 2月, 2007 1 次提交
  21. 15 2月, 2007 3 次提交
  22. 12 2月, 2007 2 次提交
  23. 06 2月, 2007 3 次提交
    • H
      4d284cac
    • H
      [S390] Mark kernel text section read-only. · 162e006e
      Heiko Carstens 提交于
      Set read-only flag in the page table entries for the kernel image text
      section. This will catch all instruction caused corruptions withing the
      text section.
      Instruction replacement via kprobes still works, since it bypasses now
      dynamic address translation.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      162e006e
    • G
      [S390] noexec protection · c1821c2e
      Gerald Schaefer 提交于
      This provides a noexec protection on s390 hardware. Our hardware does
      not have any bits left in the pte for a hw noexec bit, so this is a
      different approach using shadow page tables and a special addressing
      mode that allows separate address spaces for code and data.
      
      As a special feature of our "secondary-space" addressing mode, separate
      page tables can be specified for the translation of data addresses
      (storage operands) and instruction addresses. The shadow page table is
      used for the instruction addresses and the standard page table for the
      data addresses.
      The shadow page table is linked to the standard page table by a pointer
      in page->lru.next of the struct page corresponding to the page that
      contains the standard page table (since page->private is not really
      private with the pte_lock and the page table pages are not in the LRU
      list).
      Depending on the software bits of a pte, it is either inserted into
      both page tables or just into the standard (data) page table. Pages of
      a vma that does not have the VM_EXEC bit set get mapped only in the
      data address space. Any try to execute code on such a page will cause a
      page translation exception. The standard reaction to this is a SIGSEGV
      with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn)
      and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the
      kernel to the signal stack frame. Unfortunately, the signal return
      mechanism cannot be modified to use an SA_RESTORER because the
      exception unwinding code depends on the system call opcode stored
      behind the signal stack frame.
      
      This feature requires that user space is executed in secondary-space
      mode and the kernel in home-space mode, which means that the addressing
      modes need to be switched and that the noexec protection only works
      for user space.
      After switching the addressing modes, we cannot use the mvcp/mvcs
      instructions anymore to copy between kernel and user space. A new
      mvcos instruction has been added to the z9 EC/BC hardware which allows
      to copy between arbitrary address spaces, but on older hardware the
      page tables need to be walked manually.
      Signed-off-by: NGerald Schaefer <geraldsc@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c1821c2e