1. 26 3月, 2009 1 次提交
  2. 18 3月, 2009 2 次提交
    • M
      [S390] make page table upgrade work again · 0fb1d9bc
      Martin Schwidefsky 提交于
      After TASK_SIZE now gives the current size of the address space the
      upgrade of a 64 bit process from 3 to 4 levels of page table  needs
      to use the arch_mmap_check hook to catch large mmap lengths. The
      get_unmapped_area* functions need to check for -ENOMEM from the
      arch_get_unmapped_area*, upgrade the page table and retry.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      0fb1d9bc
    • M
      [S390] make page table walking more robust · f481bfaf
      Martin Schwidefsky 提交于
      Make page table walking on s390 more robust. The current code requires
      that the pgd/pud/pmd/pte loop is only done for address ranges that are
      below the end address of the last vma of the address space. But this
      is not always true, e.g. the generic page table walker does not guarantee
      this. Change TASK_SIZE/TASK_SIZE_OF to reflect the current size of the
      address space. This makes the generic page table walker happy but it
      breaks the upgrade of a 3 level page table to a 4 level page table.
      To make the upgrade work again another fix is required.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f481bfaf
  3. 07 1月, 2009 1 次提交
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
  4. 29 12月, 2008 1 次提交
    • J
      aio: make the lookup_ioctx() lockless · abf137dd
      Jens Axboe 提交于
      The mm->ioctx_list is currently protected by a reader-writer lock,
      so we always grab that lock on the read side for doing ioctx
      lookups. As the workload is extremely reader biased, turn this into
      an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
      the rwlock and use a spinlock for providing update side exclusion.
      
      There's usually only 1 entry on this list, so it doesn't make sense
      to look into fancier data structures.
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      abf137dd
  5. 25 12月, 2008 1 次提交
  6. 28 10月, 2008 1 次提交
    • C
      [S390] pgtables: Fix race in enable_sie vs. page table ops · 250cf776
      Christian Borntraeger 提交于
      The current enable_sie code sets the mm->context.pgstes bit to tell
      dup_mm that the new mm should have extended page tables. This bit is also
      used by the s390 specific page table primitives to decide about the page
      table layout - which means context.pgstes has two meanings. This can cause
      any kind of bugs. For example  - e.g. shrink_zone can call
      ptep_clear_flush_young while enable_sie is running. ptep_clear_flush_young
      will test for context.pgstes. Since enable_sie changed that value of the old
      struct mm without changing the page table layout ptep_clear_flush_young will
      do the wrong thing.
      The solution is to split pgstes into two bits
      - one for the allocation
      - one for the current state
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      250cf776
  7. 20 10月, 2008 1 次提交
    • B
      mm: cleanup to make remove_memory() arch-neutral · 71088785
      Badari Pulavarty 提交于
      There is nothing architecture specific about remove_memory().
      remove_memory() function is common for all architectures which support
      hotplug memory remove.  Instead of duplicating it in every architecture,
      collapse them into arch neutral function.
      
      [akpm@linux-foundation.org: fix the export]
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71088785
  8. 11 10月, 2008 1 次提交
  9. 01 8月, 2008 1 次提交
  10. 27 7月, 2008 1 次提交
  11. 25 7月, 2008 2 次提交
  12. 14 7月, 2008 1 次提交
  13. 10 6月, 2008 1 次提交
  14. 07 6月, 2008 1 次提交
  15. 30 5月, 2008 2 次提交
  16. 15 5月, 2008 1 次提交
  17. 07 5月, 2008 1 次提交
  18. 30 4月, 2008 3 次提交
  19. 27 4月, 2008 1 次提交
    • C
      s390: KVM preparation: provide hook to enable pgstes in user pagetable · 402b0862
      Carsten Otte 提交于
      The SIE instruction on s390 uses the 2nd half of the page table page to
      virtualize the storage keys of a guest. This patch offers the s390_enable_sie
      function, which reorganizes the page tables of a single-threaded process to
      reserve space in the page table:
      s390_enable_sie makes sure that the process is single threaded and then uses
      dup_mm to create a new mm with reorganized page tables. The old mm is freed
      and the process has now a page status extended field after every page table.
      
      Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
      
      This patch has a small common code hit, namely making dup_mm non-static.
      
      Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
      review feedback. Now we do have the prototype for dup_mm in
      include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
      call task_lock() to prevent race against ptrace modification of mm_users.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      402b0862
  20. 17 4月, 2008 4 次提交
  21. 10 2月, 2008 3 次提交
  22. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  23. 05 2月, 2008 3 次提交
  24. 26 1月, 2008 4 次提交
  25. 20 11月, 2007 1 次提交