1. 15 11月, 2006 1 次提交
    • H
      [PATCH] hugetlb: prepare_hugepage_range check offset too · 68589bc3
      Hugh Dickins 提交于
      (David:)
      
      If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
      because the given file offset is not hugepage aligned - then do_mmap_pgoff
      will go to the unmap_and_free_vma backout path.
      
      But at this stage the vma hasn't been marked as hugepage, and the backout path
      will call unmap_region() on it.  That will eventually call down to the
      non-hugepage version of unmap_page_range().  On ppc64, at least, that will
      cause serious problems if there are any existing hugepage pagetable entries in
      the vicinity - for example if there are any other hugepage mappings under the
      same PUD.  unmap_page_range() will trigger a bad_pud() on the hugepage pud
      entries.  I suspect this will also cause bad problems on ia64, though I don't
      have a machine to test it on.
      
      (Hugh:)
      
      prepare_hugepage_range() should check file offset alignment when it checks
      virtual address and length, to stop MAP_FIXED with a bad huge offset from
      unmapping before it fails further down.  PowerPC should apply the same
      prepare_hugepage_range alignment checks as ia64 and all the others do.
      
      Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
      is the check for too small a mapping); but even so, move up setting of
      VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
      hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
      when unwinding from error will go the non-huge way, which may cause bad
      behaviour on architectures (powerpc and ia64) which segregate their huge
      mappings into a separate region of the address space.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68589bc3
  2. 01 11月, 2006 1 次提交
  3. 12 10月, 2006 1 次提交
    • M
      [PATCH] mm: use symbolic names instead of indices for zone initialisation · 6391af17
      Mel Gorman 提交于
      Arch-independent zone-sizing is using indices instead of symbolic names to
      offset within an array related to zones (max_zone_pfns).  The unintended
      impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
      of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set.  As a result, the
      the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.
      
      The following patch properly initialises the max_zone_pfns[] array and uses
      symbolic names instead of indices in each architecture using
      arch-independent zone-sizing.  Two users have successfully booted their
      powerpcs with it (one an ibook G4).  It has also been boot tested on x86,
      x86_64, ppc64 and ia64.  Please merge for 2.6.19-rc2.
      
      Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
      first fix.  Additional credit to Johannes Berg and Andreas Schwab for
      reporting the problem and testing on powerpc.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6391af17
  4. 03 10月, 2006 1 次提交
  5. 30 9月, 2006 2 次提交
    • S
      [PATCH] pidspace: is_init() · f400e198
      Sukadev Bhattiprolu 提交于
      This is an updated version of Eric Biederman's is_init() patch.
      (http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
      replaces a few more instances of ->pid == 1 with is_init().
      
      Further, is_init() checks pid and thus removes dependency on Eric's other
      patches for now.
      
      Eric's original description:
      
      	There are a lot of places in the kernel where we test for init
      	because we give it special properties.  Most  significantly init
      	must not die.  This results in code all over the kernel test
      	->pid == 1.
      
      	Introduce is_init to capture this case.
      
      	With multiple pid spaces for all of the cases affected we are
      	looking for only the first process on the system, not some other
      	process that has pid == 1.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: <lxc-devel@lists.sourceforge.net>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f400e198
    • J
      [PATCH] make PROT_WRITE imply PROT_READ · df67b3da
      Jason Baron 提交于
      Make PROT_WRITE imply PROT_READ for a number of architectures which don't
      support write only in hardware.
      
      While looking at this, I noticed that some architectures which do not
      support write only mappings already take the exact same approach.  For
      example, in arch/alpha/mm/fault.c:
      
      "
              if (cause < 0) {
                      if (!(vma->vm_flags & VM_EXEC))
                              goto bad_area;
              } else if (!cause) {
                      /* Allow reads even for write-only mappings */
                      if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                              goto bad_area;
              } else {
                      if (!(vma->vm_flags & VM_WRITE))
                              goto bad_area;
              }
      "
      
      Thus, this patch brings other architectures which do not support write only
      mappings in-line and consistent with the rest.  I've verified the patch on
      ia64, x86_64 and x86.
      
      Additional discussion:
      
      Several architectures, including x86, can not support write-only mappings.
      The pte for x86 reserves a single bit for protection and its two states are
      read only or read/write.  Thus, write only is not supported in h/w.
      
      Currently, if i 'mmap' a page write-only, the first read attempt on that page
      creates a page fault and will SEGV.  That check is enforced in
      arch/blah/mm/fault.c.  However, if i first write that page it will fault in
      and the pte will be set to read/write.  Thus, any subsequent reads to the page
      will succeed.  It is this inconsistency in behavior that this patch is
      attempting to address.  Furthermore, if the page is swapped out, and then
      brought back the first read will also cause a SEGV.  Thus, any arbitrary read
      on a page can potentially result in a SEGV.
      
      According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
      implementation may also allow read access." Also as mentioned, some
      archtectures, such as alpha, shown above already take the approach that i am
      suggesting.
      
      The counter-argument to this raised by Arjan, is that the kernel is enforcing
      the write only mapping the best it can given the h/w limitations.  This is
      true, however Alan Cox, and myself would argue that the inconsitency in
      behavior, that is applications can sometimes work/sometimes fails is highly
      undesireable.  If you read through the thread, i think people, came to an
      agreement on the last patch i posted, as nobody has objected to it...
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Ian Molton <spyro@f2s.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df67b3da
  6. 27 9月, 2006 1 次提交
  7. 25 9月, 2006 1 次提交
  8. 25 8月, 2006 1 次提交
  9. 24 8月, 2006 1 次提交
    • A
      [POWERPC] hugepage BUG fix · c9169f87
      Adam Litke 提交于
      On Tue, 2006-08-15 at 08:22 -0700, Dave Hansen wrote:
      > kernel BUG in cache_free_debugcheck at mm/slab.c:2748!
      
      Alright, this one is only triggered when slab debugging is enabled.  The
      slabs are assumed to be aligned on a HUGEPTE_TABLE_SIZE boundary.  The free
      path makes use of this assumption and uses the lowest nibble to pass around
      an index into an array of kmem_cache pointers.  With slab debugging turned
      on, the slab is still aligned, but the "working" object pointer is not.
      This would break the assumption above that a full nibble is available for
      the PGF_CACHENUM_MASK.
      
      The following patch reduces PGF_CACHENUM_MASK to cover only the two least
      significant bits, which is enough to cover the current number of 4 pgtable
      cache types.  Then use this constant to mask out the appropriate part of
      the huge pte pointer.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c9169f87
  10. 08 8月, 2006 2 次提交
  11. 31 7月, 2006 1 次提交
  12. 25 7月, 2006 1 次提交
  13. 13 7月, 2006 1 次提交
  14. 07 7月, 2006 1 次提交
  15. 01 7月, 2006 1 次提交
  16. 29 6月, 2006 1 次提交
  17. 28 6月, 2006 5 次提交
  18. 27 6月, 2006 1 次提交
  19. 21 6月, 2006 1 次提交
  20. 18 6月, 2006 1 次提交
  21. 15 6月, 2006 2 次提交
    • A
      [POWERPC] Remove stale 64bit on 32bit kernel code · 227318bb
      Anton Blanchard 提交于
      Remove some stale POWER3/POWER4/970 on 32bit kernel support.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      227318bb
    • P
      powerpc: Use 64k pages without needing cache-inhibited large pages · bf72aeba
      Paul Mackerras 提交于
      Some POWER5+ machines can do 64k hardware pages for normal memory but
      not for cache-inhibited pages.  This patch lets us use 64k hardware
      pages for most user processes on such machines (assuming the kernel
      has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
      start out using 64k pages and get switched to 4k pages if they use any
      non-cacheable mappings.
      
      With this, we use 64k pages for the vmalloc region and 4k pages for
      the imalloc region.  If anything creates a non-cacheable mapping in
      the vmalloc region, the vmalloc region will get switched to 4k pages.
      I don't know of any driver other than the DRM that would do this,
      though, and these machines don't have AGP.
      
      When a region gets switched from 64k pages to 4k pages, we do not have
      to clear out all the 64k HPTEs from the hash table immediately.  We
      use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
      was hashed in as a 64k page or a set of 4k pages.  If hash_page is
      trying to insert a 4k page for a Linux PTE and it sees that it has
      already been inserted as a 64k page, it first invalidates the 64k HPTE
      before inserting the 4k HPTE.  The hash invalidation routines also use
      the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
      set of 4k HPTEs to remove.  With those two changes, we can tolerate a
      mix of 4k and 64k HPTEs in the hash table, and they will all get
      removed when the address space is torn down.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bf72aeba
  22. 12 6月, 2006 1 次提交
  23. 11 6月, 2006 1 次提交
  24. 09 6月, 2006 1 次提交
    • B
      [PATCH] powerpc: Fix buglet with MMU hash management · c5cf0e30
      Benjamin Herrenschmidt 提交于
      Our MMU hash management code would not set the "C" bit (changed bit) in
      the hardware PTE when updating a RO PTE into a RW PTE. That would cause
      the hardware to possibly to a write back to the hash table to set it on
      the first store access, which in addition to being a performance issue,
      might also hit a bug when running with native hash management (non-HV)
      as our code is specifically optimized for the case where no write back
      happens.
      
      Thus there is a very small therocial window were a hash PTE can become
      corrupted if that HPTE has just been upgraded to read write, a store
      access happens on it, and that races with another processor evicting
      that same slot. Since eviction (caused by an almost full hash) is
      extremely rare, the bug is very unlikely to happen fortunately.
      
      This fixes by allowing the updating of the protection bits in the native
      hash handling to also set (but not clear) the "C" bit, and, in order to
      also improve performances in the general case, by always setting that
      bit on newly inserted hash PTE so that writeback really never happens.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c5cf0e30
  25. 19 5月, 2006 1 次提交
    • M
      [PATCH] powerpc: Unify mem= handling · 2babf5c2
      Michael Ellerman 提交于
      We currently do mem= handling in three seperate places. And as benh pointed out
      I wrote two of them. Now that we parse command line parameters earlier we can
      clean this mess up.
      
      Moving the parsing out of prom_init means the device tree might be allocated
      above the memory limit. If that happens we'd have to move it. As it happens
      we already have logic to do that for kdump, so just genericise it.
      
      This also means we might have reserved regions above the memory limit, if we
      do the bootmem allocator will blow up, so we have to modify
      lmb_enforce_memory_limit() to truncate the reserves as well.
      
      Tested on P5 LPAR, iSeries, F50, 44p. Tested moving device tree on P5 and
      44p and F50.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      2babf5c2
  26. 02 5月, 2006 1 次提交
  27. 28 4月, 2006 1 次提交
    • D
      [PATCH] powerpc: Fix pagetable bloat for hugepages · f10a04c0
      David Gibson 提交于
      At present, ARCH=powerpc kernels can waste considerable space in
      pagetables when making large hugepage mappings.  Hugepage PTEs go in
      PMD pages, but each PMD page maps 256M and so contains only 16
      hugepage PTEs (128 bytes of data), but takes up a 1024 byte
      allocation.  With CONFIG_PPC_64K_PAGES enabled (64k base page size),
      the situation is worse.  Now hugepage PTEs are at the PTE page level
      (also mapping 256M), so we store 16 hugepage PTEs in a 64k allocation.
      
      The PowerPC MMU already means that any 256M region is either all
      hugepage, or all normal pages.  Thus, with some care, we can use a
      different allocation for the hugepage PTE tables and only allocate the
      128 bytes necessary.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f10a04c0
  28. 22 4月, 2006 2 次提交
  29. 01 4月, 2006 1 次提交
  30. 29 3月, 2006 2 次提交
  31. 28 3月, 2006 1 次提交