1. 30 9月, 2006 2 次提交
    • S
      [PATCH] pidspace: is_init() · f400e198
      Sukadev Bhattiprolu 提交于
      This is an updated version of Eric Biederman's is_init() patch.
      (http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
      replaces a few more instances of ->pid == 1 with is_init().
      
      Further, is_init() checks pid and thus removes dependency on Eric's other
      patches for now.
      
      Eric's original description:
      
      	There are a lot of places in the kernel where we test for init
      	because we give it special properties.  Most  significantly init
      	must not die.  This results in code all over the kernel test
      	->pid == 1.
      
      	Introduce is_init to capture this case.
      
      	With multiple pid spaces for all of the cases affected we are
      	looking for only the first process on the system, not some other
      	process that has pid == 1.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: <lxc-devel@lists.sourceforge.net>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f400e198
    • J
      [PATCH] make PROT_WRITE imply PROT_READ · df67b3da
      Jason Baron 提交于
      Make PROT_WRITE imply PROT_READ for a number of architectures which don't
      support write only in hardware.
      
      While looking at this, I noticed that some architectures which do not
      support write only mappings already take the exact same approach.  For
      example, in arch/alpha/mm/fault.c:
      
      "
              if (cause < 0) {
                      if (!(vma->vm_flags & VM_EXEC))
                              goto bad_area;
              } else if (!cause) {
                      /* Allow reads even for write-only mappings */
                      if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                              goto bad_area;
              } else {
                      if (!(vma->vm_flags & VM_WRITE))
                              goto bad_area;
              }
      "
      
      Thus, this patch brings other architectures which do not support write only
      mappings in-line and consistent with the rest.  I've verified the patch on
      ia64, x86_64 and x86.
      
      Additional discussion:
      
      Several architectures, including x86, can not support write-only mappings.
      The pte for x86 reserves a single bit for protection and its two states are
      read only or read/write.  Thus, write only is not supported in h/w.
      
      Currently, if i 'mmap' a page write-only, the first read attempt on that page
      creates a page fault and will SEGV.  That check is enforced in
      arch/blah/mm/fault.c.  However, if i first write that page it will fault in
      and the pte will be set to read/write.  Thus, any subsequent reads to the page
      will succeed.  It is this inconsistency in behavior that this patch is
      attempting to address.  Furthermore, if the page is swapped out, and then
      brought back the first read will also cause a SEGV.  Thus, any arbitrary read
      on a page can potentially result in a SEGV.
      
      According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
      implementation may also allow read access." Also as mentioned, some
      archtectures, such as alpha, shown above already take the approach that i am
      suggesting.
      
      The counter-argument to this raised by Arjan, is that the kernel is enforcing
      the write only mapping the best it can given the h/w limitations.  This is
      true, however Alan Cox, and myself would argue that the inconsitency in
      behavior, that is applications can sometimes work/sometimes fails is highly
      undesireable.  If you read through the thread, i think people, came to an
      agreement on the last patch i posted, as nobody has objected to it...
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Ian Molton <spyro@f2s.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df67b3da
  2. 27 9月, 2006 3 次提交
  3. 04 8月, 2006 2 次提交
    • B
      [IA64] fix show_mem for VIRTUAL_MEM_MAP+FLATMEM · e44e41d0
      Bob Picco 提交于
      contig.c (FLATMEM) requires the same optimization as in discontig.c for show_mem
      when VIRTUAL_MEM_MAP is in use. Otherwise FLATMEM has softlockup timeouts.
      This was boot tested for memory configuration: SPARSEMEM,
      DISCONTIG+VIRTUAL_MEM_MAP, FLATMEM, FLATMEM+VIRTUAL_MEM_MAP and
      FLATMEM+VIRTUAL_MEM_MAP with largest memory gap less than LARGE_GAP by
      using boot parameter "mem=".
      
      This was boot tested and "echo m >/proc/sysrq-trigger" output evaluated for
      : FLATMEM, FLATMEM+VIRTUAL_MEM_MAP, DISCONTIGMEM+VIRTUAL_MEM_MAP and
      SPARSEMEM.
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e44e41d0
    • B
      [IA64] align high endpoint of VIRTUAL_MEM_MAP · 921eea1c
      Bob Picco 提交于
      Assure that vmem_map's high endpoint is MAX_ORDER aligned. Not doing so violates
      the buddy allocator algorithm. Also anyone using mem=XXX on boot line and
      not aligned to MAX_ORDER requires this patch in order to satisfy buddy
      allocator. vmem_map always starts at pfn 0. The potentially large MAX_ORDER
      on ia64 (due to hugetlbfs) requires that the end of vmem_map be aligned
      to MAX_ORDER_NR_PAGES.
      
      This was boot tested for: FLATMEM, FLATMEM+VIRTUAL_MEM_MAP,
      DISCONTIGMEM+VIRTUAL_MEM_MAP and SPARSEMEM.
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      921eea1c
  4. 03 8月, 2006 1 次提交
  5. 05 7月, 2006 1 次提交
    • Y
      [PATCH] Fix copying of pgdat array on each node for ia64 memory hotplug · dd8041f1
      Yasunori Goto 提交于
      I found a bug in memory hot-add code for ia64.
      
      IA64's code has copies of pgdat's array on each node to reduce memory
      access over crossing node.  This array is used by NODE_DATA() macro.  When
      new node is hot-added, this pgdat's array should be updated and copied on
      new node too.
      
      However, I used for_each_online_node() in scatter_node_data() to copy
      it. This meant its array is not copied on new node.
      Because initialization of structures for new node was halfway,
      so online_node_map couldn't be set at this time.
      
      To copy arrays on new node, I changed it to check value of pgdat_list[]
      which is source array of copies.  I tested this patch with my Memory Hotadd
      emulation on Tiger4.  This patch is for 2.6.17-git20.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dd8041f1
  6. 01 7月, 2006 1 次提交
  7. 28 6月, 2006 4 次提交
  8. 27 6月, 2006 1 次提交
  9. 15 5月, 2006 1 次提交
  10. 09 5月, 2006 1 次提交
    • B
      [IA64] rework memory attribute aliasing · 32e62c63
      Bjorn Helgaas 提交于
      This closes a couple holes in our attribute aliasing avoidance scheme:
      
        - The current kernel fails mmaps of some /dev/mem MMIO regions because
          they don't appear in the EFI memory map.  This keeps X from working
          on the Intel Tiger box.
      
        - The current kernel allows UC mmap of the 0-1MB region of
          /sys/.../legacy_mem even when the chipset doesn't support UC
          access.  This causes an MCA when starting X on HP rx7620 and rx8620
          boxes in the default configuration.
      
      There's more detail in the Documentation/ia64/aliasing.txt file this
      adds, but the general idea is that if a region might be covered by
      a granule-sized kernel identity mapping, any access via /dev/mem or
      mmap must use the same attribute as the identity mapping.
      
      Otherwise, we fall back to using an attribute that is supported
      according to the EFI memory map, or to using UC if the EFI memory
      map doesn't mention the region.
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      32e62c63
  11. 14 4月, 2006 1 次提交
    • R
      [IA64] Make show_mem() skip holes in a pgdat · ace1d816
      Robin Holt 提交于
      This patch modifies ia64's show_mem() to walk the vmem_map page tables and
      rapidly skip forward across regions where the page tables are missing.
      This prevents the pfn_valid() check from causing numerous unnecessary
      page faults.
      
      Without this patch on a 512 node 512 cpu system where every node has four
      memory holes, the show_mem() call takes 1 hour 18 minutes.  With this
      patch, it takes less than 3 seconds.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      ace1d816
  12. 08 4月, 2006 1 次提交
  13. 31 3月, 2006 1 次提交
  14. 28 3月, 2006 4 次提交
  15. 27 3月, 2006 1 次提交
  16. 23 3月, 2006 5 次提交
  17. 22 3月, 2006 2 次提交
    • D
      [PATCH] hugepage: is_aligned_hugepage_range() cleanup · 42b88bef
      David Gibson 提交于
      Quite a long time back, prepare_hugepage_range() replaced
      is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
      verify if an address range is suitable for a hugepage mapping.
      is_aligned_hugepage_range() stuck around, but only to implement
      prepare_hugepage_range() on archs which didn't implement their own.
      
      Most archs (everything except ia64 and powerpc) used the same
      implementation of is_aligned_hugepage_range().  On powerpc, which
      implements its own prepare_hugepage_range(), the custom version was never
      used.
      
      In addition, "is_aligned_hugepage_range()" was a bad name, because it
      suggests it returns true iff the given range is a good hugepage range,
      whereas in fact it returns 0-or-error (so the sense is reversed).
      
      This patch cleans up by abolishing is_aligned_hugepage_range().  Instead
      prepare_hugepage_range() is defined directly.  Most archs use the default
      version, which simply checks the given region is aligned to the size of a
      hugepage.  ia64 and powerpc define custom versions.  The ia64 one simply
      checks that the range is in the correct address space region in addition to
      being suitably aligned.  The powerpc version (just as previously) checks
      for suitable addresses, and if necessary performs low-level MMU frobbing to
      set up new areas for use by hugepages.
      
      No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      42b88bef
    • N
      [PATCH] remove set_page_count() outside mm/ · 7835e98b
      Nick Piggin 提交于
      set_page_count usage outside mm/ is limited to setting the refcount to 1.
      Remove set_page_count from outside mm/, and replace those users with
      init_page_count() and set_page_refcounted().
      
      This allows more debug checking, and tighter control on how code is allowed
      to play around with page->_count.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7835e98b
  18. 17 1月, 2006 1 次提交
  19. 14 1月, 2006 1 次提交
    • J
      [IA64] Hole in IA64 TLB flushing from system threads · cfbb1426
      Jack Steiner 提交于
      I originally thought this was an bug only in the SN code, but I think I
      also see a hole in the generic IA64 tlb code. (Separate patch was sent
      for the SN problem).
      
      It looks like there is a bug in the TLB flushing code. During context switch,
      kernel threads (kswapd, for example) inherit the mm of the task that was
      previously running on the cpu. Normally, this is ok because the previous context
      is still loaded into the RR registers. However, if the owner of the mm
      migrates to another cpu, changes it's context number, and references a
      page before kswapd issues a tlb_purge for that same page, the purge will be
      done with a stale context number (& RR registers).
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      cfbb1426
  20. 06 1月, 2006 1 次提交
    • A
      [IA64] support for cpu0 removal · ff741906
      Ashok Raj 提交于
      here is the BSP removal support for IA64. Its pretty much the same thing that
      was released a while back, but has your feedback incorporated.
      
      - Removed CONFIG_BSP_REMOVE_WORKAROUND and associated cmdline param
      - Fixed compile issue with sn2/zx1 due to a undefined fix_b0_for_bsp
      - some formatting nits (whitespace etc)
      
      This has been tested on tiger and long back by alex on hp systems as well.
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      ff741906
  21. 07 12月, 2005 1 次提交
  22. 09 11月, 2005 1 次提交
    • B
      [IA64] fix memory less node allocation · 97835245
      Bob Picco 提交于
      The original memory less node allocation attempted to use NODEDATA_ALIGN for
      alignment.  The bootmem allocator only allows a power of two alignments. This
      causes a BUG_ON for some nodes. For cpu only nodes just allocate with a
      PERCPU_PAGE_SIZE alignment.
      
      Some older firmware reports SLIT distances of 0xff and results in bestnode
      not being computed. This is now treated correctly.
      
      The failed allocation check was removed because it's redundant.  The
      bootmem allocator already makes this check.
      
      This fix has been boot tested on 4 node machine which has 4 cpu only nodes
      and 1 memory node.  Thanks to Pete Keilty for reporting this and helping me
      test it.
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      97835245
  23. 04 11月, 2005 1 次提交
  24. 01 11月, 2005 1 次提交
  25. 30 10月, 2005 1 次提交
    • D
      [PATCH] memory hotplug locking: node_size_lock · 208d54e5
      Dave Hansen 提交于
      pgdat->node_size_lock is basically only neeeded in one place in the normal
      code: show_mem(), which is the arch-specific sysrq-m printing function.
      
      Strictly speaking, the architectures not doing memory hotplug do no need this
      locking in show_mem().  However, they are all included for completeness.  This
      should also make any future consolidation of all of the implementations a
      little more straightforward.
      
      This lock is also held in the sparsemem code during a memory removal, as
      sections are invalidated.  This is the place there pfn_valid() is made false
      for a memory area that's being removed.  The lock is only required when doing
      pfn_valid() operations on memory which the user does not already have a
      reference on the page, such as in show_mem().
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      208d54e5