1. 11 7月, 2008 1 次提交
    • D
      slub: Fix use-after-preempt of per-CPU data structure · bdb21928
      Dmitry Adamushko 提交于
      Vegard Nossum reported a crash in kmem_cache_alloc():
      
      	BUG: unable to handle kernel paging request at da87d000
      	IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0
      	*pde = 28180163 *pte = 1a87d160
      	Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      	Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
      	EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0
      	EIP is at kmem_cache_alloc+0xc7/0xe0
      	EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
      	ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
      	DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      
      and analyzed it:
      
        "The register %ecx looks innocent but is very important here. The disassembly:
      
             mov    %edx,%ecx
             shr    $0x2,%ecx
             rep stos %eax,%es:(%edi) <-- the fault
      
         So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
         (0x6b6b6b6b >> 2 == 0x1adadada.)
      
         %ecx is the counter for the memset, from here:
      
             memset(object, 0, c->objsize);
      
        i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
        Where did "c" come from? Uh-oh...
      
             c = get_cpu_slab(s, smp_processor_id());
      
        This looks like it has very much to do with CPU hotplug/unplug. Is
        there a race between SLUB/hotplug since the CPU slab is used after it
        has been freed?"
      
      Good analysis.
      
      Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc()
      can be migrated on another CPU right after local_irq_restore() and
      before memset().  The inital cpu can become offline in the mean time (or
      a migration is a consequence of the CPU going offline) so its
      'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback).
      
      At some point of time the caller continues on another CPU having an
      obsolete pointer...
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdb21928
  2. 05 7月, 2008 4 次提交
  3. 04 7月, 2008 2 次提交
  4. 24 6月, 2008 2 次提交
    • N
      mm: fix race in COW logic · 945754a1
      Nick Piggin 提交于
      There is a race in the COW logic.  It contains a shortcut to avoid the
      COW and reuse the page if we have the sole reference on the page,
      however it is possible to have two racing do_wp_page()ers with one
      causing the other to mistakenly believe it is safe to take the shortcut
      when it is not.  This could lead to data corruption.
      
      Process 1 and process2 each have a wp pte of the same anon page (ie.
      one forked the other).  The page's mapcount is 2.  Then they both
      attempt to write to it around the same time...
      
        proc1				proc2 thr1			proc2 thr2
        CPU0				CPU1				CPU3
        do_wp_page()			do_wp_page()
      				 trylock_page()
      				  can_share_swap_page()
      				   load page mapcount (==2)
      				  reuse = 0
      				 pte unlock
      				 copy page to new_page
      				 pte lock
      				 page_remove_rmap(page);
         trylock_page()
          can_share_swap_page()
           load page mapcount (==1)
          reuse = 1
         ptep_set_access_flags (allow W)
      
        write private key into page
      								read from page
      				ptep_clear_flush()
      				set_pte_at(pte of new_page)
      
      Fix this by moving the page_remove_rmap of the old page after the pte
      clear and flush.  Potentially the entire branch could be moved down
      here, but in order to stay consistent, I won't (should probably move all
      the *_mm_counter stuff with one patch).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      945754a1
    • L
      Fix ZERO_PAGE breakage with vmware · 672ca28e
      Linus Torvalds 提交于
      Commit 89f5b7da ("Reinstate ZERO_PAGE
      optimization in 'get_user_pages()' and fix XIP") broke vmware, as
      reported by Jeff Chua:
      
        "This broke vmware 6.0.4.
         Jun 22 14:53:03.845: vmx| NOT_IMPLEMENTED
         /build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774"
      
      and the reason seems to be that there's an old bug in how we handle do
      FOLL_ANON on VM_SHARED areas in get_user_pages(), but since it only
      triggered if the whole page table was missing, nobody had apparently hit
      it before.
      
      The recent changes to 'follow_page()' made the FOLL_ANON logic trigger
      not just for whole missing page tables, but for individual pages as
      well, and exposed this problem.
      
      This fixes it by making the test for when FOLL_ANON is used more
      careful, and also makes the code easier to read and understand by moving
      the logic to a separate inline function.
      Reported-and-tested-by: NJeff Chua <jeff.chua.linux@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      672ca28e
  5. 22 6月, 2008 2 次提交
  6. 21 6月, 2008 1 次提交
    • L
      Reinstate ZERO_PAGE optimization in 'get_user_pages()' and fix XIP · 89f5b7da
      Linus Torvalds 提交于
      KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit
      557ed1fa ("remove ZERO_PAGE") removed
      the ZERO_PAGE from the VM mappings, any users of get_user_pages() will
      generally now populate the VM with real empty pages needlessly.
      
      We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but
      since fault handling no longer uses ZERO_PAGE for new anonymous pages,
      we now need to handle that special case in follow_page() instead.
      
      In particular, the removal of ZERO_PAGE effectively removed the core
      file writing optimization where we would skip writing pages that had not
      been populated at all, and increased memory pressure a lot by allocating
      all those useless newly zeroed pages.
      
      This reinstates the optimization by making the unmapped PTE case the
      same as for a non-existent page table, which already did this correctly.
      
      While at it, this also fixes the XIP case for follow_page(), where the
      caller could not differentiate between the case of a page that simply
      could not be used (because it had no "struct page" associated with it)
      and a page that just wasn't mapped.
      
      We do that by simply returning an error pointer for pages that could not
      be turned into a "struct page *".  The error is arbitrarily picked to be
      EFAULT, since that was what get_user_pages() already used for the
      equivalent IO-mapped page case.
      
      [ Also removed an impossible test for pte_offset_map_lock() failing:
        that's not how that function works ]
      Acked-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89f5b7da
  7. 13 6月, 2008 2 次提交
  8. 12 6月, 2008 1 次提交
    • P
      nommu: Correct kobjsize() page validity checks. · 5a1603be
      Paul Mundt 提交于
      This implements a few changes on top of the recent kobjsize() refactoring
      introduced by commit 6cfd53fc.
      
      As Christoph points out:
      
      	virt_to_head_page cannot return NULL. virt_to_page also
      	does not return NULL. pfn_valid() needs to be used to
      	figure out if a page is valid.  Otherwise the page struct
      	reference that was returned may have PageReserved() set
      	to indicate that it is not a valid page.
      
      As discussed further in the thread, virt_addr_valid() is the preferable
      way to validate the object pointer in this case. In addition to fixing
      up the reserved page case, it also has the benefit of encapsulating the
      hack introduced by commit 4016a139 on
      the impacted platforms, allowing us to get rid of the extra checking in
      kobjsize() for the platforms that don't perform this type of bizarre
      memory_end abuse (every nommu platform that isn't blackfin). If blackfin
      decides to get in line with every other platform and use PageReserved
      for the DMA pages in question, kobjsize() will also continue to work
      fine.
      
      It also turns out that compound_order() will give us back 0-order for
      non-head pages, so we can get rid of the PageCompound check and just
      use compound_order() directly. Clean that up while we're at it.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Reviewed-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a1603be
  9. 10 6月, 2008 1 次提交
  10. 07 6月, 2008 3 次提交
  11. 25 5月, 2008 5 次提交
    • H
      memory hotplug: fix early allocation handling · cd94b9db
      Heiko Carstens 提交于
      Trying to add memory via add_memory() from within an initcall function
      results in
      
      bootmem alloc of 163840 bytes failed!
      Kernel panic - not syncing: Out of memory
      
      This is caused by zone_wait_table_init() which uses system_state to decide
      if it should use the bootmem allocator or not.
      
      When initcalls are handled the system_state is still SYSTEM_BOOTING but
      the bootmem allocator doesn't work anymore.  So the allocation will fail.
      
      To fix this use slab_is_available() instead as indicator like we do it
      everywhere else.
      
      [akpm@linux-foundation.org: coding-style fix]
      Reviewed-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd94b9db
    • A
      zonelists: handle a node zonelist with no applicable entries · 7eb54824
      Andy Whitcroft 提交于
      When booting 2.6.26-rc3 on a multi-node x86_32 numa system we are seeing
      panics when trying node local allocations:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000034c
       IP: [<c1042507>] get_page_from_freelist+0x4a/0x18e
       *pdpt = 00000000013a7001 *pde = 0000000000000000
       Oops: 0000 [#1] SMP
       Modules linked in:
      
       Pid: 0, comm: swapper Not tainted (2.6.26-rc3-00003-g5abc28d #82)
       EIP: 0060:[<c1042507>] EFLAGS: 00010282 CPU: 0
       EIP is at get_page_from_freelist+0x4a/0x18e
       EAX: c1371ed8 EBX: 00000000 ECX: 00000000 EDX: 00000000
       ESI: f7801180 EDI: 00000000 EBP: 00000000 ESP: c1371ec0
        DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
       Process swapper (pid: 0, ti=c1370000 task=c12f5b40 task.ti=c1370000)
       Stack: 00000000 00000000 00000000 00000000 000612d0 000412d0 00000000 000412d0
              f7801180 f7c0101c f7c01018 c10426e4 f7c01018 00000001 00000044 00000000
              00000001 c12f5b40 00000001 00000010 00000000 000412d0 00000286 000412d0
       Call Trace:
        [<c10426e4>] __alloc_pages_internal+0x99/0x378
        [<c10429ca>] __alloc_pages+0x7/0x9
        [<c105e0e8>] kmem_getpages+0x66/0xef
        [<c105ec55>] cache_grow+0x8f/0x123
        [<c105f117>] ____cache_alloc_node+0xb9/0xe4
        [<c105f427>] kmem_cache_alloc_node+0x92/0xd2
        [<c122118c>] setup_cpu_cache+0xaf/0x177
        [<c105e6ca>] kmem_cache_create+0x2c8/0x353
        [<c13853af>] kmem_cache_init+0x1ce/0x3ad
        [<c13755c5>] start_kernel+0x178/0x1ee
      
      This occurs when we are scanning the zonelists looking for a ZONE_NORMAL
      page.  In this system there is only ZONE_DMA and ZONE_NORMAL memory on
      node 0, all other nodes are mapped above 4GB physical.  Here is a dump
      of the zonelists from this system:
      
          zonelists pgdat=c1400000
           0: c14006c0:2 f7c006c0:2 f7e006c0:2 c1400360:1 c1400000:0
           1: c14006c0:2 c1400360:1 c1400000:0
          zonelists pgdat=f7c00000
           0: f7c006c0:2 f7e006c0:2 c14006c0:2 c1400360:1 c1400000:0
           1: f7c006c0:2
          zonelists pgdat=f7e00000
           0: f7e006c0:2 c14006c0:2 f7c006c0:2 c1400360:1 c1400000:0
           1: f7e006c0:2
      
      When performing a node local allocation we call get_page_from_freelist()
      looking for a page.  It in turn calls first_zones_zonelist() which returns
      a preferred_zone.  Where there are no applicable zones this will be NULL.
      However we use this unconditionally, leading to this panic.
      
      Where there are no applicable zones there is no possibility of a successful
      allocation, so simply fail the allocation.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7eb54824
    • A
      mm: fix atomic_t overflow in vm · 80119ef5
      Alan Cox 提交于
      The atomic_t type is 32bit but a 64bit system can have more than 2^32
      pages of virtual address space available.  Without this we overflow on
      ludicrously large mappings
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80119ef5
    • J
      mm: don't drop a partial page in a zone's memory map size · f7232154
      Johannes Weiner 提交于
      In a zone's present pages number, account for all pages occupied by the
      memory map, including a partial.
      Signed-off-by: NJohannes Weiner <hannes@saeurebad.de>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7232154
    • N
      mm: allow pfnmap ->fault()s · 42172d75
      Nick Piggin 提交于
      Take out an assertion to allow ->fault handlers to service PFNMAP regions.
      This is required to reimplement .nopfn handlers with .fault handlers and
      subsequently remove nopfn.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NJes Sorensen <jes@sgi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42172d75
  12. 23 5月, 2008 1 次提交
  13. 21 5月, 2008 1 次提交
    • G
      mm: bdi: fix race in bdi_class device creation · 19051c50
      Greg Kroah-Hartman 提交于
      There is a race from when a device is created with device_create() and
      then the drvdata is set with a call to dev_set_drvdata() in which a
      sysfs file could be open, yet the drvdata will be NULL, causing all
      sorts of bad things to happen.
      
      This patch fixes the problem by using the new function,
      device_create_vargs().
      
      Many thanks to Arthur Jones <ajones@riverbed.com> for reporting the bug,
      and testing patches out.
      
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Arthur Jones <ajones@riverbed.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      19051c50
  14. 20 5月, 2008 1 次提交
  15. 15 5月, 2008 6 次提交
    • H
      memory_hotplug: always initialize pageblock bitmap · 76cdd58e
      Heiko Carstens 提交于
      Trying to online a new memory section that was added via memory hotplug
      sometimes results in crashes when the new pages are added via __free_page.
       Reason for that is that the pageblock bitmap isn't initialized and hence
      contains random stuff.  That means that get_pageblock_migratetype()
      returns also random stuff and therefore
      
      	list_add(&page->lru,
      		&zone->free_area[order].free_list[migratetype]);
      
      in __free_one_page() tries to do a list_add to something that isn't even
      necessarily a list.
      
      This happens since 86051ca5 ("mm: fix
      usemap initialization") which makes sure that the pageblock bitmap gets
      only initialized for pages present in a zone.  Unfortunately for hot-added
      memory the zones "grow" after the memmap and the pageblock memmap have
      been initialized.  Which means that the new pages have an unitialized
      bitmap.  To solve this the calls to grow_zone_span() and grow_pgdat_span()
      are moved to __add_zone() just before the initialization happens.
      
      The patch also moves the two functions since __add_zone() is the only
      caller and I didn't want to add a forward declaration.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76cdd58e
    • V
      mprotect: prevent alteration of the PAT bits · 1c12c4cf
      Venki Pallipadi 提交于
      There is a defect in mprotect, which lets the user change the page cache
      type bits by-passing the kernel reserve_memtype and free_memtype
      wrappers.  Fix the problem by not letting mprotect change the PAT bits.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c12c4cf
    • G
      memory_hotplug: check for walk_memory_resource() failure in online_pages() · fd8a4221
      Geoff Levand 提交于
      Add a check to online_pages() to test for failure of
      walk_memory_resource().  This fixes a condition where a failure
      of walk_memory_resource() can lead to online_pages() returning
      success without the requested pages being onlined.
      Signed-off-by: NGeoff Levand <geoffrey.levand@am.sony.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Keith Mannthey <kmannth@us.ibm.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8a4221
    • H
      memory hotplug: memmap_init_zone called twice · c3723ca3
      Heiko Carstens 提交于
      __add_zone calls memmap_init_zone twice if memory gets attached to an empty
      zone.  Once via init_currently_empty_zone and once explictly right after that
      call.
      
      Looks like this is currently not a bug, however the call is superfluous and
      might lead to subtle bugs if memmap_init_zone gets changed.  So make sure it
      is called only once.
      
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3723ca3
    • M
      mm: fix infinite loop in filemap_fault · 3ef0f720
      Miklos Szeredi 提交于
      filemap_fault will go into an infinite loop if ->readpage() fails
      asynchronously.
      
      AFAICS the bug was introduced by this commit, which removed the wait after the
      final readpage:
      
         commit d00806b1
         Author: Nick Piggin <npiggin@suse.de>
         Date:   Thu Jul 19 01:46:57 2007 -0700
      
             mm: fix fault vs invalidate race for linear mappings
      
      Fix by reintroducing the wait_on_page_locked() after ->readpage() to make sure
      the page is up-to-date before jumping back to the beginning of the function.
      
      I've noticed this while testing nfs exporting on fuse.  The patch
      fixes it.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ef0f720
    • N
      fix SMP data race in pagetable setup vs walking · 362a61ad
      Nick Piggin 提交于
      There is a possible data race in the page table walking code. After the split
      ptlock patches, it actually seems to have been introduced to the core code, but
      even before that I think it would have impacted some architectures (powerpc
      and sparc64, at least, walk the page tables without taking locks eg. see
      find_linux_pte()).
      
      The race is as follows:
      The pte page is allocated, zeroed, and its struct page gets its spinlock
      initialized. The mm-wide ptl is then taken, and then the pte page is inserted
      into the pagetables.
      
      At this point, the spinlock is not guaranteed to have ordered the previous
      stores to initialize the pte page with the subsequent store to put it in the
      page tables. So another Linux page table walker might be walking down (without
      any locks, because we have split-leaf-ptls), and find that new pte we've
      inserted. It might try to take the spinlock before the store from the other
      CPU initializes it. And subsequently it might read a pte_t out before stores
      from the other CPU have cleared the memory.
      
      There are also similar races in higher levels of the page tables. They
      obviously don't involve the spinlock, but could see uninitialized memory.
      
      Arch code and hardware pagetable walkers that walk the pagetables without
      locks could see similar uninitialized memory problems, regardless of whether
      split ptes are enabled or not.
      
      I prefer to put the barriers in core code, because that's where the higher
      level logic happens, but the page table accessors are per-arch, and open-coding
      them everywhere I don't think is an option. I'll put the read-side barriers
      in alpha arch code for now (other architectures perform data-dependent loads
      in order).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      362a61ad
  16. 13 5月, 2008 2 次提交
  17. 09 5月, 2008 1 次提交
  18. 07 5月, 2008 2 次提交
    • M
      vfs: splice remove_suid() cleanup · 7f3d4ee1
      Miklos Szeredi 提交于
      generic_file_splice_write() duplicates remove_suid() just because it
      doesn't hold i_mutex.  But it grabs i_mutex inside splice_from_pipe()
      anyway, so this is rather pointless.
      
      Move locking to generic_file_splice_write() and call remove_suid() and
      __splice_from_pipe() instead.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7f3d4ee1
    • H
      x86: fix PAE pmd_bad bootup warning · aeed5fce
      Hugh Dickins 提交于
      Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.
      
      That came from 9fc34113 x86: debug pmd_bad();
      but we understand now that the typecasting was wrong for PAE in the previous
      version: pagetable pages above 4GB looked bad and stopped Arjan from booting.
      
      And revert that cded932b x86: fix pmd_bad
      and pud_bad to support huge pages.  It was the wrong way round: we shouldn't
      weaken every pmd_bad and pud_bad check to let huge pages slip through - in
      part they check that we _don't_ have a huge page where it's not expected.
      
      Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
      been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
      junk in the upper word is good; and x86_64 should follow x86_32's stricter
      comparison, to stop thinking any subset of required bits is good); but that
      should be a later patch.
      
      Fix Hans' good observation that follow_page() will never find pmd_huge()
      because that would have already failed the pmd_bad test: test pmd_huge in
      between the pmd_none and pmd_bad tests.  Tighten x86's pmd_huge() check?
      No, once it's a hugepage entry, it can get quite far from a good pmd: for
      example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.
      
      However... though follow_page() contains this and another test for huge
      pages, so it's nice to keep it working on them, where does it actually get
      called on a huge page?  get_user_pages() checks is_vm_hugetlb_page(vma) to
      to call alternative hugetlb processing, as does unmap_vmas() and others.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Earlier-version-tested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jeff Chua <jeff.chua.linux@gmail.com>
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aeed5fce
  19. 02 5月, 2008 2 次提交