提交 · 9b41046cd0ee0a57f849d6e1363f7933e363cca9 · openeuler / raspberrypi-kernel

01 4月, 2006 1 次提交

[PATCH] Don't pass boot parameters to argv_init[] · 9b41046c

由 OGAWA Hirofumi 提交于 3月 31, 2006

The boot cmdline is parsed in parse_early_param() and
parse_args(,unknown_bootoption).

And __setup() is used in obsolete_checksetup().

	start_kernel()
		-> parse_args()
			-> unknown_bootoption()
				-> obsolete_checksetup()

If __setup()'s callback (->setup_func()) returns 1 in
obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
handled.

If ->setup_func() returns 0, obsolete_checksetup() tries other
->setup_func().  If all ->setup_func() that matched a parameter returns 0,
a parameter is seted to argv_init[].

Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
If the app doesn't ignore those arguments, it will warning and exit.

This patch fixes a wrong usage of it, however fixes obvious one only.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9b41046c

27 3月, 2006 2 次提交

[PATCH] Add API for flushing Anon pages · 03beb076

由 James Bottomley 提交于 3月 26, 2006

Currently, get_user_pages() returns fully coherent pages to the kernel for
anything other than anonymous pages.  This is a problem for things like
fuse and the SCSI generic ioctl SG_IO which can potentially wish to do DMA
to anonymous pages passed in by users.

The fix is to add a new memory management API: flush_anon_page() which
is used in get_user_pages() to make anonymous pages coherent.
Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

03beb076

BUG_ON() Conversion in mm/memory.c · 5bcb28b1

由 Eric Sesterhenn 提交于 3月 26, 2006

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

5bcb28b1

26 3月, 2006 1 次提交

[PATCH] mm: restore vm_normal_page check · 315ab19a

由 Nick Piggin 提交于 3月 25, 2006

Hugh is rightly concerned that the CONFIG_DEBUG_VM coverage has gone too
far in vm_normal_page, considering that we expect production kernels to be
shipped with the option turned off, and that the code has been under some
large changes recently.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

315ab19a

22 3月, 2006 4 次提交

[PATCH] hugepage: Fix hugepage logic in free_pgtables() harder · 4866920b

由 David Gibson 提交于 3月 22, 2006

Turns out the hugepage logic in free_pgtables() was doubly broken.  The
loop coalescing multiple normal page VMAs into one call to free_pgd_range()
had an off by one error, which could mean it would coalesce one hugepage
VMA into the same bundle (checking 'vma' not 'next' in the loop).  I
transferred this bug into the new is_vm_hugetlb_page() based version.
Here's the fix.

This one didn't bite on powerpc previously for the same reason the
is_hugepage_only_range() problem didn't: powerpc's hugetlb_free_pgd_range()
is identical to free_pgd_range().  It didn't bite on ia64 because the
hugepage region is distant enough from any other region that the separated
PMD_SIZE distance test would always prevent coalescing the two together.

No libhugetlbfs testsuite regressions (ppc64, POWER5).
Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4866920b

[PATCH] hugepage: Fix hugepage logic in free_pgtables() · 9da61aef

由 David Gibson 提交于 3月 22, 2006

free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
of the normal free_pgd_range() on hugepage VMAs. However, the test it uses
to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
range at the start of the vma. is_hugepage_only_range() will return true
if the given range has any intersection with a hugepage address region, and
in this case the given region need not be hugepage aligned. So, for
example, this test can return true if called on, say, a 4k VMA immediately
preceding a (nicely aligned) hugepage VMA.

At present we get away with this because the powerpc version of
hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64 (the
only other arch with a non-trivial is_hugepage_only_range()) we get away
with it for a different reason; the hugepage area is not contiguous with
the rest of the user address space, and VMAs are not permitted in between,
so the test can't return a false positive there.

Nonetheless this should be fixed. We do that in the patch below by
replacing the is_hugepage_only_range() test with an explicit test of the
VMA using is_vm_hugetlb_page().

This in turn changes behaviour for platforms where is_hugepage_only_range()
returns false always (everything except powerpc and ia64). We address this
by ensuring that hugetlb_free_pgd_range() is defined to be identical to
free_pgd_range() (instead of a no-op) on everything except ia64. Even so,
it will prevent some otherwise possible coalescing of calls down to
free_pgd_range(). Since this only happens for hugepage VMAs, removing this
small optimization seems unlikely to cause any trouble.

This patch causes no regressions on the libhugetlbfs testsuite - ppc64
POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Acked-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9da61aef

[PATCH] mm: more CONFIG_DEBUG_VM · b7ab795b

由 Nick Piggin 提交于 3月 22, 2006

Put a few more checks under CONFIG_DEBUG_VM
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b7ab795b

[PATCH] mm: split highorder pages · 8dfcc9ba

由 Nick Piggin 提交于 3月 22, 2006

Have an explicit mm call to split higher order pages into individual pages.
 Should help to avoid bugs and be more explicit about the code's intention.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: NYoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8dfcc9ba

17 3月, 2006 1 次提交

[PATCH] fix free swap cache latency · 6f5e6b9e

由 Hugh Dickins 提交于 3月 16, 2006

Lee Revell reported 28ms latency when process with lots of swapped memory
exits.

2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all.  We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.

Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NNick Piggin <npiggin@suse.de>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6f5e6b9e

18 2月, 2006 1 次提交

[PATCH] x86_64: Add boot option to disable randomized mappings and cleanup · a62eaf15

由 Andi Kleen 提交于 2月 16, 2006

AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.

Also I moved the variable to a more appropiate place and make
it independent from sysctl

And marked __read_mostly which it is.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a62eaf15

02 2月, 2006 1 次提交

[PATCH] Direct Migration V9: PageSwapCache checks · b16664e4

由 Christoph Lameter 提交于 2月 01, 2006

Check for PageSwapCache after looking up and locking a swap page.

The page migration code may change a swap pte to point to a different page
under lock_page().

If that happens then the vm must retry the lookup operation in the swap space
to find the correct page number.  There are a couple of locations in the VM
where a lock_page() is done on a swap page.  In these locations we need to
check afterwards if the page was migrated.  If the page was migrated then the
old page that was looked up before was freed and no longer has the
PageSwapCache bit set.
Signed-off-by: NHirokazu Takahashi <taka@valinux.co.jp>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b16664e4

10 1月, 2006 1 次提交

[PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem · 1b1dcc1b

由 Jes Sorensen 提交于 1月 09, 2006

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: NIngo Molnar <mingo@elte.hu>

(finished the conversion)
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1b1dcc1b

09 1月, 2006 1 次提交

[PATCH] spufs: The SPU file system, base · 67207b96

由 Arnd Bergmann 提交于 11月 15, 2005

This is the current version of the spu file system, used
for driving SPEs on the Cell Broadband Engine.

This release is almost identical to the version for the
2.6.14 kernel posted earlier, which is available as part
of the Cell BE Linux distribution from
http://www.bsc.es/projects/deepcomputing/linuxoncell/.

The first patch provides all the interfaces for running
spu application, but does not have any support for
debugging SPU tasks or for scheduling. Both these
functionalities are added in the subsequent patches.

See Documentation/filesystems/spufs.txt on how to use
spufs.
Signed-off-by: NArnd Bergmann <arndb@de.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

67207b96

07 1月, 2006 3 次提交

[PATCH] mm: pfault optimisation · 41e9b63b

由 Nick Piggin 提交于 1月 06, 2006

This atomic operation is superfluous: the pte will be added with the
referenced bit set, and the page will be referenced through this mapping after
the page fault handler returns anyway.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

41e9b63b

[PATCH] mm: rmap optimisation · 9617d95e

由 Nick Piggin 提交于 1月 06, 2006

Optimise rmap functions by minimising atomic operations when we know there
will be no concurrent modifications.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9617d95e

[PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store · f6b3ec23

由 Badari Pulavarty 提交于 1月 06, 2006

Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
given range of pages & its associated backing store.  Current
implementation supports only shmfs/tmpfs and other filesystems return
-ENOSYS.

"Some app allocates large tmpfs files, then when some task quits and some
client disconnect, some memory can be released.  However the only way to
release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli

Databases want to use this feature to drop a section of their bufferpool
(shared memory segments) - without writing back to disk/swap space.

This feature is also useful for supporting hot-plug memory on UML.

Concerns raised by Andrew Morton:

- "We have no plan for holepunching!  If we _do_ have such a plan (or
  might in the future) then what would the API look like?  I think
  sys_holepunch(fd, start, len), so we should start out with that."

- Using madvise is very weird, because people will ask "why do I need to
  mmap my file before I can stick a hole in it?"

- None of the other madvise operations call into the filesystem in this
  manner.  A broad question is: is this capability an MM operation or a
  filesytem operation?  truncate, for example, is a filesystem operation
  which sometimes has MM side-effects.  madvise is an mm operation and with
  this patch, it gains FS side-effects, only they're really, really
  significant ones."

Comments:

- Andrea suggested the fs operation too but then it's more efficient to
  have it as a mm operation with fs side effects, because they don't
  immediatly know fd and physical offset of the range.  It's possible to
  fixup in userland and to use the fs operation but it's more expensive,
  the vmas are already in the kernel and we can use them.

Short term plan &  Future Direction:

- We seem to need this interface only for shmfs/tmpfs files in the short
  term.  We have to add hooks into the filesystem for correctness and
  completeness.  This is what this patch does.

- In the future, plan is to support both fs and mmap apis also.  This
  also involves (other) filesystem specific functions to be implemented.

- Current patch doesn't support VM_NONLINEAR - which can be addressed in
  the future.
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f6b3ec23

17 12月, 2005 1 次提交

Make sure we copy pages inserted with "vm_insert_page()" on fork · 4d7672b4

由 Linus Torvalds 提交于 12月 16, 2005

The logic that decides that a fork() might be able to avoid copying a VM
area when it can be re-created by page faults didn't know about the new
vm_insert_page() case.

Also make some things a bit more anal wrt VM_PFNMAP.

Pointed out by Hugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4d7672b4

13 12月, 2005 1 次提交

get_user_pages: don't try to follow PFNMAP pages · 1ff80389

由 Linus Torvalds 提交于 12月 12, 2005

Nick Piggin points out that a few drivers play games with VM_IO (why?
who knows..) and thus a pfn-remapped area may not have that bit set even
if remap_pfn_range() set it originally.

So make it explicit in get_user_pages() that we don't follow VM_PFNMAP
pages, since pretty much by definition they do not have a "struct page"
associated with them.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1ff80389

12 12月, 2005 3 次提交

Allow arbitrary read-only shared pfn-remapping too · 67121172

由 Linus Torvalds 提交于 12月 11, 2005

The VM layer (for historical reasons) turns a read-only shared mmap into
a private-like mapping with the VM_MAYWRITE bit clear.  Thus checking
just VM_SHARED isn't actually sufficient.

So use a trivial helper function for the cases where we wanted to inquire
if a mapping was COW-like or not.

Moo!
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

67121172

Remove (at least temporarily) the "incomplete PFN mapping" support · 7fc7e2ee

由 Linus Torvalds 提交于 12月 11, 2005

With the previous commit, we can handle arbitrary shared re-mappings
even without this complexity, and since the only known private mappings
are for strange users of /dev/mem (which never create an incomplete one),
there seems to be no reason to support it.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7fc7e2ee

Allow arbitrary shared PFNMAP's · fb155c16

由 Linus Torvalds 提交于 12月 11, 2005

A shared mapping doesn't cause COW-pages, so we don't need to worry
about the whole vm_pgoff logic to decide if a PFN-remapped page has
gone through COW or not.

This makes it possible to entirely avoid the special "partial remapping"
logic for the common case.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fb155c16

04 12月, 2005 1 次提交

Make vm_insert_page() available to NVidia module · e3c3374f

由 Linus Torvalds 提交于 12月 03, 2005

It used to use remap_pfn_range(), which wasn't GPL-only either, and the
new interface is actually simpler and does more checking, so we
shouldn't unnecessarily discourage people from switching over.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e3c3374f

01 12月, 2005 1 次提交

VM: add "vm_insert_page()" function · a145dd41

由 Linus Torvalds 提交于 11月 30, 2005

This is what a lot of drivers will actually want to use to insert
individual pages into a user VMA.  It doesn't have the old PageReserved
restrictions of remap_pfn_range(), and it doesn't complain about partial
remappings.

The page you insert needs to be a nice clean kernel allocation, so you
can't insert arbitrary page mappings with this, but that's not what
people want.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a145dd41

30 11月, 2005 7 次提交

[PATCH] VM: Fix typos in get_locked_pte · 49c91fb0

由 Trond Myklebust 提交于 11月 29, 2005

Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

49c91fb0

[PATCH] pfnmap: do_no_page BUG_ON again · 325f04db

由 Hugh Dickins 提交于 11月 29, 2005

Use copy_user_highpage directly instead of cow_user_page in do_no_page:
in the immediately following page_cache_release, and elsewhere, it is
assuming that new_page is normal. If any VM_PFNMAP driver can get to
do_no_page, it's just a BUG (but not in the case of do_anonymous_page).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

325f04db

[PATCH] pfnmap: remove src_page from do_wp_page · e5bbe4df

由 Hugh Dickins 提交于 11月 29, 2005

Clean away do_wp_page's "src_page": cow_user_page makes it unnecessary.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e5bbe4df

cow_user_page: fix page alignment · 5d2a2dbb

由 Linus Torvalds 提交于 11月 29, 2005

High Dickins points out that the user virtual address passed to the page
fault handler isn't necessarily page-aligned.

Also, add a comment on why the copy could fail for the user address case.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5d2a2dbb

L
VM: add common helper function to create the page tables · c9cfcddf
由 Linus Torvalds 提交于 11月 29, 2005
```
This logic was duplicated four times, for no good reason.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
```
c9cfcddf

Support strange discontiguous PFN remappings · 238f58d8

由 Linus Torvalds 提交于 11月 29, 2005

These get created by some drivers that don't generally even want a pfn
remapping at all, but would really mostly prefer to just map pages
they've allocated individually instead.

For now, create a helper function that turns such an incomplete PFN
remapping call into a loop that does that explicit mapping.  In the long
run we almost certainly want to export a totally different interface for
that, though.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

238f58d8

[PATCH] Fix missing pfn variables caused by vm changes · eca35133

由 Ben Collins 提交于 11月 29, 2005

I image this showed up because of "unused var..." when the changes
occured, because flush_cache_page() is a noop in most places.  This
showed up for me on parisc however, where flush_cache_page() is a real
function.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

eca35133

29 11月, 2005 3 次提交

[PATCH] Fix vma argument in get_usr_pages() for gate areas · fa2a455b

由 Nick Piggin 提交于 11月 29, 2005

The system call gate area handling called vm_normal_page() with the
wrong vma (which was always NULL, and caused an oops).
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fa2a455b

[PATCH] Workaround for gcc 2.96 (undefined references) · e0f39591

由 Alan Stern 提交于 11月 28, 2005

  LD      .tmp_vmlinux1
mm/built-in.o(.text+0x100d6): In function `copy_page_range':
: undefined reference to `__pud_alloc'
mm/built-in.o(.text+0x1010b): In function `copy_page_range':
: undefined reference to `__pmd_alloc'
mm/built-in.o(.text+0x11ef4): In function `__handle_mm_fault':
: undefined reference to `__pud_alloc'
fs/built-in.o(.text+0xc930): In function `install_arg_page':
: undefined reference to `__pud_alloc'
make: *** [.tmp_vmlinux1] Error 1

Those missing references in mm/memory.c arise from this code in
include/linux/mm.h, combined with the fact that __PGTABLE_PMD_FOLDED and
__PGTABLE_PUD_FOLDED are both set and __ARCH_HAS_4LEVEL_HACK is not:

/*
 * The following ifdef needed to get the 4level-fixup.h header to work.
 * Remove it when 4level-fixup.h has been removed.
 */
#if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
{
        return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
                NULL: pud_offset(pgd, address);
}

static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
{
        return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
                NULL: pmd_offset(pud, address);
}
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */

With my configuration the pgd_none and pud_none routines are inlines
returning a constant 0.  Apparently the old compiler avoids generating
calls to __pud_alloc and __pmd_alloc but still lists them as undefined
references in the module's symbol table.

I don't know which change caused this problem.  I think it was added
somewhere between 2.6.14 and 2.6.15-rc1, because I remember building
several 2.6.14-rc kernels without difficulty.  However I can't point to an
individual culprit.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e0f39591

mm: re-architect the VM_UNPAGED logic · 6aab341e

由 Linus Torvalds 提交于 11月 28, 2005

This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.

Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way.  It just works.  As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.

Sparc update from David in the next commit.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6aab341e

23 11月, 2005 5 次提交

[PATCH] unpaged: ZERO_PAGE in VM_UNPAGED · f57e88a8

由 Hugh Dickins 提交于 11月 21, 2005

It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas,
let's not insert the ZERO_PAGE there - though whether it would matter will
depend on what we decide about ZERO_PAGE refcounting.

But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED
area, do_no_page should never be: just BUG_ON.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f57e88a8

[PATCH] unpaged: anon in VM_UNPAGED · ee498ed7

由 Hugh Dickins 提交于 11月 21, 2005

copy_one_pte needs to copy the anonymous COWed pages in a VM_UNPAGED area,
zap_pte_range needs to free them, do_wp_page needs to COW them: just like
ordinary pages, not like the unpaged.

But recognizing them is a little subtle: because PageReserved is no longer a
condition for remap_pfn_range, we can now mmap all of /dev/mem (whether the
distro permits, and whether it's advisable on this or that architecture, is
another matter).  So if we can see a PageAnon, it may not be ours to mess with
(or may be ours from elsewhere in the address space).  I suspect there's an
entertaining insoluble self-referential problem here, but the page_is_anon
function does a good practical job, and MAP_PRIVATE PROT_WRITE VM_UNPAGED will
always be an odd choice.

In updating the comment on page_address_in_vma, noticed a potential NULL
dereference, in a path we don't actually take, but fixed it.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ee498ed7

[PATCH] unpaged: COW on VM_UNPAGED · 920fc356

由 Hugh Dickins 提交于 11月 21, 2005

Remove the BUG_ON(vma->vm_flags & VM_UNPAGED) from do_wp_page, and let it do
Copy-On-Write without touching the VM_UNPAGED's page counts - but this is
incomplete, because the anonymous page it inserts will itself need to be
handled, here and in other functions - next patch.

We still don't copy the page if the pfn is invalid, because the
copy_user_highpage interface does not allow it.  But that's not been a problem
in the past: can be added in later if the need arises.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

920fc356

[PATCH] unpaged: VM_UNPAGED · 0b14c179

由 Hugh Dickins 提交于 11月 21, 2005

Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
drivers set VM_RESERVED on areas which are then populated by nopage. The
PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
zap_pte_range, without changing those drivers not to set it: so their pages
just leak away.

Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
to flag the special areas where the ptes may have no struct page, or if they
have then it's not to be touched. Replace most instances of VM_RESERVED in
core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and
sparc64 io_remap_pfn_range.

Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it
needed anywhere? It still governs the mm->reserved_vm statistic, and special
vmas not to be merged, and areas not to be core dumped; but could probably be
eliminated later (the drivers are probably specifying it because in 2.4 it
kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
don't get on).

Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
purpose whatsoever, and should be removed from drivers when we clean up.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NWilliam Irwin <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0b14c179

[PATCH] unpaged: get_user_pages VM_RESERVED · ed5297a9

由 Hugh Dickins 提交于 11月 21, 2005

The PageReserved removal in 2.6.15-rc1 prohibited get_user_pages on the areas
flagged VM_RESERVED in place of PageReserved. That is correct in theory - we
ought not to interfere with struct pages in such a reserved area; but in
practice it broke BTTV for one.

So revert to prohibiting only on VM_IO: if someone gets into trouble with
get_user_pages on VM_RESERVED, it'll just be a "don't do that".

You can argue that videobuf_mmap_mapper shouldn't set VM_RESERVED in the first
place, but now's not the time for breaking drivers without notice.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ed5297a9

14 11月, 2005 1 次提交

[PATCH] mm: ZAP_BLOCK causes redundant work · 51c6f666

由 Robin Holt 提交于 11月 13, 2005

The address based work estimate for unmapping (for lockbreak) is and always
was horribly inefficient for sparse mappings.  The problem is most simply
explained with an example:

If we find a pgd is clear, we still have to call into unmap_page_range
PGDIR_SIZE / ZAP_BLOCK_SIZE times, each time checking the clear pgd, in
order to progress the working address to the next pgd.

The fundamental way to solve the problem is to keep track of the end
address we've processed and pass it back to the higher layers.

From: Nick Piggin <npiggin@suse.de>

  Modification to completely get away from address based work estimate
  and instead use an abstract count, with a very small cost for empty
  entries as opposed to present pages.

  On 2.6.14-git2, ppc64, and CONFIG_PREEMPT=y, mapping and unmapping 1TB
  of virtual address space takes 1.69s; with the following patch applied,
  this operation can be done 1000 times in less than 0.01s

From: Andrew Morton <akpm@osdl.org>

With CONFIG_HUTETLB_PAGE=n:

mm/memory.c: In function `unmap_vmas':
mm/memory.c:779: warning: division by zero

Due to

			zap_work -= (end - start) /
					(HPAGE_SIZE / PAGE_SIZE);

So make the dummy HPAGE_SIZE non-zero
Signed-off-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

51c6f666

30 10月, 2005 1 次提交

[PATCH] .text page fault SMP scalability optimization · 1a44e149

由 Andrea Arcangeli 提交于 10月 29, 2005

We had a problem on ppc64 where with more than 4 threads a large system
wouldn't scale well while faulting in the .text (most of the time was spent
in the kernel despite it was an userland compute intensive app).  The
reason is the useless overwrite of the same pte from all cpu.

I fixed it this way (verified on an older kernel but the forward port is
almost identical).  This will benefit all archs not just ppc64.
Signed-off-by: NAndrea Arcangeli <andrea@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1a44e149