提交 · f549d6c18c0e8e6cf1bf0e7a47acc1daf7e2cec1 · openanolis / cloud-kernel

You need to sign in or sign up before continuing.

05 9月, 2005 37 次提交

[PATCH] Generic VFS fallback for security xattrs · f549d6c1

由 Stephen Smalley 提交于 9月 03, 2005

This patch modifies the VFS setxattr, getxattr, and listxattr code to fall
back to the security module for security xattrs if the filesystem does not
support xattrs natively.  This allows security modules to export the incore
inode security label information to userspace even if the filesystem does
not provide xattr storage, and eliminates the need to individually patch
various pseudo filesystem types to provide such access.  The patch removes
the existing xattr code from devpts and tmpfs as it is then no longer
needed.

The patch restructures the code flow slightly to reduce duplication between
the normal path and the fallback path, but this should only have one
user-visible side effect - a program may get -EACCES rather than
-EOPNOTSUPP if policy denied access but the filesystem didn't support the
operation anyway.  Note that the post_setxattr hook call is not needed in
the fallback case, as the inode_setsecurity hook call handles the incore
inode security state update directly.  In contrast, we do call fsnotify in
both cases.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Acked-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f549d6c1

[PATCH] VM: add page_state info to per-node meminfo · c07e02db

由 Martin Hicks 提交于 9月 03, 2005

Add page_state info to the per-node meminfo file in sysfs.  This is mostly
just for informational purposes.

The lack of this information was brought up recently during a discussion
regarding pagecache clearing, and I put this patch together to test out one
of the suggestions.

It seems like interesting info to have, so I'm submitting the patch.
Signed-off-by: NMartin Hicks <mort@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c07e02db

[PATCH] slab: removes local_irq_save()/local_irq_restore() pair · 00e145b6

由 Manfred Spraul 提交于 9月 03, 2005

Proposed by and based on a patch from Eric Dumazet <dada1@cosmosbay.com>:
This patch removes unnecessary critical section in ksize() function, as
cli/sti are rather expensive on modern CPUS.

It additionally adds a docbook entry for ksize() and further simplifies the
code.
Signed-Off-By: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

00e145b6

[PATCH] mm/slab.c: prefetchw the start of new allocated objects · 34342e86

由 Eric Dumazet 提交于 9月 03, 2005

Mostobjects returned by __cache_alloc() will be written by the caller,
(but not all callers want to write all the object, but just at the
begining) prefetchw() tells the modern CPU to think about the future
writes, ie start some memory transactions in advance.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

34342e86

[PATCH] x86: ptep_clear optimization · a600388d

由 Zachary Amsden 提交于 9月 03, 2005

Add a new accessor for PTEs, which passes the full hint from the mmu_gather
struct; this allows architectures with hardware pagetables to optimize away
atomic PTE operations when destroying an address space. Removing the
locked operation should allow better pipelining of memory access in this
loop. I measured an average savings of 30-35 cycles per zap_pte_range on
the first 500 destructions on Pentium-M, but I believe the optimization
would win more on older processors which still assert the bus lock on xchg
for an exclusive cacheline.

Update: I made some new measurements, and this saves exactly 26 cycles over
ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180
cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
full address space destruction.

pte_clear_full is not yet used, but is provided for future optimizations
(in particular, when running inside of a hypervisor that queues page table
updates, the full hint allows us to avoid queueing unnecessary page table
update for an address space in the process of being destroyed.

This is not a huge win, but it does help a bit, and sets the stage for
further hypervisor optimization of the mm layer on all architectures.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <christoph@lameter.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a600388d

[PATCH] sab: consolidate kmem_bufctl_t · fa5b08d5

由 Kyle Moffett 提交于 9月 03, 2005

This is used only in slab.c and each architecture gets to define whcih
underlying type is to be used.

Seems a bit silly - move it to slab.c and use the same type for all
architectures: unsigned int.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fa5b08d5

[PATCH] hugetlb: move stale pte check into huge_pte_alloc() · 7bf07f3d

由 Adam Litke 提交于 9月 03, 2005

Initial Post (Wed, 17 Aug 2005)

This patch moves the
	if (! pte_none(*pte))
		hugetlb_clean_stale_pgtable(pte);
logic into huge_pte_alloc() so all of its callers can be immune to the bug
described by Kenneth Chen at http://lkml.org/lkml/2004/6/16/246

> It turns out there is a bug in hugetlb_prefault(): with 3 level page table,
> huge_pte_alloc() might return a pmd that points to a PTE page. It happens
> if the virtual address for hugetlb mmap is recycled from previously used
> normal page mmap. free_pgtables() might not scrub the pmd entry on
> munmap and hugetlb_prefault skips on any pmd presence regardless what type
> it is.

Unless I am missing something, it seems more correct to place the check inside
huge_pte_alloc() to prevent a the same bug wherever a huge pte is allocated.
It also allows checking for this condition when lazily faulting huge pages
later in the series.
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7bf07f3d

[PATCH] arm: allow for arch-specific IOREMAP_MAX_ORDER · fd195c49

由 Deepak Saxena 提交于 9月 03, 2005

Version 6 of the ARM architecture introduces the concept of 16MB pages
(supersections) and 36-bit (40-bit actually, but nobody uses this) physical
addresses.  36-bit addressed memory and I/O and ARMv6 can only be mapped
using supersections and the requirement on these is that both virtual and
physical addresses be 16MB aligned.  In trying to add support for ioremap()
of 36-bit I/O, we run into the issue that get_vm_area() allows for a
maximum of 512K alignment via the IOREMAP_MAX_ORDER constant.  To work
around this, we can:

- Allocate a larger VM area than needed (size + (1ul << IOREMAP_MAX_ORDER))
  and then align the pointer ourselves, but this ends up with 512K of
  wasted VM per ioremap().

- Provide a new __get_vm_area_aligned() API and make __get_vm_area() sit
  on top of this. I did this and it works but I don't like the idea
  adding another VM API just for this one case.

- My preferred solution which is to allow the architecture to override
  the IOREMAP_MAX_ORDER constant with it's own version.
Signed-off-by: NDeepak Saxena <dsaxena@plexity.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fd195c49

[PATCH] mm: remove implied vm_ops check · 4944e76d

由 Paolo 'Blaisorblade' Giarrusso 提交于 9月 03, 2005

If !vma->vm-ops we already BUG above, so retesting it is useless.  The
compiler cannot optimize this because BUG is a macro and is not thus marked
noreturn; that should possibly be fixed.
Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4944e76d

[PATCH] shmem_populate: avoid an useless check, and some comments · d44ed4f8

由 Paolo 'Blaisorblade' Giarrusso 提交于 9月 03, 2005

Either shmem_getpage returns a failure, or it found a page, or it was told
it couldn't do any I/O.  So it's useless to check nonblock in the else
branch.  We could add a BUG() there but I preferred to comment the
offending function.

This was taken out from one Ingo Molnar's old patch I'm resurrecting.
Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d44ed4f8

[PATCH] vm: slab.c spelling correction · 0abf40c1

由 Martin Hicks 提交于 9月 03, 2005

Fix a small spelling mistake.  subtile->subtle
Signed-off-by: NMartin Hicks <mort@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0abf40c1

[PATCH] mm: fix madvise vma merging · 836d5ffd

由 Hugh Dickins 提交于 9月 03, 2005

Better late than never, I've at last reviewed the madvise vma merging
going into 2.6.13.  Remove a pointless check and fix two little bugs -
a simple test (with /proc/<pid>/maps hacked to show ReadHints) showed
both mismerges in practice: though being madvise, neither was disastrous.

1. Correct placement of the success label in madvise_behavior: as in
   mprotect_fixup and mlock_fixup, it is necessary to update vm_flags
   when vma_merge succeeds (to handle the exceptional Case 8 noted in
   the comments above vma_merge itself).

2. Correct initial value of prev when starting part way into a vma: as
   in sys_mprotect and do_mlock, it needs to be set to vma in this case
   (vma_merge handles only that minimum of cases shown in its comments).

3. If find_vma_prev sets prev, then the vma it returns is prev->vm_next,
   so it's pointless to make that same assignment again in sys_madvise.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

836d5ffd

[PATCH] VM: zone reclaim atomic ops cleanup · 53e9a615

由 Martin Hicks 提交于 9月 03, 2005

Christoph Lameter and Marcelo Tosatti asked to get rid of the
atomic_inc_and_test() to cleanup the atomic ops in the zone reclaim code.
Signed-off-by: NMartin Hicks <mort@sgi.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

53e9a615

[PATCH] VM: add capabilites check to set_zone_reclaim · bce5f6ba

由 Martin Hicks 提交于 9月 03, 2005

Add a capability check to sys_set_zone_reclaim().  This syscall is not
something that should be available to a user.
Signed-off-by: NMartin Hicks <mort@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bce5f6ba

[PATCH] mm: remove atomic · 242e5468

由 Nick Piggin 提交于 9月 03, 2005

This bitop does not need to be atomic because it is performed when there will
be no references to the page (ie.  the page is being freed).
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

242e5468

[PATCH] mm: remap ZERO_PAGE mappings · 9a61c349

由 Nick Piggin 提交于 9月 03, 2005

filemap_xip's nopage routine maps the ZERO_PAGE into readonly mappings, if it
has no data page to map there: then if the hole in the file is later filled,
__xip_unmap uses an rmap technique to replace the ZERO_PAGEs mapped for that
offset by the newly allocated file page, so that established mappings will see
the newly written data.

However, on MIPS (alone) there's not one but as many as eight ZERO_PAGEs,
chosen for coloring by user virtual address; and if mremap has meanwhile been
used to move a mapping containing a ZERO_PAGE, it will generally not match the
ZERO_PAGE(address) __xip_unmap is looking for.

To maintain XIP's established mappings correctly on MIPS, we need Nick's fix
to mremap's move_one_page (originally presented as an optimization), to
replace the ZERO_PAGE appropriate to the old address by the ZERO_PAGE
appropriate to the new address.

(But when I first saw this, I was thinking the ZERO_PAGEs themselves would get
corrupted, very bad.  Now I think it's the other way round, that the
established mappings will fail to see the newly written data: incorrect, but
not corrupting everything else.  Whether filemap_xip's technique is generally
safe, I'd hesitate to say in a hurry: it's interesting, but we've never tried
to do that in tmpfs.)
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9a61c349

[PATCH] mm: cleanup rmap · 4d7670e0

由 Nick Piggin 提交于 9月 03, 2005

Thanks to Bill Irwin for pointing this out.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4d7670e0

[PATCH] mm: micro-optimise rmap · 2822c1aa

由 Nick Piggin 提交于 9月 03, 2005

Microoptimise page_add_anon_rmap.  Although these expressions are used only in
the taken branch of the if() statement, the compiler can't reorder them inside
because atomic_inc_and_test is a barrier.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2822c1aa

[PATCH] mm: comment rmap · c3dce2d8

由 Nick Piggin 提交于 9月 03, 2005

Just be clear that VM_RESERVED pages here are a bug, and the test is not there
because they are expected.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c3dce2d8

[PATCH] /proc/<pid>/numa_maps to show on which nodes pages reside · 6e21c8f1

由 Christoph Lameter 提交于 9月 03, 2005

This patch was recently discussed on linux-mm:
http://marc.theaimsgroup.com/?t=112085728500002&r=1&w=2

I inherited a large code base from Ray for page migration. There was a
small patch in there that I find to be very useful since it allows the
display of the locality of the pages in use by a process. I reworked that
patch and came up with a /proc/<pid>/numa_maps that gives more information
about the vma's of a process. numa_maps is indexes by the start address
found in /proc/<pid>/maps. F.e. with this patch you can see the page use
of the "getty" process:

margin:/proc/12008 # cat maps
00000000-00004000 r--p 00000000 00:00 0
2000000000000000-200000000002c000 r-xp 00000000 08:04 516 /lib/ld-2.3.3.so
2000000000038000-2000000000040000 rw-p 00028000 08:04 516 /lib/ld-2.3.3.so
2000000000040000-2000000000044000 rw-p 2000000000040000 00:00 0
2000000000058000-2000000000260000 r-xp 00000000 08:04 54707842 /lib/tls/libc.so.6.1
2000000000260000-2000000000268000 ---p 00208000 08:04 54707842 /lib/tls/libc.so.6.1
2000000000268000-2000000000274000 rw-p 00200000 08:04 54707842 /lib/tls/libc.so.6.1
2000000000274000-2000000000280000 rw-p 2000000000274000 00:00 0
2000000000280000-20000000002b4000 r--p 00000000 08:04 9126923 /usr/lib/locale/en_US.utf8/LC_CTYPE
2000000000300000-2000000000308000 r--s 00000000 08:04 60071467 /usr/lib/gconv/gconv-modules.cache
2000000000318000-2000000000328000 rw-p 2000000000318000 00:00 0
4000000000000000-4000000000008000 r-xp 00000000 08:04 29576399 /sbin/mingetty
6000000000004000-6000000000008000 rw-p 00004000 08:04 29576399 /sbin/mingetty
6000000000008000-600000000002c000 rw-p 6000000000008000 00:00 0 [heap]
60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
60000ffffff44000-60000ffffff98000 rw-p 60000ffffff44000 00:00 0 [stack]
a000000000000000-a000000000020000 ---p 00000000 00:00 0 [vdso]

cat numa_maps
2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1

getty uses ld.so. The first vma is the code segment which is used by 43
other processes and the pages are evenly distributed over the 4 nodes.

The second vma is the process specific data portion for ld.so. This is
only one page.

The display format is:

<startaddress> Links to information in /proc/<pid>/map
<memory policy> This can be "default" "interleave={}", "prefer=<node>" or "bind={<zones>}"
MaxRef= <maximum reference to a page in this vma>
Pages= <Nr of pages in use>
Mapped= <Nr of pages with mapcount >
Anon= <nr of anonymous pages>
Nx= <Nr of pages on Node x>

The content of the proc-file is self-evident. If this would be tied into
the sparsemem system then the contents of this file would not be too
useful.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6e21c8f1

[PATCH] rmap: don't test rss · 839b9685

由 Hugh Dickins 提交于 9月 03, 2005

Remove the three get_mm_counter(mm, rss) tests from rmap.c: there was a
time when testing rss was important to avoid a particular race between
dup_mmap and the anonmm rmap; but now it's just a rather silly pseudo-
optimization, made even more obscure by the get_mm_counter macro.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

839b9685

[PATCH] delete from_swap_cache BUG_ONs · 3279ffd9

由 Hugh Dickins 提交于 9月 03, 2005

Three of the four BUG_ONs in delete_from_swap_cache are immediately
repeated in __delete_from_swap_cache: delete those and add the one.  But
perhaps mm/ is altogether overprovisioned with historic BUGs?
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3279ffd9

[PATCH] swap: swap_lock replace list+device · 5d337b91

由 Hugh Dickins 提交于 9月 03, 2005

The idea of a swap_device_lock per device, and a swap_list_lock over them all,
is appealing; but in practice almost every holder of swap_device_lock must
already hold swap_list_lock, which defeats the purpose of the split.

The only exceptions have been swap_duplicate, valid_swaphandles and an
untrodden path in try_to_unuse (plus a few places added in this series).
valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
demand attention.  However, with the hold time in get_swap_pages so much
reduced, I've not yet found a load and set of swap device priorities to show
even swap_duplicate benefitting from the split.  Certainly the split is mere
overhead in the common case of a single swap device.

So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
(generally we seem to prefer an _ in the name, and not hide in a macro).

If someone can show a regression in swap_duplicate, then probably we should
add a hashlock for the swap_map entries alone (shorts being anatomic), so as
to help the case of the single swap device too.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5d337b91

[PATCH] swap: scan_swap_map latency breaks · 048c27fd

由 Hugh Dickins 提交于 9月 03, 2005

The get_swap_page/scan_swap_map latency can be so bad that even those without
preemption configured deserve relief: periodically cond_resched.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

048c27fd

[PATCH] swap: scan_swap_map drop swap_device_lock · 52b7efdb

由 Hugh Dickins 提交于 9月 03, 2005

get_swap_page has often shown up on latency traces, doing lengthy scans while
holding two spinlocks. swap_list_lock is already dropped, now scan_swap_map
drop swap_device_lock before scanning the swap_map.

While scanning for an empty cluster, don't worry that racing tasks may
allocate what was free and free what was allocated; but when allocating an
entry, check it's still free after retaking the lock. Avoid dropping the lock
in the expected common path. No barriers beyond the locks, just let the
cookie crumble; highest_bit limit is volatile, but benign.

Guard against swapoff: must check SWP_WRITEOK before allocating, must raise
SWP_SCANNING reference count while in scan_swap_map, swapoff wait for that to
fall - just use schedule_timeout, we don't want to burden scan_swap_map
itself, and it's very unlikely that anyone can really still be in
scan_swap_map once swapoff gets this far.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

52b7efdb

[PATCH] swap: scan_swap_map restyled · 7dfad418

由 Hugh Dickins 提交于 9月 03, 2005

Rewrite scan_swap_map to allocate in just the same way as before (taking the
next free entry SWAPFILE_CLUSTER-1 times, then restarting at the lowest wholly
empty cluster, falling back to lowest entry if none), but with a view towards
dropping the lock in the next patch.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7dfad418

[PATCH] swap: get_swap_page drop swap_list_lock · fb4f88dc

由 Hugh Dickins 提交于 9月 03, 2005

Rewrite get_swap_page to allocate in just the same sequence as before, but
without holding swap_list_lock across its scan_swap_map. Decrement
nr_swap_pages and update swap_list.next in advance, while still holding
swap_list_lock. Skip full devices by testing highest_bit. Swapoff hold
swap_device_lock as well as swap_list_lock to clear SWP_WRITEOK. Reduces lock
contention when there are parallel swap devices of the same priority.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fb4f88dc

[PATCH] swap: freeing update swap_list.next · 89d09a2c

由 Hugh Dickins 提交于 9月 03, 2005

This makes negligible difference in practice: but swap_list.next should not be
updated to a higher prio in the general helper swap_info_get, but rather in
swap_entry_free; and then only in the case when entry is actually freed.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

89d09a2c

[PATCH] swap: swap unsigned int consistency · 6eb396dc

由 Hugh Dickins 提交于 9月 03, 2005

The swap header's unsigned int last_page determines the range of swap pages,
but swap_info has been using int or unsigned long in some cases: use unsigned
int throughout (except, in several places a local unsigned long is useful to
avoid overflows when adding).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NJens Axboe <axboe@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6eb396dc

[PATCH] swap: show span of swap extents · 53092a74

由 Hugh Dickins 提交于 9月 03, 2005

The "Adding %dk swap" message shows the number of swap extents, as a guide to
how fragmented the swapfile may be.  But a useful further guide is what total
extent they span across (sometimes scarily large).

And there's no need to keep nr_extents in swap_info: it's unused after the
initial message, so save a little space by keeping it on stack.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

53092a74

[PATCH] swap: swap extent list is ordered · 11d31886

由 Hugh Dickins 提交于 9月 03, 2005

There are several comments that swap's extent_list.prev points to the lowest
extent: that's not so, it's extent_list.next which points to it, as you'd
expect. And a couple of loops in add_swap_extent which go all the way through
the list, when they should just add to the other end.

Fix those up, and let map_swap_page search the list forwards: profiles shows
it to be twice as quick that way - because prefetch works better on how the
structs are typically kmalloc'ed? or because usually more is written to than
read from swap, and swap is allocated ascendingly?
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

11d31886

[PATCH] swap: move destroy_swap_extents calls · 4cd3bb10

由 Hugh Dickins 提交于 9月 03, 2005

sys_swapon's call to destroy_swap_extents on failure is made after the final
swap_list_unlock, which is faintly unsafe: another sys_swapon might already be
setting up that swap_info_struct. Calling it earlier, before taking
swap_list_lock, is safe. sys_swapoff's call to destroy_swap_extents was safe,
but likewise move it earlier, before taking the locks (once try_to_unuse has
completed, nothing can be needing the swap extents).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4cd3bb10

[PATCH] swap: correct swapfile nr_good_pages · e2244ec2

由 Hugh Dickins 提交于 9月 03, 2005

If a regular swapfile lies on a filesystem whose blocksize is less than
PAGE_SIZE, then setup_swap_extents may have to cut the number of usable swap
pages; but sys_swapon's nr_good_pages was not expecting that. Also,
setup_swap_extents takes no account of badpages listed in the swap header: not
worth doing so, but ensure nr_badpages is 0 for a regular swapfile.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e2244ec2

[PATCH] swap: update swapfile i_sem comment · b0d9bcd4

由 Hugh Dickins 提交于 9月 03, 2005

Update swap extents comment: nowadays we guard with S_SWAPFILE not i_sem.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b0d9bcd4

[PATCH] sparsemem extreme: hotplug preparation · 28ae55c9

由 Dave Hansen 提交于 9月 03, 2005

This splits up sparse_index_alloc() into two pieces.  This is needed
because we'll allocate the memory for the second level in a different place
from where we actually consume it to keep the allocation from happening
underneath a lock
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

28ae55c9

[PATCH] sparsemem extreme implementation · 3e347261

由 Bob Picco 提交于 9月 03, 2005

With cleanups from Dave Hansen <haveblue@us.ibm.com>

SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
mem_sections.  This two level layout scheme is able to achieve smaller
memory requirements for SPARSEMEM with the tradeoff of an additional shift
and load when fetching the memory section.  The current SPARSEMEM
implementation is a one dimensional array of mem_sections which is the
default SPARSEMEM configuration.  The patch attempts isolates the
implementation details of the physical layout of the sparsemem section
array.

SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
memory_present() calls.  This is not always feasible, so architectures
which do not need it may allocate everything statically by using
SPARSEMEM_STATIC.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3e347261

[PATCH] SPARSEMEM EXTREME · 802f192e

由 Bob Picco 提交于 9月 03, 2005

A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME.  Architecture
platforms with a very sparse physical address space would likely want to
select this option.  For those architecture platforms that don't select the
option, the code generated is equivalent to SPARSEMEM currently in -mm.
I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.

ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
pointers to mem_sections.  This two level layout scheme is able to achieve
smaller memory requirements for SPARSEMEM with the tradeoff of an
additional shift and load when fetching the memory section.  The current
SPARSEMEM -mm implementation is a one dimensional array of mem_sections
which is the default SPARSEMEM configuration.  The patch attempts isolates
the implementation details of the physical layout of the sparsemem section
array.

ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.

I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
 I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
and tested with aim.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

802f192e

30 8月, 2005 1 次提交

[PATCH] Lazy page table copies in fork() · d992895b

由 Nick Piggin 提交于 8月 28, 2005

Defer copying of ptes until fault time when it is possible to reconstruct
the pte from backing store. Idea from Andi Kleen and Nick Piggin.

Thanks to input from Rik van Riel and Linus and to Hugh for correcting
my blundering.

Ray Fucillo <fucillo@intersystems.com> reports:

  "I applied this latest patch to a 2.6.12 kernel and found that it does
   resolve the problem.  Prior to the patch on this machine, I was
   seeing about 23ms spent in fork for ever 100MB of shared memory
   segment.

   After applying the patch, fork is taking about 1ms regardless of the
   shared memory size."
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d992895b

20 8月, 2005 1 次提交

Fix nasty ncpfs symlink handling bug. · cc314eef

由 Linus Torvalds 提交于 8月 19, 2005

This bug could cause oopses and page state corruption, because ncpfs
used the generic page-cache symlink handlign functions. But those
functions only work if the page cache is guaranteed to be "stable", ie a
page that was installed when the symlink walk was started has to still
be installed in the page cache at the end of the walk.

We could have fixed ncpfs to not use the generic helper routines, but it
is in many ways much cleaner to instead improve on the symlink walking
helper routines so that they don't require that absolute stability.

We do this by allowing "follow_link()" to return a error-pointer as a
cookie, which is fed back to the cleanup "put_link()" routine. This
also simplifies NFS symlink handling.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cc314eef

06 8月, 2005 1 次提交

[PATCH] Fix hugepage crash on failing mmap() · c7546f8f

由 David Gibson 提交于 8月 05, 2005

This patch fixes a crash in the hugepage code.  unmap_hugepage_area() was
assuming that (due to prefault) PTEs must exist for all the area in
question.  However, this may not be the case, if mmap() encounters an error
before the prefault and calls unmap_region() to clean up any partial
mapping.

Depending on the hugepage configuration, this crash can be triggered by an
unpriveleged user.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c7546f8f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功