提交 · 8ed0a5216a0238f53b482ec88ce4aeed4b9f0da1 · gsplhtlxg / clone-Linux

28 6月, 2008 10 次提交

Enable setting of 'offset' and 'size' of a hot-added spare. · 8ed0a521

由 Neil Brown 提交于 6月 28, 2008

offset_store and rdev_size_store allow control of the region of a
device which is to be using in an md/raid array.
They only allow these values to be set when an array is being assembled,
as changing them on an active array could be dangerous.
However when adding a spare device to an array, we might need to
set the offset and size before starting recovery.  So allow
these values to be set also if "->raid_disk < 0" which indicates that
the device is still a spare.
Signed-off-by: NNeil Brown <neilb@suse.de>

8ed0a521

Don't try to make md arrays dirty if that is not meaningful. · 1a0fd497

由 Neil Brown 提交于 6月 28, 2008

Arrays personalities such as 'raid0' and 'linear' have no redundancy,
and so marking them as 'clean' or 'dirty' is not meaningful.
So always allow write requests without requiring a superblock update.

Such arrays types are detected by ->sync_request being NULL. If it is
not possible to send a sync request we don't need a 'dirty' flag because
all a dirty flag does is trigger some sync_requests.
Signed-off-by: NNeil Brown <neilb@suse.de>

1a0fd497

Close race in md_probe · f48ed538

由 Neil Brown 提交于 6月 28, 2008

There is a possible race in md_probe.  If two threads call md_probe
for the same device, then one could exit (having checked that
->gendisk exists) before the other has called kobject_init_and_add,
thus returning an incomplete kobj which will cause problems when
we try to add children to it.

So extend the range of protection of disks_mutex slightly to
avoid this possibility.
Signed-off-by: NNeil Brown <neilb@suse.de>

f48ed538

Allow setting start point for requested check/repair · 5e96ee65

由 Neil Brown 提交于 6月 28, 2008

This makes it possible to just resync a small part of an array.
e.g. if a drive reports that it has questionable sectors,
a 'repair' of just the region covering those sectors will
cause them to be read and, if there is an error, re-written
with correct data.
Signed-off-by: NNeil Brown <neilb@suse.de>

5e96ee65

Improve setting of "events_cleared" for write-intent bitmaps. · a0da84f3

由 Neil Brown 提交于 6月 28, 2008

When an array is degraded, bits in the write-intent bitmap are not
cleared, so that if the missing device is re-added, it can be synced
by only updated those parts of the device that have changed since
it was removed.

The enable this a 'events_cleared' value is stored. It is the event
counter for the array the last time that any bits were cleared.

Sometimes - if a device disappears from an array while it is 'clean' -
the events_cleared value gets updated incorrectly (there are subtle
ordering issues between updateing events in the main metadata and the
bitmap metadata) resulting in the missing device appearing to require
a full resync when it is re-added.

With this patch, we update events_cleared precisely when we are about
to clear a bit in the bitmap.  We record events_cleared when we clear
the bit internally, and copy that to the superblock which is written
out before the bit on storage.  This makes it more "obviously correct".

We also need to update events_cleared when the event_count is going
backwards (as happens on a dirty->clean transition of a non-degraded
array).

Thanks to Mike Snitzer for identifying this problem and testing early
"fixes".

Cc:  "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

a0da84f3

use bio_endio instead of a call to bi_end_io · 0e13fe23

由 Neil Brown 提交于 6月 28, 2008

Turn calls to bi->bi_end_io() into bio_endio(). Apparently bio_endio does
exactly the same error processing as is hardcoded at these places.

bio_endio() avoids recursion (or will soon), so it should be used.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

0e13fe23

linear: correct disk numbering error check · 13864515

由 Nikanth Karthikesan 提交于 6月 28, 2008

From: "Nikanth Karthikesan" <knikanth@novell.com>

Correct disk numbering problem check.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NNeil Brown <neilb@suse.de>

13864515

Fix error paths if md_probe fails. · 9bbbca3a

由 Neil Brown 提交于 6月 28, 2008

md_probe can fail (e.g. alloc_disk could fail) without
returning an error (as it alway returns NULL).
So when we call mddev_find immediately afterwards, we need
to check that md_probe actually succeeded.  This means checking
that mdev->gendisk is non-NULL.

cc: <stable@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

9bbbca3a

Don't acknowlege that stripe-expand is complete until it really is. · efe31143

由 Neil Brown 提交于 6月 28, 2008

We shouldn't acknowledge that a stripe has been expanded (When
reshaping a raid5 by adding a device) until the moved data has
actually been written out.  However we are currently
acknowledging (by calling md_done_sync) when the POST_XOR
is complete and before the write.

So track in s.locked whether there are pending writes, and don't
call md_done_sync yet if there are.

Note: we all set R5_LOCKED on devices which are are about to
read from.  This probably isn't technically necessary, but is
usually done when writing a block, and justifies the use of
s.locked here.

This bug can lead to a crash if an array is stopped while an reshape
is in progress.

Cc: <stable@kernel.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

efe31143

Ensure interrupted recovery completed properly (v1 metadata plus bitmap) · 8c2e870a

由 Neil Brown 提交于 6月 28, 2008

If, while assembling an array, we find a device which is not fully
in-sync with the array, it is important to set the "fullsync" flags.
This is an exact analog to the setting of this flag in hot_add_disk
methods.

Currently, only v1.x metadata supports having devices in an array
which are not fully in-sync (it keep track of how in sync they are).
The 'fullsync' flag only makes a difference when a write-intent bitmap
is being used.  In this case it tells recovery to ignore the bitmap
and recovery all blocks.

This fix is already in place for raid1, but not raid5/6 or raid10.

So without this fix, a raid1 ir raid4/5/6 array with version 1.x
metadata and a write intent bitmaps, that is stopped in the middle
of a recovery, will appear to complete the recovery instantly
after it is reassembled, but the recovery will not be correct.

If you might have an array like that, issueing
   echo repair > /sys/block/mdXX/md/sync_action

will make sure recovery completes properly.

Cc: <stable@kernel.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

8c2e870a

25 6月, 2008 17 次提交

L

Linux 2.6.26-rc8 · 543cf4cb
由 Linus Torvalds 提交于 6月 24, 2008

543cf4cb

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 · bd8c540f

由 Linus Torvalds 提交于 6月 24, 2008

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
  [IA64] Handle count==0 in sn2_ptc_proc_write()
  [IA64] Fix boot failure on ia64/sn2

bd8c540f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes · 035cfc61

由 Linus Torvalds 提交于 6月 24, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  [GFS2] fix gfs2 block allocation (cleaned up)
  [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000

035cfc61

Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm · 919c0d14

由 Linus Torvalds 提交于 6月 24, 2008

* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Remove now unused structs from kvm_para.h
  x86: KVM guest: Use the paravirt clocksource structs and functions
  KVM: Make kvm host use the paravirt clocksource structs
  x86: Make xen use the paravirt clocksource structs and functions
  x86: Add structs and functions for paravirt clocksource
  KVM: VMX: Fix host msr corruption with preemption enabled
  KVM: ioapic: fix lost interrupt when changing a device's irq
  KVM: MMU: Fix oops on guest userspace access to guest pagetable
  KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
  KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
  KVM: close timer injection race window in __vcpu_run
  KVM: Fix race between timer migration and vcpu migration

919c0d14

Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog · de08341a

由 Linus Torvalds 提交于 6月 24, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"

de08341a

Merge branch 'x86-fixes-for-linus' of... · 9bf8a943

由 Linus Torvalds 提交于 6月 24, 2008

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xen: remove support for non-PAE 32-bit

9bf8a943

L
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb · 3b968b7c
由 Linus Torvalds 提交于 6月 24, 2008
```
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb: sparse fix
  kgdb: documentation update - remove kgdboe
```
3b968b7c

enable bus mastering on i915 at resume time · ea7b44c8

由 Jie Luo 提交于 6月 24, 2008

On 9xx chips, bus mastering needs to be enabled at resume time for much of the
chip to function. With this patch, vblank interrupts will work as expected
on resume, along with other chip functions. Fixes kernel bugzilla #10844.
Signed-off-by: NJie Luo <clotho67@gmail.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea7b44c8

KVM: Remove now unused structs from kvm_para.h · 6b1ed908

由 Gerd Hoffmann 提交于 6月 03, 2008

The kvm_* structs are obsoleted by the pvclock_* ones.
Now all users have been switched over and the old structs
can be dropped.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

6b1ed908

x86: KVM guest: Use the paravirt clocksource structs and functions · f6e16d5a

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.

The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup.  kvmclock must take that in account when registering the
physical address within the host.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

f6e16d5a

KVM: Make kvm host use the paravirt clocksource structs · 50d0a0f9

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch updates the kvm host code to use the pvclock structs.
It also makes the paravirt clock compatible with Xen.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

50d0a0f9

x86: Make xen use the paravirt clocksource structs and functions · 1c7b67f7

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch updates the xen guest to use the pvclock structs
and helper functions.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1c7b67f7

x86: Add structs and functions for paravirt clocksource · 7af192c9

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).

It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code.  They are enabled using
CONFIG_PARAVIRT_CLOCK.

Subsequent patches of this series will put the code in use.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

7af192c9

[GFS2] fix gfs2 block allocation (cleaned up) · 5af4e7a0

由 Benjamin Marzinski 提交于 6月 24, 2008

This patch fixes bz 450641.

This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

5af4e7a0

[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte() · e2569b7e

由 Julia Lawall 提交于 6月 24, 2008

As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory.  Thus a NULL test or
memset after calls to these functions is unnecessary.
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NTony Luck <tony.luck@intel.com>

e2569b7e

[IA64] Handle count==0 in sn2_ptc_proc_write() · 8097110d

由 Cliff Wickman 提交于 6月 24, 2008

The fix applied in e0c6d97c
"security hole in sn2_ptc_proc_write" didn't take into account
the case where count==0 (which results in a buffer underrun
when adding the trailing '\0').  Thanks to Andi Kleen for
pointing this out.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

8097110d

[IA64] Fix boot failure on ia64/sn2 · 2826f8c0

由 Jes Sorensen 提交于 6月 24, 2008

Call check_sal_cache_flush() after platform_setup() as
check_sal_cache_flush() now relies on being able to call platform
vector code.

Problem was introduced by: 3463a93d
"Update check_sal_cache_flush to use platform_send_ipi()"
Signed-off-by: NJes Sorensen <jes@sgi.com>
Tested-by: NAlex Chiang: <achiang@hp.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

2826f8c0

24 6月, 2008 13 次提交

kgdb: sparse fix · aabdc3b8

由 Jason Wessel 提交于 6月 24, 2008

- Fix warning reported by sparse
kernel/kgdb.c:1502:6: warning: symbol 'kgdb_console_write' was not declared.
	Should it be static?
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

aabdc3b8

kgdb: documentation update - remove kgdboe · a606b5e2

由 Jason Wessel 提交于 6月 24, 2008

kgdboe is not presently included kgdb, and there should be no
references to it.

Also fix the tcp port terminal connection example.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

a606b5e2

xen: remove support for non-PAE 32-bit · 28499143

由 Jeremy Fitzhardinge 提交于 5月 09, 2008

Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used.  xen-unstable has now officially dropped
non-PAE support.  Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

28499143

[GFS2] BUG: unable to handle kernel paging request at ffff81002690e000 · 17c15da0

由 Bob Peterson 提交于 6月 18, 2008

This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to
handle kernel paging request at ffff81002690e000.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

17c15da0

Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working" · 63842ccc

由 Wim Van Sebroeck 提交于 6月 24, 2008

After Linus fixed the inline assembly, the CFLAGS option is not
needed anymore.
Signed-off-by: NThomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

63842ccc

KVM: VMX: Fix host msr corruption with preemption enabled · a9b21b62

由 Avi Kivity 提交于 6月 24, 2008

Switching msrs can occur either synchronously as a result of calls to
the msr management functions (usually in response to the guest touching
virtualized msrs), or asynchronously when preempting a kvm thread that has
guest state loaded. If we're unlucky enough to have the two at the same
time, host msrs are corrupted and the machine goes kaput on the next syscall.

Most easily triggered by Windows Server 2008, as it does a lot of msr
switching during bootup.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

a9b21b62

KVM: ioapic: fix lost interrupt when changing a device's irq · 4fa6b9c5

由 Avi Kivity 提交于 6月 17, 2008

The ioapic acknowledge path translates interrupt vectors to irqs.  It
currently uses a first match algorithm, stopping when it finds the first
redirection table entry containing the vector.  That fails however if the
guest changes the irq to a different line, leaving the old redirection table
entry in place (though masked).  Result is interrupts not making it to the
guest.

Fix by always scanning the entire redirection table.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

4fa6b9c5

KVM: MMU: Fix oops on guest userspace access to guest pagetable · 6bf6a953

由 Avi Kivity 提交于 6月 12, 2008

KVM has a heuristic to unshadow guest pagetables when userspace accesses
them, on the assumption that most guests do not allow userspace to access
pagetables directly. Unfortunately, in addition to unshadowing the pagetables,
it also oopses.

This never triggers on ordinary guests since sane OSes will clear the
pagetables before assigning them to userspace, which will trigger the flood
heuristic, unshadowing the pagetables before the first userspace access. One
particular guest, though (Xenner) will run the kernel in userspace, triggering
the oops.  Since the heuristic is incorrect in this case, we can simply
remove it.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

6bf6a953

KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) · 30945387

由 Marcelo Tosatti 提交于 6月 11, 2008

kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed
guests properly. It will instantiate two 2MB sptes pointing to the same
physical 2MB page when a guest large pte update is trapped.

Instead of duplicating code to handle this, disallow directory level
updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes
emulating one guest 4MB pte can be correctly created by the page fault
handling path.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

30945387

KVM: MMU: Fix rmap_write_protect() hugepage iteration bug · 6597ca09

由 Marcelo Tosatti 提交于 6月 08, 2008

rmap_next() does not work correctly after rmap_remove(), as it expects
the rmap chains not to change during iteration.  Fix (for now) by restarting
iteration from the beginning.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

6597ca09

KVM: close timer injection race window in __vcpu_run · 06e05645

由 Marcelo Tosatti 提交于 6月 06, 2008

If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.

It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.

For now introduce a new vcpu requests bit to cancel guest entry.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

06e05645

KVM: Fix race between timer migration and vcpu migration · d4acf7e7

由 Marcelo Tosatti 提交于 6月 06, 2008

A guest vcpu instance can be scheduled to a different physical CPU
between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().

If that happens, the timer will only be migrated to the current pCPU on
the next exit, meaning that guest LAPIC timer event can be delayed until
a host interrupt is triggered.

Fix it by cancelling guest entry if any vcpu request is pending.  This
has the side effect of nicely consolidating vcpu->requests checks.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

d4acf7e7

alpha: fix compile error in arch/alpha/mm/init.c · 72c6e251

由 Thorsten Kranzkowski 提交于 6月 23, 2008

Commit 9267b4b3 ("alpha: fix module load
failures on smp (bug #10926)") causes a regression for my ev4
uniprocessor build:

  CC      arch/alpha/mm/init.o
/export/data/repositories/linux-2.6/arch/alpha/mm/init.c:34: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘typeof’
make[2]: *** [arch/alpha/mm/init.o] Error 1
make[1]: *** [arch/alpha/mm] Error 2
make: *** [sub-make] Error 2

This fixes it for me (compile and boot tested):
Signed-off-by: NThorsten Kranzkowski <dl8bcu@dl8bcu.de>
Acked-by: NIvan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72c6e251