提交 · 8af905b4a403ce74b8d907b50bccc453a58834bc · bug2833 / cloud-kernel

11 12月, 2006 40 次提交

[PATCH] smc91x: Kill off excessive versatile hooks. · 8af905b4

由 Paul Mundt 提交于 12月 11, 2006

This looks like a result of too many auto-merges. The
CONFIG_ARCH_VERSATILE case was handled a total of 6 times.
This kills 5 of them.
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

--

 drivers/net/smc91x.h |   90 ---------------------------------------------------
 1 file changed, 90 deletions(-)
Signed-off-by: NJeff Garzik <jeff@garzik.org>

8af905b4

[PATCH] myri10ge: update driver version to 1.1.0 · 5796df19

由 Brice Goglin 提交于 12月 11, 2006

Update driver version to 1.1.0.
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

5796df19

[PATCH] myri10ge: fix big_bytes in case of vlan frames · 13348bee

由 Brice Goglin 提交于 12月 11, 2006

Fix sizing of big_bytes in the case of vlan frames. The 4
VLAN_HLEN bytes were omitted, leading to sizing the big buffer
4 bytes smaller than it should be.  Due to how rx buffers are
carved from pages, this was harmless for the common (9000, 1500)
byte MTUs, but could lead to data corruption for some MTUs.
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

13348bee

[PATCH] myri10ge: Full vlan frame in small_bytes · de3c4507

由 Brice Goglin 提交于 12月 11, 2006

Receive full vlan frames into smalls when running with a jumbo MTU.
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

de3c4507

[PATCH] myri10ge: drop contiguous skb routines · 52ea6fb3

由 Brice Goglin 提交于 12月 11, 2006

Drop the old routines that used the physically contigous skb now
that we use the physical pages. And rename myri10ge_page_rx_done()
to myri10ge_rx_done() as it was previously.
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

52ea6fb3

[PATCH] myri10ge: switch to page-based skb · c7dab99b

由 Brice Goglin 提交于 12月 11, 2006

Switch to physical page skb, by calling the new page-based
allocation routines and using myri10ge_page_rx_done().
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

c7dab99b

[PATCH] myri10ge: add page-based skb routines · dd50f336

由 Brice Goglin 提交于 12月 11, 2006

Add physical page skb allocation routines and page based rx_done,
to be used by upcoming patches.
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

dd50f336

[PATCH] myri10ge: indentation cleanups · 6250223e

由 Brice Goglin 提交于 12月 11, 2006

Indentation cleanups to synchronize to our tree which is automatically
indent'ed.
Signed-off-by: NBrice Goglin <brice@myri.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

6250223e

[PATCH] chelsio: working NAPI · 7fe26a60

由 Stephen Hemminger 提交于 12月 08, 2006

This driver tries to enable/disable NAPI at runtime, but
does so in an unsafe manner, and the NAPI interrupt handling is
a mess. Replace it with a compile time selected NAPI implementation.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

7fe26a60

[PATCH] MACB: Use __raw register access · 0f0d84e5

由 Haavard Skinnemoen 提交于 12月 08, 2006

Since macb is a chip-internal device, use __raw_readl and
__raw_writel instead of readl/writel. This will perform native-endian
accesses, which is the right thing to do on both AVR32 and ARM devices.
Signed-off-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

0f0d84e5

[PATCH] MACB: Use struct delayed_work instead of struct work_struct · d836cae4

由 Haavard Skinnemoen 提交于 12月 08, 2006

The macb driver calls schedule_delayed_work() and friends, so we need
to use a struct delayed_work along with it. The conversion was
explained by David Howells on lkml Dec 5 2006:

http://lkml.org/lkml/2006/12/5/269Signed-off-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

d836cae4

[PATCH] ucc_geth: Initialize mdio_lock. · 68dc44af

由 Scott Wood 提交于 12月 07, 2006

Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

68dc44af

[PATCH] ucc_geth: compilation error fixes · 1083cfe1

由 Scott Wood 提交于 12月 07, 2006

Fix compilation failures when building the ucc_geth driver with spinlock
debugging.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NJeff Garzik <jeff@garzik.org>

1083cfe1

[MIPS] Export local_flush_data_cache_page for sake of IDE. · 9202f325

由 Ralf Baechle 提交于 12月 10, 2006

On a CPU with aliases the IDE core needs to flush caches in the special
IDE variants of insw, insl etc.  If IDE support is built as a module this
will only work if local_flush_data_cache_page happens is exported as a
module.

As per policy export local_flush_data_cache_page as GPL symbol only.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

9202f325

[MIPS] Export pm_power_off · f8bf35a9

由 Ralf Baechle 提交于 12月 10, 2006

This is required for ipmi_poweroff.c to work as a module.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

f8bf35a9

R
[MIPS] Export csum_partial_copy_nocheck. · ae32ffd6
由 Ralf Baechle 提交于 12月 10, 2006
```
ibmtr.c and typhoon.c use it.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
```
ae32ffd6

[MIPS] Move die and die_if_kernel() from system.h to ptrace.h · 2d911e9a

由 Ralf Baechle 提交于 12月 10, 2006

This eleminates the need to include ptrace.h into system.h and fixes a
harmless namespace conflict on the PC symbol in bpck.c.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

2d911e9a

[MIPS] Discard .exit.text at linktime. · 86384d54

由 Ralf Baechle 提交于 12月 10, 2006

This fixes fairly unobvious breakage of various drivers.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

86384d54

R
[MIPS] Fix build of several IDE drivers by providing pci_get_legacy_ide_irq · 5b1d221e
由 Ralf Baechle 提交于 12月 09, 2006
```
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
```
5b1d221e

[CRYPTO] dm-crypt: Select CRYPTO_CBC · 3263263f

由 Herbert Xu 提交于 12月 10, 2006

As CBC is the default chaining method for cryptoloop, we should select
it from cryptoloop to ease the transition.  Spotted by Rene Herman.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3263263f

[PATCH] add MODULE_* attributes to bit reversal library · 0258736a

由 Cal Peake 提交于 12月 10, 2006

Add MODULE_* attributes to the new bit reversal library. Most notably
MODULE_LICENSE which prevents superfluous kernel tainting.
Signed-off-by: NCal Peake <cp@absolutedigital.net>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0258736a

Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 · edb16bec

由 Linus Torvalds 提交于 12月 10, 2006

* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Fix several kprobes bugs.
  [SPARC64]: Update defconfig.
  [SPARC64]: dma remove extra brackets
  [SPARC{32,64}]: Propagate ptrace_traceme() return value.
  [SPARC64]: Replace kmalloc+memset with kzalloc
  [SPARC]: Check kzalloc() return value in SUN4D irq/iommu init.
  [SPARC]: Replace kmalloc+memset with kzalloc
  [SPARC64]: Run ctrl-alt-del action for sun4v powerdown request.
  [SPARC64]: Unaligned accesses to userspace are hard errors.
  [SPARC64]: Call do_mathemu on illegal instruction traps too.
  [SPARC64]: Update defconfig.
  [SPARC64]: Add irqtrace/stacktrace/lockdep support.

edb16bec

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb · bb7320d1

由 Linus Torvalds 提交于 12月 10, 2006

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (132 commits)
  V4L/DVB 4949b: Fix container_of pointer retreival
  V4L/DVB (4949a): Fix INIT_WORK
  V4L/DVB (4949): Cxusb: codingstyle cleanups
  V4L/DVB (4948): Cxusb: Convert tuner functions to use dvb_pll_attach
  V4L/DVB (4947): Cx88: trivial cleanups
  V4L/DVB (4946): Cx88: Move cx88_dvb_bus_ctrl out of the card-specific area
  V4L/DVB (4945): Cx88: consolidate cx22702_config structs
  V4L/DVB (4944): Cx88: Convert DViCO FusionHDTV Hybrid to use dvb_pll_attach
  V4L/DVB (4943): Cx88: cleanup dvb_pll_attach for lgdt3302 tuners
  V4L/DVB (4953): Usbvision minor fixes
  V4L/DVB (4951): Add version.h, since it is required for VIDIOC_QUERYCAP
  V4L/DVB (4940): Or51211: Changed SNR and signal strength calculations
  V4L/DVB (4939): Or51132: Changed SNR and signal strength reporting
  V4L/DVB (4938): Cx88: Convert lgdt3302 tuning function to use dvb_pll_attach
  V4L/DVB (4941): Remove LINUX_VERSION_CODE and fix identations
  V4L/DVB (4942): Whitespace cleanups
  V4L/DVB (4937): Usbvision cleanup and code reorganization
  V4L/DVB (4936): Make MT4049FM5 tuner to set FM Gain to Normal
  V4L/DVB (4935): Added the capability of selecting fm gain by tuner
  V4L/DVB (4934): Usbvision radio requires GainNormal at e register
  ...

bb7320d1

[PATCH] kvm: userspace interface · 6aa8b732

由 Avi Kivity 提交于 12月 10, 2006

web site: http://kvm.sourceforge.net

mailing list: kvm-devel@lists.sourceforge.net
  (http://lists.sourceforge.net/lists/listinfo/kvm-devel)

The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture.  The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace.  Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.

Using this driver, one can start multiple virtual machines on a host.

Each virtual machine is a process on the host; a virtual cpu is a thread in
that process.  kill(1), nice(1), top(1) work as expected.  In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode.  Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm).  Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.

The driver supports i386 and x86_64 hosts and guests.  All combinations are
allowed except x86_64 guest on i386 host.  For i386 guests and hosts, both pae
and non-pae paging modes are supported.

SMP hosts and UP guests are supported.  At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.

Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch.  We plan to address this in two ways:

- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables

Currently a virtual desktop is responsive but consumes a lot of CPU.  Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization.  Linux/X is slower, probably due
to X being in a separate process.

In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.

Caveats (akpm: might no longer be true):

- The Windows install currently bluescreens due to a problem with the
  virtual APIC.  We are working on a fix.  A temporary workaround is to
  use an existing image or install through qemu
- Windows 64-bit does not work.  That's also true for qemu, so it's
  probably a problem with the device model.

[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: NYaniv Kamay <yaniv@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: NUri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAnthony Liguori <anthony@codemonkey.ws>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6aa8b732

[PATCH] clocksource: small cleanup · f5f1a24a

由 Daniel Walker 提交于 12月 10, 2006

Mostly changing alignment.  Just some general cleanup.

[akpm@osdl.org: build fix]
Signed-off-by: NDaniel Walker <dwalker@mvista.com>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f5f1a24a

[PATCH] clocksource: add usage of CONFIG_SYSFS · 2b013700

由 Daniel Walker 提交于 12月 10, 2006

Simply adds some ifdefs to remove clocksoure sysfs code when CONFIG_SYSFS
isn't turn on.
Signed-off-by: NDaniel Walker <dwalker@mvista.com>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2b013700

[PATCH] user of the jiffies rounding patch: Slab · 2b284214

由 Arjan van de Ven 提交于 12月 10, 2006

This patch introduces users of the round_jiffies() function in the slab code.

The slab code has a few "run every second" timers for background work; these
are obviously not timing critical as long as they happen roughly at the right
frequency.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2b284214

[PATCH] user of the jiffies rounding code: JBD · 44d306e1

由 Arjan van de Ven 提交于 12月 10, 2006

This patch introduces a user: of the round_jiffies() function; the "5 second"
ext3/jbd wakeup.

While "every 5 seconds" doesn't sound as a problem, there can be many of these
(and these timers do add up over all the kernel).  The "5 second" wakeup isn't
really timing sensitive; in addition even with rounding it'll still happen
every 5 seconds (with the exception of the very first time, which is likely to
be rounded up to somewhere closer to 6 seconds)
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

44d306e1

[PATCH] round_jiffies infrastructure · 4c36a5de

由 Arjan van de Ven 提交于 12月 10, 2006

Introduce a round_jiffies() function as well as a round_jiffies_relative()
function.  These functions round a jiffies value to the next whole second.
The primary purpose of this rounding is to cause all "we don't care exactly
when" timers to happen at the same jiffy.

This avoids multiple timers firing within the second for no real reason;
with dynamic ticks these extra timers cause wakeups from deep sleep CPU
sleep states and thus waste power.

The exact wakeup moment is skewed by the cpu number, to avoid all cpus from
waking up at the exact same time (and hitting the same lock/cachelines
there)

[akpm@osdl.org: fix variable type]
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4c36a5de

[PATCH] fdtable: Implement new pagesize-based fdtable allocator · 5466b456

由 Vadim Lobanov 提交于 12月 10, 2006

This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries.  The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.

The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:

    pagesize / 4
    pagesize / 2
    pagesize      <- memory allocator switch point
    pagesize * 2
    pagesize * 4
    ...etc...

Unlike the current implementation, this allocation scheme does not require a
loop to compute the optimal fdarray size, and can be done in efficient
straightline code.

Furthermore, since the fdarray overflows the pagesize boundary long before any
of the fdsets do, it makes sense to optimize run-time by allocating both
fdsets in a single swoop.  Even together, they will still be, by far, smaller
than the fdarray.  The fdtable->open_fds is now used as the anchor for the
fdset memory allocation.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5466b456

[PATCH] fdtable: Remove the free_files field · 4fd45812

由 Vadim Lobanov 提交于 12月 10, 2006

An fdtable can either be embedded inside a files_struct or standalone (after
being expanded).  When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.

Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded.  We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4fd45812

[PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69

由 Vadim Lobanov 提交于 12月 10, 2006

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
code becomes simpler.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bbea9f69

[PATCH] fdtable: Delete pointless code in dup_fd() · f3d19c90

由 Vadim Lobanov 提交于 12月 10, 2006

The dup_fd() function creates a new files_struct and fdtable embedded inside
that files_struct, and then possibly expands the fdtable using expand_files().

The out_release error path is invoked when expand_files() returns an error
code.  However, when this attempt to expand fails, the fdtable is left in its
original embedded form, so it is pointless to try to free the associated
fdarray and fdsets.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f3d19c90

[PATCH] dio: lock refcount operations · 5eb6c7a2

由 Zach Brown 提交于 12月 10, 2006

The wait_for_more_bios() function name was poorly chosen.  While looking to
clean it up it I noticed that the dio struct refcounting between the bio
completion and dio submission paths was racey.

The bio submission path was simply freeing the dio struct if
atomic_dec_and_test() indicated that it dropped the final reference.

The aio bio completion path was dereferencing its dio struct pointer *after
dropping its reference* based on the remaining number of references.

These two paths could race and result in the aio bio completion path
dereferencing a freed dio, though this was not observed in the wild.

This moves the refcount under the bio lock so that bio completion can drop
its reference and decide to wake all in one atomic step.

Once testing and waking is locked dio_await_one() can test its sleeping
condition and mark itself uninterruptible under the lock.  It gets simpler
and wait_for_more_bios() disappears.

The addition of the interrupt masking spin lock acquiry in dio_bio_submit()
looks alarming.  This lock acquiry existed in that path before the recent
dio completion patch set.  We shouldn't expect significant performance
regression from returning to the behaviour that existed before the
completion clean up work.

This passed 4k block ext3 O_DIRECT fsx and aio-stress on an SMP machine.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5eb6c7a2

[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED · 8459d86a

由 Zach Brown 提交于 12月 10, 2006

The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
conditionals in sync. direct_io_worker() knew when finished_one_bio() was
going to call aio_complete(). It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.

Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio(). direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.

The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain. It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.

This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio. Instead we return the return
code of the operation and let the aio core call aio_complete(). This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.

Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it. XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8459d86a

[PATCH] dio: remove duplicate bio wait code · 20258b2b

由 Zach Brown 提交于 12月 10, 2006

Now that we have a single refcount and waiting path we can reuse it in the
async 'should_wait' path.  It continues to rely on the fragile link between
the conditional in dio_complete_aio() which decides to complete the AIO and
the conditional in direct_io_worker() which decides to wait and free.

By waiting before dropping the reference we stop dio_bio_end_aio() from
calling dio_complete_aio() which used to wake up the waiter after seeing the
reference count drop to 0.  We hoist this wake up into dio_bio_end_aio() which
now notices when it's left a single remaining reference that is held by the
waiter.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

20258b2b

[PATCH] dio: formalize bio counters as a dio reference count · 0273201e

由 Zach Brown 提交于 12月 10, 2006

Previously we had two confusing counts of bio progress. 'bio_count' was
decremented as bios were processed and freed by the dio core. It was used to
indicate final completion of the dio operation. 'bios_in_flight' reflected
how many bios were between submit_bio() and bio->end_io. It was used by the
sync path to decide when to wake up and finish completing bios and was ignored
by the async path.

This patch collapses the two notions into one notion of a dio reference count.
bios hold a dio reference when they're between submit_bio and bio->end_io.

Since bios_in_flight was only used in the sync path it is now equivalent to
dio->refcount - 1 which accounts for direct_io_worker() holding a reference
for the duration of the operation.

dio_bio_complete() -> finished_one_bio() was called from the sync path after
finding bios on the list that the bio->end_io function had deposited.
finished_one_bio() can not drop the dio reference on behalf of these bios now
because bio->end_io already has. The is_async test in finished_one_bio()
meant that it never actually did anything other than drop the bio_count for
sync callers. So we remove its refcount decrement, don't call it from
dio_bio_complete(), and hoist its call up into the async dio_bio_complete()
caller after an explicit refcount decrement. It is renamed dio_complete_aio()
to reflect the remaining work it actually does.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0273201e

[PATCH] dio: call blk_run_address_space() once per op · 17a7b1d7

由 Zach Brown 提交于 12月 10, 2006

We only need to call blk_run_address_space() once after all the bios for the
direct IO op have been submitted.  This removes the chance of calling
blk_run_address_space() after spurious wake ups as the sync path waits for
bios to drain.  It's also one less difference betwen the sync and async paths.

In the process we remove a redundant dio_bio_submit() that its caller had
already performed.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

17a7b1d7

[PATCH] dio: centralize completion in dio_complete() · 6d544bb4

由 Zach Brown 提交于 12月 10, 2006

There have been a lot of bugs recently due to the way direct_io_worker() tries
to decide how to finish direct IO operations.  In the worst examples it has
failed to call aio_complete() at all (hang) or called it too many times
(oops).

This set of patches cleans up the completion phase with the goal of removing
the complexity that lead to these bugs.  We end up with one path that
calculates the result of the operation after all off the bios have completed.
We decide when to generate a result of the operation using that path based on
the final release of a refcount on the dio structure.

I tried to progress towards the final state in steps that were relatively easy
to understand.  Each step should compile but I only tested the final result of
having all the patches applied.

I've tested these on low end PC drives with aio-stress, the direct IO tests I
could manage to get running in LTP, orasim, and some home-brew functional
tests.

In http://lkml.org/lkml/2006/9/21/103 IBM reports success with ext2 and ext3
running DIO LTP tests.  They found that XFS bug which has since been addressed
in the patch series.

This patch:

The mechanics which decide the result of a direct IO operation were duplicated
in the sync and async paths.

The async path didn't check page_errors which can manifest as silently
returning success when the final pointer in an operation faults and its
matching file region is filled with zeros.

The sync path and async path differed in whether they passed errors to the
caller's dio->end_io operation.  The async path was passing errors to it which
trips an assertion in XFS, though it is apparently harmless.

This centralizes the completion phase of dio ops in one place.  AIO will now
return EFAULT consistently and all paths fall back to the previously sync
behaviour of passing the number of bytes 'transferred' to the dio->end_io
callback, regardless of errors.

dio_await_completion() doesn't have to propogate EIO from non-uptodate bios
now that it's being propogated through dio_complete() via dio->io_error.  This
lets it return void which simplifies its sole caller.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6d544bb4

[PATCH] md: assorted md and raid1 one-liners · 17571284

由 NeilBrown 提交于 12月 10, 2006

Fix few bugs that meant that:
  - superblocks weren't alway written at exactly the right time (this
    could show up if the array was not written to - writting to the array
    causes lots of superblock updates and so hides these errors).

  - restarting device recovery after a clean shutdown (version-1 metadata
    only) didn't work as intended (or at all).

1/ Ensure superblock is updated when a new device is added.
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
   The body of this if takes one of two branches depending on whether
   MD_RECOVERY_SYNC is set, so testing it in the clause of the if
   is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
   metadata only) make sure a full recovery (not just as guided by
   bitmaps) does get done.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

17571284

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致