提交 · c9b8af00ff71f86ff3d092cc60ca673e1d0eae5b · openeuler / Kernel

12 6月, 2009 40 次提交

lguest: remove obsolete LHREQ_BREAK call · 5dac051b

由 Rusty Russell 提交于 6月 12, 2009

We no longer need an efficient mechanism to force the Guest back into
host userspace, as each device is serviced without bothering the main
Guest process (aka. the Launcher).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

5dac051b

lguest: use eventfds for device notification · df60aeef

由 Rusty Russell 提交于 6月 12, 2009

Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
an address: the main Launcher process returns with this address, and figures
out what device to run.

A far nicer model is to let processes bind an eventfd to an address: if we
find one, we simply signal the eventfd.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Davide Libenzi <davidel@xmailserver.org>

df60aeef

lguest: improve interrupt handling, speed up stream networking · a32a8813

由 Rusty Russell 提交于 6月 12, 2009

lguest never checked for pending interrupts when enabling interrupts, and
things still worked.  However, it makes a significant difference to TCP
performance, so it's time we fixed it by introducing a pending_irq flag
and checking it on irq_restore and irq_enable.

These two routines are now too big to patch into the 8/10 bytes
patch space, so we drop that code.

Note: The high latency on interrupt delivery had a very curious
effect: once everything else was optimized, networking without GSO was
faster than networking with GSO, since more interrupts were sent and
hence a greater chance of one getting through to the Guest!

Note2: (Almost) Closing the same loophole for iret doesn't have any
measurable effect, so I'm leaving that patch for the moment.

Before:
	1GB tcpblast Guest->Host:		30.7 seconds
	1GB tcpblast Guest->Host (no GSO):	76.0 seconds

After:
	1GB tcpblast Guest->Host:		6.8 seconds
	1GB tcpblast Guest->Host (no GSO):	27.8 seconds
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

a32a8813

virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) · 9fa29b9d

由 Mark McLoughlin 提交于 5月 11, 2009

Add a new feature flag for indirect ring entries. These are ring
entries which point to a table of buffer descriptors.

The idea here is to increase the ring capacity by allowing a larger
effective ring size whereby the ring size dictates the number of
requests that may be outstanding, rather than the size of those
requests.

This should be most effective in the case of block I/O where we can
potentially benefit by concurrently dispatching a large number of
large requests. Even in the simple case of single segment block
requests, this results in a threefold increase in ring capacity.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

9fa29b9d

virtio: teach virtio_has_feature() about transport features · ee006b35

由 Mark McLoughlin 提交于 5月 11, 2009

Drivers don't add transport features to their table, so we
shouldn't check these with virtio_check_driver_offered_feature().

We could perhaps add an ->offered_feature() virtio_config_op,
but that perhaps that would be overkill for a consitency check
like this.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

ee006b35

virtio_pci: optional MSI-X support · 82af8ce8

由 Michael S. Tsirkin 提交于 5月 14, 2009

This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NAnthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ whitespace, style)

82af8ce8

virtio: find_vqs/del_vqs virtio operations · d2a7ddda

由 Michael S. Tsirkin 提交于 6月 12, 2009

This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)

d2a7ddda

virtio: add names to virtqueue struct, mapping from devices to queues. · 9499f5e7

由 Rusty Russell 提交于 6月 12, 2009

Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.

Also, add a "name" field for clearer debug messages.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

9499f5e7

R
virtio: fix obsolete documentation on probe function · 20f77f56
由 Rusty Russell 提交于 6月 12, 2009
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
20f77f56

module: trim exception table on init free. · ad6561df

由 Rusty Russell 提交于 6月 12, 2009

It's theoretically possible that there are exception table entries
which point into the (freed) init text of modules.  These could cause
future problems if other modules get loaded into that memory and cause
an exception as we'd see the wrong fixup.  The only case I know of is
kvm-intel.ko (when CONFIG_CC_OPTIMIZE_FOR_SIZE=n).

Amerigo fixed this long-standing FIXME in the x86 version, but this
patch is more general.

This implements trim_init_extable(); most archs are simple since they
use the standard lib/extable.c sort code.  Alpha and IA64 use relative
addresses in their fixups, so thier trimming is a slight variation.

Sparc32 is unique; it doesn't seem to define ARCH_HAS_SORT_EXTABLE,
yet it defines its own sort_extable() which overrides the one in lib.
It doesn't sort, so we have to mark deleted entries instead of
actually trimming them.
Inspired-by: NAmerigo Wang <amwang@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: linux-alpha@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-ia64@vger.kernel.org

ad6561df

module_param: allow 'bool' module_params to be bool, not just int. · fddd5201

由 Rusty Russell 提交于 6月 12, 2009

Impact: API cleanup

For historical reasons, 'bool' parameters must be an int, not a bool.
But there are around 600 users, so a conversion seems like useless churn.

So we use __same_type() to distinguish, and handle both cases.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

fddd5201

module_param: add __same_type convenience wrapper for __builtin_types_compatible_p · d2c123c2

由 Rusty Russell 提交于 6月 12, 2009

Impact: new API

__builtin_types_compatible_p() is a little awkward to use: it takes two
types rather than types or variables, and it's just damn long.

(typeof(type) == type, so this works on types as well as vars).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d2c123c2

module_param: split perm field into flags and perm · 45fcc70c

由 Rusty Russell 提交于 6月 12, 2009

Impact: cleanup

Rather than hack KPARAM_KMALLOCED into the perm field, separate it out.
Since the perm field was 32 bits and only needs 16, we don't add bloat.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

45fcc70c

module_param: invbool should take a 'bool', not an 'int' · 9a71af2c

由 Rusty Russell 提交于 6月 12, 2009

It takes an 'int' for historical reasons, and there are only two
users: simply switch it over to bool.

The other user (uvesafb.c) will get a (harmless-on-x86) warning until
the next patch is applied.

Cc: Brad Douglas <brad@neruo.com>
Cc: Michal Januszewski <spock@gentoo.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

9a71af2c

fs/qnx4: sanitize includes · 964f5369

由 Al Viro 提交于 6月 07, 2009

fs-internal parts of qnx4_fs.h taken to fs/qnx4/qnx4.h, includes adjusted,
qnx4_fs.h doesn't need unifdef anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

964f5369

Sanitize qnx4 fsync handling · 79d25767

由 Al Viro 提交于 6月 07, 2009

* have directory operations use mark_buffer_dirty_inode(),
  so that sync_mapping_buffers() would get those.
* make qnx4_write_inode() honour its last argument.
* get rid of insane copies of very ancient "walk the indirect blocks"
  in qnx4/fsync - they never matched the actual fs layout and, fortunately,
  never'd been called.  Again, all this junk is not needed; ->fsync()
  should just do sync_mapping_buffers + sync_inode (and if we implement
  block allocation for qnx4, we'll need to use mark_buffer_dirty_inode()
  for extent blocks)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

79d25767

New helper - simple_fsync() · d5aacad5

由 Al Viro 提交于 6月 07, 2009

writes associated buffers, then does sync_inode() to write
the inode itself (and to make it clean).  Depends on
->write_inode() honouring the second argument.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d5aacad5

linux/magic.h: move cramfs magic out of cramfs_fs.h · 8688b863

由 Mike Frysinger 提交于 5月 26, 2009

Signed-off-by: NMike Frysinger <vapier@gentoo.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8688b863

fs: Rearrange inode structure elements to avoid waste due to padding · 28ad0c11

由 Theodore Ts'o 提交于 5月 21, 2009

Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

28ad0c11

fs: Remove i_cindex from struct inode · 9fd5746f

由 Theodore Ts'o 提交于 5月 21, 2009

The only user of the i_cindex element in the inode structure is used
is by the firewire drivers.  As part of an attempt to slim down the
inode structure to save memory --- since a typical Linux system will
have hundreds of thousands if not millions of inodes cached, a
reduction in the size inode has high leverage.

The firewire driver does not need i_cindex in any fast path, so it's
simple enough to calculate when it is needed, instead of wasting space
in the inode structure.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: krh@redhat.com
Cc: stefanr@s5r6.in-berlin.de
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9fd5746f

Trim a bit of crap from fs.h · 62c6943b

由 Al Viro 提交于 5月 07, 2009

do_remount_sb() is fs/internal.h fodder, fsync_no_super() is long gone.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

62c6943b

dcache: extrace and use d_unlinked() · f3da392e

由 Alexey Dobriyan 提交于 5月 04, 2009

d_unlinked() will be used in middle-term to ban checkpointing when opened
but unlinked file is detected, and in long term, to detect such situation
and special case on it.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f3da392e

quota: Introduce writeout_quota_sb() (version 4) · c3f8a40c

由 Jan Kara 提交于 4月 27, 2009

Introduce this function which just writes all the quota structures but
avoids all the syncing and cache pruning work to expose quota structures
to userspace. Use this function from __sync_filesystem when wait == 0.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c3f8a40c

quota: cleanup dquota sync functions (version 4) · 850b201b

由 Christoph Hellwig 提交于 4月 27, 2009

Currently the VFS calls vfs_dq_sync to sync out disk quotas for a given
superblock.  This is a small wrapper around sync_dquots which for the
case of a non-NULL superblock is a small wrapper around quota_sync_sb.

Just make quota_sync_sb global (rename it to sync_quota_sb) and call it
directly.  Also call it directly for those cases in quota.c that have a
superblock and leave sync_dquots purely an iterator over sync_quota_sb and
remove it's superblock argument.

To make this nicer move the check for the lack of a quota_sync method
from the callers into sync_quota_sb.

[folded build fix from Alexander Beregalov <a.beregalov@gmail.com>]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

850b201b

vfs: Rename fsync_super() to sync_filesystem() (version 4) · 60b0680f

由 Jan Kara 提交于 4月 27, 2009

Rename the function so that it better describe what it really does. Also
remove the unnecessary include of buffer_head.h.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

60b0680f

vfs: Move syncing code from super.c to sync.c (version 4) · c15c54f5

由 Jan Kara 提交于 4月 27, 2009

Move sync_filesystems(), __fsync_super(), fsync_super() from
super.c to sync.c where it fits better.

[build fixes folded]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c15c54f5

vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815

由 Jan Kara 提交于 4月 27, 2009

It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.

Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.

sync_blockdevs() introduced a couple of patches ago is gone now.

[build fixes folded]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5cee5815

vfs: Make __fsync_super() a static function (version 4) · 429479f0

由 Jan Kara 提交于 4月 27, 2009

__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

429479f0

remove s_async_list · 876a9f76

由 Christoph Hellwig 提交于 4月 28, 2009

Remove the unused s_async_list in the superblock, a leftover of the
broken async inode deletion code that leaked into mainline.  Having this
in the middle of the sync/unmount path is not helpful for the following
cleanups.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

876a9f76

fs: introduce mnt_clone_write · 96029c4e

由 npiggin@suse.de 提交于 4月 26, 2009

This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.

Before:
 avg = 462.286
 std = 5.46106

After:
 avg = 453.12
 std = 9.58257

(50 runs of each, stddev gives a reasonable confidence)

It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.

After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).

[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96029c4e

fs: mnt_want_write speedup · d3ef3d73

由 npiggin@suse.de 提交于 4月 26, 2009

This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
A microbenchmark yes, but it exercises some important paths in the mm.

Before:
 avg = 501.9
 std = 14.7773

After:
 avg = 462.286
 std = 5.46106

(50 runs of each, stddev gives a reasonable confidence, but there is quite
a bit of variation there still)

It does this by removing the complex per-cpu locking and counter-cache and
replaces it with a percpu counter in struct vfsmount. This makes the code
much simpler, and avoids spinlocks (although the msync is still pretty
costly, unfortunately). It results in about 900 bytes smaller code too. It
does increase the size of a vfsmount, however.

It should also give a speedup on large systems if CPUs are frequently operating
on different mounts (because the existing scheme has to operate on an atomic in
the struct vfsmount when switching between mounts). But I'm most interested in
the single threaded path performance for the moment.

[AV: minor cleanup]

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d3ef3d73

A
Move junk from proc_fs.h to fs/proc/internal.h · 3174c21b
由 Al Viro 提交于 4月 07, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3174c21b
A
switch lookup_mnt() · 1c755af4
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1c755af4
A
switch follow_down() · 9393bd07
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9393bd07
A
Switch collect_mounts() to struct path · 589ff870
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
589ff870
A
switch follow_up() to struct path · bab77ebf
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
bab77ebf
A
switch rqst_exp_parent() · e64c390c
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e64c390c
A
switch rqst_exp_get_by_name() · 91c9fa8f
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
91c9fa8f

Cache root in nameidata · 2a737871

由 Al Viro 提交于 4月 07, 2009

New field: nd->root. When pathname resolution wants to know the root,
check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise
copy current->fs->root there. After path_walk() is finished, we check
if we'd got a cached value in nd->root and drop it. Before calling
path_walk() we should either set nd->root.mnt to NULL *or* copy (and
pin down) some path to nd->root. In the latter case we won't be
looking at current->fs->root at all.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2a737871

reiserfs: allow exposing privroot w/ xattrs enabled · 73422811

由 Jeff Mahoney 提交于 5月 10, 2009

This patch adds an -oexpose_privroot option to allow access to the privroot.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

73422811

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功