提交 · fcf634098c00dd9cd247447368495f0b79be12d1 · openeuler / Kernel

01 11月, 2011 1 次提交

由 Christopher Yeoh 提交于 10月 31, 2011

The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fcf63409

28 10月, 2011 2 次提交

vfs: add generic_file_llseek_size · 5760495a

由 Andi Kleen 提交于 9月 15, 2011

Add a generic_file_llseek variant to the VFS that allows passing in
the maximum file size of the file system, instead of always
using maxbytes from the superblock.

This can be used to eliminate some cut'n'paste seek code in ext4.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5760495a

vfs: do (nearly) lockless generic_file_llseek · ef3d0fd2

由 Andi Kleen 提交于 9月 15, 2011

The i_mutex lock use of generic _file_llseek hurts.  Independent processes
accessing the same file synchronize over a single lock, even though
they have no need for synchronization at all.

Under high utilization this can cause llseek to scale very poorly on larger
systems.

This patch does some rethinking of the llseek locking model:

First the 64bit f_pos is not necessarily atomic without locks
on 32bit systems. This can already cause races with read() today.
This was discussed on linux-kernel in the past and deemed acceptable.
The patch does not change that.

Let's look at the different seek variants:

SEEK_SET: Doesn't really need any locking.
If there's a race one writer wins, the other loses.

For 32bit the non atomic update races against read()
stay the same. Without a lock they can also happen
against write() now.  The read() race was deemed
acceptable in past discussions, and I think if it's
ok for read it's ok for write too.

=> Don't need a lock.

SEEK_END: This behaves like SEEK_SET plus it reads
the maximum size too. Reading the maximum size would have the
32bit atomic problem. But luckily we already have a way to read
the maximum size without locking (i_size_read), so we
can just use that instead.

Without i_mutex there is no synchronization with write() anymore,
however since the write() update is atomic on 64bit it just behaves
like another racy SEEK_SET.  On non atomic 32bit it's the same
as SEEK_SET.

=> Don't need a lock, but need to use i_size_read()

SEEK_CUR: This has a read-modify-write race window
on the same file. One could argue that any application
doing unsynchronized seeks on the same file is already broken.
But for the sake of not adding a regression here I'm
using the file->f_lock to synchronize this. Using this
lock is much better than the inode mutex because it doesn't
synchronize between processes.

=> So still need a lock, but can use a f_lock.

This patch implements this new scheme in generic_file_llseek.
I dropped generic_file_llseek_unlocked and changed all callers.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ef3d0fd2

27 7月, 2011 1 次提交

fs: add missing unlock in default_llseek() · bacb2d81

由 Dan Carpenter 提交于 7月 26, 2011

A recent change in linux-next, 982d8165 "fs: add SEEK_HOLE and
SEEK_DATA flags" added some direct returns on error, but it should
have been a goto out.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bacb2d81

21 7月, 2011 1 次提交

fs: add SEEK_HOLE and SEEK_DATA flags · 982d8165

由 Josef Bacik 提交于 7月 18, 2011

This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck.  We need to match solaris
here, and the definitions are

*o* If /whence/ is SEEK_HOLE, the offset of the start of the
next hole greater than or equal to the supplied offset
is returned. The definition of a hole is provided near
the end of the DESCRIPTION.

*o* If /whence/ is SEEK_DATA, the file pointer is set to the
start of the next non-hole file region greater than or
equal to the supplied offset.

So in the generic case the entire file is data and there is a virtual hole at
the end.  That means we will just return i_size for SEEK_HOLE and will return
the same offset for SEEK_DATA.  This is how Solaris does it so we have to do it
the same way.

Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

982d8165

13 1月, 2011 1 次提交
- A
  fix signedness mess in rw_verify_area() on 64bit architectures · cccb5a1e
  由 Al Viro 提交于 12月 17, 2010
```
... and clean the unsigned-f_pos code, while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  cccb5a1e
18 11月, 2010 1 次提交

BKL: remove extraneous #include <smp_lock.h> · 451a3c24

由 Arnd Bergmann 提交于 11月 17, 2010

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

451a3c24

30 10月, 2010 1 次提交

readv/writev: do the same MAX_RW_COUNT truncation that read/write does · 435f49a5

由 Linus Torvalds 提交于 10月 29, 2010

We used to protect against overflow, but rather than return an error, do
what read/write does, namely to limit the total size to MAX_RW_COUNT.
This is not only more consistent, but it also means that any broken
low-level read/write routine that still keeps counts in 'int' can't
break.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

435f49a5

26 10月, 2010 1 次提交

vfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_pos · 4a3956c7

由 KAMEZAWA Hiroyuki 提交于 10月 01, 2010

Now, rw_verify_area() checsk f_pos is negative or not.  And if negative,
returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..  has
negative offsets.  And we can't do any access via read/write to the
file(device).

So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4a3956c7

15 10月, 2010 2 次提交

vfs: make no_llseek the default · 776c163b

由 Arnd Bergmann 提交于 7月 07, 2010

All file operations now have an explicit .llseek
operation pointer, so we can change the default
action for future code.

This makes changes the default from default_llseek
to no_llseek, which always returns -ESPIPE if
a user tries to seek on a file without a .llseek
operation.

The name of the default_llseek function remains
unchanged, if anyone thinks we should change it,
please speak up.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org

776c163b

vfs: don't use BKL in default_llseek · ab91261f

由 Arnd Bergmann 提交于 7月 07, 2010

There are currently 191 users of default_llseek.
Nine of these are in device drivers that use the
big kernel lock. None of these ever touch
file->f_pos outside of llseek or file_pos_write.

Consequently, we never rely on the BKL
in the default_llseek function and can
replace that with i_mutex, which is also
used in generic_file_llseek.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

ab91261f

28 7月, 2010 1 次提交

fsnotify: pass a file instead of an inode to open, read, and write · 2a12a9d7

由 Eric Paris 提交于 12月 17, 2009

fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.
Signed-off-by: NEric Paris <eparis@redhat.com>

2a12a9d7

28 5月, 2010 1 次提交

vfs: introduce noop_llseek() · ae6afc3f

由 jan Blunck 提交于 5月 26, 2010

This is an implementation of ->llseek useable for the rare special case
when userspace expects the seek to succeed but the (device) file is
actually not able to perform the seek.  In this case you use noop_llseek()
instead of falling back to the default implementation of ->llseek.
Signed-off-by: NJan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae6afc3f

25 3月, 2010 1 次提交

do_sync_read/write() should set kiocb.ki_nbytes to be consistent · 61964eba

由 David Howells 提交于 3月 24, 2010

do_sync_read/write() should set kiocb.ki_nbytes to be consistent with
do_sync_readv_writev().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61964eba

04 11月, 2009 1 次提交

sendfile(): check f_op.splice_write() rather than f_op.sendpage() · cc56f7de

由 Changli Gao 提交于 11月 04, 2009

sendfile(2) was reworked with the splice infrastructure, but it still
checks f_op.sendpage() instead of f_op.splice_write() wrongly. Although
if f_op.sendpage() exists, f_op.splice_write() always exists at the same
time currently, the assumption will be broken in future silently. This
patch also brings a side effect: sendfile(2) can work with any output
file. Some security checks related to f_op are added too.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cc56f7de

24 9月, 2009 1 次提交

vfs: remove redundant position check in do_sendfile · f9098980

由 Jeff Layton 提交于 9月 18, 2009

As Johannes Weiner pointed out, one of the range checks in do_sendfile
is redundant and is already checked in rw_verify_area.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f9098980

11 5月, 2009 1 次提交

splice: implement default splice_read method · 6818173b

由 Miklos Szeredi 提交于 5月 07, 2009

If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.

This will allow splice and functions using splice, such as the loop
device, to work on all filesystems.  This includes "direct_io" files
in fuse which bypass the page cache.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6818173b

05 4月, 2009 1 次提交

Make non-compat preadv/pwritev use native register size · 601cc11d

由 Linus Torvalds 提交于 4月 03, 2009

Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.

This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).

This also changes the order of 'high' and 'low' to be "low first".  Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.

This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code.  On x86-64, we now
generate

        testq   %rcx, %rcx      # pos_l
        js      .L122   #,
        movq    %rcx, -48(%rbp) # pos_l, pos

from the C source

        loff_t pos = pos_from_hilo(pos_h, pos_l);
	...
        if (pos < 0)
                return -EINVAL;

and the 'pos_h' register isn't even touched.  It used to generate code
like

        mov     %r8d, %r8d      # pos_low, pos_low
        salq    $32, %rcx       #, tmp71
        movq    %r8, %rax       # pos_low, pos.386
        orq     %rcx, %rax      # tmp71, pos.386
        js      .L122   #,
        movq    %rax, -48(%rbp) # pos.386, pos

which isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)

Note: in all cases the user code wrapper can again be the same. You can
just do

	#define HALF_BITS (sizeof(unsigned long)*4)
	__syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);

or something like that.  That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.

And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone.  Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.

[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]
Acked-by: NGerd Hoffmann <kraxel@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ralf Baechle <ralf@linux-mips.org>>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

601cc11d

03 4月, 2009 1 次提交

preadv/pwritev: Add preadv and pwritev system calls. · f3554f4b

由 Gerd Hoffmann 提交于 4月 02, 2009

This patch adds preadv and pwritev system calls.  These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

  ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
  ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables.  Other archs follow as separate patches.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3554f4b

14 1月, 2009 5 次提交

H
[CVE-2009-0029] System call wrappers part 20 · 3cdad428
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
3cdad428
H
[CVE-2009-0029] System call wrappers part 19 · 003d7ab4
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
003d7ab4
H
[CVE-2009-0029] System call wrappers part 16 · 002c8976
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
002c8976

[CVE-2009-0029] System call wrapper special cases · 6673e0c3

由 Heiko Carstens 提交于 1月 14, 2009

System calls with an unsigned long long argument can't be converted with
the standard wrappers since that would include a cast to long, which in
turn means that we would lose the upper 32 bit on 32 bit architectures.
Also semctl can't use the standard wrapper since it has a 'union'
parameter.

So we handle them as special case and add some extra wrappers instead.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

6673e0c3

[CVE-2009-0029] Convert all system calls to return a long · 2ed7c03e

由 Heiko Carstens 提交于 1月 14, 2009

Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

2ed7c03e

06 1月, 2009 1 次提交

vfs: lseek(fd, 0, SEEK_CUR) race condition · 5b6f1eb9

由 Alain Knaff 提交于 11月 10, 2008

This patch fixes a race condition in lseek. While it is expected that
unpredictable behaviour may result while repositioning the offset of a
file descriptor concurrently with reading/writing to the same file
descriptor, this should not happen when merely *reading* the file
descriptor's offset.

Unfortunately, the only portable way in Unix to read a file
descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
concurrently with read/write may mess up the position.

[with fixes from akpm]
Signed-off-by: NAlain Knaff <alain@knaff.lu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5b6f1eb9

23 10月, 2008 1 次提交

[PATCH] generic_file_llseek tidyups · 3a8cff4f

由 Christoph Hellwig 提交于 8月 11, 2008

Add kerneldoc for generic_file_llseek and generic_file_llseek_unlocked,
use sane variable names and unclutter the code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3a8cff4f

03 7月, 2008 1 次提交

Remove BKL from remote_llseek v2 · 9465efc9

由 Andi Kleen 提交于 6月 27, 2008

- Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use generic_file_llseek_unlocked directly or
take the BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Open problem: 32bit kernels can corrupt fpos because its modification
is not atomic, but they can do that anyways because there's other paths who
modify it without BKL.

Do we need a special lock for the pos/f_version = 0 checks?

Trond says the NFS BKL is likely not needed, but keep it for now
until his full audit.

v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
    and factor duplicated code (suggested by hch)

Cc: Trond.Myklebust@netapp.com
Cc: swhiteho@redhat.com
Cc: sfrench@samba.org
Cc: vandrove@vc.cvut.cz
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

9465efc9

23 4月, 2008 1 次提交

fs: use loff_t type instead of long long · 16abef0e

由 David Sterba 提交于 4月 22, 2008

Use offset type consistently.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

16abef0e

09 2月, 2008 1 次提交

remove the unused exports of sys_open/sys_read · 3287629e

由 Arjan van de Ven 提交于 2月 08, 2008

These exports (which aren't used and which are in fact dangerous to use
because they pretty much form a security hole to use) have been marked
_UNUSED since 2.6.24 with removal in 2.6.25.  This patch is their final
departure from the Linux kernel tree.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3287629e

29 1月, 2008 1 次提交

ext4: export iov_shorten from kernel for ext4's use · 19295529

由 Eric Sandeen 提交于 1月 28, 2008

Export iov_shorten() from kernel so that ext4 can
truncate too-large writes to bitmapped files.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

19295529

25 1月, 2008 1 次提交

security: call security_file_permission from rw_verify_area · c43e259c

由 James Morris 提交于 1月 12, 2008

All instances of rw_verify_area() are followed by a call to
security_file_permission(), so just call the latter from the former.
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

c43e259c

15 11月, 2007 1 次提交

mark sys_open/sys_read exports unused · cb51f973

由 Arjan van de Ven 提交于 11月 14, 2007

sys_open / sys_read were used in the early 1.2 days to load firmware from
disk inside drivers.  Since 2.0 or so this was deprecated behavior, but
several drivers still were using this.  Since a few years we have a
request_firmware() API that implements this in a nice, consistent way.
Only some old ISA sound drivers (pre-ALSA) still straggled along for some
time....  however with commit c2b1239a the
last user is now gone.

This is a good thing, since using sys_open / sys_read etc for firmware is a
very buggy to dangerous thing to do; these operations put an fd in the
process file descriptor table....  which then can be tampered with from
other threads for example.  For those who don't want the firmware loader,
filp_open()/vfs_read are the better APIs to use, without this security
issue.

The patch below marks sys_open and sys_read unused now that they're
really not used anymore, and for deletion in the 2.6.25 timeframe.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb51f973

10 10月, 2007 1 次提交

Cleanup macros for distinguishing mandatory locks · a16877ca

由 Pavel Emelyanov 提交于 10月 01, 2007

The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode).  However, fs/locks.c and some filesystems still perform
the explicit i_mode checking.  Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.

Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.

The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

a16877ca

10 7月, 2007 3 次提交

Remove remnants of sendfile() · d96e6e71

由 Jens Axboe 提交于 6月 11, 2007

There are now zero users of .sendfile() in the kernel, so kill
it from the file_operations structure and in do_sendfile().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d96e6e71

splice: divorce the splice structure/function definitions from the pipe header · d6b29d7c

由 Jens Axboe 提交于 6月 04, 2007

We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d6b29d7c

sys_sendfile: switch to using ->splice_read, if available · 534f2aaa

由 Jens Axboe 提交于 6月 01, 2007

This patch makes sendfile prefer to use ->splice_read(), if it's
available in the file_operations structure.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

534f2aaa

09 5月, 2007 2 次提交

use use SEEK_MAX to validate user lseek arguments · 1ae7075b

由 Chris Snook 提交于 5月 08, 2007

Add SEEK_MAX and use it to validate lseek arguments from userspace.
Signed-off-by: NChris Snook <csnook@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ae7075b

use symbolic constants in generic lseek code · 7b8e8924

由 Chris Snook 提交于 5月 08, 2007

Convert magic numbers to SEEK_* values from fs.h
Signed-off-by: NChris Snook <csnook@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b8e8924

13 2月, 2007 1 次提交

[PATCH] FS: speed up rw_verify_area() · 163da958

由 Eric Dumazet 提交于 2月 12, 2007

oprofile hunting showed a stall in rw_verify_area(), because of triple
indirection and potential cache misses.
(file->f_path.dentry->d_inode->i_flock)

By moving initialization of 'struct inode' pointer before the pos/count
sanity tests, we allow the compiler and processor to perform two loads by
anticipation, reducing stall, without prefetch() hints.  Even x86 arch has
enough registers to not use temporary variables and not increase text size.

I validated this patch running a bench and studied oprofile changes, and
absolute perf of the test program.

Results of my epoll_pipe_bench (source available on request) on a Pentium-M
1.6 GHz machine

Before :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples
per call
(best value out of 10 runs)

After :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples
per call
(best value out of 10 runs)

oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 %
for the rw_verify_area() function.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

163da958

12 2月, 2007 1 次提交

[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct · 4b98d11b

由 Alexey Dobriyan 提交于 2月 10, 2007

They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".

And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b98d11b

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功