提交 · 92e1d5be91a0e3ffa5c4697eeb09b2aa22792122 · openanolis / cloud-kernel

13 2月, 2007 33 次提交

[PATCH] mark struct inode_operations const 2 · 92e1d5be

由 Arjan van de Ven 提交于 2月 12, 2007

Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92e1d5be

[PATCH] mark struct inode_operations const 1 · 754661f1

由 Arjan van de Ven 提交于 2月 12, 2007

Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

754661f1

[PATCH] mark struct file_operations const 6 · 00977a59

由 Arjan van de Ven 提交于 2月 12, 2007

Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00977a59

[PATCH] ufs2 write: block allocation update · 54fb996a

由 Evgeniy Dushistov 提交于 2月 12, 2007

Patch adds ability to work with 64bit metadata, this made by replacing work
with 32bit pointers by inline functions.
Signed-off-by: NEvgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

54fb996a

[PATCH] ufs2 write: inodes write · 3313e292

由 Evgeniy Dushistov 提交于 2月 12, 2007

This patch adds into write inode path function to write UFS2 inode, and
modifys allocate inode path to allocate and init additional inode chunks.

Also some cleanups:
- remove not used parameters in some functions
- remove i_gen field from ufs_inode_info structure,
there is i_generation in inode structure with same purposes.
Signed-off-by: NEvgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3313e292

[PATCH] ufs2 write: mount as rw · cbcae39f

由 Evgeniy Dushistov 提交于 2月 12, 2007

These series of patches add UFS2 write-support.  UFS2 - is default file system
for recent versions of FreeBSD.

The main differences from UFS1 from write support point of view
are:
1)Not all inodes are allocated during formatation of disk.
2)All meta-data(pointer to data blocks) are 64bit(in UFS1 they
are 32bit).

So patch series consist of
1)make possible mount UFS2 in read-write mode
2)code to write ufs2 inodes and code to initialize inodes chunks.
3)work with 64bit meta-data

I made simple testing like create/deleting/writing/reading/truncating, also I
ran fsx-linux and untar and build kernel on UFS1 and UFS2, after that FreeBSD
fsck do not find any errors in fs.

This patch makes possible to mount ufs2 "rw", and updates UFS2 documentation:
remove note about bug(it fixed by reallocate blocks on the fly patch) and add
me in the list of people who want receive bug reports.
Signed-off-by: NEvgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cbcae39f

[PATCH] eCryptfs: add flush_dcache_page() calls · 0a9ac382

由 Michael Halcrow 提交于 2月 12, 2007

Call flush_dcache_page() after modifying a pagecache by hand.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a9ac382

[PATCH] eCryptfs: open-code flag checking and manipulation · e2bd99ec

由 Michael Halcrow 提交于 2月 12, 2007

Open-code flag checking and manipulation.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NTrevor Highland <tshighla@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2bd99ec

[PATCH] eCryptfs: convert kmap() to kmap_atomic() · 9d8b8ce5

由 Michael Halcrow 提交于 2月 12, 2007

Replace kmap() with kmap_atomic().  Reduce the amount of time that mappings
are held.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NTrevor Highland <tshighla@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9d8b8ce5

[PATCH] eCryptfs: convert f_op->write() to vfs_write() · 70456600

由 Michael Halcrow 提交于 2月 12, 2007

sys_write() takes a local copy of f_pos and writes that back
into the struct file. It does this so that two concurrent write()
callers don't make a mess of f_pos, and of the file contents.

ecryptfs should be calling vfs_write(). That way we also get the fsnotify
notifications, which ecryptfs presently appears to have subverted.

Convert direct calls to f_op->write() into calls to vfs_write().
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

70456600

[PATCH] eCryptfs: Encrypted passthrough · e77a56dd

由 Michael Halcrow 提交于 2月 12, 2007

Provide an option to provide a view of the encrypted files such that the
metadata is always in the header of the files, regardless of whether the
metadata is actually in the header or in the extended attribute. This mode of
operation is useful for applications like incremental backup utilities that do
not preserve the extended attributes when directly accessing the lower files.

With this option enabled, the files under the eCryptfs mount point will be
read-only.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e77a56dd

[PATCH] eCryptfs: Generalize metadata read/write · dd2a3b7a

由 Michael Halcrow 提交于 2月 12, 2007

Generalize the metadata reading and writing mechanisms, with two targets for
now: metadata in file header and metadata in the user.ecryptfs xattr of the
lower file.

[akpm@osdl.org: printk warning fix]
[bunk@stusta.de: make some needlessly global code static]
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd2a3b7a

[PATCH] eCryptfs: xattr flags and mount options · 17398957

由 Michael Halcrow 提交于 2月 12, 2007

This patch set introduces the ability to store cryptographic metadata into an
lower file extended attribute rather than the lower file header region.

This patch set implements two new mount options:

ecryptfs_xattr_metadata
- When set, newly created files will have their cryptographic
metadata stored in the extended attribute region of the file rather
than the header.

When storing the data in the file header, there is a minimum of 8KB
reserved for the header information for each file, making each file at
least 12KB in size. This can take up a lot of extra disk space if the user
creates a lot of small files. By storing the data in the extended
attribute, each file will only occupy at least of 4KB of space.

As the eCryptfs metadata set becomes larger with new features such as
multi-key associations, most popular filesystems will not be able to store
all of the information in the xattr region in some cases due to space
constraints. However, the majority of users will only ever associate one
key per file, so most users will be okay with storing their data in the
xattr region.

This option should be used with caution. I want to emphasize that the
xattr must be maintained under all circumstances, or the file will be
rendered permanently unrecoverable. The last thing I want is for a user to
forget to set an xattr flag in a backup utility, only to later discover
that their backups are worthless.

ecryptfs_encrypted_view
- When set, this option causes eCryptfs to present applications a
view of encrypted files as if the cryptographic metadata were
stored in the file header, whether the metadata is actually stored
in the header or in the extended attributes.

No matter what eCryptfs winds up doing in the lower filesystem, I want
to preserve a baseline format compatibility for the encrypted files. As of
right now, the metadata may be in the file header or in an xattr. There is
no reason why the metadata could not be put in a separate file in future
versions.

Without the compatibility mode, backup utilities would have to know to
back up the metadata file along with the files. The semantics of eCryptfs
have always been that the lower files are self-contained units of encrypted
data, and the only additional information required to decrypt any given
eCryptfs file is the key. That is what has always been emphasized about
eCryptfs lower files, and that is what users expect. Providing the
encrypted view option will provide a way to userspace applications wherein
they can always get to the same old familiar eCryptfs encrypted files,
regardless of what eCryptfs winds up doing with the metadata behind the
scenes.

This patch:

Add extended attribute support to version bit vector, flags to indicate when
xattr or encrypted view modes are enabled, and support for the new mount
options.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17398957

[PATCH] eCryptfs: Public key; packet management · dddfa461

由 Michael Halcrow 提交于 2月 12, 2007

Public key support code.  This reads and writes packets in the header that
contain public key encrypted file keys.  It calls the messaging code in the
previous patch to send and receive encryption and decryption request
packets from the userspace daemon.

[akpm@osdl.org: cleab fix]
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dddfa461

[PATCH] eCryptfs: Public key transport mechanism · 88b4a07e

由 Michael Halcrow 提交于 2月 12, 2007

This is the transport code for public key functionality in eCryptfs.  It
manages encryption/decryption request queues with a transport mechanism.
Currently, netlink is the only implemented transport.

Each inode has a unique File Encryption Key (FEK).  Under passphrase, a File
Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase
combo on mount.  This FEKEK encrypts each FEK and writes it into the header of
each file using the packet format specified in RFC 2440.  This is all
symmetric key encryption, so it can all be done via the kernel crypto API.

These new patches introduce public key encryption of the FEK.  There is no
asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes
the FEK encryption and decryption out to a userspace daemon.  After
considering our requirements and determining the complexity of using various
transport mechanisms, we settled on netlink for this communication.

eCryptfs stores authentication tokens into the kernel keyring.  These tokens
correlate with individual keys.  For passphrase mode of operation, the
authentication token contains the symmetric FEKEK.  For public key, the
authentication token contains a PKI type and an opaque data blob managed by
individual PKI modules in userspace.

Each user who opens a file under an eCryptfs partition mounted in public key
mode must be running a daemon.  That daemon has the user's credentials and has
access to all of the keys to which the user should have access.  The daemon,
when started, initializes the pluggable PKI modules available on the system
and registers itself with the eCryptfs kernel module.  Userspace utilities
register public key authentication tokens into the user session keyring.
These authentication tokens correlate key signatures with PKI modules and PKI
blobs.  The PKI blobs contain PKI-specific information necessary for the PKI
module to carry out asymmetric key encryption and decryption.

When the eCryptfs module parses the header of an existing file and finds a Tag
1 (Public Key) packet (see RFC 2440), it reads in the public key identifier
(signature).  The asymmetrically encrypted FEK is in the Tag 1 packet;
eCryptfs puts together a decrypt request packet containing the signature and
the encrypted FEK, then it passes it to the daemon registered for the
current->euid via a netlink unicast to the PID of the daemon, which was
registered at the time the daemon was started by the user.

The daemon actually just makes calls to libecryptfs, which implements request
packet parsing and manages PKI modules.  libecryptfs grabs the public key
authentication token for the given signature from the user session keyring.
This auth tok tells libecryptfs which PKI module should receive the request.
libecryptfs then makes a decrypt() call to the PKI module, and it passes along
the PKI block from the auth tok.  The PKI uses the blob to figure out how it
should decrypt the data passed to it; it performs the decryption and passes
the decrypted data back to libecryptfs.  libecryptfs then puts together a
reply packet with the decrypted FEK and passes that back to the eCryptfs
module.

The eCryptfs module manages these request callouts to userspace code via
message context structs.  The module maintains an array of message context
structs and places the elements of the array on two lists: a free and an
allocated list.  When eCryptfs wants to make a request, it moves a msg ctx
from the free list to the allocated list, sets its state to pending, and fires
off the message to the user's registered daemon.

When eCryptfs receives a netlink message (via the callback), it correlates the
msg ctx struct in the alloc list with the data in the message itself.  The
msg->index contains the offset of the array of msg ctx structs.  It verifies
that the registered daemon PID is the same as the PID of the process that sent
the message.  It also validates a sequence number between the received packet
and the msg ctx.  Then, it copies the contents of the message (the reply
packet) into the msg ctx struct, sets the state in the msg ctx to done, and
wakes up the process that was sleeping while waiting for the reply.

The sleeping process was whatever was performing the sys_open().  This process
originally called ecryptfs_send_message(); it is now in
ecryptfs_wait_for_response().  When it wakes up and sees that the msg ctx
state was set to done, it returns a pointer to the message contents (the reply
packet) and returns.  If all went well, this packet contains the decrypted
FEK, which is then copied into the crypt_stat struct, and life continues as
normal.

The case for creation of a new file is very similar, only instead of a decrypt
request, eCryptfs sends out an encrypt request.

> - We have a great clod of key mangement code in-kernel.  Why is that
>   not suitable (or growable) for public key management?

eCryptfs uses Howells' keyring to store persistent key data and PKI state
information.  It defers public key cryptographic transformations to userspace
code.  The userspace data manipulation request really is orthogonal to key
management in and of itself.  What eCryptfs basically needs is a secure way to
communicate with a particular daemon for a particular task doing a syscall,
based on the UID.  Nothing running under another UID should be able to access
that channel of communication.

> - Is it appropriate that new infrastructure for public key
> management be private to a particular fs?

The messaging.c file contains a lot of code that, perhaps, could be extracted
into a separate kernel service.  In essence, this would be a sort of
request/reply mechanism that would involve a userspace daemon.  I am not aware
of anything that does quite what eCryptfs does, so I was not aware of any
existing tools to do just what we wanted.

>   What happens if one of these daemons exits without sending a quit
>   message?

There is a stale uid<->pid association in the hash table for that user.  When
the user registers a new daemon, eCryptfs cleans up the old association and
generates a new one.  See ecryptfs_process_helo().

> - _why_ does it use netlink?

Netlink provides the transport mechanism that would minimize the complexity of
the implementation, given that we can have multiple daemons (one per user).  I
explored the possibility of using relayfs, but that would involve having to
introduce control channels and a protocol for creating and tearing down
channels for the daemons.  We do not have to worry about any of that with
netlink.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

88b4a07e

[PATCH] include/linux/nfsd/const.h: remove NFS_SUPER_MAGIC · b5d5dfbd

由 Adrian Bunk 提交于 2月 12, 2007

NFS_SUPER_MAGIC is already defined in include/linux/magic.h
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b5d5dfbd

[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses · 27459f09

由 Chuck Lever 提交于 2月 12, 2007

Expand the rq_addr field to allow it to contain larger addresses.

Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27459f09

[PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing · ad06e4bd

由 Chuck Lever 提交于 2月 12, 2007

There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address.  Top among these are error and debugging messages
that display the server's IP address.

Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad06e4bd

[PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper · 482fb94e

由 Chuck Lever 提交于 2月 12, 2007

Sometimes we need to create an RPC service but not register it with the local
portmapper.  NFSv4 delegation callback, for example.

Change the svc_makesock() API to allow optionally creating temporary or
permanent sockets, optionally registering with the local portmapper, and make
it return the ephemeral port of the new socket.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

482fb94e

[PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task · 41487c65

由 Eric W. Biederman 提交于 2月 12, 2007

There isn't any real advantage to this change except that it allows the old
functions to be removed.  Which is easier on maintenance and puts the code in
a more uniform style.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41487c65

[PATCH] tty: update the tty layer to work with struct pid · ab521dc0

由 Eric W. Biederman 提交于 2月 12, 2007

Of kernel subsystems that work with pids the tty layer is probably the largest
consumer.  But it has the nice virtue that the assiation with a session only
lasts until the session leader exits.  Which means that no reference counting
is required.  So using struct pid winds up being a simple optimization to
avoid hash table lookups.

In the long term the use of pid_nr also ensures that when we have multiple pid
spaces mixed everything will work correctly.
Signed-off-by: NEric W. Biederman <eric@maxwell.lnxi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab521dc0

[PATCH] Minix V3 support · 939b00df

由 Andries Brouwer 提交于 2月 12, 2007

This morning I needed to read a Minix V3 filesystem, but unfortunately my
2.6.19 did not support that, and neither did the downloaded 2.6.20rc4.

Fortunately, google told me that Daniel Aragones had already done the work,
patch found at http://www.terra.es/personal2/danarag/

Unfortunaly, looking at the patch was painful to my eyes, so I polished it
a bit before applying.  The resulting kernel boots, and reads the
filesystem it needed to read.
Signed-off-by: NDaniel Aragones <danarag@gmail.com>
Signed-off-by: NAndries Brouwer <aeb@cwi.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

939b00df

[PATCH] FS: speed up rw_verify_area() · 163da958

由 Eric Dumazet 提交于 2月 12, 2007

oprofile hunting showed a stall in rw_verify_area(), because of triple
indirection and potential cache misses.
(file->f_path.dentry->d_inode->i_flock)

By moving initialization of 'struct inode' pointer before the pos/count
sanity tests, we allow the compiler and processor to perform two loads by
anticipation, reducing stall, without prefetch() hints.  Even x86 arch has
enough registers to not use temporary variables and not increase text size.

I validated this patch running a bench and studied oprofile changes, and
absolute perf of the test program.

Results of my epoll_pipe_bench (source available on request) on a Pentium-M
1.6 GHz machine

Before :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples
per call
(best value out of 10 runs)

After :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples
per call
(best value out of 10 runs)

oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 %
for the rw_verify_area() function.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

163da958

[PATCH] warning fix: unsigned->signed · 3991d3bd

由 Tomasz Kvarsin 提交于 2月 12, 2007

While compiling my code with -Wconversion using gcc-trunk, I always get a
bunch of warrning from headers, here is fix for them:

__getblk is alawys called with unsigned argument,
but it takes signed, the same story with __bread,__breadahead and so on.

Signed-off-by: Tomasz Kvarsin
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3991d3bd

[PATCH] reiserfs: Use ARRAY_SIZE macro when appropriate · 79a81aef

由 Ahmed S. Darwish 提交于 2月 12, 2007

Use ARRAY_SIZE macro already defined in kernel.h
Signed-off-by: NAhmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79a81aef

[PATCH] inotify: read return val fix · f9e4acf3

由 Nick Piggin 提交于 2月 12, 2007

Fix for inotify read bug (bugzilla.kernel.org #6999)

Problem Description:
When reading from an inotify device with an insufficient sized buffer, read(2)
will return 0 with no errno set. This is because of an logically incorrect
action from the user program thus should return an more logical value. My
suggestion is return -EINVAL as for bind(2).

This patch is based on the proposal from Ryan <wolf0403@hotmail.com>, and
feedback from John McCutchan <john@johnmccutchan.com>.

Return -EINVAL if we have not passed in enough buffer space to read a single
inotify event, rather than 0 which indicates that there is nothing to read.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Acked-by: N"John McCutchan" <john@johnmccutchan.com>
Cc: Ryan <wolf0403@hotmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9e4acf3

[PATCH] remove sb->s_files and file_list_lock usage in dquot.c · d003fb70

由 Christoph Hellwig 提交于 2月 12, 2007

Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref.  This
reduces list search and lock hold time aswell as getting rid of one of the
few uses of file_list_lock which Ingo identified as a scalability problem.

Previously we called dq_op->initialize for every inode handing of a
writeable file that wasn't initialized before.  Now we're calling it for
every inode that has a non-zero i_writecount, aka a writeable file
descriptor refering to it.

Thanks a lot to Jan Kara for running this patch through his quota test
harness.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d003fb70

[PATCH] move remove_dquot_ref to dqout.c · fb58b731

由 Christoph Hellwig 提交于 2月 12, 2007

Remove_dquot_ref can move to dqout.c instead of beeing in inode.c under
#ifdef CONFIG_QUOTA.  Also clean the resulting code up a tiny little bit by
testing sb->dq_op earlier - it's constant over a filesystems lifetime.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb58b731

[PATCH] Fix d_path for lazy unmounts · eb3dfb0c

由 Andreas Gruenbacher 提交于 2月 12, 2007

Here is a bugfix to d_path.

First, when d_path() hits a lazily unmounted mount point, it tries to
prepend the name of the lazily unmounted dentry to the path name. It gets
this wrong, and also overwrites the slash that separates the name from the
following pathname component. This is demonstrated by the attached test
case, which prints "getcwd returned d_path-bugsubdir" with the bug. The
correct result would be "getcwd returned d_path-bug/subdir".

It could be argued that the name of the root dentry should not be part of
the result of d_path in the first place. On the other hand, what the
unconnected namespace was once reachable as may provide some useful hints
to users, and so that seems okay.

Second, it isn't always possible to tell from the __d_path result whether
the specified root and rootmnt (i.e., the chroot) was reached: lazy
unmounts of bind mounts will produce a path that does start with a
non-slash so we can tell from that, but other lazy unmounts will produce a
path that starts with a slash, just like "ordinary" paths.

The attached patch cleans up __d_path() to fix the bug with overlapping
pathname components. It also adds a @fail_deleted argument, which allows
to get rid of some of the mess in sys_getcwd(). Grabbing the dcache_lock
can then also be moved into __d_path(). The patch also makes sure that
paths will only start with a slash for paths which are connected to the
root and rootmnt.

The @fail_deleted argument could be added to d_path() as well: this would
allow callers to recognize deleted files, without having to resort to the
ambiguous check for the " (deleted)" string at the end of the pathnames.
This is not currently done, but it might be worthwhile.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb3dfb0c

[PATCH] NTFS: rename incorrect check of NTFS_DEBUG with just DEBUG · 5c3bd438

由 Robert P. J. Day 提交于 2月 12, 2007

Replace the incorrect debugging check of "#ifdef NTFS_DEBUG" with
just "#ifdef DEBUG".
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Acked-by: NAnton Altaparmakov <aia21@cantab.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c3bd438

[PATCH] register_chrdev_region() don't hand out the LOCAL/EXPERIMENTAL majors · 215122e1

由 Andrew Morton 提交于 2月 12, 2007

As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic
chardev major allocation can hand out majors which LANANA has defined as being
for local/experimental use.

Cc: Torben Mathiasen <device@lanana.org>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tomas Klas <tomas.klas@mepatek.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

215122e1

[PATCH] Make XFS use BH_Unwritten and BH_Delay correctly · 6ab8eb1c

由 David Chinner 提交于 2月 12, 2007

Don't hide buffer_unwritten behind buffer_delay() and remove the hack that
clears unexpected buffer_unwritten() states now that it can't happen.
Signed-off-by: NDave Chinner <dgc@sgi.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Cc: Timothy Shimmin <tes@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ab8eb1c

[PATCH] Make BH_Unwritten a first class bufferhead flag V2 · 33a266dd

由 David Chinner 提交于 2月 12, 2007

Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
bufferhead.  Recently, I found the long standing mmap/unwritten extent
conversion bug, and it was to do with partial page invalidation not clearing
the unwritten flag from bufferheads attached to the page but beyond EOF.  See
here for a full explaination:

http://oss.sgi.com/archives/xfs/2006-12/msg00196.html

The solution I have checked into the XFS dev tree involves duplicating code
from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
and then calling block_invalidatepage() to do the rest.

Christoph suggested that this would be better solved by pushing the unwritten
flag into the common buffer head flags and just adding the call to
discard_buffer():

http://oss.sgi.com/archives/xfs/2006-12/msg00239.html

The following patch makes BH_Unwritten a first class citizen.
Signed-off-by: NDave Chinner <dgc@sgi.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33a266dd

12 2月, 2007 7 次提交

[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct · 4b98d11b

由 Alexey Dobriyan 提交于 2月 10, 2007

They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".

And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b98d11b

[PATCH] jbd layer function called instead of fs specific one · 3e4fdaf8

由 Dmitriy Monakhov 提交于 2月 10, 2007

jbd function called instead of fs specific one.
Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e4fdaf8

[PATCH] Remove unused kernel config option ZISOFS_FS · 730c385b

由 Robert P. J. Day 提交于 2月 10, 2007

Remove the kernel config option ZISOFS_FS, since it appears that the actual
option is simply ZISOFS.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

730c385b

[PATCH] buffer: memorder fix · 72ed3d03

由 Nick Piggin 提交于 2月 10, 2007

unlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.

Mingming later sent the same patch, saying:

  We are running SDET benchmark and saw double free issue for ext3 extended
  attributes block, which complains the same xattr block already being freed (in
  ext3_xattr_release_block()).  The problem could also been triggered by
  multiple threads loop untar/rm a kernel tree.

  The race is caused by missing a memory barrier at unlock_buffer() before the
  lock bit being cleared, resulting in possible concurrent h_refcounter update.
  That causes a reference counter leak, then later leads to the double free that
  we have seen.

  Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
  bit is being cleared, however, there is no memory barrier *before* the bit is
  cleared.  On some arch the h_refcount update instruction and the clear bit
  instruction could be reordered, thus leave the critical section re-entered.

  The race is like this: For example, if the h_refcount is initialized as 1,

  cpu 0:                                   cpu1
  --------------------------------------   -----------------------------------
  lock_buffer() /* test_and_set_bit */
  clear_buffer_locked(bh);
                                          lock_buffer() /* test_and_set_bit */
  h_refcount = h_refcount+1; /* = 2*/     h_refcount = h_refcount + 1; /*= 2 */
                                          clear_buffer_locked(bh);
  ....                                    ......

  We lost a h_refcount here. We need a memory barrier before the buffer head lock
  bit being cleared to force the order of the two writes.  Please apply.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72ed3d03

[PATCH] extend the set of "__attribute__" shortcut macros · 82ddcb04

由 Robert P. J. Day 提交于 2月 10, 2007

Extend the set of "__attribute__" shortcut macros, and remove identical
(and now superfluous) definitions from a couple of source files.

based on a page at robert love's blog:

	http://rlove.org/log/2005102601

extend the set of shortcut macros defined in compiler-gcc.h with the
following:

#define __packed                       __attribute__((packed))
#define __weak                         __attribute__((weak))
#define __naked                        __attribute__((naked))
#define __noreturn                     __attribute__((noreturn))
#define __pure                         __attribute__((pure))
#define __aligned(x)                   __attribute__((aligned(x)))
#define __printf(a,b)                  __attribute__((format(printf,a,b)))

Once these are in place, it's up to subsystem maintainers to decide if they
want to take advantage of them.  there is already a strong precedent for
using shortcuts like this in the source tree.

The ones that might give people pause are "__aligned" and "__printf", but
shortcuts for both of those are already in use, and in some ways very
confusingly.  note the two very different definitions for a macro named
"ALIGNED":

  drivers/net/sgiseeq.c:#define ALIGNED(x) ((((unsigned long)(x)) + 0xf) & ~(0xf))
  drivers/scsi/ultrastor.c:#define ALIGNED(x) __attribute__((aligned(x)))

also:

  include/acpi/platform/acgcc.h:
    #define ACPI_PRINTF_LIKE(c) __attribute__ ((__format__ (__printf__, c, c+1)))

Given the precedent, then, it seems logical to at least standardize on a
consistent set of these macros.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

82ddcb04

[PATCH] remove ext[34]_inc_count and _dec_count · 731b9a54

由 Eric Sandeen 提交于 2月 10, 2007

- Naming is confusing, ext3_inc_count manipulates i_nlink not i_count
- handle argument passed in is not used
- ext3 and ext4 already call inc_nlink and dec_nlink directly in other places
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

731b9a54

[PATCH] return ENOENT from ext3_link when racing with unlink · 2988a774

由 Eric Sandeen 提交于 2月 10, 2007

Return -ENOENT from ext[34]_link if we've raced with unlink and i_nlink is
0.  Doing otherwise has the potential to corrupt the orphan inode list,
because we'd wind up with an inode with a non-zero link count on the list,
and it will never get properly cleaned up & removed from the orphan list
before it is freed.

[akpm@osdl.org: build fix]
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2988a774

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功