提交 · 88b4a07e6610f4c93b08b0bb103318218db1e9f6 · openeuler / raspberrypi-kernel

13 2月, 2007 19 次提交

[PATCH] eCryptfs: Public key transport mechanism · 88b4a07e

由 Michael Halcrow 提交于 2月 12, 2007

This is the transport code for public key functionality in eCryptfs.  It
manages encryption/decryption request queues with a transport mechanism.
Currently, netlink is the only implemented transport.

Each inode has a unique File Encryption Key (FEK).  Under passphrase, a File
Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase
combo on mount.  This FEKEK encrypts each FEK and writes it into the header of
each file using the packet format specified in RFC 2440.  This is all
symmetric key encryption, so it can all be done via the kernel crypto API.

These new patches introduce public key encryption of the FEK.  There is no
asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes
the FEK encryption and decryption out to a userspace daemon.  After
considering our requirements and determining the complexity of using various
transport mechanisms, we settled on netlink for this communication.

eCryptfs stores authentication tokens into the kernel keyring.  These tokens
correlate with individual keys.  For passphrase mode of operation, the
authentication token contains the symmetric FEKEK.  For public key, the
authentication token contains a PKI type and an opaque data blob managed by
individual PKI modules in userspace.

Each user who opens a file under an eCryptfs partition mounted in public key
mode must be running a daemon.  That daemon has the user's credentials and has
access to all of the keys to which the user should have access.  The daemon,
when started, initializes the pluggable PKI modules available on the system
and registers itself with the eCryptfs kernel module.  Userspace utilities
register public key authentication tokens into the user session keyring.
These authentication tokens correlate key signatures with PKI modules and PKI
blobs.  The PKI blobs contain PKI-specific information necessary for the PKI
module to carry out asymmetric key encryption and decryption.

When the eCryptfs module parses the header of an existing file and finds a Tag
1 (Public Key) packet (see RFC 2440), it reads in the public key identifier
(signature).  The asymmetrically encrypted FEK is in the Tag 1 packet;
eCryptfs puts together a decrypt request packet containing the signature and
the encrypted FEK, then it passes it to the daemon registered for the
current->euid via a netlink unicast to the PID of the daemon, which was
registered at the time the daemon was started by the user.

The daemon actually just makes calls to libecryptfs, which implements request
packet parsing and manages PKI modules.  libecryptfs grabs the public key
authentication token for the given signature from the user session keyring.
This auth tok tells libecryptfs which PKI module should receive the request.
libecryptfs then makes a decrypt() call to the PKI module, and it passes along
the PKI block from the auth tok.  The PKI uses the blob to figure out how it
should decrypt the data passed to it; it performs the decryption and passes
the decrypted data back to libecryptfs.  libecryptfs then puts together a
reply packet with the decrypted FEK and passes that back to the eCryptfs
module.

The eCryptfs module manages these request callouts to userspace code via
message context structs.  The module maintains an array of message context
structs and places the elements of the array on two lists: a free and an
allocated list.  When eCryptfs wants to make a request, it moves a msg ctx
from the free list to the allocated list, sets its state to pending, and fires
off the message to the user's registered daemon.

When eCryptfs receives a netlink message (via the callback), it correlates the
msg ctx struct in the alloc list with the data in the message itself.  The
msg->index contains the offset of the array of msg ctx structs.  It verifies
that the registered daemon PID is the same as the PID of the process that sent
the message.  It also validates a sequence number between the received packet
and the msg ctx.  Then, it copies the contents of the message (the reply
packet) into the msg ctx struct, sets the state in the msg ctx to done, and
wakes up the process that was sleeping while waiting for the reply.

The sleeping process was whatever was performing the sys_open().  This process
originally called ecryptfs_send_message(); it is now in
ecryptfs_wait_for_response().  When it wakes up and sees that the msg ctx
state was set to done, it returns a pointer to the message contents (the reply
packet) and returns.  If all went well, this packet contains the decrypted
FEK, which is then copied into the crypt_stat struct, and life continues as
normal.

The case for creation of a new file is very similar, only instead of a decrypt
request, eCryptfs sends out an encrypt request.

> - We have a great clod of key mangement code in-kernel.  Why is that
>   not suitable (or growable) for public key management?

eCryptfs uses Howells' keyring to store persistent key data and PKI state
information.  It defers public key cryptographic transformations to userspace
code.  The userspace data manipulation request really is orthogonal to key
management in and of itself.  What eCryptfs basically needs is a secure way to
communicate with a particular daemon for a particular task doing a syscall,
based on the UID.  Nothing running under another UID should be able to access
that channel of communication.

> - Is it appropriate that new infrastructure for public key
> management be private to a particular fs?

The messaging.c file contains a lot of code that, perhaps, could be extracted
into a separate kernel service.  In essence, this would be a sort of
request/reply mechanism that would involve a userspace daemon.  I am not aware
of anything that does quite what eCryptfs does, so I was not aware of any
existing tools to do just what we wanted.

>   What happens if one of these daemons exits without sending a quit
>   message?

There is a stale uid<->pid association in the hash table for that user.  When
the user registers a new daemon, eCryptfs cleans up the old association and
generates a new one.  See ecryptfs_process_helo().

> - _why_ does it use netlink?

Netlink provides the transport mechanism that would minimize the complexity of
the implementation, given that we can have multiple daemons (one per user).  I
explored the possibility of using relayfs, but that would involve having to
introduce control channels and a protocol for creating and tearing down
channels for the daemons.  We do not have to worry about any of that with
netlink.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

88b4a07e

[PATCH] include/linux/nfsd/const.h: remove NFS_SUPER_MAGIC · b5d5dfbd

由 Adrian Bunk 提交于 2月 12, 2007

NFS_SUPER_MAGIC is already defined in include/linux/magic.h
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b5d5dfbd

[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses · 27459f09

由 Chuck Lever 提交于 2月 12, 2007

Expand the rq_addr field to allow it to contain larger addresses.

Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27459f09

[PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing · ad06e4bd

由 Chuck Lever 提交于 2月 12, 2007

There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address.  Top among these are error and debugging messages
that display the server's IP address.

Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad06e4bd

[PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper · 482fb94e

由 Chuck Lever 提交于 2月 12, 2007

Sometimes we need to create an RPC service but not register it with the local
portmapper.  NFSv4 delegation callback, for example.

Change the svc_makesock() API to allow optionally creating temporary or
permanent sockets, optionally registering with the local portmapper, and make
it return the ephemeral port of the new socket.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

482fb94e

[PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task · 41487c65

由 Eric W. Biederman 提交于 2月 12, 2007

There isn't any real advantage to this change except that it allows the old
functions to be removed.  Which is easier on maintenance and puts the code in
a more uniform style.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41487c65

[PATCH] tty: update the tty layer to work with struct pid · ab521dc0

由 Eric W. Biederman 提交于 2月 12, 2007

Of kernel subsystems that work with pids the tty layer is probably the largest
consumer.  But it has the nice virtue that the assiation with a session only
lasts until the session leader exits.  Which means that no reference counting
is required.  So using struct pid winds up being a simple optimization to
avoid hash table lookups.

In the long term the use of pid_nr also ensures that when we have multiple pid
spaces mixed everything will work correctly.
Signed-off-by: NEric W. Biederman <eric@maxwell.lnxi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab521dc0

[PATCH] Minix V3 support · 939b00df

由 Andries Brouwer 提交于 2月 12, 2007

This morning I needed to read a Minix V3 filesystem, but unfortunately my
2.6.19 did not support that, and neither did the downloaded 2.6.20rc4.

Fortunately, google told me that Daniel Aragones had already done the work,
patch found at http://www.terra.es/personal2/danarag/

Unfortunaly, looking at the patch was painful to my eyes, so I polished it
a bit before applying.  The resulting kernel boots, and reads the
filesystem it needed to read.
Signed-off-by: NDaniel Aragones <danarag@gmail.com>
Signed-off-by: NAndries Brouwer <aeb@cwi.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

939b00df

[PATCH] FS: speed up rw_verify_area() · 163da958

由 Eric Dumazet 提交于 2月 12, 2007

oprofile hunting showed a stall in rw_verify_area(), because of triple
indirection and potential cache misses.
(file->f_path.dentry->d_inode->i_flock)

By moving initialization of 'struct inode' pointer before the pos/count
sanity tests, we allow the compiler and processor to perform two loads by
anticipation, reducing stall, without prefetch() hints.  Even x86 arch has
enough registers to not use temporary variables and not increase text size.

I validated this patch running a bench and studied oprofile changes, and
absolute perf of the test program.

Results of my epoll_pipe_bench (source available on request) on a Pentium-M
1.6 GHz machine

Before :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples
per call
(best value out of 10 runs)

After :
# ./epoll_pipe_bench -l 30 -t 20
Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples
per call
(best value out of 10 runs)

oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 %
for the rw_verify_area() function.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

163da958

[PATCH] warning fix: unsigned->signed · 3991d3bd

由 Tomasz Kvarsin 提交于 2月 12, 2007

While compiling my code with -Wconversion using gcc-trunk, I always get a
bunch of warrning from headers, here is fix for them:

__getblk is alawys called with unsigned argument,
but it takes signed, the same story with __bread,__breadahead and so on.

Signed-off-by: Tomasz Kvarsin
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3991d3bd

[PATCH] reiserfs: Use ARRAY_SIZE macro when appropriate · 79a81aef

由 Ahmed S. Darwish 提交于 2月 12, 2007

Use ARRAY_SIZE macro already defined in kernel.h
Signed-off-by: NAhmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79a81aef

[PATCH] inotify: read return val fix · f9e4acf3

由 Nick Piggin 提交于 2月 12, 2007

Fix for inotify read bug (bugzilla.kernel.org #6999)

Problem Description:
When reading from an inotify device with an insufficient sized buffer, read(2)
will return 0 with no errno set. This is because of an logically incorrect
action from the user program thus should return an more logical value. My
suggestion is return -EINVAL as for bind(2).

This patch is based on the proposal from Ryan <wolf0403@hotmail.com>, and
feedback from John McCutchan <john@johnmccutchan.com>.

Return -EINVAL if we have not passed in enough buffer space to read a single
inotify event, rather than 0 which indicates that there is nothing to read.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Acked-by: N"John McCutchan" <john@johnmccutchan.com>
Cc: Ryan <wolf0403@hotmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9e4acf3

[PATCH] remove sb->s_files and file_list_lock usage in dquot.c · d003fb70

由 Christoph Hellwig 提交于 2月 12, 2007

Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref.  This
reduces list search and lock hold time aswell as getting rid of one of the
few uses of file_list_lock which Ingo identified as a scalability problem.

Previously we called dq_op->initialize for every inode handing of a
writeable file that wasn't initialized before.  Now we're calling it for
every inode that has a non-zero i_writecount, aka a writeable file
descriptor refering to it.

Thanks a lot to Jan Kara for running this patch through his quota test
harness.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d003fb70

[PATCH] move remove_dquot_ref to dqout.c · fb58b731

由 Christoph Hellwig 提交于 2月 12, 2007

Remove_dquot_ref can move to dqout.c instead of beeing in inode.c under
#ifdef CONFIG_QUOTA.  Also clean the resulting code up a tiny little bit by
testing sb->dq_op earlier - it's constant over a filesystems lifetime.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb58b731

[PATCH] Fix d_path for lazy unmounts · eb3dfb0c

由 Andreas Gruenbacher 提交于 2月 12, 2007

Here is a bugfix to d_path.

First, when d_path() hits a lazily unmounted mount point, it tries to
prepend the name of the lazily unmounted dentry to the path name. It gets
this wrong, and also overwrites the slash that separates the name from the
following pathname component. This is demonstrated by the attached test
case, which prints "getcwd returned d_path-bugsubdir" with the bug. The
correct result would be "getcwd returned d_path-bug/subdir".

It could be argued that the name of the root dentry should not be part of
the result of d_path in the first place. On the other hand, what the
unconnected namespace was once reachable as may provide some useful hints
to users, and so that seems okay.

Second, it isn't always possible to tell from the __d_path result whether
the specified root and rootmnt (i.e., the chroot) was reached: lazy
unmounts of bind mounts will produce a path that does start with a
non-slash so we can tell from that, but other lazy unmounts will produce a
path that starts with a slash, just like "ordinary" paths.

The attached patch cleans up __d_path() to fix the bug with overlapping
pathname components. It also adds a @fail_deleted argument, which allows
to get rid of some of the mess in sys_getcwd(). Grabbing the dcache_lock
can then also be moved into __d_path(). The patch also makes sure that
paths will only start with a slash for paths which are connected to the
root and rootmnt.

The @fail_deleted argument could be added to d_path() as well: this would
allow callers to recognize deleted files, without having to resort to the
ambiguous check for the " (deleted)" string at the end of the pathnames.
This is not currently done, but it might be worthwhile.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb3dfb0c

[PATCH] NTFS: rename incorrect check of NTFS_DEBUG with just DEBUG · 5c3bd438

由 Robert P. J. Day 提交于 2月 12, 2007

Replace the incorrect debugging check of "#ifdef NTFS_DEBUG" with
just "#ifdef DEBUG".
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Acked-by: NAnton Altaparmakov <aia21@cantab.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c3bd438

[PATCH] register_chrdev_region() don't hand out the LOCAL/EXPERIMENTAL majors · 215122e1

由 Andrew Morton 提交于 2月 12, 2007

As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic
chardev major allocation can hand out majors which LANANA has defined as being
for local/experimental use.

Cc: Torben Mathiasen <device@lanana.org>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tomas Klas <tomas.klas@mepatek.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

215122e1

[PATCH] Make XFS use BH_Unwritten and BH_Delay correctly · 6ab8eb1c

由 David Chinner 提交于 2月 12, 2007

Don't hide buffer_unwritten behind buffer_delay() and remove the hack that
clears unexpected buffer_unwritten() states now that it can't happen.
Signed-off-by: NDave Chinner <dgc@sgi.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Cc: Timothy Shimmin <tes@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ab8eb1c

[PATCH] Make BH_Unwritten a first class bufferhead flag V2 · 33a266dd

由 David Chinner 提交于 2月 12, 2007

Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
bufferhead.  Recently, I found the long standing mmap/unwritten extent
conversion bug, and it was to do with partial page invalidation not clearing
the unwritten flag from bufferheads attached to the page but beyond EOF.  See
here for a full explaination:

http://oss.sgi.com/archives/xfs/2006-12/msg00196.html

The solution I have checked into the XFS dev tree involves duplicating code
from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
and then calling block_invalidatepage() to do the rest.

Christoph suggested that this would be better solved by pushing the unwritten
flag into the common buffer head flags and just adding the call to
discard_buffer():

http://oss.sgi.com/archives/xfs/2006-12/msg00239.html

The following patch makes BH_Unwritten a first class citizen.
Signed-off-by: NDave Chinner <dgc@sgi.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33a266dd

12 2月, 2007 21 次提交

[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct · 4b98d11b

由 Alexey Dobriyan 提交于 2月 10, 2007

They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".

And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b98d11b

[PATCH] jbd layer function called instead of fs specific one · 3e4fdaf8

由 Dmitriy Monakhov 提交于 2月 10, 2007

jbd function called instead of fs specific one.
Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e4fdaf8

[PATCH] Remove unused kernel config option ZISOFS_FS · 730c385b

由 Robert P. J. Day 提交于 2月 10, 2007

Remove the kernel config option ZISOFS_FS, since it appears that the actual
option is simply ZISOFS.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

730c385b

[PATCH] buffer: memorder fix · 72ed3d03

由 Nick Piggin 提交于 2月 10, 2007

unlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.

Mingming later sent the same patch, saying:

  We are running SDET benchmark and saw double free issue for ext3 extended
  attributes block, which complains the same xattr block already being freed (in
  ext3_xattr_release_block()).  The problem could also been triggered by
  multiple threads loop untar/rm a kernel tree.

  The race is caused by missing a memory barrier at unlock_buffer() before the
  lock bit being cleared, resulting in possible concurrent h_refcounter update.
  That causes a reference counter leak, then later leads to the double free that
  we have seen.

  Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
  bit is being cleared, however, there is no memory barrier *before* the bit is
  cleared.  On some arch the h_refcount update instruction and the clear bit
  instruction could be reordered, thus leave the critical section re-entered.

  The race is like this: For example, if the h_refcount is initialized as 1,

  cpu 0:                                   cpu1
  --------------------------------------   -----------------------------------
  lock_buffer() /* test_and_set_bit */
  clear_buffer_locked(bh);
                                          lock_buffer() /* test_and_set_bit */
  h_refcount = h_refcount+1; /* = 2*/     h_refcount = h_refcount + 1; /*= 2 */
                                          clear_buffer_locked(bh);
  ....                                    ......

  We lost a h_refcount here. We need a memory barrier before the buffer head lock
  bit being cleared to force the order of the two writes.  Please apply.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72ed3d03

[PATCH] extend the set of "__attribute__" shortcut macros · 82ddcb04

由 Robert P. J. Day 提交于 2月 10, 2007

Extend the set of "__attribute__" shortcut macros, and remove identical
(and now superfluous) definitions from a couple of source files.

based on a page at robert love's blog:

	http://rlove.org/log/2005102601

extend the set of shortcut macros defined in compiler-gcc.h with the
following:

#define __packed                       __attribute__((packed))
#define __weak                         __attribute__((weak))
#define __naked                        __attribute__((naked))
#define __noreturn                     __attribute__((noreturn))
#define __pure                         __attribute__((pure))
#define __aligned(x)                   __attribute__((aligned(x)))
#define __printf(a,b)                  __attribute__((format(printf,a,b)))

Once these are in place, it's up to subsystem maintainers to decide if they
want to take advantage of them.  there is already a strong precedent for
using shortcuts like this in the source tree.

The ones that might give people pause are "__aligned" and "__printf", but
shortcuts for both of those are already in use, and in some ways very
confusingly.  note the two very different definitions for a macro named
"ALIGNED":

  drivers/net/sgiseeq.c:#define ALIGNED(x) ((((unsigned long)(x)) + 0xf) & ~(0xf))
  drivers/scsi/ultrastor.c:#define ALIGNED(x) __attribute__((aligned(x)))

also:

  include/acpi/platform/acgcc.h:
    #define ACPI_PRINTF_LIKE(c) __attribute__ ((__format__ (__printf__, c, c+1)))

Given the precedent, then, it seems logical to at least standardize on a
consistent set of these macros.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

82ddcb04

[PATCH] remove ext[34]_inc_count and _dec_count · 731b9a54

由 Eric Sandeen 提交于 2月 10, 2007

- Naming is confusing, ext3_inc_count manipulates i_nlink not i_count
- handle argument passed in is not used
- ext3 and ext4 already call inc_nlink and dec_nlink directly in other places
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

731b9a54

[PATCH] return ENOENT from ext3_link when racing with unlink · 2988a774

由 Eric Sandeen 提交于 2月 10, 2007

Return -ENOENT from ext[34]_link if we've raced with unlink and i_nlink is
0.  Doing otherwise has the potential to corrupt the orphan inode list,
because we'd wind up with an inode with a non-zero link count on the list,
and it will never get properly cleaned up & removed from the orphan list
before it is freed.

[akpm@osdl.org: build fix]
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2988a774

[PATCH] fix umask when noACL kernel meets extN tuned for ACLs · 2e7842b8

由 Hugh Dickins 提交于 2月 10, 2007

Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or
ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a
kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.

This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.

We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured. We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).

Likewise don't set the XATTR_USER flag when built without XATTR support.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e7842b8

[PATCH] seq_file conversion: coda · 9bbf81e4

由 Alexey Dobriyan 提交于 2月 10, 2007

Compile-tested.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: <jaharkes@cs.cmu.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bbf81e4

[PATCH] ext4: refuse ro to rw remount of fs with orphan inodes · ead6596b

由 Eric Sandeen 提交于 2月 10, 2007

In the rare case where we have skipped orphan inode processing due to a
readonly block device, and the block device subsequently changes back to
read-write, disallow a remount,rw transition of the filesystem when we have an
unprocessed orphan inodes as this would corrupt the list.

Ideally we should process the orphan inode list during the remount, but that's
trickier, and this plugs the hole for now.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ead6596b

[PATCH] ext3: refuse ro to rw remount of fs with orphan inodes · ea9a05a1

由 Eric Sandeen 提交于 2月 10, 2007

In the rare case where we have skipped orphan inode processing due to a
readonly block device, and the block device subsequently changes back to
read-write, disallow a remount,rw transition of the filesystem when we have an
unprocessed orphan inodes as this would corrupt the list.

Ideally we should process the orphan inode list during the remount, but that's
trickier, and this plugs the hole for now.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea9a05a1

[PATCH] proc_misc warning fix · 100bb934

由 Andrew Morton 提交于 2月 10, 2007

fs/proc/proc_misc.c: In function 'proc_misc_init':
fs/proc/proc_misc.c:764: warning: unused variable 'entry'
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

100bb934

[PATCH] msdos partitions: fix logic error in AIX detection · a470e18f

由 Olaf Hering 提交于 2月 10, 2007

Correct the AIX magic check to let 'echo > /dev/sdb' actually work.
Signed-off-by: NOlaf Hering <olh@suse.de>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a470e18f

[PATCH] relax check for AIX in msdos partition table · 4419d1ac

由 Olaf Hering 提交于 2月 10, 2007

The patch to identify AIX disks and ignore them has caused at least one
machine to fail to find the root partition on 2.6.19. The patch is:

http://lkml.org/lkml/2006/7/31/117

The problem is some disk formatters do not blow away the first 4 bytes
of the disk. If the disk we are installing to used to have AIX on it,
then the first 4 bytes will still have IBMA in EBCDIC.

The install in question was debian etch. Im not sure what the best fix
is, perhaps the AIX detection code could check more than the first 4
bytes.

The whole partition info for primary partitions is in this block:

  dd if=/dev/sdb count=$(( 4 * 16 )) bs=1 skip=$(( 0x1be ))

All other data do not matter, beside the 0x55aa marker at the end of the
first block.
Signed-off-by: NOlaf Hering <olh@suse.de>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4419d1ac

[PATCH] remove invalidate_inode_pages() · fc0ecff6

由 Andrew Morton 提交于 2月 10, 2007

Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().

Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc0ecff6

[PATCH] ext2: skip pages past number of blocks in ext2_find_entry · d8adb9ce

由 Eric Sandeen 提交于 2月 10, 2007

This one was pointed out on the MOKB site:
http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html

If a directory's i_size is corrupted, ext2_find_entry() will keep
processing pages until the i_size is reached, even if there are no more
blocks associated with the directory inode.  This patch puts in some
minimal sanity-checking so that we don't keep checking pages (and issuing
errors) if we know there can be no more data to read, based on the block
count of the directory inode.

This is somewhat similar in approach to the ext3 patch I sent earlier this
year.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8adb9ce

[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). · c3762229

由 Robert P. J. Day 提交于 2月 10, 2007

Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: NJoel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3762229

[PATCH] igrab() should check for I_CLEAR · 4a3b0a49

由 Jan Blunck 提交于 2月 10, 2007

When igrab() is calling __iget() on an inode it should check if
clear_inode() has been called on the inode already.  Otherwise there is a
race window between clear_inode() and destroy_inode() where igrab() calls
__iget() which leads to already free inodes on the inode lists.
Signed-off-by: NVandana Rungta <vandana@novell.com>
Signed-off-by: NJan Blunck <jblunck@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a3b0a49

[PATCH] avoid one conditional branch in touch_atime() · 37756ced

由 Eric Dumazet 提交于 2月 10, 2007

I added IS_NOATIME(inode) macro definition in include/linux/fs.h, true if
the inode superblock is marked readonly or noatime.

This new macro is then used in touch_atime() instead of separatly testing
MS_RDONLY and MS_NOATIME
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37756ced

[PATCH] convert ramfs to use __set_page_dirty_no_writeback · 46626296

由 Ken Chen 提交于 2月 10, 2007

As pointed out by Hugh, ramfs would also benefit from using the new
set_page_dirty aop method for memory backed file systems.
Signed-off-by: NKen Chen <kenchen@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

46626296

[PATCH] Drop get_zone_counts() · 65e458d4

由 Christoph Lameter 提交于 2月 10, 2007

Values are available via ZVC sums.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65e458d4