提交 · f429ee3b808118591d1f3cdf3c0d0793911a5677 · Linux-御风守护者 / linux

18 1月, 2012 40 次提交

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit · f429ee3b

由 Linus Torvalds 提交于 1月 17, 2012

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
  audit: no leading space in audit_log_d_path prefix
  audit: treat s_id as an untrusted string
  audit: fix signedness bug in audit_log_execve_info()
  audit: comparison on interprocess fields
  audit: implement all object interfield comparisons
  audit: allow interfield comparison between gid and ogid
  audit: complex interfield comparison helper
  audit: allow interfield comparison in audit rules
  Kernel: Audit Support For The ARM Platform
  audit: do not call audit_getname on error
  audit: only allow tasks to set their loginuid if it is -1
  audit: remove task argument to audit_set_loginuid
  audit: allow audit matching on inode gid
  audit: allow matching on obj_uid
  audit: remove audit_finish_fork as it can't be called
  audit: reject entry,always rules
  audit: inline audit_free to simplify the look of generic code
  audit: drop audit_set_macxattr as it doesn't do anything
  audit: inline checks for not needing to collect aux records
  audit: drop some potentially inadvisable likely notations
  ...

Use evil merge to fix up grammar mistakes in Kconfig file.

Bad speling and horrible grammar (and copious swearing) is to be
expected, but let's keep it to commit messages and comments, rather than
expose it to users in config help texts or printouts.

f429ee3b

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs · 22b4eb5e

由 Linus Torvalds 提交于 1月 17, 2012

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: cleanup xfs_file_aio_write
  xfs: always return with the iolock held from xfs_file_aio_write_checks
  xfs: remove the i_new_size field in struct xfs_inode
  xfs: remove the i_size field in struct xfs_inode
  xfs: replace i_pin_wait with a bit waitqueue
  xfs: replace i_flock with a sleeping bitlock
  xfs: make i_flags an unsigned long
  xfs: remove the if_ext_max field in struct xfs_ifork
  xfs: remove the unused dm_attrs structure
  xfs: cleanup xfs_iomap_eof_align_last_fsb
  xfs: remove xfs_itruncate_data

22b4eb5e

Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d65773b2

由 Linus Torvalds 提交于 1月 17, 2012

* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  btrfs: take allocation of ->tree_root into open_ctree()
  btrfs: let ->s_fs_info point to fs_info, not root...
  btrfs: consolidate failure exits in btrfs_mount() a bit
  btrfs: make free_fs_info() call ->kill_sb() unconditional
  btrfs: merge free_fs_info() calls on fill_super failures
  btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
  btrfs: make open_ctree() return int
  btrfs: sanitizing ->fs_info, part 5
  btrfs: sanitizing ->fs_info, part 4
  btrfs: sanitizing ->fs_info, part 3
  btrfs: sanitizing ->fs_info, part 2
  btrfs: sanitizing ->fs_info, part 1
  btrfs: fix a deadlock in btrfs_scan_one_device()
  btrfs: fix mount/umount race
  btrfs: get ->kill_sb() of its own
  btrfs: preparation to fixing mount/umount race

d65773b2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · f9156c72

由 Linus Torvalds 提交于 1月 17, 2012

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
  Btrfs: use larger system chunks
  Btrfs: add a delalloc mutex to inodes for delalloc reservations
  Btrfs: space leak tracepoints
  Btrfs: protect orphan block rsv with spin_lock
  Btrfs: add allocator tracepoints
  Btrfs: don't call btrfs_throttle in file write
  Btrfs: release space on error in page_mkwrite
  Btrfs: fix btrfsck error 400 when truncating a compressed
  Btrfs: do not use btrfs_end_transaction_throttle everywhere
  Btrfs: add balance progress reporting
  Btrfs: allow for resuming restriper after it was paused
  Btrfs: allow for canceling restriper
  Btrfs: allow for pausing restriper
  Btrfs: add skip_balance mount option
  Btrfs: recover balance on mount
  Btrfs: save balance parameters to disk
  Btrfs: soft profile changing mode (aka soft convert)
  Btrfs: implement online profile changing
  Btrfs: do not reduce profile in do_chunk_alloc()
  Btrfs: virtual address space subset filter
  ...

Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
mnt_drop_write_file() helper.

f9156c72

Fix compile breakage with kref.h · 67175b85

由 James Bottomley 提交于 1月 17, 2012

This set of build failures just started appearing on parisc:

  In file included from drivers/input/serio/serio_raw.c:12:
  include/linux/kref.h: In function 'kref_get':
  include/linux/kref.h:40: error: 'TAINT_WARN' undeclared (first use in this function)
  include/linux/kref.h:40: error: (Each undeclared identifier is reported only once
  include/linux/kref.h:40: error: for each function it appears in.)
  include/linux/kref.h: In function 'kref_sub':
  include/linux/kref.h:65: error: 'TAINT_WARN' undeclared (first use in this function)

It happens because TAINT_WARN is defined in kernel.h and this particular
compile doesn't seem to include it (no idea why it's just manifesting ..
probably some #include file untangling exposed it).

Fix by adding

  #include <linux/kernel.h>

to linux/kref.h
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67175b85

proc: clean up and fix /proc/<pid>/mem handling · e268337d

由 Linus Torvalds 提交于 1月 17, 2012

Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.

This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open.  That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.

That is different from our previous behavior, but much simpler.  If
somebody actually finds a load where this matters, we'll need to revert
this commit.

I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve.  So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.
Reported-by: NJüri Aedla <asd@ut.ee>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e268337d

audit: no leading space in audit_log_d_path prefix · c158a35c

由 Kees Cook 提交于 1月 06, 2012

audit_log_d_path() injects an additional space before the prefix,
which serves no purpose and doesn't mix well with other audit_log*()
functions that do not sneak extra characters into the log.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NEric Paris <eparis@redhat.com>

c158a35c

audit: treat s_id as an untrusted string · 41fdc305

由 Kees Cook 提交于 1月 07, 2012

The use of s_id should go through the untrusted string path, just to be
extra careful.
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NMimi Zohar <zohar@us.ibm.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

41fdc305

audit: fix signedness bug in audit_log_execve_info() · 5afb8a3f

由 Xi Wang 提交于 12月 20, 2011

In the loop, a size_t "len" is used to hold the return value of
audit_log_single_execve_arg(), which returns -1 on error.  In that
case the error handling (len <= 0) will be bypassed since "len" is
unsigned, and the loop continues with (p += len) being wrapped.
Change the type of "len" to signed int to fix the error handling.

	size_t len;
	...
	for (...) {
		len = audit_log_single_execve_arg(...);
		if (len <= 0)
			break;
		p += len;
	}
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

5afb8a3f

audit: comparison on interprocess fields · 10d68360

由 Peter Moody 提交于 1月 04, 2012

This allows audit to specify rules in which we compare two fields of a
process.  Such as is the running process uid != to the running process
euid?
Signed-off-by: NPeter Moody <pmoody@google.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

10d68360

audit: implement all object interfield comparisons · 4a6633ed

由 Peter Moody 提交于 12月 13, 2011

This completes the matrix of interfield comparisons between uid/gid
information for the current task and the uid/gid information for inodes.
aka I can audit based on differences between the euid of the process and
the uid of fs objects.
Signed-off-by: NPeter Moody <pmoody@google.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

4a6633ed

audit: allow interfield comparison between gid and ogid · c9fe685f

由 Eric Paris 提交于 1月 03, 2012

Allow audit rules to compare the gid of the running task to the gid of the
inode in question.
Signed-off-by: NEric Paris <eparis@redhat.com>

c9fe685f

audit: complex interfield comparison helper · b34b0393

由 Eric Paris 提交于 1月 03, 2012

Rather than code the same loop over and over implement a helper function which
uses some pointer magic to make it generic enough to be used numerous places
as we implement more audit interfield comparisons
Signed-off-by: NEric Paris <eparis@redhat.com>

b34b0393

audit: allow interfield comparison in audit rules · 02d86a56

由 Eric Paris 提交于 1月 03, 2012

We wish to be able to audit when a uid=500 task accesses a file which is
uid=0. Or vice versa. This patch introduces a new audit filter type
AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields
should be compared. At this point we only define the task->uid vs
inode->uid, but other comparisons can be added.
Signed-off-by: NEric Paris <eparis@redhat.com>

02d86a56

Kernel: Audit Support For The ARM Platform · 29ef73b7

由 Nathaniel Husted 提交于 1月 03, 2012

This patch provides functionality to audit system call events on the
ARM platform. The implementation was based off the structure of the
MIPS platform and information in this
(http://lists.fedoraproject.org/pipermail/arm/2009-October/000382.html)
mailing list thread. The required audit_syscall_exit and
audit_syscall_entry checks were added to ptrace using the standard
registers for system call values (r0 through r3). A thread information
flag was added for auditing (TIF_SYSCALL_AUDIT) and a meta-flag was
added (_TIF_SYSCALL_WORK) to simplify modifications to the syscall
entry/exit. Now, if either the TRACE flag is set or the AUDIT flag is
set, the syscall_trace function will be executed. The prober changes
were made to Kconfig to allow CONFIG_AUDITSYSCALL to be enabled.

Due to platform availability limitations, this patch was only tested
on the Android platform running the modified "android-goldfish-2.6.29"
kernel. A test compile was performed using Code Sourcery's
cross-compilation toolset and the current linux-3.0 stable kernel. The
changes compile without error. I'm hoping, due to the simple modifications,
the patch is "obviously correct".
Signed-off-by: NNathaniel Husted <nhusted@gmail.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

29ef73b7

audit: do not call audit_getname on error · 4043cde8

由 Eric Paris 提交于 1月 03, 2012

Just a code cleanup really.  We don't need to make a function call just for
it to return on error.  This also makes the VFS function even easier to follow
and removes a conditional on a hot path.
Signed-off-by: NEric Paris <eparis@redhat.com>

4043cde8

audit: only allow tasks to set their loginuid if it is -1 · 633b4545

由 Eric Paris 提交于 1月 03, 2012

At the moment we allow tasks to set their loginuid if they have
CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they
log in and it be impossible to ever reset. We had to make it mutable even
after it was once set (with the CAP) because on update and admin might have
to restart sshd. Now sshd would get his loginuid and the next user which
logged in using ssh would not be able to set his loginuid.

Systemd has changed how userspace works and allowed us to make the kernel
work the way it should. With systemd users (even admins) are not supposed
to restart services directly. The system will restart the service for
them. Thus since systemd is going to loginuid==-1, sshd would get -1, and
sshd would be allowed to set a new loginuid without special permissions.

If an admin in this system were to manually start an sshd he is inserting
himself into the system chain of trust and thus, logically, it's his
loginuid that should be used! Since we have old systems I make this a
Kconfig option.
Signed-off-by: NEric Paris <eparis@redhat.com>

633b4545

audit: remove task argument to audit_set_loginuid · 0a300be6

由 Eric Paris 提交于 1月 03, 2012

The function always deals with current.  Don't expose an option
pretending one can use it for something.  You can't.
Signed-off-by: NEric Paris <eparis@redhat.com>

0a300be6

audit: allow audit matching on inode gid · 54d3218b

由 Eric Paris 提交于 1月 03, 2012

Much like the ability to filter audit on the uid of an inode collected, we
should be able to filter on the gid of the inode.
Signed-off-by: NEric Paris <eparis@redhat.com>

54d3218b

audit: allow matching on obj_uid · efaffd6e

由 Eric Paris 提交于 1月 03, 2012

Allow syscall exit filter matching based on the uid of the owner of an
inode used in a syscall.  aka:

auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa
Signed-off-by: NEric Paris <eparis@redhat.com>

efaffd6e

audit: remove audit_finish_fork as it can't be called · 6422e78d

由 Eric Paris 提交于 1月 03, 2012

Audit entry,always rules are not allowed and are automatically changed in
exit,always rules in userspace. The kernel refuses to load such rules.

Thus a task in the middle of a syscall (and thus in audit_finish_fork())
can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED.
Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every
going to actually use the code in audit_finish_fork() since it will
return without doing anything. Thus drop the code.
Signed-off-by: NEric Paris <eparis@redhat.com>

6422e78d

audit: reject entry,always rules · 7ff68e53

由 Eric Paris 提交于 1月 03, 2012

We deprecated entry,always rules a long time ago.  Reject those rules as
invalid.
Signed-off-by: NEric Paris <eparis@redhat.com>

7ff68e53

audit: inline audit_free to simplify the look of generic code · a4ff8dba

由 Eric Paris 提交于 1月 03, 2012

make the conditional a static inline instead of doing it in generic code.
Signed-off-by: NEric Paris <eparis@redhat.com>

a4ff8dba

E
audit: drop audit_set_macxattr as it doesn't do anything · 38cdce53
由 Eric Paris 提交于 1月 03, 2012
```
unused.  deleted.
Signed-off-by: NEric Paris <eparis@redhat.com>
```
38cdce53

audit: inline checks for not needing to collect aux records · 07c49417

由 Eric Paris 提交于 1月 03, 2012

A number of audit hooks make function calls before they determine that
auxilary records do not need to be collected. Do those checks as static
inlines since the most common case is going to be that records are not
needed and we can skip the function call overhead.
Signed-off-by: NEric Paris <eparis@redhat.com>

07c49417

audit: drop some potentially inadvisable likely notations · 56179a6e

由 Eric Paris 提交于 1月 03, 2012

The audit code makes heavy use of likely() and unlikely() macros, but they
don't always make sense.  Drop any that seem questionable and let the
computer do it's thing.
Signed-off-by: NEric Paris <eparis@redhat.com>

56179a6e

audit: remove AUDIT_SETUP_CONTEXT as it isn't used · 997f5b64

由 Eric Paris 提交于 1月 03, 2012

Audit contexts have 3 states.  Disabled, which doesn't collect anything,
build, which collects info but might not emit it, and record, which
collects and emits.  There is a 4th state, setup, which isn't used.  Get
rid of it.
Signed-off-by: NEric Paris <eparis@redhat.com>

997f5b64

audit: inline audit_syscall_entry to reduce burden on archs · b05d8447

由 Eric Paris 提交于 1月 03, 2012

Every arch calls:

if (unlikely(current->audit_context))
	audit_syscall_entry()

which requires knowledge about audit (the existance of audit_context) in
the arch code.  Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.
Signed-off-by: NEric Paris <eparis@redhat.com>

b05d8447

audit: ia32entry.S sign extend error codes when calling 64 bit code · f031cd25

由 Eric Paris 提交于 1月 03, 2012

In the ia32entry syscall exit audit fastpath we have assembly code which calls
__audit_syscall_exit directly.  This code was, however, zeroes the upper 32
bits of the return code.  It then proceeded to call code which expects longs
to be 64bits long.  In order to handle code which expects longs to be 64bit we
sign extend the return code if that code is an error.  Thus the
__audit_syscall_exit function can correctly handle using the values in
snprintf("%ld").  This fixes the regression introduced in 5cbf1565.

Old record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=4294967283
New record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=-13
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>

f031cd25

Audit: push audit success and retcode into arch ptrace.h · d7e7528b

由 Eric Paris 提交于 1月 03, 2012

The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure. This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall. The fix is to fix the
layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure. We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO. This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros. The reason is because the audit function must take a void*
for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value. But the ptrace_64.h code defined the macro
regs_return_value() as regs[3]. I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]

d7e7528b

seccomp: audit abnormal end to a process due to seccomp · 85e7bac3

由 Eric Paris 提交于 1月 03, 2012

The audit system likes to collect information about processes that end
abnormally (SIGSEGV) as this may me useful intrusion detection information.
This patch adds audit support to collect information when seccomp forces a
task to exit because of misbehavior in a similar way.
Signed-off-by: NEric Paris <eparis@redhat.com>

85e7bac3

audit: check current inode and containing object when filtering on major and minor · 16c174bd

由 Eric Paris 提交于 1月 03, 2012

The audit system has the ability to filter on the major and minor number of
the device containing the inode being operated upon.  Lets say that
/dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot.  Now lets
say we add a watch with a filter on 8,1.  If we proceed to open an inode
inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.

Lets instead assume that one were to use a tool like debugfs and were to
open /dev/sda1 directly and to modify it's contents.  We might hope that
this would also be logged, but it isn't.  The rules will check the
major,minor of the device containing /dev/sda1.  In other words the rule
would match on the major/minor of the tmpfs mounted at /dev.

I believe these rules should trigger on either device.  The man page is
devoid of useful information about the intended semantics.  It only seems
logical that if you want to know everything that happened on a major,minor
that would include things that happened to the device itself...
Signed-off-by: NEric Paris <eparis@redhat.com>

16c174bd

audit: drop the meaningless and format breaking word 'user' · 3035c51e

由 Eric Paris 提交于 1月 03, 2012

userspace audit messages look like so:

type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg=''

That third field just says 'user'. That's useless and doesn't follow the
key=value pair we are trying to enforce. We already know it came from the
user based on the record type. Kill that word. Die.
Signed-off-by: NEric Paris <eparis@redhat.com>

3035c51e

audit: dynamically allocate audit_names when not enough space is in the names array · 5195d8e2

由 Eric Paris 提交于 1月 03, 2012

This patch does 2 things.  First it reduces the number of audit_names
allocated in every audit context from 20 to 5.  5 should be enough for all
'normal' syscalls (rename being the worst).  Some syscalls can still touch
more the 5 inodes such as mount.  When rpc filesystem is mounted it will
create inodes and those can exceed 5.  To handle that problem this patch will
dynamically allocate audit_names if it needs more than 5.  This should
decrease the typicall memory usage while still supporting all the possible
kernel operations.
Signed-off-by: NEric Paris <eparis@redhat.com>

5195d8e2

audit: make filetype matching consistent with other filters · 5ef30ee5

由 Eric Paris 提交于 1月 03, 2012

Every other filter that matches part of the inodes list collected by audit
will match against any of the inodes on that list. The filetype matching
however had a strange way of doing things. It allowed userspace to
indicated if it should match on the first of the second name collected by
the kernel. Name collection ordering seems like a kernel internal and
making userspace rules get that right just seems like a bad idea. As it
turns out the userspace audit writers had no idea it was doing this and
thus never overloaded the value field. The kernel always checked the first
name collected which for the tested rules was always correct.

This patch just makes the filetype matching like the major, minor, inode,
and LSM rules in that it will match against any of the names collected. It
also changes the rule validation to reject the old unused rule types.

Noone knew it was there. Noone used it. Why keep around the extra code?
Signed-off-by: NEric Paris <eparis@redhat.com>

5ef30ee5

xfs: cleanup xfs_file_aio_write · d0606464

由 Christoph Hellwig 提交于 12月 18, 2011

With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

d0606464

xfs: always return with the iolock held from xfs_file_aio_write_checks · 5bf1f262

由 Christoph Hellwig 提交于 12月 18, 2011

While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

5bf1f262

xfs: remove the i_new_size field in struct xfs_inode · 2813d682

由 Christoph Hellwig 提交于 12月 18, 2011

Now that we use the VFS i_size field throughout XFS there is no need for the
i_new_size field any more given that the VFS i_size field gets updated
in ->write_end before unlocking the page, and thus is always uptodate when
writeback could see a page. Removing i_new_size also has the advantage that
we will never have to trim back di_size during a failed buffered write,
given that it never gets updated past i_size.

Note that currently the generic direct I/O code only updates i_size after
calling our end_io handler, which requires a small workaround to make
sure di_size actually makes it to disk. I hope to fix this properly in
the generic code.

A downside is that we lose the support for parallel non-overlapping O_DIRECT
appending writes that recently was added. I don't think keeping the complex
and fragile i_new_size infrastructure for this is a good tradeoff - if we
really care about parallel appending writers we should investigate turning
the iolock into a range lock, which would also allow for parallel
non-overlapping buffered writers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

2813d682

xfs: remove the i_size field in struct xfs_inode · ce7ae151

由 Christoph Hellwig 提交于 12月 18, 2011

There is no fundamental need to keep an in-memory inode size copy in the XFS
inode. We already have the on-disk value in the dinode, and the separate
in-memory copy that we need for regular files only in the XFS inode.

Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
VFS inode i_size field for regular files. Switch code that was directly
accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
we are limited to regular files direct access of the VFS inode i_size field.

This also allows dropping some fairly complicated code in the write path
which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
that is getting updated inside ->write_end.

Note that we do not bother resetting the VFS i_size when truncating a file
that gets freed to zero as there is no point in doing so because the VFS inode
is no longer in use at this point. Just relax the assert in xfs_ifree to
only check the on-disk size instead.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

ce7ae151

xfs: replace i_pin_wait with a bit waitqueue · f392e631

由 Christoph Hellwig 提交于 12月 18, 2011

Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue.  This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

f392e631

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致