提交 · db8b631d4bc4eaa9f7e13a6b0a287306cac0cb72 · openeuler / raspberrypi-kernel

17 10月, 2014 2 次提交

Allow mknod and mkfifo on SMB2/SMB3 mounts · db8b631d

由 Steve French 提交于 9月 22, 2014

The "sfu" mount option did not work on SMB2/SMB3 mounts.
With these changes when the "sfu" mount option is passed in
on an smb2/smb2.1/smb3 mount the client can emulate (and
recognize) fifo and device (character and device files).

In addition the "sfu" mount option should not conflict
with "mfsymlinks" (symlink emulation) as we will never
create "sfu" style symlinks, but using "sfu" mount option
will allow us to recognize existing symlinks, created with
Microsoft "Services for Unix" (SFU and SUA).

To enable the "sfu" mount option for SMB2/SMB3 the calling
syntax of the generic cifs/smb2/smb3 sync_read and sync_write
protocol dependent function needed to be changed (we
don't have a file struct in all cases), but this actually
ended up simplifying the code a little.
Signed-off-by: NSteve French <smfrench@gmail.com>

db8b631d

add defines for two new file attributes · 73322979

由 Steve French 提交于 9月 23, 2014

Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>

73322979

15 10月, 2014 17 次提交

mnt: Prevent pivot_root from creating a loop in the mount tree · 0d082601

由 Eric W. Biederman 提交于 10月 08, 2014

Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.

In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another.  Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.

Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.

[Added stable cc.  Fixes CVE-2014-7970.  --Andy]

Cc: stable@vger.kernel.org
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>

0d082601

ceph: fix divide-by-zero in __validate_layout() · 0bc62284

由 Yan, Zheng 提交于 10月 14, 2014

The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

0bc62284

ceph: fix bool assignments · ab6c2c3e

由 Fabian Frederick 提交于 10月 09, 2014

Fix some coccinelle warnings:
fs/ceph/caps.c:2400:6-10: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2401:6-15: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2402:6-17: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2403:6-22: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2404:6-22: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2405:6-19: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2440:4-20: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2469:3-16: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2490:2-18: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2519:3-7: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2549:3-12: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2575:2-6: WARNING: Assignment of bool to 0/1
fs/ceph/caps.c:2589:3-7: WARNING: Assignment of bool to 0/1
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

ab6c2c3e

ceph: additional debugfs output · 14ed9703

由 John Spray 提交于 9月 12, 2014

MDS session state and client global ID is
useful instrumentation when testing.
Signed-off-by: NJohn Spray <john.spray@redhat.com>

14ed9703

ceph: export ceph_session_state_name function · a687ecaf

由 John Spray 提交于 9月 19, 2014

...so that it can be used from the ceph debugfs
code when dumping session info.
Signed-off-by: NJohn Spray <john.spray@redhat.com>

a687ecaf

ceph: include the initial ACL in create/mkdir/mknod MDS requests · b1ee94aa

由 Yan, Zheng 提交于 9月 16, 2014

Current code set new file/directory's initial ACL in a non-atomic
manner.
Client first sends request to MDS to create new file/directory, then set
the initial ACL after the new file/directory is successfully created.

The fix is include the initial ACL in create/mkdir/mknod MDS requests.
So MDS can handle creating file/directory and setting the initial ACL in
one request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b1ee94aa

ceph: use pagelist to present MDS request data · 25e6bae3

由 Yan, Zheng 提交于 9月 16, 2014

Current code uses page array to present MDS request data. Pages in the
array are allocated/freed by caller of ceph_mdsc_do_request(). If request
is interrupted, the pages can be freed while they are still being used by
the request message.

The fix is use pagelist to present MDS request data. Pagelist is
reference counted.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

25e6bae3

libceph: reference counting pagelist · e4339d28

由 Yan, Zheng 提交于 9月 16, 2014

this allow pagelist to present data that may be sent multiple times.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

e4339d28

ceph: fix llistxattr on symlink · 0abb43dc

由 Yan, Zheng 提交于 9月 18, 2014

only regular file and directory have vxattrs.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

0abb43dc

ceph: send client metadata to MDS · dbd0c8bf

由 John Spray 提交于 9月 09, 2014

Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax,
which includes additional client metadata to allow
the MDS to report on clients by user-sensible names
like hostname.
Signed-off-by: NJohn Spray <john.spray@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

dbd0c8bf

ceph: remove redundant code for max file size verification · a4483e8a

由 Chao Yu 提交于 9月 17, 2014

Both ceph_update_writeable_page and ceph_setattr will verify file size
with max size ceph supported.
There are two caller for ceph_update_writeable_page, ceph_write_begin and
ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in
generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no
chance to change file size when mmap. Likewise we have already verified the size
in inode_change_ok when we call ceph_setattr.
So let's remove the redundant code for max file size verification.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

a4483e8a

ceph: remove redundant io_iter_advance() · 3b70b388

由 Yan, Zheng 提交于 9月 17, 2014

ceph_sync_read and generic_file_read_iter() have already advanced the
IO iterator.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

3b70b388

ceph: move ceph_find_inode() outside the s_mutex · 6cd3bcad

由 Yan, Zheng 提交于 9月 17, 2014

ceph_find_inode() may wait on freeing inode, using it inside the s_mutex
may cause deadlock. (the freeing inode is waiting for OSD read reply, but
dispatch thread is blocked by the s_mutex)
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

6cd3bcad

ceph: request xattrs if xattr_version is zero · 508b32d8

由 Yan, Zheng 提交于 9月 16, 2014

Following sequence of events can happen.
  - Client releases an inode, queues cap release message.
  - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

The fix is force sending a getattr request to MDS if xattrs_version
is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
does not have xattr.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

508b32d8

ceph: make sure request isn't in any waiting list when kicking request. · 03974e81

由 Yan, Zheng 提交于 9月 11, 2014

we may corrupt waiting list if a request in the waiting list is kicked.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

03974e81

Y
ceph: protect kick_requests() with mdsc->mutex · 656e4382
由 Yan, Zheng 提交于 9月 11, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
656e4382

ceph: trim unused inodes before reconnecting to recovering MDS · 5d23371f

由 Yan, Zheng 提交于 9月 10, 2014

So the recovering MDS does not need to fetch these ununsed inodes during
cache rejoin. This may reduce MDS recovery time.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5d23371f

14 10月, 2014 21 次提交

btrfs: LLVMLinux: Remove VLAIS · 0458a953

由 Vinícius Tinti 提交于 4月 04, 2014

Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99
compliant equivalent.  This patch instead allocates the appropriate amount of
memory using a char array using the SHASH_DESC_ON_STACK macro.

The new code can be compiled with both gcc and clang.
Signed-off-by: NVinícius Tinti <viniciustinti@gmail.com>
Reviewed-by: NJan-Simon Möller <dl9pf@gmx.de>
Reviewed-by: NMark Charlebois <charlebm@gmail.com>
Signed-off-by: NBehan Webster <behanw@converseincode.com>
Acked-by: NChris Mason <clm@fb.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>

0458a953

mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared · 64e45507

由 Peter Feiner 提交于 10月 13, 2014

For VMAs that don't want write notifications, PTEs created for read faults
have their write bit set.  If the read fault happens after VM_SOFTDIRTY is
cleared, then the PTE's softdirty bit will remain clear after subsequent
writes.

Here's a simple code snippet to demonstrate the bug:

  char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
                 MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
  assert(*m == '\0');     /* new PTE allows write access */
  assert(!soft_dirty(x));
  *m = 'x';               /* should dirty the page */
  assert(soft_dirty(x));  /* fails */

With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared.  Furthermore, to avoid unnecessary faults, write notifications
are disabled when VM_SOFTDIRTY is set.

As a side effect of enabling and disabling write notifications with
care, this patch fixes a bug in mprotect where vm_page_prot bits set by
drivers were zapped on mprotect.  An analogous bug was fixed in mmap by
commit c9d0bf24 ("mm: uncached vma support with writenotify").
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Reported-by: NPeter Feiner <pfeiner@google.com>
Suggested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

64e45507

fs: check bh blocknr earlier when searching lru · 9470dd5d

由 Zach Brown 提交于 10月 13, 2014

It's very common for the buffer heads in the lru to have different block
numbers.  By comparing the blocknr before the bdev and size we can
reduce the cost of searching in the very common case where all the
entries have the same bdev and size.

In quick hot cache cycle counting tests on a single fs workstation this
cut the cost of a miss by about 20%.

A diff of the disassembly shows the reordering of the bdev and blocknr
comparisons.  This is in such a tiny loop that skipping one comparison
is a meaningful portion of the total work being done:

     1628:      83 c1 01                add    $0x1,%ecx
     162b:      83 f9 08                cmp    $0x8,%ecx
     162e:      74 60                   je     1690 <__find_get_block+0xa0>
     1630:      89 c8                   mov    %ecx,%eax
     1632:      65 4c 8b 04 c5 00 00    mov    %gs:0x0(,%rax,8),%r8
     1639:      00 00
     163b:      4d 85 c0                test   %r8,%r8
     163e:      4c 89 c3                mov    %r8,%rbx
     1641:      74 e5                   je     1628 <__find_get_block+0x38>
-    1643:      4d 3b 68 30             cmp    0x30(%r8),%r13
+    1643:      4d 3b 68 18             cmp    0x18(%r8),%r13
     1647:      75 df                   jne    1628 <__find_get_block+0x38>
-    1649:      4d 3b 60 18             cmp    0x18(%r8),%r12
+    1649:      4d 3b 60 30             cmp    0x30(%r8),%r12
     164d:      75 d9                   jne    1628 <__find_get_block+0x38>
     164f:      49 39 50 20             cmp    %rdx,0x20(%r8)
     1653:      75 d3                   jne    1628 <__find_get_block+0x38>
Signed-off-by: NZach Brown <zab@zabbo.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9470dd5d

isofs: replace strnicmp with strncasecmp · a97df427

由 Rasmus Villemoes 提交于 10月 13, 2014

The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp.  The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a97df427

ocfs2: replace strnicmp with strncasecmp · 2bd63329

由 Rasmus Villemoes 提交于 10月 13, 2014

The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp.  The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2bd63329

cifs: replace strnicmp with strncasecmp · 87e747cd

由 Rasmus Villemoes 提交于 10月 13, 2014

The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp.  The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87e747cd

FS/OMFS: block number sanity check during fill_super operation · 76e51210

由 Fabian Frederick 提交于 10月 13, 2014

This patch defines maximum block number to 2^31.  It also converts
bitmap_size and array_size to unsigned int in omfs_get_imap
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Suggested-by: NBob Copeland <me@bobcopeland.com>
Acked-by: NBob Copeland <me@bobcopeland.com>
Tested-by: NBob Copeland <me@bobcopeland.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

76e51210

fs/affs: remove redundant sys_tz declarations · c70b17b6

由 Fabian Frederick 提交于 10月 13, 2014

sys_tz is already declared in include/linux/time.h
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c70b17b6

fs/affs/file.c: fix shadow warnings · 73516ace

由 Fabian Frederick 提交于 10月 13, 2014

Four functions declared variables twice resulting in shadow warnings.

This patch renames internal variables and adds blank line after
declarations.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73516ace

fs/affs/inode.c: remove unused variable · 3bc75993

由 Fabian Frederick 提交于 10月 13, 2014

head is set to AFFS_HEAD(bh) but never used.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3bc75993

fs/affs/super.c: remove unused variable · 1e907f4f

由 Fabian Frederick 提交于 10月 13, 2014

key is set in affs_fill_super but never used.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e907f4f

coredump: add %i/%I in core_pattern to report the tid of the crashed thread · b03023ec

由 Oleg Nesterov 提交于 10月 13, 2014

format_corename() can only pass the leader's pid to the core handler,
but there is no simple way to figure out which thread originated the
coredump.

As Jan explains, this also means that there is no simple way to create
the backtrace of the crashed process:

As programs are mostly compiled with implicit gcc -fomit-frame-pointer
one needs program's .eh_frame section (equivalently PT_GNU_EH_FRAME
segment) or .debug_frame section.  .debug_frame usually is present only
in separate debug info files usually not even installed on the system.
While .eh_frame is a part of the executable/library (and it is even
always mapped for C++ exceptions unwinding) it no longer has to be
present anywhere on the disk as the program could be upgraded in the
meantime and the running instance has its executable file already
unlinked from disk.

One possibility is to echo 0x3f >/proc/*/coredump_filter and dump all
the file-backed memory including the executable's .eh_frame section.
But that can create huge core files, for example even due to mmapped
data files.

Other possibility would be to read .eh_frame from /proc/PID/mem at the
core_pattern handler time of the core dump.  For the backtrace one needs
to read the register state first which can be done from core_pattern
handler:

    ptrace(PTRACE_SEIZE, tid, 0, PTRACE_O_TRACEEXIT)
    close(0);    // close pipe fd to resume the sleeping dumper
    waitpid();   // should report EXIT
    PTRACE_GETREGS or other requests

The remaining problem is how to get the 'tid' value of the crashed
thread.  It could be read from the first NT_PRSTATUS note of the core
file but that makes the core_pattern handler complicated.

Unfortunately %t is already used so this patch uses %i/%I.

Automatic Bug Reporting Tool (https://github.com/abrt/abrt/wiki/overview)
is experimenting with this.  It is using the elfutils
(https://fedorahosted.org/elfutils/) unwinder for generating the
backtraces.  Apart from not needing matching executables as mentioned
above, another advantage is that we can get the backtrace without saving
the core (which might be quite large) to disk.

[mmilata@redhat.com: final paragraph of changelog]
Signed-off-by: NJan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Mark Wielaard <mjw@redhat.com>
Cc: Martin Milata <mmilata@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b03023ec

fat: remove redundant sys_tz declaration · 877aabd6

由 Fabian Frederick 提交于 10月 13, 2014

sys_tz is already declared extern struct in include/linux/time.h
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

877aabd6

fs/reiserfs/journal.c: fix sparse context imbalance warning · 54cc6cea

由 Fabian Frederick 提交于 10月 13, 2014

Merge conditional unlock/lock in the same condition to avoid sparse
warning:

  fs/reiserfs/journal.c:703:36: warning: context imbalance in 'add_to_chunk' - unexpected unlock
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

54cc6cea

fs/ufs/balloc.c: remove unused variable · 35c0b380

由 Fabian Frederick 提交于 10月 13, 2014

ucg is defined and set in ufs_bitmap_search but never used.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

35c0b380

fs/hfs/hfs_fs.h: remove redundant sys_tz declaration · a792d908

由 Fabian Frederick 提交于 10月 13, 2014

sys_tz is already declared in include/linux/time.h
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a792d908

nilfs2: improve the performance of fdatasync() · b9f66140

由 Andreas Rohner 提交于 10月 13, 2014

Support for fdatasync() has been implemented in NILFS2 for a long time,
but whenever the corresponding inode is dirty the implementation falls
back to a full-flegded sync().  Since every write operation has to
update the modification time of the file, the inode will almost always
be dirty and fdatasync() will fall back to sync() most of the time.  But
this fallback is only necessary for a change of the file size and not
for a change of the various timestamps.

This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between
those two situations.

 * If it is set the file size was changed and a full sync is necessary.
 * If it is not set then only the timestamps were updated and
   fdatasync() can go ahead.

There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with
the exact same semantics.  Unfortunately it cannot be used directly,
because NILFS2 doesn't implement write_inode() and doesn't clear the VFS
flags when inodes are written out.  So the VFS writeback thread can
clear I_DIRTY_DATASYNC at any time without notifying NILFS2.  So
I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in
nilfs_update_inode().
Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9f66140

nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() · e2c7617a

由 Andreas Rohner 提交于 10月 13, 2014

Under normal circumstances nilfs_sync_fs() writes out the super block,
which causes a flush of the underlying block device. But this depends
on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
last segment crosses a segment boundary. So if only a small amount of
data is written before the call to nilfs_sync_fs(), no flush of the
block device occurs.

In the above case an additional call to blkdev_issue_flush() is needed.
To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
is introduced, which is cleared whenever new logs are written and set
whenever the block device is flushed. For convenience the function
nilfs_flush_device() is added, which contains the above logic.
Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2c7617a

fs/befs/btree.c: remove typedef befs_btree_node · 0f2a84f4

由 Himangi Saraogi 提交于 10月 13, 2014

The Linux kernel coding style guidelines suggest not using typedefs for
structure types.  This patch gets rid of the typedef for befs_btree_node.

The following Coccinelle semantic patch detects the case.

@tn1@
type td;
@@

typedef struct { ... } td;

@script:python tf@
td << tn1.td;
tdres;
@@

coccinelle.tdres = td;

@@
type tn1.td;
identifier tf.tdres;
@@

-typedef
 struct
+  tdres
   { ... }
-td
 ;

@@
type tn1.td;
identifier tf.tdres;
@@

-td
+ struct tdres
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f2a84f4

autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode. · ef16cc59

由 NeilBrown 提交于 10月 13, 2014

If rcu-walk mode we don't *have* to return -EISDIR for non-mount-traps
as we will simply drop into REF-walk and handling DCACHE_NEED_AUTOMOUNT
dentrys the slow way.  But it is better if we do when possible.

In 'oz_mode', use the same condition as ref-walk: if not a mountpoint,
then it must be -EISDIR.

In regular mode there are most tests needed.  Most of them can be
performed without taking any spinlocks.  If we find a directory that
isn't obviously empty, and isn't mounted on, we need to call
'simple_empty()' which does take a spinlock.  If this turned out to hurt
performance, some other approach could be found to signal when a
directory is known to be empty.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NIan Kent <raven@themaw.net>
Tested-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef16cc59

autofs4: avoid taking fs_lock during rcu-walk · 4d885f90

由 NeilBrown 提交于 10月 13, 2014

->fs_lock protects AUTOFS_INF_EXPIRING.  We need to be sure that once
the flag is set, no new references beneath the dentry are taken.  So
rcu-walk currently needs to take fs_lock before checking the flag.  This
hurts performance.

Change the expiry to a two-stage process.  First set AUTOFS_INF_NO_RCU
which forces any path walk into ref-walk mode, then drop the lock and
call synchronize_rcu().  Once that returns we can be sure no rcu-walk is
active beneath the dentry and we can check reference counts again.

Now during an RCU-walk we can test AUTOFS_INF_EXPIRING without taking
the lock as along as we test AUTOFS_INF_NO_RCU too.  If either are set,
we must abort the RCU-walk If neither are set, we know that refcounts
will be tested again after we finish the RCU-walk so we are safe to
continue.

->fs_lock is still taken in d_manage() to check for a non-trap
directory.  That will be resolved in the next patch.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NIan Kent <raven@themaw.net>
Tested-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d885f90