提交 · 3457f4147675108aa83f9f33c136f06bb9f8518f · openanolis / cloud-kernel

10 7月, 2017 3 次提交

btrfs: nowait aio: Correct assignment of pos · ff0fa732

由 Goldwyn Rodrigues 提交于 7月 04, 2017

Assigning pos for usage early messes up in append mode, where the pos is
re-assigned in generic_write_checks(). Assign pos later to get the
correct position to write from iocb->ki_pos.

Since check_can_nocow also uses the value of pos, we shift
generic_write_checks() before check_can_nocow(). Checks with IOCB_DIRECT
are present in generic_write_checks(), so checking for IOCB_NOWAIT is
enough.

Also, put locking sequence in the fast path.

This fixes a user visible bug, as reported:

"apparently breaks several shell related features on my system.
In zsh history stopped working, because no new entries are added
anymore.
I fist noticed the issue when I tried to build mplayer. It uses a shell
script to generate a help_mp.h file:
[...]

Here is a simple testcase:

 % echo "foo" >> test
 % echo "foo" >> test
 % cat test
 foo
 %
"

Fixes: edf064e7 ("btrfs: nowait aio support")
CC: Jens Axboe <axboe@kernel.dk>
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Link: https://lkml.kernel.org/r/20170704042306.GA274@x4Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ff0fa732

afs: Add metadata xattrs · d3e3b7ea

由 David Howells 提交于 7月 06, 2017

Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
available.  The following xattrs are now available:

 - "afs.cell"

   The name of the cell in which the vnode's volume resides.

 - "afs.fid"

   The volume ID, vnode ID and vnode uniquifier of the file as three hex
   numbers separated by colons.

 - "afs.volume"

   The name of the volume in which the vnode resides.

For example:

	# getfattr -d -m ".*" /mnt/scratch
	getfattr: Removing leading '/' from absolute path names
	# file: mnt/scratch
	afs.cell="mycell.myorg.org"
	afs.fid="10000b:1:1"
	afs.volume="scratch"
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3e3b7ea

afs: Ignore AFS_ACE_READ and AFS_ACE_WRITE for directories · fd249821

由 Marc Dionne 提交于 7月 06, 2017

The AFS_ACE_READ and AFS_ACE_WRITE permission bits should not
be used to make access decisions for the directory itself.  They
are meant to control access for the objects contained in that
directory.

Reading a directory is allowed if the AFS_ACE_LOOKUP bit is set.
This would cause an incorrect access denied error for a directory
with AFS_ACE_LOOKUP but not AFS_ACE_READ.

The AFS_ACE_WRITE bit does not allow operations that modify the
directory.  For a directory with AFS_ACE_WRITE but neither
AFS_ACE_INSERT nor AFS_ACE_DELETE, this would result in trying
operations that would ultimately be denied by the server.
Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd249821

08 7月, 2017 5 次提交

exec: Limit arg stack to at most 75% of _STK_LIM · da029c11

由 Kees Cook 提交于 7月 07, 2017

To avoid pathological stack usage or the need to special-case setuid
execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da029c11

xfs: don't crash on unexpected holes in dir/attr btrees · cd87d867

由 Darrick J. Wong 提交于 7月 07, 2017

In quite a few places we call xfs_da_read_buf with a mappedbno that we
don't control, then assume that the function passes back either an error
code or a buffer pointer. Unfortunately, if mappedbno == -2 and bno
maps to a hole, we get a return code of zero and a NULL buffer, which
means that we crash if we actually try to use that buffer pointer. This
happens immediately when we set the buffer type for transaction context.

Therefore, check that we have no error code and a non-NULL bp before
trying to use bp. This patch is a follow-up to an incomplete fix in
96a3aefb ("xfs: don't crash if reading a directory results in an
unexpected hole").
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

cd87d867

dentry name snapshots · 49d31c2f

由 Al Viro 提交于 7月 07, 2017

take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified).  In either case the pointer to stable
string is stored into the same structure.

dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().

Intended use:
	struct name_snapshot s;

	take_dentry_name_snapshot(&s, dentry);
	...
	access s.name
	...
	release_dentry_name_snapshot(&s);

Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

49d31c2f

vfs: fix flock compat thinko · b59eea55

由 Linus Torvalds 提交于 7月 07, 2017

Michael Ellerman reported that commit 8c6657cb ("Switch flock
copyin/copyout primitives to copy_{from,to}_user()") broke his
networking on a bunch of PPC machines (64-bit kernel, 32-bit userspace).

The reason is a brown-paper bug by that commit, which had the arguments
to "copy_flock_fields()" in the wrong order, breaking the compat
handling for file locking.  Apparently very few people run 32-bit user
space on x86 any more, so the PPC people got the honor of noticing this
"feature".

Michael also sent a minimal diff that just changed the order of the
arguments in that macro.

This is not that minimal diff.

This not only changes the order of the arguments in the macro, it also
changes them to be pointers (to be consistent with all the other uses of
those pointers), and makes the functions that do all of this also have
the proper "const" attribution on the source pointers in order to make
issues like that (using the source as a destination) be really obvious.
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b59eea55

gfs2: Fix glock rhashtable rcu bug · 961ae1d8

由 Andreas Gruenbacher 提交于 7月 07, 2017

Before commit 88ffbf3e "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu.  This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe.  Bring back the original
code for freeing glocks via call_rcu.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # 4.3+

961ae1d8

07 7月, 2017 11 次提交

xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN · 6eb0b8df

由 Darrick J. Wong 提交于 7月 07, 2017

XFS has a maximum symlink target length of 1024 bytes; this is a
holdover from the Irix days.  Unfortunately, the constant establishing
this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN,
which is 4096.

The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but
xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means
that xfs_repair doesn't complain about oversized symlinks.

Since this is an on-disk format constraint, put the define in the XFS
namespace and move everything over to use the new name.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

6eb0b8df

mm: per-cgroup memory reclaim stats · 2262185c

由 Roman Gushchin 提交于 7月 06, 2017

Track the following reclaim counters for every memory cgroup: PGREFILL,
PGSCAN, PGSTEAL, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE and PGLAZYFREED.

These values are exposed using the memory.stats interface of cgroup v2.

The meaning of each value is the same as for global counters, available
using /proc/vmstat.

Also, for consistency, rename mem_cgroup_count_vm_event() to
count_memcg_event_mm().

Link: http://lkml.kernel.org/r/1494530183-30808-1-git-send-email-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
Suggested-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2262185c

mm/hugetlb: add size parameter to huge_pte_offset() · 7868a208

由 Punit Agrawal 提交于 7月 06, 2017

A poisoned or migrated hugepage is stored as a swap entry in the page
tables.  On architectures that support hugepages consisting of
contiguous page table entries (such as on arm64) this leads to ambiguity
in determining the page table entry to return in huge_pte_offset() when
a poisoned entry is encountered.

Let's remove the ambiguity by adding a size parameter to convey
additional information about the requested address.  Also fixup the
definition/usage of huge_pte_offset() throughout the tree.

Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.comSigned-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7868a208

mm: update callers to use HASH_ZERO flag · 3d375d78

由 Pavel Tatashin 提交于 7月 06, 2017

Update dcache, inode, pid, mountpoint, and mount hash tables to use
HASH_ZERO, and remove initialization after allocations.  In case of
places where HASH_EARLY was used such as in __pv_init_lock_hash the
zeroed hash table was already assumed, because memblock zeroes the
memory.

CPU: SPARC M6, Memory: 7T
Before fix:
  Dentry cache hash table entries: 1073741824
  Inode-cache hash table entries: 536870912
  Mount-cache hash table entries: 16777216
  Mountpoint-cache hash table entries: 16777216
  ftrace: allocating 20414 entries in 40 pages
  Total time: 11.798s

After fix:
  Dentry cache hash table entries: 1073741824
  Inode-cache hash table entries: 536870912
  Mount-cache hash table entries: 16777216
  Mountpoint-cache hash table entries: 16777216
  ftrace: allocating 20414 entries in 40 pages
  Total time: 3.198s

CPU: Intel Xeon E5-2630, Memory: 2.2T:
Before fix:
  Dentry cache hash table entries: 536870912
  Inode-cache hash table entries: 268435456
  Mount-cache hash table entries: 8388608
  Mountpoint-cache hash table entries: 8388608
  CPU: Physical Processor ID: 0
  Total time: 3.245s

After fix:
  Dentry cache hash table entries: 536870912
  Inode-cache hash table entries: 268435456
  Mount-cache hash table entries: 8388608
  Mountpoint-cache hash table entries: 8388608
  CPU: Physical Processor ID: 0
  Total time: 3.244s

Link: http://lkml.kernel.org/r/1488432825-92126-4-git-send-email-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: NBabu Moger <babu.moger@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d375d78

fs/userfaultfd.c: drop dead code · f93ae364

由 Mike Rapoport 提交于 7月 06, 2017

Calculation of start end end in __wake_userfault function are not used
and can be removed.

Link: http://lkml.kernel.org/r/1494930917-3134-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f93ae364

fs/file.c: replace alloc_fdmem() with kvmalloc() alternative · c823bd92

由 Michal Hocko 提交于 7月 06, 2017

There is no real reason to duplicate kvmalloc* helpers so drop
alloc_fdmem and replace it with the appropriate library function.

Link: http://lkml.kernel.org/r/20170531155145.17111-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c823bd92

ocfs2: constify attribute_group structures · b74271e4

由 Arvind Yadav 提交于 7月 06, 2017

attribute_groups are not supposed to change at runtime.  All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group.  So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   4402	   1088	     38	   5528	   1598	fs/ocfs2/stackglue.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   4442	   1024	     38	   5504	   1580	fs/ocfs2/stackglue.o

Link: http://lkml.kernel.org/r/cab4e59b4918db3ed2ec77073a4cb310c4429ef5.1498808026.git.arvind.yadav.cs@gmail.comSigned-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b74271e4

ocfs2: free 'dummy_sc' in sc_fop_release() to prevent memory leak · 25b1c72e

由 piaojun 提交于 7月 06, 2017

'sd->dbg_sock' is malloced in sc_common_open(), but not freed at the end
of sc_fop_release().

Link: http://lkml.kernel.org/r/594FB0A4.2050105@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com>
Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

25b1c72e

ocfs2: use magic.h · 62aa81d7

由 Fabian Frederick 提交于 7月 06, 2017

Filesystems generally use SUPER_MAGIC values from magic.h instead of a
local definition.

Link: http://lkml.kernel.org/r/20170521154217.27917-1-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be>
Reviewed-by: NMark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

62aa81d7

ocfs2: fix a static checker warning · 8c4d5a43

由 Gang He 提交于 7月 06, 2017

Fix a static code checker warning:

  fs/ocfs2/inode.c:179 ocfs2_iget() warn: passing zero to 'ERR_PTR'

Fixes: d56a8f32 ("ocfs2: check/fix inode block for online file check")
Link: http://lkml.kernel.org/r/1495516634-1952-1-git-send-email-ghe@suse.comSigned-off-by: NGang He <ghe@suse.com>
Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
Reviewed-by: NEric Ren <zren@suse.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c4d5a43

ext4: fix spelling mistake: "prellocated" -> "preallocated" · ff950156

由 Colin Ian King 提交于 7月 06, 2017

Trivial fix to spelling mistake in mb_debug debug message
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ff950156

06 7月, 2017 21 次提交

move file_{start,end}_write() out of do_iter_write() · 62473a2d

由 Al Viro 提交于 7月 06, 2017

... and do *not* grab it in vfs_write_iter().

Fixes: "fs: implement vfs_iter_read using do_iter_read"
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

62473a2d

btrfs: minimal conversion to errseq_t writeback error reporting on fsync · 333427a5

由 Jeff Layton 提交于 7月 06, 2017

Just check and advance the errseq_t in the file before returning, and
use an errseq_t based check for writeback errors.

Other internal callers of filemap_* functions are left as-is.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

333427a5

xfs: minimal conversion to errseq_t writeback error reporting · 1b180274

由 Jeff Layton 提交于 7月 06, 2017

Just check and advance the data errseq_t in struct file before
before returning from fsync on normal files. Internal filemap_*
callers are left as-is.
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

1b180274

ext4: use errseq_t based error handling for reporting data writeback errors · 6acec592

由 Jeff Layton 提交于 7月 06, 2017

Add a call to filemap_report_wb_err at the end of ext4_sync_file. This
will ensure that we check and advance the errseq_t in the file, which
allows us to track and report errors on all open fds when they occur.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

6acec592

fs: convert __generic_file_fsync to use errseq_t based reporting · 383aa543

由 Jeff Layton 提交于 7月 06, 2017

Many simple, block-based filesystems use generic_file_fsync as their
fsync operation. Some others (ext* and fat) also call this function
to handle syncing out data.

Switch this code over to use errseq_t based error reporting so that
all of these filesystems get reliable error reporting via fsync,
fdatasync and msync.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

383aa543

block: convert to errseq_t based writeback error tracking · 372cf243

由 Jeff Layton 提交于 7月 06, 2017

This is a very minimal conversion to errseq_t based error tracking
for raw block device access. Just have it use the standard
file_write_and_wait_range call.

Note that there are internal callers that call sync_blockdev
and the like that are not affected by this. They'll continue
to use the AS_EIO/AS_ENOSPC flags for error reporting like
they always have for now.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

372cf243

dax: set errors in mapping when writeback fails · 819ec6b9

由 Jeff Layton 提交于 7月 06, 2017

Jan Kara's description for this patch is much better than mine, so I'm
quoting it verbatim here:

DAX currently doesn't set errors in the mapping when cache flushing
fails in dax_writeback_mapping_range(). Since this function can get
called only from fsync(2) or sync(2), this is actually as good as it can
currently get since we correctly propagate the error up from
dax_writeback_mapping_range() to filemap_fdatawrite()

However, in the future better writeback error handling will enable us to
properly report these errors on fsync(2) even if there are multiple file
descriptors open against the file or if sync(2) gets called before
fsync(2). So convert DAX to using standard error reporting through the
mapping.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-and-tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>

819ec6b9

fs: new infrastructure for writeback error handling and reporting · 5660e13d

由 Jeff Layton 提交于 7月 06, 2017

Most filesystems currently use mapping_set_error and
filemap_check_errors for setting and reporting/clearing writeback errors
at the mapping level. filemap_check_errors is indirectly called from
most of the filemap_fdatawait_* functions and from
filemap_write_and_wait*. These functions are called from all sorts of
contexts to wait on writeback to finish -- e.g. mostly in fsync, but
also in truncate calls, getattr, etc.

The non-fsync callers are problematic. We should be reporting writeback
errors during fsync, but many places spread over the tree clear out
errors before they can be properly reported, or report errors at
nonsensical times.

If I get -EIO on a stat() call, there is no reason for me to assume that
it is because some previous writeback failed. The fact that it also
clears out the error such that a subsequent fsync returns 0 is a bug,
and a nasty one since that's potentially silent data corruption.

This patch adds a small bit of new infrastructure for setting and
reporting errors during address_space writeback. While the above was my
original impetus for adding this, I think it's also the case that
current fsync semantics are just problematic for userland. Most
applications that call fsync do so to ensure that the data they wrote
has hit the backing store.

In the case where there are multiple writers to the file at the same
time, this is really hard to determine. The first one to call fsync will
see any stored error, and the rest get back 0. The processes with open
fds may not be associated with one another in any way. They could even
be in different containers, so ensuring coordination between all fsync
callers is not really an option.

One way to remedy this would be to track what file descriptor was used
to dirty the file, but that's rather cumbersome and would likely be
slow. However, there is a simpler way to improve the semantics here
without incurring too much overhead.

This set adds an errseq_t to struct address_space, and a corresponding
one is added to struct file. Writeback errors are recorded in the
mapping's errseq_t, and the one in struct file is used as the "since"
value.

This changes the semantics of the Linux fsync implementation such that
applications can now use it to determine whether there were any
writeback errors since fsync(fd) was last called (or since the file was
opened in the case of fsync having never been called).

Note that those writeback errors may have occurred when writing data
that was dirtied via an entirely different fd, but that's the case now
with the current mapping_set_error/filemap_check_error infrastructure.
This will at least prevent you from getting a false report of success.

The new behavior is still consistent with the POSIX spec, and is more
reliable for application developers. This patch just adds some basic
infrastructure for doing this, and ensures that the f_wb_err "cursor"
is properly set when a file is opened. Later patches will change the
existing code to use this new infrastructure for reporting errors at
fsync time.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>

5660e13d

jbd2: don't clear and reset errors after waiting on writeback · 76341cab

由 Jeff Layton 提交于 7月 06, 2017

Resetting this flag is almost certainly racy, and will be problematic
with some coming changes.

Make filemap_fdatawait_keep_errors return int, but not clear the flag(s).
Have jbd2 call it instead of filemap_fdatawait and don't attempt to
re-set the error flag if it fails.
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

76341cab

buffer: set errors in mapping at the time that the error occurs · 87354e5d

由 Jeff Layton 提交于 7月 06, 2017

I noticed on xfs that I could still sometimes get back an error on fsync
on a fd that was opened after the error condition had been cleared.

The problem is that the buffer code sets the write_io_error flag and
then later checks that flag to set the error in the mapping. That flag
perisists for quite a while however. If the file is later opened with
O_TRUNC, the buffers will then be invalidated and the mapping's error
set such that a subsequent fsync will return error. I think this is
incorrect, as there was no writeback between the open and fsync.

Add a new mark_buffer_write_io_error operation that sets the flag and
the error in the mapping at the same time. Replace all calls to
set_buffer_write_io_error with mark_buffer_write_io_error, and remove
the places that check this flag in order to set the error in the
mapping.

This sets the error in the mapping earlier, at the time that it's first
detected.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>

87354e5d

fs: check for writeback errors after syncing out buffers in generic_file_fsync · dac257f7

由 Jeff Layton 提交于 7月 06, 2017

ext2 currently does a test+clear of the AS_EIO flag, which is
is problematic for some coming changes.

What we really need to do instead is call filemap_check_errors
in __generic_file_fsync after syncing out the buffers. That
will be sufficient for this case, and help other callers detect
these errors properly as well.

With that, we don't need to twiddle it in ext2.
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>

dac257f7

buffer: use mapping_set_error instead of setting the flag · d945b59d

由 Jeff Layton 提交于 7月 06, 2017

Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

d945b59d

ext4: fix __ext4_new_inode() journal credits calculation · af65207c

由 Tahsin Erdogan 提交于 7月 06, 2017

ea_inode feature allows creating extended attributes that are up to
64k in size. Update __ext4_new_inode() to pick increased credit limits.

To avoid overallocating too many journal credits, update
__ext4_xattr_set_credits() to make a distinction between xattr create
vs update. This helps __ext4_new_inode() because all attributes are
known to be new, so we can save credits that are normally needed to
delete old values.

Also, have fscrypt specify its maximum context size so that we don't
end up allocating credits for 64k size.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

af65207c

ext4: skip ext4_init_security() and encryption on ea_inodes · ad47f953

由 Tahsin Erdogan 提交于 7月 06, 2017

Extended attribute inodes are internal to ext4. Adding encryption/security
related attributes on them would mean dealing with nested calls into ea code.
Since they have no direct exposure to user mode, just avoid creating ea
entries for them.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ad47f953

CIFS: fix circular locking dependency · 966681c9

由 Rabin Vincent 提交于 6月 29, 2017

When a CIFS filesystem is mounted with the forcemand option and the
following command is run on it, lockdep warns about a circular locking
dependency between CifsInodeInfo::lock_sem and the inode lock.

 while echo foo > hello; do :; done & while touch -c hello; do :; done

cifs_writev() takes the locks in the wrong order, but note that we can't
only flip the order around because it releases the inode lock before the
call to generic_write_sync() while it holds the lock_sem across that
call.

But, AFAICS, there is no need to hold the CifsInodeInfo::lock_sem across
the generic_write_sync() call either, so we can release both the locks
before generic_write_sync(), and change the order.

 ======================================================
 WARNING: possible circular locking dependency detected
 4.12.0-rc7+ #9 Not tainted
 ------------------------------------------------------
 touch/487 is trying to acquire lock:
  (&cifsi->lock_sem){++++..}, at: cifsFileInfo_put+0x88f/0x16a0

 but task is already holding lock:
  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&sb->s_type->i_mutex_key#11){+.+.+.}:
        __lock_acquire+0x1f74/0x38f0
        lock_acquire+0x1cc/0x600
        down_write+0x74/0x110
        cifs_strict_writev+0x3cb/0x8c0
        __vfs_write+0x4c1/0x930
        vfs_write+0x14c/0x2d0
        SyS_write+0xf7/0x240
        entry_SYSCALL_64_fastpath+0x1f/0xbe

 -> #0 (&cifsi->lock_sem){++++..}:
        check_prevs_add+0xfa0/0x1d10
        __lock_acquire+0x1f74/0x38f0
        lock_acquire+0x1cc/0x600
        down_write+0x74/0x110
        cifsFileInfo_put+0x88f/0x16a0
        cifs_setattr+0x992/0x1680
        notify_change+0x61a/0xa80
        utimes_common+0x3d4/0x870
        do_utimes+0x1c1/0x220
        SyS_utimensat+0x84/0x1a0
        entry_SYSCALL_64_fastpath+0x1f/0xbe

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&sb->s_type->i_mutex_key#11);
                                lock(&cifsi->lock_sem);
                                lock(&sb->s_type->i_mutex_key#11);
   lock(&cifsi->lock_sem);

  *** DEADLOCK ***

 2 locks held by touch/487:
  #0:  (sb_writers#10){.+.+.+}, at: mnt_want_write+0x41/0xb0
  #1:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870

 stack backtrace:
 CPU: 0 PID: 487 Comm: touch Not tainted 4.12.0-rc7+ #9
 Call Trace:
  dump_stack+0xdb/0x185
  print_circular_bug+0x45b/0x790
  __lock_acquire+0x1f74/0x38f0
  lock_acquire+0x1cc/0x600
  down_write+0x74/0x110
  cifsFileInfo_put+0x88f/0x16a0
  cifs_setattr+0x992/0x1680
  notify_change+0x61a/0xa80
  utimes_common+0x3d4/0x870
  do_utimes+0x1c1/0x220
  SyS_utimensat+0x84/0x1a0
  entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: 19dfc1f5 ("cifs: fix the race in cifs_writev()")
Signed-off-by: NRabin Vincent <rabinv@axis.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
Acked-by: NPavel Shilovsky <pshilov@microsoft.com>

966681c9

cifs: set oparms.create_options rather than or'ing in CREATE_OPEN_BACKUP_INTENT · 709340a0

由 Colin Ian King 提交于 7月 05, 2017

Currently oparms.create_options is uninitialized and the code is logically
or'ing in CREATE_OPEN_BACKUP_INTENT onto a garbage value of
oparms.create_options from the stack.  Fix this by just setting the value
rather than or'ing in the setting.

Detected by CoverityScan, CID#1447220 ("Unitialized scale value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

709340a0

cifs: Do not modify mid entry after submitting I/O in cifs_call_async · 93d2cb6c

由 Long Li 提交于 6月 28, 2017

In cifs_call_async, server may respond as soon as I/O is submitted. Because
mid entry is freed on the return path, it should not be modified after I/O
is submitted.

cifs_save_when_sent modifies the sent timestamp in mid entry, and should not
be called after I/O. Call it before I/O.
Signed-off-by: NLong Li <longli@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

93d2cb6c

CIFS: add SFM mapping for 0x01-0x1F · 7e46f090

由 Björn JACKE 提交于 6月 01, 2017

Hi,

attached patch adds more missing mappings for the 0x01-0x1f range. Please
review, if you're fine with it, considere it also for stable.

Björn

>From a97720c26db2ee77d4e798e3d383fcb6a348bd29 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Jacke?= <bjacke@samba.org>
Date: Wed, 31 May 2017 22:48:41 +0200
Subject: [PATCH] cifs: add SFM mapping for 0x01-0x1F

0x1-0x1F has to be mapped to 0xF001-0xF01F
Signed-off-by: NBjoern Jacke <bjacke@samba.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

7e46f090

cifs: hide unused functions · 84908426

由 Arnd Bergmann 提交于 6月 27, 2017

Some functions are only referenced under an #ifdef, causing a harmless
warning:

fs/cifs/smb2ops.c:1374:1: error: 'get_smb2_acl' defined but not used [-Werror=unused-function]

We could mark them __maybe_unused or add another #ifdef, I picked
the second approach here.

Fixes: b3fdda4d1e1b ("cifs: Use smb 2 - 3 and cifsacl mount options getacl functions")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NSteve French <smfrench@gmail.com>

84908426

cifs: Use smb 2 - 3 and cifsacl mount options getacl functions · 2f1afe25

由 Shirish Pargaonkar 提交于 6月 22, 2017

Fill in smb2/3 query acl functions in ops structures and use them.
Signed-off-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

2f1afe25

cifs: prototype declaration and definition for smb 2 - 3 and cifsacl mount options · 42c493c1

由 Shirish Pargaonkar 提交于 6月 22, 2017

Add definition and declaration of function to get cifs acls when
mounting with smb version 2 onwards to 3.

Extend/Alter query info function to allocate and return
security descriptors within the response.

Not yet handling the error case when the size of security descriptors
in response to query exceeds SMB2_MAX_BUFFER_SIZE.
Signed-off-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

42c493c1

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功