提交 · 235628c5c76040b0ec206ea9ab9e017771e0d78e · openeuler / Kernel

19 1月, 2018 5 次提交

gfs2: Add gfs2_max_stuffed_size · 235628c5

由 Andreas Gruenbacher 提交于 11月 14, 2017

Add a small inline function for computing the maximum size of a stuffed
inode instead of open coding that in several places throughout the code.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

235628c5

gfs2: Typo fixes · 9db115a0

由 Andreas Gruenbacher 提交于 11月 18, 2017

Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

9db115a0

gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE) · 4e56a641

由 Andreas Gruenbacher 提交于 12月 14, 2017

Implement the top-level bits of punching a hole into a file.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

4e56a641

gfs2: Turn trunc_dealloc into punch_hole · 10d2cf94

由 Andreas Gruenbacher 提交于 12月 19, 2017

Add an upper bound to the range of blocks to deallocate blocks to
function trunc_dealloc so that this function can be used for truncating
a file as well as for punching a hole into a file.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

10d2cf94

gfs2: Generalize truncate code · 5cf26b1e

由 Andreas Gruenbacher 提交于 12月 11, 2017

Pull the code for computing the range of metapointers to iterate out of
gfs2_metapath_ra (for readahead), sweep_bh_for_rgrps (for deallocating
metapointers within a block), and trunc_dealloc (for walking the
metadata tree).

In sweep_bh_for_rgrps, move the code for looking up the resource group
descriptor of the current resource group out of the inner loop.  The
metatype check moves to trunc_dealloc.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

5cf26b1e

17 1月, 2018 9 次提交

Turn gfs2_block_truncate_page into gfs2_block_zero_range · bdba0d5e

由 Andreas Gruenbacher 提交于 12月 13, 2017

Turn gfs2_block_truncate_page into a function that zeroes a range within
a block rather than only the end of a block.  This will be used for
cleaning the end of the first partial block and the start of the last
partial block when punching a hole in a file.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

bdba0d5e

gfs2: Improve non-recursive delete algorithm · cb7f0903

由 Andreas Gruenbacher 提交于 12月 04, 2017

In rare cases, the current non-recursive delete algorithm doesn't
deallocate empty intermediary indirect blocks. This should have very
little practical effect, but deallocating all blocks correctly should
still be preferable as it is cleaner and easier to validate.

The fix consists of using the first block to deallocate to compute the
start marker of the truncate point instead of the last block that needs
to be kept. With that change, computing which indirect blocks are still
needed becomes relatively easy.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

cb7f0903

gfs2: Fix metadata read-ahead during truncate · c3ce5aa9

由 Andreas Gruenbacher 提交于 12月 08, 2017

The metadata read-ahead algorithm broke when switching from recursive to
non-recursive delete: the current algorithm reads ahead blocks at height
N - 1 while deallocating the blocks at hight N.  However, deallocating
the blocks at height N requires a complete walk of the metadata tree,
not only down to height N - 1.  Consequently, all blocks below height
N - 1 will be accessed without read-ahead.

Fix this by issuing read-aheads as early as possible, after each
metapath lookup.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

c3ce5aa9

gfs2: Clean up {lookup,fillup}_metapath · e8b43fe0

由 Andreas Gruenbacher 提交于 12月 08, 2017

Split out the entire lookup loop from lookup_metapath and
fillup_metapath.  Make both functions return the actual height in
mp->mp_aheight, and return 0 on success.  Handle lookup errors properly
in trunc_dealloc.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

e8b43fe0

gfs2: Remove minor gfs2_journaled_truncate inefficiencies · e7fdf004

由 Andreas Gruenbacher 提交于 12月 12, 2017

First, this function truncates the file in chunks. When the original
file size isn't block aligned, each chunk that is truncated will remain
be misaligned. This is inefficient.

Second, this function doesn't recognize where holes are, so it loops
through them. For each chunk of a hole, it creates a new transaction.
At least avoid creating another transactions whe the current one is
still empty. (An better fix would be to skip large holes, of course.)
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

e7fdf004

A
gfs2: truncate: Remove unnecessary oldsize parameters · 8b5860a3
由 Andreas Gruenbacher 提交于 12月 12, 2017
```
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
```
8b5860a3

gfs2: Clean up trunc_start error path · 80990f40

由 Andreas Gruenbacher 提交于 12月 12, 2017

Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

80990f40

gfs2: Remove pointless BUG_ON · da5eb9cd

由 Andreas Gruenbacher 提交于 12月 12, 2017

The current transaction is being dereferenced before asserting that is
not NULL; that isn't going to help.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

da5eb9cd

gfs2: Add gfs2_blk2rgrpd comment and fix incorrect use · 90bcab99

由 Steven Whitehouse 提交于 12月 22, 2017

Document when to use gfs2_blk2rgrpd for "inexact" resource group
matching.  Based on that, fix an incorrect use of gfs2_blk2rgrpd in
sweep_bh_for_rgrps.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

90bcab99

22 12月, 2017 2 次提交

gfs2: Trim the ordered write list in gfs2_ordered_write() · 1f23bc78

由 Abhi Das 提交于 12月 22, 2017

We iterate through the entire ordered writes list in
gfs2_ordered_write() to write out inodes. It's a good
place to try and shrink the list by throwing out inodes
that don't have any pages.
Signed-off-by: NAbhi Das <adas@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

1f23bc78

GFS2: Reduce code redundancy writing log headers · 588bff95

由 Bob Peterson 提交于 12月 18, 2017

Before this patch, there was a lot of code redundancy between functions
log_write_header (which uses bio) and clean_journal (which uses
buffer_head). This patch reduces the redundancy to simplify the code
and make log header writing more consistent. We want more consistency
and reduced redundancy because we plan to add a bunch of new fields
to improve performance (by eliminating the local statfs and quota files)
improve metadata integrity (by adding new crcs and such) and for better
debugging (by adding new fields to track when and where metadata was
pushed through the journals.) We don't want to duplicate setting these
new fields, nor allow for human error in the process.

This reduction in code redundancy is accomplished by introducing a new
helper function, gfs2_write_log_header which uses bio rather than bh.
That simplifies recovery function clean_journal() to use the new helper
function and iomap rather than redundancy and block_map (and eventually
we can maybe remove block_map). It also reduces our dependency on
buffer_heads.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

588bff95

18 12月, 2017 2 次提交

Revert "exec: avoid RLIMIT_STACK races with prlimit()" · 779f4e1c

由 Kees Cook 提交于 12月 12, 2017

This reverts commit 04e35f44.

SELinux runs with secureexec for all non-"noatsecure" domain transitions,
which means lots of processes end up hitting the stack hard-limit change
that was introduced in order to fix a race with prlimit(). That race fix
will need to be redesigned.
Reported-by: NLaura Abbott <labbott@redhat.com>
Reported-by: NTomáš Trnka <trnka@scm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

779f4e1c

cramfs: fix MTD dependency · b9f5fb18

由 Arnd Bergmann 提交于 11月 10, 2017

With CONFIG_MTD=m and CONFIG_CRAMFS=y, we now get a link failure:

  fs/cramfs/inode.o: In function `cramfs_mount': inode.c:(.text+0x220): undefined reference to `mount_mtd'
  fs/cramfs/inode.o: In function `cramfs_mtd_fill_super':
  inode.c:(.text+0x6d8): undefined reference to `mtd_point'
  inode.c:(.text+0xae4): undefined reference to `mtd_unpoint'

This adds a more specific Kconfig dependency to avoid the broken
configuration.

Alternatively we could make CRAMFS itself depend on "MTD || !MTD" with a
similar result.

Fixes: 99c18ce5 ("cramfs: direct memory access support")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9f5fb18

17 12月, 2017 1 次提交

locking/barriers: Convert users of lockless_dereference() to READ_ONCE() · 3382290e

由 Will Deacon 提交于 10月 24, 2017

[ Note, this is a Git cherry-pick of the following commit:

    506458ef ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")

  ... for easier x86 PTI code testing and back-porting. ]

READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

3382290e

16 12月, 2017 3 次提交

Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths" · f6f37321

由 Linus Torvalds 提交于 12月 15, 2017

This reverts commits 5c9d2d5c, c7da82b8, and e7fe7b5c.

We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.

Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was.  Dave
Hansen says:

 "So, the underlying bug here is that we now a get_user_pages_remote()
  and then go ahead and do the p*_access_permitted() checks against the
  current PKRU. This was introduced recently with the addition of the
  new p??_access_permitted() calls.

  We have checks in the VMA path for the "remote" gups and we avoid
  consulting PKRU for them. This got missed in the pkeys selftests
  because I did a ptrace read, but not a *write*. I also didn't
  explicitly test it against something where a COW needed to be done"

It's also not entirely clear that it makes sense to check the protection
key bits at this level at all.  But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.

We'll see.

Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back.  Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6f37321

nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests · dc4fd9ab

由 Scott Mayhew 提交于 12月 08, 2017

If there were no commit requests, then nfs_commit_inode() should not
wait on the commit or mark the inode dirty, otherwise the following
BUG_ON can be triggered:

[ 1917.130762] kernel BUG at fs/inode.c:578!
[ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
[ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G               ------------ T 3.10.0-768.el7.ppc64 #1
[ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000
[ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80
[ 1917.130816] REGS: c00000004cea3720 TRAP: 0700   Tainted: G               ------------ T  (3.10.0-768.el7.ppc64)
[ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>  CR: 22002822  XER: 20000000
[ 1917.130828] CFAR: c00000000011f594 SOFTE: 1
GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
[ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350
[ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350
[ 1917.130873] Call Trace:
[ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable)
[ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270
[ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0
[ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
[ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
[ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180
[ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150
[ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100
[ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
[ 1917.130919] Instruction dump:
[ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
[ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org # 4.5+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

dc4fd9ab

nfs: fix a deadlock in nfs client initialization · c156618e

由 Scott Mayhew 提交于 12月 05, 2017

The following deadlock can occur between a process waiting for a client
to initialize in while walking the client list during nfsv4 server trunking
detection and another process waiting for the nfs_clid_init_mutex so it
can initialize that client:

Process 1                               Process 2
---------                               ---------
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTA->cl_share_link,
        &nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
                                        spin_lock(&nn->nfs_client_lock);
                                        list_add_tail(&CLIENTB->cl_share_link,
                                                &nn->nfs_client_list);
                                        spin_unlock(&nn->nfs_client_lock);
                                        mutex_lock(&nfs_clid_init_mutex);
                                        nfs41_walk_client_list(clp, result, cred);
                                        nfs_wait_client_init_complete(CLIENTA);
(waiting for nfs_clid_init_mutex)

Make sure nfs_match_client() only evaluates clients that have completed
initialization in order to prevent that deadlock.

This patch also fixes v4.0 trunking behavior by not marking the client
NFS_CS_READY until the clientid has been confirmed.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c156618e

15 12月, 2017 3 次提交

kernel: make groups_sort calling a responsibility group_info allocators · bdcf0a42

由 Thiago Rafael Becker 提交于 12月 14, 2017

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.comSigned-off-by: NThiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NNeilBrown <neilb@suse.com>
Acked-by: N"J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bdcf0a42

exec: avoid gcc-8 warning for get_task_comm · 3756f640

由 Arnd Bergmann 提交于 12月 14, 2017

gcc-8 warns about using strncpy() with the source size as the limit:

  fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]

This is indeed slightly suspicious, as it protects us from source
arguments without NUL-termination, but does not guarantee that the
destination is terminated.

This keeps the strncpy() to ensure we have properly padded target
buffer, but ensures that we use the correct length, by passing the
actual length of the destination buffer as well as adding a build-time
check to ensure it is exactly TASK_COMM_LEN.

There are only 23 callsites which I all reviewed to ensure this is
currently the case.  We could get away with doing only the check or
passing the right length, but it doesn't hurt to do both.

Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
Suggested-by: NKees Cook <keescook@chromium.org>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3756f640

autofs: fix careless error in recent commit · 302ec300

由 NeilBrown 提交于 12月 14, 2017

Commit ecc0c469 ("autofs: don't fail mount for transient error") was
meant to replace an 'if' with a 'switch', but instead added the 'switch'
leaving the case in place.

Link: http://lkml.kernel.org/r/87zi6wstmw.fsf@notabene.neil.brown.name
Fixes: ecc0c469 ("autofs: don't fail mount for transient error")
Reported-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NNeilBrown <neilb@suse.com>
Cc: Ian Kent <raven@themaw.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

302ec300

14 12月, 2017 1 次提交

ovl: fix overlay: warning prefix · da2e6b7e

由 Amir Goldstein 提交于 11月 22, 2017

Conform two stray warning messages to the standard overlayfs: prefix.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

da2e6b7e

13 12月, 2017 3 次提交

gfs2: Add a crc field to resource group headers · 850d2d91

由 Andrew Price 提交于 12月 12, 2017

Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This
allows us to check resource group headers' integrity and removes the
requirement to check them against the rindex entries in fsck. If this
field is found to be zero, it should be ignored (or updated with an
accurate value).
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

850d2d91

gfs2: Add rindex fields to rgrp headers · 166725d9

由 Andrew Price 提交于 12月 12, 2017

Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields
are identical to their counterparts in struct gfs2_rindex and are
intended to reduce the use of the rindex. For now the fields are only
written back as the in-memory equivalents in struct gfs2_rgrpd are set
using values from the rindex. However, they are needed at this point so
that userspace can make use of them, allowing a migration away from the
rindex over time.

The new fields take up previously reserved space which was explicitly
zeroed on write so, in clusters with mixed kernels, these fields could
get zeroed after being set and this should not be treated as an error.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

166725d9

gfs2: Add a next-resource-group pointer to resource groups · 65adc273

由 Andrew Price 提交于 12月 12, 2017

Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The
rg_skip field has the following meaning:

- If rg_skip is zero, it is considered unset and not useful.
- If rg_skip is non-zero, its value will be the number of blocks between
  this rgrp's address and the next rgrp's address. This can be used as a
  hint by fsck.gfs2 when rebuilding a bad rindex, for example.

This will provide less dependency on the rindex in future, and allow
tools such as fsck.gfs2 to iterate the resource groups without keeping
the rindex around.

The field is updated in gfs2_rgrp_out() so that existing file systems
will have it set. This means that any resource groups that aren't ever
written will not be updated. The final rgrp is a special case as there
is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is
extended).

Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so
the rg_skip field can get set back to 0 in cases where nodes with and
without this patch are mixed in a cluster. In some cases, the field may
bounce between being set by one node and then zeroed by another which
may harm performance slightly, e.g. when two nodes create many small
files. In testing this situation is rare but it becomes more likely as
the filesystem fills up and there are fewer resource groups to choose
from. The problem goes away when all nodes are running with this patch.
Dipping into the space currently occupied by the rg_reserved field would
have resulted in the same problem as it is also explicitly zeroed, so
unfortunately there is no other way around it.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

65adc273

12 12月, 2017 1 次提交

ext4: fix crash when a directory's i_size is too small · 9d5afec6

由 Chandan Rajendra 提交于 12月 11, 2017

On a ppc64 machine, when mounting a fuzzed ext2 image (generated by
fsfuzzer) the following call trace is seen,

VFS: brelse: Trying to free free buffer
WARNING: CPU: 1 PID: 6913 at /root/repos/linux/fs/buffer.c:1165 .__brelse.part.6+0x24/0x40
.__brelse.part.6+0x20/0x40 (unreliable)
.ext4_find_entry+0x384/0x4f0
.ext4_lookup+0x84/0x250
.lookup_slow+0xdc/0x230
.walk_component+0x268/0x400
.path_lookupat+0xec/0x2d0
.filename_lookup+0x9c/0x1d0
.vfs_statx+0x98/0x140
.SyS_newfstatat+0x48/0x80
system_call+0x58/0x6c

This happens because the directory that ext4_find_entry() looks up has
inode->i_size that is less than the block size of the filesystem. This
causes 'nblocks' to have a value of zero. ext4_bread_batch() ends up not
reading any of the directory file's blocks. This renders the entries in
bh_use[] array to continue to have garbage data. buffer_uptodate() on
bh_use[0] can then return a zero value upon which brelse() function is
invoked.

This commit fixes the bug by returning -ENOENT when the directory file
has no associated blocks.
Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org

9d5afec6

11 12月, 2017 7 次提交

ovl: Use PTR_ERR_OR_ZERO() · 7879cb43

由 Vasyl Gomonovych 提交于 11月 28, 2017

Fix ptr_ret.cocci warnings:
fs/overlayfs/overlayfs.h:179:11-17: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: NVasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

7879cb43

ovl: Sync upper dirty data when syncing overlayfs · e8d4bfe3

由 Chengguang Xu 提交于 11月 29, 2017

When executing filesystem sync or umount on overlayfs,
dirty data does not get synced as expected on upper filesystem.
This patch fixes sync filesystem method to keep data consistency
for overlayfs.
Signed-off-by: NChengguang Xu <cgxu@mykernel.net>
Fixes: e593b2bf ("ovl: properly implement sync_filesystem()")
Cc: <stable@vger.kernel.org> #4.11
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e8d4bfe3

ovl: update ctx->pos on impure dir iteration · b02a16e6

由 Amir Goldstein 提交于 11月 29, 2017

This fixes a regression with readdir of impure dir in overlayfs
that is shared to VM via 9p fs.
Reported-by: NMiguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Fixes: 4edb83bb ("ovl: constant d_ino for non-merge dirs")
Cc: <stable@vger.kernel.org> #4.14
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Tested-by: NMiguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

b02a16e6

ovl: Pass ovl_get_nlink() parameters in right order · 08d8f8a5

由 Vivek Goyal 提交于 11月 27, 2017

Right now we seem to be passing index as "lowerdentry" and origin.dentry
as "upperdentry". IIUC, we should pass these parameters in reversed order
and this looks like a bug.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NAmir Goldstein <amir73il@gmail.com>
Fixes: caf70cb2 ("ovl: cleanup orphan index entries")
Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

08d8f8a5

ovl: don't follow redirects if redirect_dir=off · 438c84c2

由 Miklos Szeredi 提交于 12月 11, 2017

Overlayfs is following redirects even when redirects are disabled. If this
is unintentional (probably the majority of cases) then this can be a
problem.  E.g. upper layer comes from untrusted USB drive, and attacker
crafts a redirect to enable read access to otherwise unreadable
directories.

If "redirect_dir=off", then turn off following as well as creation of
redirects.  If "redirect_dir=follow", then turn on following, but turn off
creation of redirects (which is what "redirect_dir=off" does now).

This is a backward incompatible change, so make it dependent on a config
option.
Reported-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

438c84c2

ext4: add missing error check in __ext4_new_inode() · 996fc447

由 Theodore Ts'o 提交于 12月 10, 2017

It's possible for ext4_get_acl() to return an ERR_PTR.  So we need to
add a check for this case in __ext4_new_inode().  Otherwise on an
error we can end up oops the kernel.

This was getting triggered by xfstests generic/388, which is a test
which exercises the shutdown code path.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

996fc447

hpfs: don't bother with the i_version counter or f_version · 98087c05

由 Jeff Layton 提交于 11月 23, 2017

HPFS does not set SB_I_VERSION and does not use the i_version counter
internally.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NMikulas Patocka <mikulas@twibright.com>
Reviewed-by: NMikulas Patocka <mikulas@twibright.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98087c05

10 12月, 2017 1 次提交

VFS: Handle lazytime in do_mount() · d7ee9469

由 Markus Trippelsdorf 提交于 10月 11, 2017

Since commit e462ec50 ("VFS: Differentiate mount flags (MS_*) from
internal superblock flags") the lazytime mount option doesn't get passed
on anymore.

Fix the issue by handling the option in do_mount().
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d7ee9469

09 12月, 2017 2 次提交

xfs: make iomap_begin functions trim iomaps consistently · b7e0b6ff

由 Darrick J. Wong 提交于 12月 06, 2017

Historically, the XFS iomap_begin function only returned mappings for
exactly the range queried, i.e. it doesn't do XFS_BMAPI_ENTIRE lookups.
The current vfs iomap consumers are only set up to deal with trimmed
mappings.  xfs_xattr_iomap_begin does BMAPI_ENTIRE lookups, which is
inconsistent with the current iomap usage.  Remove the flag so that both
iomap_begin functions behave the same way.

FWIW this also fixes a behavioral regression in xattr FIEMAP that was
introduced in 4.8 wherein attr fork extents are no longer trimmed like
they used to be.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

b7e0b6ff

xfs: remove "no-allocation" reservations for file creations · f59cf5c2

由 Christoph Hellwig 提交于 12月 04, 2017

If we create a new file we will need an inode, and usually some metadata
in the parent direction. Aiming for everything to go well despite the
lack of a reservation leads to dirty transactions cancelled under a heavy
create/delete load. This patch removes those nospace transactions, which
will lead to slightly earlier ENOSPC on some workloads, but instead
prevent file system shutdowns due to cancelling dirty transactions for
others.

A customer could observe assertations failures and shutdowns due to
cancelation of dirty transactions during heavy NFS workloads as shown
below:

2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262

2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---

2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs]

2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f59cf5c2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功