提交 · c53cd490b1a491ebf1d8e30da97e7231459a4208 · openeuler / raspberrypi-kernel

29 7月, 2017 6 次提交

kernfs: introduce kernfs_node_id · c53cd490

由 Shaohua Li 提交于 7月 12, 2017

inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c53cd490

kernfs: don't set dentry->d_fsdata · 319ba91d

由 Shaohua Li 提交于 7月 12, 2017

When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata. inode->i_private
already points to kernfs_node, and we can get inode from a dentry. So
this patch just delete the d_fsdata usage.
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

319ba91d

kernfs: add an API to get kernfs node from inode number · ba16b284

由 Shaohua Li 提交于 7月 12, 2017

Add an API to get kernfs node from inode number. We will need this to
implement exportfs operations.

This API will be used in blktrace too later, so it should be as fast as
possible. To make the API lock free, kernfs node is freed in RCU
context. And we depend on kernfs_node count/ino number to filter out
stale kernfs nodes.
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ba16b284

kernfs: implement i_generation · 4a3ef68a

由 Shaohua Li 提交于 7月 12, 2017

Set i_generation for kernfs inode. This is required to implement
exportfs operations. The generation is 32-bit, so it's possible the
generation wraps up and we find stale files. To reduce the posssibility,
we don't reuse inode numer immediately. When the inode number allocation
wraps, we increase generation number. In this way generation/inode
number consist of a 64-bit number which is unlikely duplicated. This
does make the idr tree more sparse and waste some memory. Since idr
manages 32-bit keys, idr uses a 6-level radix tree, each level covers 6
bits of the key. In a 100k inode kernfs, the worst case will have around
300k radix tree node. Each node is 576bytes, so the tree will use about
~150M memory. Sounds not too bad, if this really is a problem, we should
find better data structure.
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4a3ef68a

kernfs: use idr instead of ida to manage inode number · 7d35079f

由 Shaohua Li 提交于 7月 12, 2017

kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7d35079f

NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter · b7dbcc0e

由 Benjamin Coddington 提交于 7月 28, 2017

nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
same region protected by the wait_queue's lock after checking for a
notification from CB_NOTIFY_LOCK callback. However, after releasing that
lock, a wakeup for that task may race in before the call to
freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
freezable_schedule_timeout_interruptible() will set the state back to
TASK_INTERRUPTIBLE before the task will sleep. The result is that the task
will sleep for the entire duration of the timeout.

Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
freezable_schedule_timout() instead.

Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b7dbcc0e

27 7月, 2017 3 次提交

NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43

由 NeilBrown 提交于 7月 24, 2017

posix_fallocate() will allocate space in an NFS file by considering
the last byte of every 4K block. If it is before EOF, it will read
the byte and if it is zero, a zero is written out. If it is after EOF,
the zero is unconditionally written.

For the blocks beyond EOF, if NFS believes its cache is valid, it will
expand these writes to write full pages, and then will merge the pages.
This results if (typically) 1MB writes. If NFS believes its cache is
not valid (particularly if NFS_INO_INVALID_DATA or
NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
send the individual 1-byte writes. This results in (typically) 256 times
as many RPC requests, and can be substantially slower.

Currently nfs_revalidate_mapping() is only used when reading a file or
mmapping a file, as these are times when the content needs to be
up-to-date. Writes don't generally need the cache to be up-to-date, but
writes beyond EOF can benefit, particularly in the posix_fallocate()
case.

So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
i.e. when there is a gap between the end of the file and the start of
the write. If the cache is thought to be out of date (as happens after
taking a file lock), this will cause a GETATTR, and the two flags
mentioned above will be cleared. With this, posix_fallocate() on a
newly locked file does not generate excessive tiny writes.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6ba80d43

NFS: invalidate file size when taking a lock. · 442ce049

由 NeilBrown 提交于 7月 24, 2017

Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
for writing"), NFS would revalidate, or invalidate, the file size when
taking a lock.  Since that commit it only invalidates the file content.

If the file size is changed on the server while wait for the lock, the
client will have an incorrect understanding of the file size and could
corrupt data.  This particularly happens when writing beyond the
(supposed) end of file and can be easily be demonstrated with
posix_fallocate().

If an application opens an empty file, waits for a write lock, and then
calls posix_fallocate(), glibc will determine that the underlying
filesystem doesn't support fallocate (assuming version 4.1 or earlier)
and will write out a '0' byte at the end of each 4K page in the region
being fallocated that is after the end of the file.
NFS will (usually) detect that these writes are beyond EOF and will
expand them to cover the whole page, and then will merge the pages.
Consequently, NFS will write out large blocks of zeroes beyond where it
thought EOF was.  If EOF had moved, the pre-existing part of the file
will be over-written.  Locking should have protected against this,
but it doesn't.

This patch restores the use of nfs_zap_caches() which invalidated the
cached attributes.  When posix_fallocate() asks for the file size, the
request will go to the server and get a correct answer.

cc: stable@vger.kernel.org (v4.8+)
Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

442ce049

NFS: Use raw NFS access mask in nfs4_opendata_access() · 1e6f2095

由 Anna Schumaker 提交于 7月 25, 2017

Commit bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
access cache") changed how the access results are stored after an
access() call.  An NFS v4 OPEN might have access bits returned with the
opendata, so we should use the NFS4_ACCESS values when determining the
return value in nfs4_opendata_access().

Fixes: bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
access cache")
Reported-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Tested-by: NTakashi Iwai <tiwai@suse.de>

1e6f2095

26 7月, 2017 1 次提交

xfs: fix multi-AG deadlock in xfs_bunmapi · 5b094d6d

由 Christoph Hellwig 提交于 7月 18, 2017

Just like in the allocator we must avoid touching multiple AGs out of
order when freeing blocks, as freeing still locks the AGF and can cause
the same AB-BA deadlocks as in the allocation path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

5b094d6d

25 7月, 2017 1 次提交

xfs: check that dir block entries don't off the end of the buffer · 6215894e

由 Darrick J. Wong 提交于 7月 21, 2017

When we're checking the entries in a directory buffer, make sure that
the entry length doesn't push us off the end of the buffer.  Found via
xfs/388 writing ones to the length fields.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

6215894e

24 7月, 2017 4 次提交

xfs: fix quotacheck dquot id overflow infinite loop · cfaf2d03

由 Brian Foster 提交于 7月 24, 2017

If a dquot has an id of U32_MAX, the next lookup index increment
overflows the uint32_t back to 0. This starts the lookup sequence
over from the beginning, repeats indefinitely and results in a
livelock.

Update xfs_qm_dquot_walk() to explicitly check for the lookup
overflow and exit the loop.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

cfaf2d03

btrfs: round down size diff when shrinking/growing device · 0e4324a4

由 Nikolay Borisov 提交于 7月 21, 2017

Further testing showed that the fix introduced in 7dfb8be1 ("btrfs:
Round down values which are written for total_bytes_size") was
insufficient and it could still lead to discrepancies between the
total_bytes in the super block and the device total bytes. So this patch
also ensures that the difference between old/new sizes when
shrinking/growing is also rounded down. This ensure that we won't be
subtracting/adding a non-sectorsize multiples to the superblock/device
total sizees.

Fixes: 7dfb8be1 ("btrfs: Round down values which are written for total_bytes_size")
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0e4324a4

Btrfs: fix early ENOSPC due to delalloc · 17024ad0

由 Omar Sandoval 提交于 7月 20, 2017

If a lot of metadata is reserved for outstanding delayed allocations, we
rely on shrink_delalloc() to reclaim metadata space in order to fulfill
reservation tickets. However, shrink_delalloc() has a shortcut where if
it determines that space can be overcommitted, it will stop early. This
made sense before the ticketed enospc system, but now it means that
shrink_delalloc() will often not reclaim enough space to fulfill any
tickets, leading to an early ENOSPC. (Reservation tickets don't care
about being able to overcommit, they need every byte accounted for.)

Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
all of the metadata it is supposed to. This fixes early ENOSPCs we were
seeing when doing a btrfs receive to populate a new filesystem, as well
as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.

Fixes: 957780eb ("Btrfs: introduce ticketed enospc infrastructure")
Tested-by: NChristoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Cc: stable@vger.kernel.org
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

17024ad0

btrfs: fix lockup in find_free_extent with read-only block groups · 14443937

由 Jeff Mahoney 提交于 7月 19, 2017

If we have a block group that is all of the following:
1) uncached in memory
2) is read-only
3) has a disk cache state that indicates we need to recreate the cache

AND the file system has enough free space fragmentation such that the
request for an extent of a given size can't be honored;

AND have a single CPU core;

AND it's the block group with the highest starting offset such that
there are no opportunities (like reading from disk) for the loop to
yield the CPU;

We can end up with a lockup.

The root cause is simple.  Once we're in the position that we've read in
all of the other block groups directly and none of those block groups
can honor the request, there are no more opportunities to sleep.  We end
up trying to start a caching thread which never gets run if we only have
one core.  This *should* present as a hung task waiting on the caching
thread to make some progress, but it doesn't.  Instead, it degrades into
a busy loop because of the placement of the read-only check.

During the first pass through the loop, block_group->cached will be set
to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
read-only check and short circuit the loop.  We're not yet in
LOOP_CACHING_WAIT, so we skip that loop back before going through the
loop again for other raid groups.

Then we move to LOOP_CACHING_WAIT state.

During the this pass through the loop, ->cached will still be
BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
cache_block_group, do a lot of nothing, and return, and also set
have_caching_bg again.  Then we hit the read-only check and short circuit
the loop.  The same thing happens as before except now we DO trigger
the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
top.  We do this forever.

There are two fixes in this patch since they address the same underlying
bug.

The first is to add a cond_resched to the end of the loop to ensure
that the caching thread always has an opportunity to run.  This will
fix the soft lockup issue, but find_free_extent will still loop doing
nothing until the thread has completed.

The second is to move the read-only check to the top of the loop.  We're
never going to return an allocation within a read-only block group so
we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
would cause the same problem except that BTRFS_CACHE_ERROR is considered
a "done" state and we won't re-set have_caching_bg again.

Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
the testing process.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

14443937

22 7月, 2017 1 次提交

NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid() · 1ebf9801

由 Trond Myklebust 提交于 7月 20, 2017

We must set fl->dsaddr once, and once only, even if there are multiple
processes calling filelayout_check_deviceid() for the same layout
segment.
Reported-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1ebf9801

21 7月, 2017 8 次提交

NFS: Be more careful about mapping file permissions · ecbb903c

由 Trond Myklebust 提交于 7月 11, 2017

When mapping a directory, we want the MAY_WRITE permissions to reflect
whether or not we have permission to modify, add and delete the directory
entries. MAY_EXEC must map to lookup permissions.

On the other hand, for files, we want MAY_WRITE to reflect a permission
to modify and extend the file.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ecbb903c

NFS: Store the raw NFS access mask in the inode's access cache · bd8b2441

由 Trond Myklebust 提交于 7月 11, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bd8b2441

NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask() · eda3e208

由 Trond Myklebust 提交于 7月 11, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eda3e208

NFS: Refactor NFS access to kernel access mask calculation · 15d4b73a

由 Trond Myklebust 提交于 7月 11, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

15d4b73a

nfs: count correct array for mnt3_counts array size · ecc7b435

由 Eryu Guan 提交于 7月 18, 2017

Array size of mnt3_counts should be the size of array
mnt3_procedures, not mnt_procedures, though they're same in size
right now. Found this by code inspection.

Fixes: 1c5876dd ("sunrpc: move p_count out of struct rpc_procinfo")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NEryu Guan <eguan@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ecc7b435

xfs: check _alloc_read_agf buffer pointer before using · 10479e2d

由 Darrick J. Wong 提交于 7月 17, 2017

In some circumstances, _alloc_read_agf can return an error code of zero
but also a null AGF buffer pointer.  Check for this and jump out.

Fixes-coverity-id: 1415250
Fixes-coverity-id: 1415320
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

10479e2d

xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write · 4c1a67bd

由 Darrick J. Wong 提交于 7月 17, 2017

We must initialize the firstfsb parameter to _bmapi_write so that it
doesn't incorrectly treat stack garbage as a restriction on which AGs
it can search for free space.

Fixes-coverity-id: 1402025
Fixes-coverity-id: 1415167
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

4c1a67bd

xfs: check _btree_check_block value · 1e86eabe

由 Darrick J. Wong 提交于 7月 17, 2017

Check the _btree_check_block return value for the firstrec and lastrec
functions, since we have the ability to signal that the repositioning
did not succeed.

Fixes-coverity-id: 114067
Fixes-coverity-id: 114068
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

1e86eabe

20 7月, 2017 9 次提交

ovl: check for bad and whiteout index on lookup · 0e082555

由 Amir Goldstein 提交于 7月 18, 2017

Index should always be of the same file type as origin, except for
the case of a whiteout index.  A whiteout index should only exist
if all lower aliases have been unlinked, which means that finding
a lower origin on lookup whose index is a whiteout should be treated
as a lookup error.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0e082555

ovl: do not cleanup directory and whiteout index entries · 61b67471

由 Amir Goldstein 提交于 7月 18, 2017

Directory index entries are going to be used for looking up
redirected upper dirs by lower dir fh when decoding an overlay
file handle of a merge dir.

Whiteout index entries are going to be used as an indication that
an exported overlay file handle should be treated as stale (i.e.
after unlink of the overlay inode).

We don't know the verification rules for directory and whiteout
index entries, because they have not been implemented yet, so fail
to mount overlay rw if those entries are found to avoid corrupting
an index that was created by a newer kernel.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

61b67471

ovl: fix xattr get and set with selinux · 1d88f183

由 Miklos Szeredi 提交于 7月 20, 2017

inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr
to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which
depends on dentry->d_inode, but d_inode is null and not initialized yet at
this point resulting in an Oops.

Fix by getting the upperdentry info from the inode directly in this case.
Reported-by: NEryu Guan <eguan@redhat.com>
Fixes: 09d8b586 ("ovl: move __upperdentry to ovl_inode")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

1d88f183

Revert commit ("pNFS: Don't send COMMITs to the DSes if...") · 21329736

由 Trond Myklebust 提交于 7月 12, 2017

Doing the test without taking any locks is racy, and so really it makes
more sense to do it in the flexfiles code (which is the only case that
cares).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

21329736

pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit() · 4b75053e

由 Trond Myklebust 提交于 7月 12, 2017

If the layout has expired due to a fencing event, then we should not
attempt to commit to the DS.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4b75053e

NFS: Fix another COMMIT race in pNFS · 41181886

由 Trond Myklebust 提交于 7月 12, 2017

We must make sure that cinfo->ds->ncommitting is in sync with the
commit list, since it is checked as part of pnfs_commit_list().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

41181886

NFS: Fix a COMMIT race in pNFS · e39928f9

由 Trond Myklebust 提交于 7月 12, 2017

We must make sure that cinfo->ds->nwritten is in sync with the
commit list, since it is checked as part of pnfs_scan_commit_lists().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e39928f9

mount: copy the port field into the cloned nfs_server structure. · 89a6814d

由 Steve Dickson 提交于 6月 29, 2017

Doing this copy eliminates the "port=0" entry in
the /proc/mounts entries

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=69241Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

89a6814d

Btrfs: fix dir item validation when replaying xattr deletes · e33bf723

由 Filipe Manana 提交于 7月 18, 2017

We were passing an incorrect slot number to the function that validates
directory items when we are replaying xattr deletes from a log tree. The
correct slot is stored at variable 'i' and not at 'path->slots[0]', so
the call to the validation function was only correct for the first
iteration of the loop, when 'i == path->slots[0]'.
After this fix, the fstest generic/066 passes again.

Fixes: 8ee8c2d6 ("btrfs: Verify dir_item in replay_xattr_deletes")
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e33bf723

19 7月, 2017 3 次提交

jfs: preserve i_mode if __jfs_set_acl() fails · f070e5ac

由 Ernesto A. Fernández 提交于 7月 12, 2017

When changing a file's acl mask, __jfs_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.
Signed-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>

f070e5ac

jfs: Don't clear SGID when inheriting ACLs · 9bcf66c7

由 Jan Kara 提交于 6月 22, 2017

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__jfs_set_acl() into jfs_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 07393101
CC: stable@vger.kernel.org
CC: jfs-discussion@lists.sourceforge.net
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>

9bcf66c7

hfsplus: Don't clear SGID when inheriting ACLs · 84969465

由 Jan Kara 提交于 6月 21, 2017

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by creating __hfsplus_set_posix_acl() function that does
not call posix_acl_update_mode() and use it when inheriting ACLs. That
prevents SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.

Fixes: 07393101
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

84969465

18 7月, 2017 4 次提交

isofs: Fix off-by-one in 'session' mount option parsing · 34363c05

由 Jan Kara 提交于 7月 18, 2017

According to ECMA-130 standard maximum valid track number is 99. Since
'session' mount option starts indexing at 0 (and we add 1 to the passed
number), we should refuse value 99. Also the condition in
isofs_get_last_session() unnecessarily repeats the check - remove it.
Reported-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>

34363c05

reiserfs: preserve i_mode if __reiserfs_set_acl() fails · fcea8aed

由 Ernesto A. Fernández 提交于 7月 17, 2017

When changing a file's acl mask, reiserfs_set_acl() will first set the
group bits of i_mode to the value of the mask, and only then set the
actual extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the
file had no acl attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.
Signed-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

fcea8aed

ext2: preserve i_mode if ext2_set_acl() fails · fe26569e

由 Ernesto A. Fernández 提交于 7月 12, 2017

When changing a file's acl mask, ext2_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

[JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"]
Signed-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

fe26569e

f2fs: avoid cpu lockup · 4db08d01

由 Jaegeuk Kim 提交于 7月 14, 2017

Before retrying to flush data or dentry pages, we need to release cpu in order
to prevent watchdog.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4db08d01