- 12 1月, 2016 2 次提交
-
-
由 Dave Chinner 提交于
When we do dquot readahead in log recovery, we do not use a verifier as the underlying buffer may not have dquots in it. e.g. the allocation operation hasn't yet been replayed. Hence we do not want to fail recovery because we detect an operation to be replayed has not been run yet. This problem was addressed for inodes in commit d8914002 ("xfs: inode buffers may not be valid during recovery readahead") but the problem was not recognised to exist for dquots and their buffers as the dquot readahead did not have a verifier. The result of not using a verifier is that when the buffer is then next read to replay a dquot modification, the dquot buffer verifier will only be attached to the buffer if *readahead is not complete*. Hence we can read the buffer, replay the dquot changes and then add it to the delwri submission list without it having a verifier attached to it. This then generates warnings in xfs_buf_ioapply(), which catches and warns about this case. Fix this and make it handle the same readahead verifier error cases as for inode buffers by adding a new readahead verifier that has a write operation as well as a read operation that marks the buffer as not done if any corruption is detected. Also make sure we don't run readahead if the dquot buffer has been marked as cancelled by recovery. This will result in readahead either succeeding and the buffer having a valid write verifier, or readahead failing and the buffer state requiring the subsequent read to resubmit the IO with the new verifier. In either case, this will result in the buffer always ending up with a valid write verifier on it. Note: we also need to fix the inode buffer readahead error handling to mark the buffer with EIO. Brian noticed the code I copied from there wrong during review, so fix it at the same time. Add comments linking the two functions that handle readahead verifier errors together so we don't forget this behavioural link in future. cc: <stable@vger.kernel.org> # 3.12 - current Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When we do inode readahead in log recovery, we do can do the readahead before we've replayed the icreate transaction that stamps the buffer with inode cores. The inode readahead verifier catches this and marks the buffer as !done to indicate that it doesn't yet contain valid inodes. In adding buffer error notification (i.e. setting b_error = -EIO at the same time as as we clear the done flag) to such a readahead verifier failure, we can then get subsequent inode recovery failing with this error: XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32 This occurs when readahead completion races with icreate item replay such as: inode readahead find buffer lock buffer submit RA io .... icreate recovery xfs_trans_get_buffer find buffer lock buffer <blocks on RA completion> ..... <ra completion> fails verifier clear XBF_DONE set bp->b_error = -EIO release and unlock buffer <icreate gains lock> icreate initialises buffer marks buffer as done adds buffer to delayed write queue releases buffer At this point, we have an initialised inode buffer that is up to date but has an -EIO state registered against it. When we finally get to recovering an inode in that buffer: inode item recovery xfs_trans_read_buffer find buffer lock buffer sees XBF_DONE is set, returns buffer sees bp->b_error is set fail log recovery! Essentially, we need xfs_trans_get_buf_map() to clear the error status of the buffer when doing a lookup. This function returns uninitialised buffers, so the buffer returned can not be in an error state and none of the code that uses this function expects b_error to be set on return. Indeed, there is an ASSERT(!bp->b_error); in the transaction case in xfs_trans_get_buf_map() that would have caught this if log recovery used transactions.... This patch firstly changes the inode readahead failure to set -EIO on the buffer, and secondly changes xfs_buf_get_map() to never return a buffer with an error state set so this first change doesn't cause unexpected log recovery failures. cc: <stable@vger.kernel.org> # 3.12 - current Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 11 1月, 2016 1 次提交
-
-
由 Eric Sandeen 提交于
Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the associated comments were replicated several times across the attribute code, all dealing with what to do if the transaction was or wasn't committed. And in that replicated code, an ASSERT() test of an uninitialized variable occurs in several locations: error = xfs_attr_thing(&args); if (!error) { error = xfs_bmap_finish(&args.trans, args.flist, &committed); } if (error) { ASSERT(committed); If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish, never set "committed", and then test it in the ASSERT. Fix this up by moving the committed state internal to xfs_bmap_finish, and add a new inode argument. If an inode is passed in, it is passed through to __xfs_trans_roll() and joined to the transaction there if the transaction was committed. xfs_qm_dqalloc() was a little unique in that it called bjoin rather than ijoin, but as Dave points out we can detect the committed state but checking whether (*tpp != tp). Addresses-Coverity-Id: 102360 Addresses-Coverity-Id: 102361 Addresses-Coverity-Id: 102363 Addresses-Coverity-Id: 102364 Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 08 1月, 2016 2 次提交
-
-
由 Dave Chinner 提交于
For large sparse or fragmented files, checking every single entry in the bmapbt on every operation is prohibitively expensive. Especially as such checks rarely discover problems during normal operations on high extent coutn files. Our regression tests don't tend to exercise files with hundreds of thousands to millions of extents, so mostly this isn't noticed. However, trying to run things like xfs_mdrestore of large filesystem dumps on a debug kernel quickly becomes impossible as the CPU is completely burnt up repeatedly walking the sparse file bmapbt that is generated for every allocation that is made. Hence, if the file has more than 10,000 extents, just don't bother with walking the tree to check it exhaustively. The btree code has checks that ensure that the newly inserted/removed/modified record is correctly ordered, so the entrie tree walk in thses cases has limited additional value. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
This allows us to see page cache driven readahead in action as it passes through XFS. This helps to understand buffered read throughput problems such as readahead IO IO sizes being too small for the underlying device to reach max throughput. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 05 1月, 2016 2 次提交
-
-
由 Brian Foster 提交于
XFS now uses CRC verification over a limited section of the log to detect torn writes prior to a crash. This is difficult to test directly due to the timing and hardware requirements to cause a short write. Add a mechanism to inject CRC errors into log records to facilitate testing torn write detection during log recovery. This mechanism is dangerous and can result in filesystem corruption. Thus, it is only available in DEBUG mode for testing/development purposes. Set a non-zero value to the following sysfs entry to enable error injection: /sys/fs/xfs/<dev>/log/log_badcrc_factor Once enabled, XFS intentionally writes an invalid CRC to a log record at some random point in the future based on the provided frequency. The filesystem immediately shuts down once the record has been written to the physical log to prevent metadata writeback (e.g., AIL insertion) once the log write completes. This helps reasonably simulate a torn write to the log as the affected record must be safe to discard. The next mount after the intentional shutdown requires log recovery and should detect and recover from the torn write. Note again that this _will_ result in data loss or worse. For testing and development purposes only! Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Certain types of storage, such as persistent memory, do not provide sector atomicity for writes. This means that if a crash occurs while XFS is writing log records, only part of those records might make it to the storage. This is problematic because log recovery uses the cycle value packed at the top of each log block to locate the head/tail of the log. This can lead to CRC verification failures during log recovery and an unmountable fs for a filesystem that is otherwise consistent. Update log recovery to incorporate log record CRC verification as part of the head/tail discovery process. Once the head is located via the traditional algorithm, run a CRC-only pass over the records up to the head of the log. If CRC verification fails, assume that the records are torn as a matter of policy and trim the head block back to the start of the first bad record. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 04 1月, 2016 19 次提交
-
-
由 Dave Chinner 提交于
Doing a splice read (generic/249) generates a lockdep splat because we recursively lock the inode iolock in this path: SyS_sendfile64 do_sendfile do_splice_direct splice_direct_to_actor do_splice_to xfs_file_splice_read <<<<<< lock here default_file_splice_read vfs_readv do_readv_writev do_iter_readv_writev xfs_file_read_iter <<<<<< then here The issue here is that for DAX inodes we need to avoid the page cache path and hence simply push it into the normal read path. Unfortunately, we can't tell down at xfs_file_read_iter() whether we are being called from the splice path and hence we cannot avoid the locking at this layer. Hence we simply have to drop the inode locking at the higher splice layer for DAX. Signed-off-by: NDave Chinner <dchinner@redhat.com> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Commit 1ca19157 ("xfs: Don't use unwritten extents for DAX") enabled the DAX allocation call to dip into the reserve pool in case it was converting unwritten extents rather than allocating blocks. This was a direct copy of the unwritten extent conversion code, but had an unintended side effect of allowing normal data block allocation to use the reserve pool. Hence normal block allocation could deplete the reserve pool and prevent unwritten extent conversion at ENOSPC, hence violating fallocate guarantees on preallocated space. Fix it by checking whether the incoming map from __xfs_get_blocks() spans an unwritten extent and only use the reserve pool if the allocation covers an unwritten extent. Signed-off-by: NDave Chinner <dchinner@redhat.com> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Markus Elfring 提交于
The return type "unsigned long" was used by the suffix_kstrtoint() function even though it will eventually return a negative error code. Improve this implementation detail by using the type "int" instead. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Create xfs_btree_sblock_verify() to verify short-format btree blocks (i.e. the per-AG btrees with 32-bit block pointers) instead of open-coding them. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Because struct xfs_agfl is 36 bytes long and has a 64-bit integer inside it, gcc will quietly round the structure size up to the nearest 64 bits -- in this case, 40 bytes. This results in the XFS_AGFL_SIZE macro returning incorrect results for v5 filesystems on 64-bit machines (118 items instead of 119). As a result, a 32-bit xfs_repair will see garbage in AGFL item 119 and complain. Therefore, tell gcc not to pad the structure so that the AGFL size calculation is correct. cc: <stable@vger.kernel.org> # 3.10 - 4.4 Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Use a convenience variable instead of open-coding the inode fork. This isn't really needed for now, but will become important when we add the copy-on-write fork later. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Update the log ticket reservation type printing code to reflect all the types of log tickets, to avoid incorrect debug output and avoid running off the end of the array. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Since xfs_repair wants to use xfs_alloc_fix_freelist, remove the static designation. xfsprogs already has this; this simply brings the kernel up to date. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Alexander Kuleshov 提交于
There are no callers of the xfs_buf_ioend_async() function outside of the fs/xfs/xfs_buf.c. So, let's make it static. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Masatake YAMATO 提交于
Linux's quota subsystem has an ability to handle project quota. This commit just utilizes the ability from xfs side. dbus-monitor and quota_nld shipped as part of quota-tools can be used for testing. See the patch posting on the XFS list for details on testing. Signed-off-by: NMasatake YAMATO <yamato@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
In my earlier commit c29aad41 xfs: pass mp to XFS_WANT_CORRUPTED_GOTO I added some local mp variables with code which indicates that mp might be NULL. Coverity doesn't like this now, because the updated per-fs XFS_STATS macros dereference mp. I don't think this is actually a problem; from what I can tell, we cannot get to these functions with a null bma->tp, so my NULL check was probably pointless. Still, it's not super obvious. So switch this code to get mp from the inode on the xfs_bmalloca structure, with no conditional, because the functions are already using bmap->ip directly. Addresses-Coverity-Id: 1339552 Addresses-Coverity-Id: 1339553 Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
This adds a name to each buf_ops structure, so that if a verifier fails we can print the type of verifier that failed it. Should be a slight debugging aid, I hope. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Jia He 提交于
If there is any non zero bit in a long bitmap, it can jump out of the loop and finish the function as soon as possible. Signed-off-by: NJia He <hejianet@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
As part of the head/tail discovery process, log recovery locates the head block and then reverse seeks to find the start of the last active record in the log. This is non-trivial as the record itself could have wrapped around the end of the physical log. Log recovery torn write detection potentially needs to walk further behind the last record in the log, as multiple log I/Os can be in-flight at one time during a crash event. Therefore, refactor the reverse log record header search mechanism into a new helper that supports the ability to seek past an arbitrary number of log records (or until the tail is hit). Update the head/tail search mechanism to call the new helper, but otherwise there is no change in log recovery behavior. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Log recovery torn write detection uses CRC verification over a range of the active log to identify torn writes. Since the generic log recovery pass code implements a superset of the functionality required for CRC verification, it can be easily modified to support a CRC verification only pass. Create a new CRC pass type and update the log record processing helper to skip everything beyond CRC verification when in this mode. This pass will be invoked in subsequent patches to implement torn write detection. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Each log recovery pass walks from the tail block to the head block and processes records appropriately based on the associated log pass type. There are various failure conditions that can occur through this sequence, such as I/O errors, CRC errors, etc. Log torn write detection will perform CRC verification near the head of the log to detect torn writes and trim torn records from the log appropriately. As it is, xlog_do_recovery_pass() only returns an error code in the event of CRC failure, which isn't enough information to trim the head of the log. Update xlog_do_recovery_pass() to optionally return the start block of the associated record when an error occurs. This patch contains no functional changes. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Log record CRC verification currently occurs during active log recovery, immediately before a log record is unpacked. Therefore, the CRC calculation code is buried within the data unpack function. CRC verification pass support only needs to go so far as check the CRC, but this is not easily allowed as the code is currently organized. Since we now have a new log record processing helper, pull the record CRC verification code out from the unpack helper and open-code it at the top of the new process helper. This facilitates the ability to modify how records are processed based on the type of the current pass. This patch contains no functional changes. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
xlog_do_recovery_pass() duplicates a couple function calls related to processing log records because the function must handle wrapping around the end of the log if the head is behind the tail. This is implemented as separate loops. CRC verification pass support will modify how records are processed in both of these loops. Rather than continue to duplicate code, factor the calls that process a log record into a new helper and call that helper from both loops. This patch contains no functional changes. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
XFS log records have separate fields for the record size and the iclog size used to write the record. mkfs.xfs zeroes the log and writes an unmount record to generate a clean log for the subsequent mount. The userspace record logging code has a bug where the iclog size (h_size) field of the log record is hardcoded to 32k, even if a log stripe unit is specified. The log record length is correctly extended to the stripe unit. Since the kernel log recovery code uses the h_size field to determine the log buffer size, this means that the kernel can attempt to read/process records larger than the buffer size and overrun the buffer. This has historically not been a problem because the kernel doesn't actually run through log recovery in the clean unmount case. Instead, the kernel detects that a single unmount record exists between the head and tail and pushes the tail forward such that the log is viewed as clean (head == tail). Once CRC verification is enabled, however, all records at the head of the log are verified for CRC errors and thus we are susceptible to overrun problems if the iclog field is not correct. While the core problem must be fixed in userspace, this is historical behavior that must be detected in the kernel to avoid severe side effects such as memory corruption and crashes. Update the log buffer size calculation code to detect this condition, warn the user and resize the log buffer based on the log stripe unit. Return a corruption error in cases where this does not look like a clean filesystem (i.e., the log record header indicates more than one operation). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 30 12月, 2015 3 次提交
-
-
由 xuejiufei 提交于
We have found a BUG on res->migration_pending when migrating lock resources. The situation is as follows. dlm_mark_lockres_migration res->migration_pending = 1; __dlm_lockres_reserve_ast dlm_lockres_release_ast returns with res->migration_pending remains because other threads reserve asts wait dlm_migration_can_proceed returns 1 >>>>>>> o2hb found that target goes down and remove target from domain_map dlm_migration_can_proceed returns 1 dlm_mark_lockres_migrating returns -ESHOTDOWN with res->migration_pending still remains. When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON with res->migration_pending. So clear migration_pending when target is down. Signed-off-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
Commit 4f656367 ("Move locks API users to locks_lock_inode_wait()") move flock/posix lock indentify code to locks_lock_inode_wait(), but missed to set fl_flags to FL_FLOCK which caused the following kernel panic on 4.4.0_rc5. kernel BUG at fs/locks.c:1895! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G O 4.4.0-rc5-next-20151217 #1 Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014 task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000 RIP: locks_lock_inode_wait+0x2e/0x160 Call Trace: ocfs2_do_flock+0x91/0x160 [ocfs2] ocfs2_flock+0x76/0xd0 [ocfs2] SyS_flock+0x10f/0x1a0 entry_SYSCALL_64_fastpath+0x12/0x71 Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 <0f> 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d RIP locks_lock_inode_wait+0x2e/0x160 RSP <ffff880028b5bce8> ---[ end trace dfca74ec9b5b274c ]--- Fixes: 4f656367 ("Move locks API users to locks_lock_inode_wait()") Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
When resizing, it firstly extends the last gd. Once it should backup super in the gd, it calculates new backup super and update the corresponding value. But it currently doesn't consider the situation that the backup super is already done. And in this case, it still sets the bit in gd bitmap and then decrease from bg_free_bits_count, which leads to a corrupted gd and trigger the BUG in ocfs2_block_group_set_bits: BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits); So check whether the backup super is done and then do the updates. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 12月, 2015 1 次提交
-
-
由 Colin Ian King 提交于
Writing to /proc/$pid/coredump_filter always returns -ESRCH because commit 774636e1 ("proc: convert to kstrto*()/kstrto*_from_user()") removed the setting of ret after the get_proc_task call and incorrectly left it as -ESRCH. Instead, return 0 when successful. Example breakage: echo 0 > /proc/self/coredump_filter bash: echo: write error: No such process Fixes: 774636e1 ("proc: convert to kstrto*()/kstrto*_from_user()") Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 12月, 2015 1 次提交
-
-
由 Jeff Layton 提交于
We do need to serialize layout stateid morphing operations, but we currently hold the ls_mutex across a layout recall which is pretty ugly. It's also unnecessary -- once we've bumped the seqid and copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL vs. anything else. Just drop the mutex once the copy is done. This was causing a "workqueue leaked lock or atomic" warning and an occasional deadlock. There's more work to be done here but this fixes the immediate regression. Fixes: cc8a5532 "nfsd: serialize layout stateid morphing operations" Cc: stable@vger.kernel.org Reported-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 16 12月, 2015 2 次提交
-
-
由 Chris Mason 提交于
prepare_pages() may end up calling prepare_uptodate_page() twice if our write only spans a single page. But if the first call returns an error, our page will be unlocked and its not safe to call it again. This bug goes all the way back to 2011, and it's not something commonly hit. While we're here, add a more explicit check for the page being truncated away. The bare lock_page() alone is protected only by good thoughts and i_mutex, which we're sure to regret eventually. Reported-by: NDave Jones <dsj@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Chris Mason 提交于
Dave Jones found a warning from kasan in setup_cluster_bitmaps() ================================================================== BUG: KASAN: stack-out-of-bounds in setup_cluster_bitmap+0xc4/0x5a0 at addr ffff88039bef6828 Read of size 8 by task nfsd/1009 page:ffffea000e6fbd80 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x8000000000000000() page dumped because: kasan: bad access detected CPU: 1 PID: 1009 Comm: nfsd Tainted: G W 4.4.0-rc3-backup-debug+ #1 ffff880065647b50 000000006bb712c2 ffff88039bef6640 ffffffffa680a43e 0000004559c00000 ffff88039bef66c8 ffffffffa62638d1 ffffffffa61121c0 ffff8803a5769de8 0000000000000296 ffff8803a5769df0 0000000000046280 Call Trace: [<ffffffffa680a43e>] dump_stack+0x4b/0x6d [<ffffffffa62638d1>] kasan_report_error+0x501/0x520 [<ffffffffa61121c0>] ? debug_show_all_locks+0x1e0/0x1e0 [<ffffffffa6263948>] kasan_report+0x58/0x60 [<ffffffffa6814b00>] ? rb_last+0x10/0x40 [<ffffffffa66f8af4>] ? setup_cluster_bitmap+0xc4/0x5a0 [<ffffffffa6262ead>] __asan_load8+0x5d/0x70 [<ffffffffa66f8af4>] setup_cluster_bitmap+0xc4/0x5a0 [<ffffffffa66f675a>] ? setup_cluster_no_bitmap+0x6a/0x400 [<ffffffffa66fcd16>] btrfs_find_space_cluster+0x4b6/0x640 [<ffffffffa66fc860>] ? btrfs_alloc_from_cluster+0x4e0/0x4e0 [<ffffffffa66fc36e>] ? btrfs_return_cluster_to_free_space+0x9e/0xb0 [<ffffffffa702dc37>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa666a1a1>] find_free_extent+0xba1/0x1520 Andrey noticed this was because we were doing list_first_entry on a list that might be empty. Rework the tests a bit so we don't do that. Signed-off-by: NChris Mason <clm@fb.com> Reprorted-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: NDave Jones <dsj@fb.com>
-
- 14 12月, 2015 1 次提交
-
-
由 Peter Zijlstra 提交于
Jan Stancek reported that I wrecked things for him by fixing things for Vladimir :/ His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which should not be possible, however my previous patch made this possible by unconditionally checking signal_pending(). We cannot use current->state as was done previously, because the instruction after the store to that variable it can be changed. We must instead pass the initial state along and use that. Fixes: 68985633 ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: NJan Stancek <jstancek@redhat.com> Reported-by: NChris Mason <clm@fb.com> Tested-by: NJan Stancek <jstancek@redhat.com> Tested-by: NVladimir Murzin <vladimir.murzin@arm.com> Tested-by: NChris Mason <clm@fb.com> Reviewed-by: NPaul Turner <pjt@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: hpa@zytor.com Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 12月, 2015 2 次提交
-
-
由 Junxiao Bi 提交于
Commit 8f1eb487 ("ocfs2: fix umask ignored issue") introduced an issue, SGID of sub dir was not inherited from its parents dir. It is because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten by "mode" which don't have SGID set later. Fixes: 8f1eb487 ("ocfs2: fix umask ignored issue") Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Commit 42cb14b1 ("mm: migrate dirty page without clear_page_dirty_for_io etc") simplified the migration of a PageDirty pagecache page: one stat needs moving from zone to zone and that's about all. It's convenient and safest for it to shift the PageDirty bit from old page to new, just before updating the zone stats: before copying data and marking the new PageUptodate. This is all done while both pages are isolated and locked, just as before; and just as before, there's a moment when the new page is visible in the radix_tree, but not yet PageUptodate. What's new is that it may now be briefly visible as PageDirty before it is PageUptodate. When I scoured the tree to see if this could cause a problem anywhere, the only places I found were in two similar functions __r4w_get_page(): which look up a page with find_get_page() (not using page lock), then claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate. I'm not sure whether that was right before, but now it might be wrong (on rare occasions): only claim the page is uptodate if PageUptodate. Or perhaps the page in question could never be migratable anyway? Signed-off-by: NHugh Dickins <hughd@google.com> Tested-by: NBoaz Harrosh <ooo@electrozaur.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2015 3 次提交
-
-
由 Holger Hoffstätte 提交于
When an inconsistent space cache is detected during loading we log a warning that users frequently mistake as instruction to invalidate the cache manually, even though this is not required. Fix the message to indicate that the cache will be rebuilt automatically. Signed-off-by: NHolger Hoffstätte <holger.hoffstaette@googlemail.com> Acked-by: NFilipe Manana <fdmanana@suse.com>
-
由 Filipe Manana 提交于
If we fail to allocate a new data chunk, we were jumping to the error path without release the transaction handle we got before. Fix this by always releasing it before doing the jump. Fixes: 2c9fe835 ("btrfs: Fix lost-data-profile caused by balance bg") Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
由 Filipe Manana 提交于
As of my previous change titled "Btrfs: fix scrub preventing unused block groups from being deleted", the following warning at extent-tree.c:btrfs_delete_unused_bgs() can be hit when we mount the a filesysten with "-o discard": 10263 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) 10264 { (...) 10405 if (trimming) { 10406 WARN_ON(!list_empty(&block_group->bg_list)); 10407 spin_lock(&trans->transaction->deleted_bgs_lock); 10408 list_move(&block_group->bg_list, 10409 &trans->transaction->deleted_bgs); 10410 spin_unlock(&trans->transaction->deleted_bgs_lock); 10411 btrfs_get_block_group(block_group); 10412 } (...) This happens because scrub can now add back the block group to the list of unused block groups (fs_info->unused_bgs). This is dangerous because we are moving the block group from the unused block groups list to the list of deleted block groups without holding the lock that protects the source list (fs_info->unused_bgs_lock). The following diagram illustrates how this happens: CPU 1 CPU 2 cleaner_kthread() btrfs_delete_unused_bgs() sees bg X in list fs_info->unused_bgs deletes bg X from list fs_info->unused_bgs scrub_enumerate_chunks() searches device tree using its commit root finds device extent for block group X gets block group X from the tree fs_info->block_group_cache_tree (via btrfs_lookup_block_group()) sets bg X to RO (again) scrub_chunk(bg X) sets bg X back to RW mode adds bg X to the list fs_info->unused_bgs again, since it's still unused and currently not in that list sets bg X to RO mode btrfs_remove_chunk(bg X) --> discard is enabled and bg X is in the fs_info->unused_bgs list again so the warning is triggered --> we move it from that list into the transaction's delete_bgs list, but we can have another task currently manipulating the first list (fs_info->unused_bgs) Fix this by using the same lock (fs_info->unused_bgs_lock) to protect both the list of unused block groups and the list of deleted block groups. This makes it safe and there's not much worry for more lock contention, as this lock is seldom used and only the cleaner kthread adds elements to the list of deleted block groups. The warning goes away too, as this was previously an impossible case (and would have been better a BUG_ON/ASSERT) but it's not impossible anymore. Reproduced with fstest btrfs/073 (using MOUNT_OPTIONS="-o discard"). Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
- 09 12月, 2015 1 次提交
-
-
由 Al Viro 提交于
Sure, it's better to bail out of past-the-eof read and return 0 than return a bogus negative value on such. Only we'd better make sure we are bailing out with 0 and not -ENOMEM... Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-