- 04 4月, 2014 17 次提交
-
-
由 Jan Kara 提交于
The flag was never set, delete it. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Goldwyn Rodrigues 提交于
This is a part of the nocontrold feature which was incorporated sometime back. This is required for backward compatibility of the tools, specifically the scenario where the tools with recovery callback is used with a kernel not using the recovery callbacks (older kernel + newer tools). The tools look for this file to understand if the kernel supports DLM recovery callbacks. For kernels which support recovery callbacks but will miss this patch, ocfs2 will continue to use the older API and would still be able to mount the filesystem. [akpm@linux-foundation.org: simplify] [sfr@canb.auug.org.au: VERIFY_OCTAL_PERMISSIONS fix up] Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
There is a race window in dlm_do_recovery() between dlm_remaster_locks() and dlm_reset_recovery() when the recovery master nearly finish the recovery process for a dead node. After the master sends FINALIZE_RECO message in dlm_remaster_locks(), another node may become the recovery master for another dead node, and then send the BEGIN_RECO message to all the nodes included the old master, in the handler of this message dlm_begin_reco_handler() of old master, dlm->reco.dead_node and dlm->reco.new_master will be set to the second dead node and the new master, then in dlm_reset_recovery(), these two variables will be reset to default value. This will cause new recovery master can not finish the recovery process and hung, at last the whole cluster will hung for recovery. old recovery master: new recovery master: dlm_remaster_locks() become recovery master for another dead node. dlm_send_begin_reco_message() dlm_begin_reco_handler() { if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) { return -EAGAIN; } dlm_set_reco_master(dlm, br->node_idx); dlm_set_reco_dead_node(dlm, br->dead_node); } dlm_reset_recovery() { dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM); dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM); } will hang in dlm_remaster_locks() for request dlm locks info Before send FINALIZE_RECO message, recovery master should set DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done, this can break the race windows as the BEGIN_RECO messages will not be handled before DLM_RECO_STATE_FINALIZE flag is cleared. A similar race may happen between new recovery master and normal node which is in dlm_finalize_reco_handler(), also fix it. Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
This issue was introduced by commit 800deef3 ("ocfs2: use list_for_each_entry where benefical") in 2007 where it replaced list_for_each with list_for_each_entry. The variable "lock" will point to invalid data if "tmpq" list is empty and a panic will be triggered due to this. Sunil advised reverting it back, but the old version was also not right. At the end of the outer for loop, that list_for_each_entry will also set "lock" to an invalid data, then in the next loop, if the "tmpq" list is empty, "lock" will be an stale invalid data and cause the panic. So reverting the list_for_each back and reset "lock" to NULL to fix this issue. Another concern is that this seemes can not happen because the "tmpq" list should not be empty. Let me describe how. old lock resource owner(node 1): migratation target(node 2): image there's lockres with a EX lock from node 2 in granted list, a NR lock from node x with convert_type EX in converting list. dlm_empty_lockres() { dlm_pick_migration_target() { pick node 2 as target as its lock is the first one in granted list. } dlm_migrate_lockres() { dlm_mark_lockres_migrating() { res->state |= DLM_LOCK_RES_BLOCK_DIRTY; wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res)); //after the above code, we can not dirty lockres any more, // so dlm_thread shuffle list will not run downconvert lock from EX to NR upconvert lock from NR to EX <<< migration may schedule out here, then <<< node 2 send down convert request to convert type from EX to <<< NR, then send up convert request to convert type from NR to <<< EX, at this time, lockres granted list is empty, and two locks <<< in the converting list, node x up convert lock followed by <<< node 2 up convert lock. // will set lockres RES_MIGRATING flag, the following // lock/unlock can not run dlm_lockres_release_ast(dlm, res); } dlm_send_one_lockres() dlm_process_recovery_data() for (i=0; i<mres->num_locks; i++) if (ml->node == dlm->node_num) for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) { list_for_each_entry(lock, tmpq, list) if (lock) break; <<< lock is invalid as grant list is empty. } if (lock->ml.node != ml->node) BUG() >>> crash here } I see the above locks status from a vmcore of our internal bug. Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Darrick J. Wong 提交于
Currently, ocfs2_sync_file grabs i_mutex and forces the current journal transaction to complete. This isn't terribly efficient, since sync_file really only needs to wait for the last transaction involving that inode to complete, and this doesn't require i_mutex. Therefore, implement the necessary bits to track the newest tid associated with an inode, and teach sync_file to wait for that instead of waiting for everything in the journal to commit. Furthermore, only issue the flush request to the drive if jbd2 hasn't already done so. This also eliminates the deadlock between ocfs2_file_aio_write() and ocfs2_sync_file(). aio_write takes i_mutex then calls ocfs2_aiodio_wait() to wait for unaligned dio writes to finish. However, if that dio completion involves calling fsync, then we can get into trouble when some ocfs2_sync_file tries to take i_mutex. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NMark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 joyce.xue 提交于
Variable uuid_net_key in ocfs2_initialize_super() is not used. Clean it up. Signed-off-by: Njoyce.xue <xuejiufei@huawei.com> Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Acked-by: NMark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wengang Wang 提交于
There is a problem that waitqueue_active() may check stale data thus miss a wakeup of threads waiting on ip_unaligned_aio. The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to be of type mutex thus the above prolem is avoid. Another benifit is that mutex which works as FIFO is fairer than wake_up_all(). Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zongxun Wang 提交于
When mounting an ocfs2 volume, it will firstly generate a file /sys/kernel/debug/o2dlm/<uuid>/dlm_state, and then launch the dlm thread. So the following situation will cause a null pointer dereference. dlm_debug_init -> access file dlm_state which will call dlm_state_print -> dlm_launch_thread Move dlm_debug_init after dlm_launch_thread and dlm_launch_recovery_thread can fix this issue. Signed-off-by: NZongxun Wang <wangzongxun@huawei.com> Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
Move code moving event structure to access_list from copy_event_to_user() to fanotify_read() where it is more logical (so that we can immediately see in the main loop that we either move the event to a different list or free it). Also move special error handling for permission events from copy_event_to_user() to the main loop to have it in one place with error handling for normal events. This makes copy_event_to_user() really only copy the event to user without any side effects. Signed-off-by: NJan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
Swap the error / "read ok" branches in the main loop of fanotify_read(). We will grow the "read ok" part in the next patch and this makes the indentation easier. Also it is more common to have error conditions inside an 'if' instead of the fast path. Signed-off-by: NJan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
access_mutex is used only to guard operations on access_list. There's no need for sleeping within this lock so just make a spinlock out of it. Signed-off-by: NJan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
Currently, fanotify creates new structure to track the fact that permission event has been reported to userspace and someone is waiting for a response to it. As event structures are now completely in the hands of each notification framework, we can use the event structure for this tracking instead of allocating a new structure. Since this makes the event structures for normal events and permission events even more different and the structures have different lifetime rules, we split them into two separate structures (where permission event structure contains the structure for a normal event). This makes normal events 8 bytes smaller and the code a tad bit cleaner. [akpm@linux-foundation.org: fix build] Signed-off-by: NJan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
The prepare_for_access_response() function checks whether group->fanotify_data.bypass_perm is set. However this test can never be true because prepare_for_access_response() is called only from fanotify_read() which means fanotify group is alive with an active fd while bypass_perm is set from fanotify_release() when all file descriptors pointing to the group are closed and the group is going away. Signed-off-by: NJan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fabian Frederick 提交于
nameidata was replaced by flags in commit 00cd8dd3 ("stop passing nameidata to ->lookup()"). Signed-off-by: NFabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fabian Frederick 提交于
cifs_init_inodecache is only called by __init init_cifs. Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
After commit 839a8e86 ("writeback: replace custom worker pool implementation with unbound workqueue") when device is removed while we are writing to it we crash in bdi_writeback_workfn() -> set_worker_desc() because bdi->dev is NULL. This can happen because even though bdi_unregister() cancels all pending flushing work, nothing really prevents new ones from being queued from balance_dirty_pages() or other places. Fix the problem by clearing BDI_registered bit in bdi_unregister() and checking it before scheduling of any flushing work. Fixes: 839a8e86Reviewed-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJan Kara <jack@suse.cz> Cc: Derek Basehore <dbasehore@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Derek Basehore 提交于
bdi_wakeup_thread_delayed() used the mod_delayed_work() function to schedule work to writeback dirty inodes. The problem with this is that it can delay work that is scheduled for immediate execution, such as the work from sync_inodes_sb(). This can happen since mod_delayed_work() can now steal work from a work_queue. This fixes the problem by using queue_delayed_work() instead. This is a regression caused by commit 839a8e86 ("writeback: replace custom worker pool implementation with unbound workqueue"). The reason that this causes a problem is that laptop-mode will change the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default. In the case that bdi_wakeup_thread_delayed() races with sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung task. Even if dirty_writeback_centisecs is not long enough to cause a hung task, we still don't want to delay sync for that long. We fix the problem by using queue_delayed_work() when we want to schedule writeback sometime in future. This function doesn't change the timer if it is already armed. For the same reason, we also change bdi_writeback_workfn() to immediately queue the work again in the case that the work_list is not empty. The same problem can happen if the sync work is run on the rescue worker. [jack@suse.cz: update changelog, add comment, use bdi_wakeup_thread_delayed()] Signed-off-by: NDerek Basehore <dbasehore@chromium.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zento.linux.org.uk> Reviewed-by: NTejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Derek Basehore <dbasehore@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: Benson Leung <bleung@chromium.org> Cc: Sonny Rao <sonnyrao@chromium.org> Cc: Luigi Semenzato <semenzato@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 3月, 2014 5 次提交
-
-
由 Theodore Ts'o 提交于
Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: NJohn Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
If the dest_mnt is not shared, propagate_mnt() does nothing - there's no mounts to propagate to and thus no copies to create. Might as well don't bother calling it in that case. Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
preparation to switching mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* switch allocation to alloc_large_system_hash() * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=) * switch mountpoint_hashtable from list_head to hlist_head Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 3月, 2014 1 次提交
-
-
由 Sasha Levin 提交于
Commit c74a3bdd ("ocfs2: add clustername to cluster connection") is trying to strlcpy a string which was explicitly passed as NULL in the very same patch, triggering a NULL ptr deref. BUG: unable to handle kernel NULL pointer dereference at (null) IP: strlcpy (lib/string.c:388 lib/string.c:151) CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W 3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274 RIP: strlcpy (lib/string.c:388 lib/string.c:151) Call Trace: ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350) ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396) user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679) dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503) vfs_mkdir (fs/namei.c:3467) SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472) tracesys (arch/x86/kernel/entry_64.S:749) akpm: this patch probably disables the feature. A temporary thing to avoid triviel oopses. Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 3月, 2014 1 次提交
-
-
由 Jan Kara 提交于
Currently we allocated anon_inode_inode in anon_inodefs_mount. This is somewhat fragile as if that function ever gets called again, it will overwrite anon_inode_inode pointer. So move the initialization of anon_inode_inode to anon_inode_init(). Signed-off-by: NJan Kara <jack@suse.cz> [ Further simplified on suggestion from Dave Jones ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 3月, 2014 3 次提交
-
-
由 Greg Kroah-Hartman 提交于
This reverts commit d1ba277e. As reported by Stephen, this patch breaks linux-next as a ppc patch suddenly (after 2 years) started using this old api call. So revert it for now, it will go away in 3.15-rc2 when we can change the PPC call to the new api. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Tejun Heo <tj@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Linus Torvalds 提交于
The previous commit removed the register_filesystem() call and the associated error handling, but left the label for the error path that no longer exists. Remove that too. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
anon_inodefs filesystem is a kernel internal filesystem userspace shouldn't mess with. Remove registration of it so userspace cannot even try to mount it (which would fail anyway because the filesystem is MS_NOUSER). This fixes an oops triggered by trinity when it tried mounting anon_inodefs which overwrote anon_inode_inode pointer while other CPU has been in anon_inode_getfile() between ihold() and d_instantiate(). Thus effectively creating dentry pointing to an inode without holding a reference to it. Reported-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 3月, 2014 1 次提交
-
-
由 Cody P Schafer 提交于
bin_attributes created/updated in create_files() (such as those listed via (struct device).attribute_groups) were not placed under the specified group, and instead appeared in the base kobj directory. Fix this by making bin_attributes use creating code similar to normal attributes. A quick grep shows that no one is using bin_attrs in a named attribute group yet, so we can do this without breaking anything in usespace. Note that I do not add is_visible() support to bin_attributes, though that could be done as well. This is a copy of the patch already merged in Greg's tree. Signed-off-by: NCody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 23 3月, 2014 4 次提交
-
-
由 Al Viro 提交于
We can get false negative from __lookup_mnt() if an unrelated vfsmount gets moved. In that case legitimize_mnt() is guaranteed to fail, and we will fall back to non-RCU walk... unless we end up running into a hard error on a filesystem object we wouldn't have reached if not for that false negative. IOW, delaying that check until the end of pathname resolution is wrong - we should recheck right after we attempt to cross the mountpoint. We don't need to recheck unless we see d_mountpoint() being true - in that case even if we have just raced with mount/umount, we can simply go on as if we'd come at the moment when the sucker wasn't a mountpoint; if we run into a hard error as the result, it was a legitimate outcome. __lookup_mnt() returning NULL is different in that respect, since it might've happened due to operation on completely unrelated mountpoint. Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
In all callchains leading to prepend_name(), the value left in *buflen is eventually discarded unused if prepend_name() has returned a negative. So we are free to do what prepend() does, and subtract from *buflen *before* checking for underflow (which turns into checking the sign of subtraction result, of course). Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Biggers 提交于
Commit bd2a31d5 ("get rid of fget_light()") introduced the __fdget_pos() function, which returns the resulting file pointer and fdput flags combined in an 'unsigned long'. However, it also changed the behavior to return files with FMODE_PATH set, which shouldn't happen because read(), write(), lseek(), etc. aren't allowed on such files. This commit restores the old behavior. This regression actually had no effect on read() and write() since FMODE_READ and FMODE_WRITE are not set on file descriptors opened with O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH to fail with ESPIPE rather than EBADF. Signed-off-by: NEric Biggers <ebiggers3@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Biggers 提交于
Commit 9c225f26 ("vfs: atomic f_pos accesses as per POSIX") changed several system calls to use fdget_pos() instead of fdget(), but missed sys_llseek(). Fix it. Signed-off-by: NEric Biggers <ebiggers3@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 3月, 2014 1 次提交
-
-
由 Frederic Weisbecker 提交于
The architectures that override cputime_t (s390, ppc) don't provide any version of nsecs_to_cputime(). Indeed this cputime_t implementation by backend only happens when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y under which the core code doesn't make any use of nsecs_to_cputime(). At least for now. We are going to make a broader use of it so lets provide a default version with a per usecs granularity. It should be good enough for most usecases. Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 12 3月, 2014 1 次提交
-
-
由 Grant Likely 提交于
The same data is now available in sysfs, so we can remove the code that exports it in /proc and replace it with a symlink to the sysfs version. Tested on versatile qemu model and mpc5200 eval board. More testing would be appreciated. v5: Fixed up conflicts with mainline changes Signed-off-by: NGrant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
-
- 11 3月, 2014 2 次提交
-
-
由 Sergei Antonov 提交于
Adds support for HFSX 'HasFolderCount' flag and a corresponding 'folderCount' field in folder records. (For reference see HFS_FOLDERCOUNT and kHFSHasFolderCountBit/kHFSHasFolderCountMask in Apple's source code.) Ignoring subfolder count leads to fs errors found by Mac: ... Checking catalog hierarchy. HasFolderCount flag needs to be set (id = 105) (It should be 0x10 instead of 0) Incorrect folder count in a directory (id = 2) (It should be 7 instead of 6) ... Steps to reproduce: Format with "newfs_hfs -s /dev/diskXXX". Mount in Linux. Create a new directory in root. Unmount. Run "fsck_hfs /dev/diskXXX". The patch handles directory creation, deletion, and rename. Signed-off-by: NSergei Antonov <saproj@gmail.com> Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Artem Fetishev 提交于
The expected logic of proc_map_files_get_link() is either to return 0 and initialize 'path' or return an error and leave 'path' uninitialized. By the time dname_to_vma_addr() returns 0 the corresponding vma may have already be gone. In this case the path is not initialized but the return value is still 0. This results in 'general protection fault' inside d_path(). Steps to reproduce: CONFIG_CHECKPOINT_RESTORE=y fd = open(...); while (1) { mmap(fd, ...); munmap(fd, ...); } ls -la /proc/$PID/map_files Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991Signed-off-by: NArtem Fetishev <artem_fetishev@epam.com> Signed-off-by: NAleksandr Terekhov <aleksandr_terekhov@epam.com> Reported-by: <wiebittewas@gmail.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NCyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 3月, 2014 3 次提交
-
-
由 Al Viro 提交于
instead of returning the flags by reference, we can just have the low-level primitive return those in lower bits of unsigned long, with struct file * derived from the rest. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Linus Torvalds 提交于
Our write() system call has always been atomic in the sense that you get the expected thread-safe contiguous write, but we haven't actually guaranteed that concurrent writes are serialized wrt f_pos accesses, so threads (or processes) that share a file descriptor and use "write()" concurrently would quite likely overwrite each others data. This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says: "2.9.7 Thread Interactions with Regular File Operations All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links: [...]" and one of the effects is the file position update. This unprotected file position behavior is not new behavior, and nobody has ever cared. Until now. Yongzhi Pan reported unexpected behavior to Michael Kerrisk that was due to this. This resolves the issue with a f_pos-specific lock that is taken by read/write/lseek on file descriptors that may be shared across threads or processes. Reported-by: NYongzhi Pan <panyongzhi@gmail.com> Reported-by: NMichael Kerrisk <mtk.manpages@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 3月, 2014 1 次提交
-
-
由 Tejun Heo 提交于
While implementing atomic_write_len, 4d3773c4 ("kernfs: implement kernfs_ops->atomic_write_len") moved data copy from userland inside kernfs_get_active() and kernfs_open_file->mutex so that kernfs_ops->atomic_write_len can be accessed before copying buffer from userland; unfortunately, this could lead to locking order inversion involving mmap_sem if copy_from_user() takes a page fault. ====================================================== [ INFO: possible circular locking dependency detected ] 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G W ------------------------------------------------------- trinity-c236/10658 is trying to acquire lock: (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++++}: [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0 [<mm/memory.c:4188>] might_fault+0x7e/0xb0 [<arch/x86/include/asm/uaccess.h:713 fs/kernfs/file.c:291>] kernfs_fop_write+0xd8/0x190 [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0 [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2 -> #0 (&of->mutex#2){+.+.+.}: [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0 [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510 [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120 [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0 [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430 [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0 [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210 [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&of->mutex#2); lock(&mm->mmap_sem); lock(&of->mutex#2); *** DEADLOCK *** 1 lock held by trinity-c236/10658: #0: (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0 stack backtrace: CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: G W 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 0000000000000000 ffff88011911fa48 ffffffff8438e945 0000000000000000 0000000000000000 ffff88011911fa98 ffffffff811a0109 ffff88011911fab8 ffff88011911fab8 ffff88011911fa98 ffff880119128cc0 ffff880119128cf8 Call Trace: [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160 [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560 [<include/linux/spinlock.h:343 mm/slub.c:1933>] ? deactivate_slab+0x511/0x550 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0 [<mm/mmap.c:1552>] ? mmap_region+0x24a/0x5c0 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0 [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120 [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510 [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120 [<kernel/sched/core.c:2477>] ? get_parent_ip+0x11/0x50 [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120 [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120 [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0 [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430 [<mm/util.c:397>] ? vm_mmap_pgoff+0x6e/0xe0 [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0 [<kernel/rcu/update.c:97>] ? __rcu_read_unlock+0x44/0xb0 [<fs/file.c:641>] ? dup_fd+0x3c0/0x3c0 [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210 [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2 Fix it by caching atomic_write_len in kernfs_open_file during open so that it can be determined without accessing kernfs_ops in kernfs_fop_write(). This restores the structure of kernfs_fop_write() before 4d3773c4 with updated @len determination logic. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NSasha Levin <sasha.levin@oracle.com> References: http://lkml.kernel.org/g/53113485.2090407@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-