- 26 3月, 2016 7 次提交
-
-
由 Anton Protopopov 提交于
A negative value rc compared to the positive value ENOENT in the finish_read() function. Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Deepa Dinamani 提交于
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
This patch makes ceph_writepages_start() try using single OSD request to write all dirty pages within a strip unit. When a nonconsecutive dirty page is found, ceph_writepages_start() tries starting a new write operation to existing OSD request. If it succeeds, it uses the new operation to writeback the dirty page. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
ceph_osdc_start_request() never return -EOLDSNAP Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
When rbytes mount option is enabled, directory size is recursive size. Recursive size is not updated instantly. This can cause directory size to change between successive stat(1) Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Ilya Dryomov 提交于
It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 19 3月, 2016 1 次提交
-
-
由 Rabin Vincent 提交于
Running the following command: busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null with any tracing enabled pretty very quickly leads to various NULL pointer dereferences and VM BUG_ON()s, such as these: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) kernel BUG at include/linux/mm.h:367! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89 (busybox's cat uses sendfile(2), unlike the coreutils version) This is because tracing_splice_read_pipe() can call splice_to_pipe() with spd->nr_pages == 0. spd_pages underflows in splice_to_pipe() and we fill the page pointers and the other fields of the pipe_buffers with garbage. All other callers of splice_to_pipe() avoid calling it when nr_pages == 0, and we could make tracing_splice_read_pipe() do that too, but it seems reasonable to have splice_to_page() handle this condition gracefully. Cc: stable@vger.kernel.org Signed-off-by: NRabin Vincent <rabin@rab.in> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 3月, 2016 11 次提交
-
-
由 Kees Cook 提交于
Some callers of strtobool() were passing a pointer to unterminated strings. In preparation of adding multi-character processing to kstrtobool(), update the callers to not pass single-character pointers, and switch to using the new kstrtobool_from_user() helper where possible. Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Amitkumar Karwar <akarwar@marvell.com> Cc: Nishant Sarmukadam <nishants@marvell.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Steve French <sfrench@samba.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
Even though this is a 'can't happen' situation, use the new radix_tree_iter_retry() pattern to eliminate a goto. [akpm@linux-foundation.org: fix btrfs build] Signed-off-by: NMatthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Young 提交于
On i686 PAE enabled machine the contiguous physical area could be large and it can cause trimming down variables in below calculation in read_vmcore() and mmap_vmcore(): tsz = min_t(size_t, m->offset + m->size - *fpos, buflen); That is, the types being used is like below on i686: m->offset: unsigned long long int m->size: unsigned long long int *fpos: loff_t (long long int) buflen: size_t (unsigned int) So casting (m->offset + m->size - *fpos) by size_t means truncating a given value by 4GB. Suppose (m->offset + m->size - *fpos) being truncated to 0, buflen >0 then we will get tsz = 0. It is of course not an expected result. Similarly we could also get other truncated values less than buflen. Then the real size passed down is not correct any more. If (m->offset + m->size - *fpos) is above 4GB, read_vmcore or mmap_vmcore use the min_t result with truncated values being compared to buflen. Then, fpos proceeds with the wrong value so that we reach below bugs: 1) read_vmcore will refuse to continue so makedumpfile fails. 2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range(). Use unsigned long long in min_t instead so that the variables in are not truncated. Signed-off-by: NBaoquan He <bhe@redhat.com> Signed-off-by: NDave Young <dyoung@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jianyu Zhan <nasa4836@gmail.com> Cc: Minfei Huang <mhuang@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minfei Huang 提交于
It is not elegant that prompt shell does not start from new line after executing "cat /proc/$pid/wchan". Make prompt shell start from new line. Signed-off-by: NMinfei Huang <mnfhuang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Engestrom 提交于
`proc_timers_operations` is only used when CONFIG_CHECKPOINT_RESTORE is enabled. Signed-off-by: NEric Engestrom <eric.engestrom@imgtec.com> Acked-by: NCyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Stultz 提交于
This patch provides a proc/PID/timerslack_ns interface which exposes a task's timerslack value in nanoseconds and allows it to be changed. This allows power/performance management software to set timer slack for other threads according to its policy for the thread (such as when the thread is designated foreground vs. background activity) If the value written is non-zero, slack is set to that value. Otherwise sets it to the default for the thread. This interface checks that the calling task has permissions to to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure arbitrary apps do not change the timer slack for other apps. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Acked-by: NKees Cook <keescook@chromium.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Stultz 提交于
This patchset introduces a /proc/<pid>/timerslack_ns interface which would allow controlling processes to be able to set the timerslack value on other processes in order to save power by avoiding wakeups (Something Android currently does via out-of-tree patches). The first patch tries to fix the internal timer_slack_ns usage which was defined as a long, which limits the slack range to ~4 seconds on 32bit systems. It converts it to a u64, which provides the same basically unlimited slack (500 years) on both 32bit and 64bit machines. The second patch introduces the /proc/<pid>/timerslack_ns interface which allows the full 64bit slack range for a task to be read or set on both 32bit and 64bit machines. With these two patches, on a 32bit machine, after setting the slack on bash to 10 seconds: $ time sleep 1 real 0m10.747s user 0m0.001s sys 0m0.005s The first patch is a little ugly, since I had to chase the slack delta arguments through a number of functions converting them to u64s. Let me know if it makes sense to break that up more or not. Other than that things are fairly straightforward. This patch (of 2): The timer_slack_ns value in the task struct is currently a unsigned long. This means that on 32bit applications, the maximum slack is just over 4 seconds. However, on 64bit machines, its much much larger (~500 years). This disparity could make application development a little (as well as the default_slack) to a u64. This means both 32bit and 64bit systems have the same effective internal slack range. Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify the interface as a unsigned long, so we preserve that limitation on 32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is actually larger then what can be stored by an unsigned long. This patch also modifies hrtimer functions which specified the slack delta as a unsigned long. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Igor Redko 提交于
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory statistics protocol, corresponding to 'Available' in /proc/meminfo. It indicates to the hypervisor how big the balloon can be inflated without pushing the guest system to swap. This metric would be very useful in VM orchestration software to improve memory management of different VMs under overcommit. This patch (of 2): Factor out calculation of the available memory counter into a separate exportable function, in order to be able to use it in other parts of the kernel. In particular, it appears a relevant metric to report to the hypervisor via virtio-balloon statistics interface (in a followup patch). Signed-off-by: NIgor Redko <redkoi@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Currently /proc/kpageflags returns just KPF_COMPOUND_TAIL for slab tail pages, which is inconvenient when grasping how slab pages are distributed (userspace always needs to check which kind of tail pages by itself). This patch sets KPF_SLAB for such pages. With this patch: $ grep Slab /proc/meminfo ; tools/vm/page-types -b slab Slab: 64880 kB flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000080 16220 63 _______S__________________________________ slab total 16220 63 16220 pages equals to 64880 kB, so returned result is consistent with the global counter. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Currently /proc/kpageflags returns nothing for "tail" buddy pages, which is inconvenient when grasping how free pages are distributed. This patch sets KPF_BUDDY for such pages. With this patch: $ grep MemFree /proc/meminfo ; tools/vm/page-types -b buddy MemFree: 3134992 kB flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000400 779272 3044 __________B_______________________________ buddy 0x0000000000000c00 4385 17 __________BM______________________________ buddy,mmap total 783657 3061 783657 pages is 3134628 kB (roughly consistent with the global counter,) so it's OK. [akpm@linux-foundation.org: update comment, per Naoya] Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2016 1 次提交
-
-
由 Dmitry V. Levin 提交于
Explicitly check show_devname method return code and bail out in case of an error. This fixes regression introduced by commit 9d4d6574. Cc: stable@vger.kernel.org Signed-off-by: NDmitry V. Levin <ldv@altlinux.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 3月, 2016 20 次提交
-
-
由 Ian Kent 提交于
Use the standard pr_xxx() log macros directly for log prints instead of the AUTOFS_XXX() macros. Signed-off-by: NIan Kent <ikent@redhat.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
Common kernel coding practice is to include the newline of log prints within the log text rather than hidden away in a macro. To avoid introducing inconsistencies as changes are made change the log macros to not include the newline. Signed-off-by: NIan Kent <raven@themaw.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
Use the pr_*() print in AUTOFS_*() macros instead of printks and include the module name in log message macros. Also use the AUTOFS_*() macros everywhere instead of raw printks. Signed-off-by: NIan Kent <raven@themaw.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
Fix some white space format errors. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
The return from an ioctl if an invalid ioctl is passed in should be EINVAL not ENOSYS. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
The need for this is questionable but checkpatch.pl complains about the line length and it's a straightfoward change. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
Refactor autofs4_get_set_timeout() to eliminate coding style error. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
Try and make the coding style completely consistent throughtout the autofs module and inline with kernel coding style recommendations. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stanislav Kinsburskiy 提交于
This is required for CRIU (Checkpoint Restart In Userspace) to migrate a mount point when write end in user space is closed. Below is a brief description of the problem. To migrate a non-catatonic autofs mount point, one has to restore the control pipe between kernel and autofs master process. One of the autofs masters is systemd, which closes pipe write end after passing it to the kernel with mount call. To be able to restore the systemd control pipe one has to know which read pipe end in systemd corresponds to the write pipe end in the kernel. The pipe "fd" in mount options is not enough because it was closed and probably replaced by some other descriptor. Thus, some other attribute is required to be able to find the read pipe end. The best attribute to use to find the correct pipe end is inode number becuase it's unique for the whole system and can't be reused while the autofs mount exists. This attribute can also be used to recognize a situation where an autofs mount has no master (no process with specified "pgrp" or no file descriptor with "pipe_ino", specified in autofs mount options). Signed-off-by: NStanislav Kinsburskiy <skinsbursky@virtuozzo.com> Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Now that migration doesn't clear page->mem_cgroup of live pages anymore, it's safe to make lock_page_memcg() and the memcg stat functions take pages, and spare the callers from memcg objects. [akpm@linux-foundation.org: fix warnings] Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Suggested-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
These patches tag the page cache radix tree eviction entries with the memcg an evicted page belonged to, thus making per-cgroup LRU reclaim work properly and be as adaptive to new cache workingsets as global reclaim already is. This should have been part of the original thrash detection patch series, but was deferred due to the complexity of those patches. This patch (of 5): So far the only sites that needed to exclude charge migration to stabilize page->mem_cgroup have been per-cgroup page statistics, hence the name mem_cgroup_begin_page_stat(). But per-cgroup thrash detection will add another site that needs to ensure page->mem_cgroup lifetime. Rename these locking functions to the more generic lock_page_memcg() and unlock_page_memcg(). Since charge migration is a cgroup1 feature only, we might be able to delete it at some point, and these now easy to identify locking sites along with it. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Suggested-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jun Piao 提交于
In dlm_send_join_cancels(), node is defined with type unsigned int, but initialized with -1, this will lead variable overflow. Although this won't cause any runtime problem, the code looks a little uncoordinated. Signed-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jiufei Xue 提交于
when o2hb detect a node down, it first set the dead node to recovery map and create ocfs2rec which will replay journal for dead node. o2hb thread then call dlm_do_local_recovery_cleanup() to delete the lock for dead node. After the lock of dead node is gone, locks for other nodes can be granted and may modify the meta data without replaying journal of the dead node. The detail is described as follows. N1 N2 N3(master) modify the extent tree of inode, and commit dirty metadata to journal, then goes down. o2hb thread detects N1 goes down, set recovery map and delete the lock of N1. dlm_thread flush ast for the lock of N2. do not detect the death of N1, so recovery map is empty. read inode from disk without replaying the journal of N1 and modify the extent tree of the inode that N1 had modified. ocfs2rec recover the journal of N1. The modification of N2 is lost. The modification of N1 and N2 are not serial, and it will lead to read-only file system. We can set recovery_waiting flag to the lock resource after delete the lock for dead node to prevent other node from getting the lock before dlm recovery. After dlm recovery, the recovery map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2 recovery. Signed-off-by: NJiufei Xue <xuejiufei@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 xuejiufei 提交于
If master migrate this lock resource to node when it happened to purge it, a new lock resource will be created and inserted into hash list. If then master goes down, the lock resource being purged is recovered, so there exist two lock resource with different owner. So return error to master if the lock resource is in DROPPING state, master will retry to migrate this lock resource. Signed-off-by: Nxuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 xuejiufei 提交于
If the master goes down after return in-progress for deref message. The lock resource on non-master node can not be purged. Clear the DROPPING_REF flag and recovery it. Signed-off-by: Nxuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 xuejiufei 提交于
Master returns in-progress to non-master node when it can not clear the refmap bit right now. And non-master node will not purge the lock resource until receiving deref done message. Signed-off-by: Nxuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 xuejiufei 提交于
This series of patches is to fix the dis-order issue of setting/clearing refmap bit described below. Node 1 Node 2(master) dlmlock dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit dlmlock succeed dlmunlock succeed dlm_purge_lockres dlm_deref_handler -> find lock resource is in DLM_LOCK_RES_SETREF_INPROG state, so dispatch a deref work dlm_purge_lockres succeed. call dlmlock again dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit deref work trigger, call dlm_lockres_clear_refmap_bit to clear Node 1 from refmap dlm_purge_lockres succeed dlm_send_remote_lock_request return DLM_IVLOCKID because the lockres is not exist BUG if the lockres is $RECOVERY This series of patches add a new message to keep the order of set and clear. Other nodes can purge the lock resource only after the refmap bit on master is cleared. This patch is to add DEREF_DONE message and corresponding handler. Node can purge the lock resource after receiving this message. As a new message is added, so increase the minor number of dlm protocol version. Signed-off-by: Nxuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
Refer to cluster/tcp.h, NET_MAX_PAYLOAD_BYTES is a typo for O2NET_MAX_PAYLOAD_BYTES. Since currently DLM_MIG_LOCKRES_RESERVED is not actually used, it won't cause any problem. But we'd better correct it for further use. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 jiangyiwen 提交于
Commit a75e9cca ("ocfs2: use spinlock irqsave for downconvert lock") missed an unmodified place in ocfs2_osb_dump(), so it still exists a deadlock scenario. ocfs2_wake_downconvert_thread ocfs2_rw_unlock ocfs2_dio_end_io dio_complete ..... bio_endio req_bio_endio .... scsi_io_completion blk_done_softirq __do_softirq do_softirq irq_exit do_IRQ ocfs2_osb_dump cat /sys/kernel/debug/ocfs2/${uuid}/fs_state This patch still uses spin_lock_irqsave() - replace spin_lock() to solve this situation. Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-