- 04 7月, 2013 40 次提交
-
-
由 Vyacheslav Dubeyko 提交于
Currently, NILFS2 returns 0 as free inodes count (f_ffree) and current used inodes count as total file nodes in file system (f_files): df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/loop0 2 2 0 100% /mnt/nilfs2 This patch implements real calculation of free inodes count. First of all, it is calculated total file nodes in file system as (desc_blocks_count * groups_per_desc_block * entries_per_group). Then, it is calculated free inodes count as difference the total file nodes and used inodes count. As a result, we have such output for NILFS2: df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/loop0 4194304 2114701 2079603 51% /mnt/nilfs2 Reported-by: NClemens Eisserer <linuxhippy@gmail.com> Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Joern Engel <joern@logfs.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Calling kthread_run with a single name parameter causes it to be handled as a format string. Many callers are passing potentially dynamic string content, so use "%s" in those cases to avoid any potential accidents. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Fix various weird constructions of strncpy(dst, src, strlen(src)). Length limits should be about the space available in the destination, not repurposed as a method to either always include or always exclude a trailing NULL byte. Either the NULL should always be copied (using strlcpy), or it should not be copied (using something like memcpy). Readable code should not depend on the weird behavior of strncpy when it hits the length limit. Better to avoid the anti-pattern entirely. [akpm@linux-foundation.org: revert getdelays.c part due to missing bsd/string.h] Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [staging] Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [acpi] Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ursula Braun <ursula.braun@de.ibm.com> Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jiang Liu 提交于
The global variable num_physpages is scheduled to be removed, so use totalram_pages instead of num_physpages at runtime. Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Now that the LRU to add a page to is decided at LRU-add time, remove the misleading lru parameter from __pagevec_lru_add. A consequence of this is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers are misleading as the caller no longer has direct control over what LRU the page is added to. Unused helpers are removed by this patch and existing users of pagevec_lru_add_file() are converted to use lru_cache_add_file() directly and use the per-cpu pagevecs instead of creating their own pagevec. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com> Cc: Andrew Perepechko <anserper@ya.ru> Cc: Robin Dong <sanbai@taobao.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Bernd Schubert <bernd.schubert@fastmail.fm> Cc: David Howells <dhowells@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
This patch introduces mmap_vmcore(). Don't permit writable nor executable mapping even with mprotect() because this mmap() is aimed at reading crash dump memory. Non-writable mapping is also requirement of remap_pfn_range() when mapping linear pages on non-consecutive physical pages; see is_cow_mapping(). Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by remap_vmalloc_range_pertial at the same time for a single vma. do_munmap() can correctly clean partially remapped vma with two functions in abnormal case. See zap_pte_range(), vm_normal_page() and their comments for details. On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This limitation comes from the fact that the third argument of remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long. [akpm@linux-foundation.org: use min(), switch to conventional error-unwinding approach] Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Tested-by: NMaxim Uvarov <muvarov@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
The previous patches newly added holes before each chunk of memory and the holes need to be count in vmcore file size. There are two ways to count file size in such a way: 1) suppose m is a poitner to the last vmcore object in vmcore_list. Then file size is (m->offset + m->size), or 2) calculate sum of size of buffers for ELF header, program headers, ELF note segments and objects in vmcore_list. Although 1) is more direct and simpler than 2), 2) seems better in that it reflects internal object structure of /proc/vmcore. Thus, this patch changes get_vmcore_size_elf{64, 32} so that it calculates size in the way of 2). As a result, both get_vmcore_size_elf{64, 32} have the same definition. Merge them as get_vmcore_size. Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
Now ELF note segment has been copied in the buffer on vmalloc memory. To allow user process to remap the ELF note segment buffer with remap_vmalloc_page, the corresponding VM area object has to have VM_USERMAP flag set. [akpm@linux-foundation.org: use the conventional comment layout] Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
The reasons why we don't allocate ELF note segment in the 1st kernel (old memory) on page boundary is to keep backward compatibility for old kernels, and that if doing so, we waste not a little memory due to round-up operation to fit the memory to page boundary since most of the buffers are in per-cpu area. ELF notes are per-cpu, so total size of ELF note segments depends on number of CPUs. The current maximum number of CPUs on x86_64 is 5192, and there's already system with 4192 CPUs in SGI, where total size amounts to 1MB. This can be larger in the near future or possibly even now on another architecture that has larger size of note per a single cpu. Thus, to avoid the case where memory allocation for large block fails, we allocate vmcore objects on vmalloc memory. This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer to the ELF note segment buffer and its size. There's no longer the vmcore object that corresponds to the ELF note segment in vmcore_list. Accordingly, read_vmcore() has new case for ELF note segment and set_vmcore_list_offsets_elf{64,32}() and other helper functions starts calculating offset from sum of size of ELF headers and size of ELF note segment. [akpm@linux-foundation.org: use min(), fix error-path vzalloc() leaks] Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list Treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list. Formally, for each range [start, end], we set up the corresponding vmcore object in vmcore_list to [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)]. This change affects layout of /proc/vmcore. The gaps generated by the rearrangement are newly made visible to applications as holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and [end, roundup(end, PAGE_SIZE)]. Suppose variable m points at a vmcore object in vmcore_list, and variable phdr points at the program header of PT_LOAD type the variable m corresponds to. Then, pictorially: m->offset +---------------+ | hole | phdr->p_offset = +---------------+ m->offset + (paddr - start) | |\ | kernel memory | phdr->p_memsz | |/ +---------------+ | hole | m->offset + m->size +---------------+ where m->offset and m->offset + m->size are always page-size aligned. Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
Allocate ELF headers on page-size boundary using __get_free_pages() instead of kmalloc(). Later patch will merge PT_NOTE entries into a single unique one and decrease the buffer size actually used. Keep original buffer size in variable elfcorebuf_sz_orig to kfree the buffer later and actually used buffer size with rounded up to page-size boundary in variable elfcorebuf_sz separately. The size of part of the ELF buffer exported from /proc/vmcore is elfcorebuf_sz. The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz, elfcorebuf_sz_orig], is filled with 0. Use size of the ELF headers as an initial offset value in set_vmcore_list_offsets_elf{64,32} and process_ptload_program_headers_elf{64,32} in order to indicate that the offset includes the holes towards the page boundary. As a result, both set_vmcore_list_offsets_elf{64,32} have the same definition. Merge them as set_vmcore_list_offsets. [akpm@linux-foundation.org: add free_elfcorebuf(), cleanups] Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
Rewrite part of read_vmcore() that reads objects in vmcore_list in the same way as part reading ELF headers, by which some duplicated and redundant codes are removed. Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
VM page reclaim uses dirty and writeback page states to determine if flushers are cleaning pages too slowly and that page reclaim should stall waiting on flushers to catch up. Page state in NFS is a bit more complex and a clean page can be unreclaimable due to being unstable which is effectively "dirty" from the perspective of the VM from reclaim context. Similarly, if the inode is currently being committed then it's similar to being under writeback. This patch adds a is_dirty_writeback() handled for NFS that checks if a pages backing inode is being committed and should be accounted as writeback and if a page has private state indicating that it is effectively dirty. Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Zlatko Calusic <zcalusic@bitsync.net> Cc: dormando <dormando@rydia.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Page reclaim keeps track of dirty and under writeback pages and uses it to determine if wait_iff_congested() should stall or if kswapd should begin writing back pages. This fails to account for buffer pages that can be under writeback but not PageWriteback which is the case for filesystems like ext3 ordered mode. Furthermore, PageDirty buffer pages can have all the buffers clean and writepage does no IO so it should not be accounted as congested. This patch adds an address_space operation that filesystems may optionally use to check if a page is really dirty or really under writeback. An implementation is provided for for buffer_heads is added and used for block operations and ext3 in ordered mode. By default the page flags are obeyed. Credit goes to Jan Kara for identifying that the page flags alone are not sufficient for ext3 and sanity checking a number of ideas on how the problem could be addressed. Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Zlatko Calusic <zcalusic@bitsync.net> Cc: dormando <dormando@rydia.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Libin 提交于
(*->vm_end - *->vm_start) >> PAGE_SHIFT operation is implemented as a inline funcion vma_pages() in linux/mm.h, so using it. Signed-off-by: NLibin <huawei.libin@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Emelyanov 提交于
In order to reuse bits from pagemap entries gracefully, we leave the entries as is but on pagemap open emit a warning in dmesg, that bits 55-60 are about to change in a couple of releases. Next, if a user issues soft-dirty clear command via the clear_refs file (it was disabled before v3.9) we assume that he's aware of the new pagemap format, note that fact and report the bits in pagemap in the new manner. The "migration strategy" looks like this then: 1. existing users are not affected -- they don't touch soft-dirty feature, thus see old bits in pagemap, but are warned and have time to fix themselves 2. those who use soft-dirty know about new pagemap format 3. some time soon we get rid of any signs of page-shift in pagemap as well as this trick with clear-soft-dirty affecting pagemap format. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Emelyanov 提交于
The soft-dirty is a bit on a PTE which helps to track which pages a task writes to. In order to do this tracking one should 1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs) 2. Wait some time. 3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries) To do this tracking, the writable bit is cleared from PTEs when the soft-dirty bit is. Thus, after this, when the task tries to modify a page at some virtual address the #PF occurs and the kernel sets the soft-dirty bit on the respective PTE. Note, that although all the task's address space is marked as r/o after the soft-dirty bits clear, the #PF-s that occur after that are processed fast. This is so, since the pages are still mapped to physical memory, and thus all the kernel does is finds this fact out and puts back writable, dirty and soft-dirty bits on the PTE. Another thing to note, is that when mremap moves PTEs they are marked with soft-dirty as well, since from the user perspective mremap modifies the virtual memory at mremap's new address. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Emelyanov 提交于
These bits are always constant (== PAGE_SHIFT) and just occupy space in the entry. Moreover, in next patch we will need to report one more bit in the pagemap, but all bits are already busy on it. That said, describe the pagemap entry that has 6 more free zero bits. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Emelyanov 提交于
In the next patch the clear-refs-type will be required in clear_refs_pte_range funciton, so prepare the walk->private to carry this info. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Emelyanov 提交于
This is the implementation of the soft-dirty bit concept that should help keep track of changes in user memory, which in turn is very-very required by the checkpoint-restore project (http://criu.org). To create a dump of an application(s) we save all the information about it to files, and the biggest part of such dump is the contents of tasks' memory. However, there are usage scenarios where it's not required to get _all_ the task memory while creating a dump. For example, when doing periodical dumps, it's only required to take full memory dump only at the first step and then take incremental changes of memory. Another example is live migration. We copy all the memory to the destination node without stopping all tasks, then stop them, check for what pages has changed, dump it and the rest of the state, then copy it to the destination node. This decreases freeze time significantly. That said, some help from kernel to watch how processes modify the contents of their memory is required. The proposal is to track changes with the help of new soft-dirty bit this way: 1. First do "echo 4 > /proc/$pid/clear_refs". At that point kernel clears the soft dirty _and_ the writable bits from all ptes of process $pid. From now on every write to any page will result in #pf and the subsequent call to pte_mkdirty/pmd_mkdirty, which in turn will set the soft dirty flag. 2. Then read the /proc/$pid/pagemap2 and check the soft-dirty bit reported there (the 55'th one). If set, the respective pte was written to since last call to clear refs. The soft-dirty bit is the _PAGE_BIT_HIDDEN one. Although it's used by kmemcheck, the latter one marks kernel pages with it, while the former bit is put on user pages so they do not conflict to each other. This patch: A new clear-refs type will be added in the next patch, so prepare code for that. [akpm@linux-foundation.org: don't assume that sizeof(enum clear_refs_types) == sizeof(int)] Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xue jiufei 提交于
There may exist NULL pointer dereference in config_item_name() when one volume (say Volume A) unmounts while another (say Volume B) mounting. Volume A Volume B already Mounted. Unmounting, call o2hb_heartbeat_group_drop_item() -> config_item_put(item) set reg(A)->item.ci_name to NULL in function config_item_cleanup(). begin mounting, call o2hb_region_pin() and tranverse all regions. When reading reg(A)->item.ci_name, it causes NULL pointer dereference. call o2hb_region_release() and del reg(A) from list. So we should skip accessing regions that is going to release when tranverse o2hb_all_regions. Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Njoyce <xuejiufei@huawei.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Jie Liu <jeff.liu@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jie Liu 提交于
Adjust switch..case syntax at o2net_state_change to meet the kernel coding standard. s/printk/pr_info/. [akpm@linux-foundation.org: revert pr_foo() change] Signed-off-by: NJie Liu <jeff.liu@oracle.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Gurudas Pai <gurudas.pai@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Tao Ma <tm@tao.ma> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jie Liu 提交于
Fix a comment typo in o2quo_hb_still_up() Signed-off-by: NJie Liu <jeff.liu@oracle.com> Cc: Gurudas Pai <gurudas.pai@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Tao Ma <tm@tao.ma> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jie Liu 提交于
s/o2hb_global_hearbeat_mode_set/o2hb_global_heartbeat_mode_set/ to make the signature of those routines in a consistent manner with others for heartbeating. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Acked-by: NSunil Mushran <sunil.mushran@gmail.com> Cc: Gurudas Pai <gurudas.pai@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Cc: Tao Ma <tm@tao.ma> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Noboru Iwamatsu 提交于
Under heavy I/O load, writing the disk heartbeat can be forced to wait for minutes, and this causes the node to be fenced. This patch tries to use WRITE_SYNC in submitting the heartbeat bio, so that writing the heartbeat will have a priority over other requests. Signed-off-by: NNoboru Iwamatsu <n_iwamatsu@jp.fujitsu.com> Acked-by: NTao Ma <tm@tao.ma> Acked-by: NSunil Mushran <sunil.mushran@gmail.com> Cc: Srinivas Eeeda <srinivas.eeda@oracle.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Tested-by: NGurudas Pai <gurudas.pai@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
Inlined xattr shared free space of inode block with inlined data or data extent record, so the size of the later two should be adjusted when inlined xattr is enabled. See ocfs2_xattr_ibody_init(). But this isn't done well when reflink. For inode with inlined data, its max inlined data size is adjusted in ocfs2_duplicate_inline_data(), no problem. But for inode with data extent record, its record count isn't adjusted. Fix it, or data extent record and inlined xattr may overwrite each other, then cause data corruption or xattr failure. One panic caused by this bug in our test environment is the following: kernel BUG at fs/ocfs2/xattr.c:1435! invalid opcode: 0000 [#1] SMP Pid: 10871, comm: multi_reflink_t Not tainted 2.6.39-300.17.1.el5uek #1 RIP: ocfs2_xa_offset_pointer+0x17/0x20 [ocfs2] RSP: e02b:ffff88007a587948 EFLAGS: 00010283 RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00000000000051e4 RDX: ffff880057092060 RSI: 0000000000000f80 RDI: ffff88007a587a68 RBP: ffff88007a587948 R08: 00000000000062f4 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010 R13: ffff88007a587a68 R14: 0000000000000001 R15: ffff88007a587c68 FS: 00007fccff7f06e0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000015cf000 CR3: 000000007aa76000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process multi_reflink_t Call Trace: ocfs2_xa_reuse_entry+0x60/0x280 [ocfs2] ocfs2_xa_prepare_entry+0x17e/0x2a0 [ocfs2] ocfs2_xa_set+0xcc/0x250 [ocfs2] ocfs2_xattr_ibody_set+0x98/0x230 [ocfs2] __ocfs2_xattr_set_handle+0x4f/0x700 [ocfs2] ocfs2_xattr_set+0x6c6/0x890 [ocfs2] ocfs2_xattr_user_set+0x46/0x50 [ocfs2] generic_setxattr+0x70/0x90 __vfs_setxattr_noperm+0x80/0x1a0 vfs_setxattr+0xa9/0xb0 setxattr+0xc3/0x120 sys_fsetxattr+0xa8/0xd0 system_call_fastpath+0x16/0x1b Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Younger Liu 提交于
While deleting a file with ocfs2_unlink(), there is a bug in this function. This bug will result in filesystem read-only. After calling ocfs2_orphan_add(), the file which will be deleted is added into orphan dir. If ocfs2_delete_entry() fails, the file still exists in the parent dir. And this scenario introduces a conflict of metadata. If a file is added into orphan dir, when we put inode of the file with iput(), the inode i_flags is setted (~OCFS2_VALID_FL) in ocfs2_remove_inode(), and then write back to disk. But as previously mentioned, the file still exists in the parent dir. On other nodes, the file can be still accessed. When first read the file with ocfs2_read_blocks() from disk, It will check and avalidate inode using ocfs2_validate_inode_block(). So File system will be readonly because the inode is invalid. In other words, the inode i_flags has been set (~OCFS2_VALID_FL). [akpm@linux-foundation.org: cleanups] [jeff.liu@oracle.com: s/inode_is_unlinkable/ocfs2_inode_is_unlinkable/] Signed-off-by: NYounger Liu <younger.liu@huawei.com> Signed-off-by: NJensen <shencanquan@huawei.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Younger Liu <younger.liu@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jie Liu 提交于
In ocfs2_relink_block_group(), we roll back all those changes if notify intent to modify buffers for metadata update failed even if the relevant buffer has not yet been modified/got dirty at that point, that are not quite right because of: - None buffer has been modified/dirty if failed to call ocfs2_journal_access_gd() against the previous block group buffer - Only the previous block group buffer has got dirty if failed to call ocfs2_journal_access_gd() against the block group buffer - There is no need to roll back the change for file entry buffer at all Those problems will not cause anything wrong but unnecessary. This patch fix them and kill the useless bg_ptr variable as well. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Cc: Younger Liu <younger.liu@huawei.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Younger Liu 提交于
While adding a file into orphan dir in ocfs2_orphan_add(), it calls __ocfs2_add_entry() before ocfs2_journal_access_di(). If ocfs2_journal_access_di() failed, the file is added into orphan dir, and orphan dir dinode updated, but file dinode has not been updated. Accordingly, the data is not consistent between file dinode and orphan dir. So, need to call ocfs2_journal_access_di() before __ocfs2_add_entry(), and if ocfs2_journal_access_di() failed, orphan_fe and orphan_dir_inode->i_nlink need rollback. This bug was added by 3939fda4 ("Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir."). Signed-off-by: NYounger Liu <younger.liu@huawei.com> Acked-by: NJeff Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xue jiufei 提交于
dlmlock_master() returns DLM_RECOVERING/DLM_MIGRATING/ DLM_FORWAR after adding lock to blocked list if lockres has the state DLM_LOCK_RES_RECOVERING/DLM_LOCK_RES_MIGRATING/ DLM_LOCK_RES_IN_PROGRESS. so it will retry in dlmlock(). And this may cause dlm_thread fall into an infinite loop Thread1 dlm_thread calls dlm_lock->dlmlock_master, if lockresA is in state DLM_LOCK_RES_RECOVERING, calls __dlm_wait_on_lockres() and waits until others threads clear this state; If cannot grant this lock, adding lock to blocked list, and return DLM_RECOVERING; Grant this lock and move it to grant list; After a while, retry and calls list_add_tail(), adding lock to blocked list again. Granted and blocked list of this lockres will become the following conditions: lock_res->granted.next = dlm_lock->list_head; lock_res->blocked.next = dlm_lock->list_head; dlm_lock->list_head.next = dlm_lock_resource->blocked; When dlm_thread traverses the granted list, it will fall into an endless loop, checking dlm_lock.list_head, dlm_lock->list_head.next (i.e.lock_res->blocked), lock_res->blocked.next(i.e.dlm_lock.list_head again) ..... Signed-off-by: Njoyce <xuejiufei@huawei.com> Reviewed-by: Njensen <shencanquan@huawei.com> Cc: Jeff Liu <jeff.liu@oracle.com> Acked-by: NSunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
Free space checking will be done in ocfs2_xattr_ibody_init(). So remove here. [akpm@linux-foundation.org: remove unused local] Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Younger Liu 提交于
There is a memory leak in sc_kref_release(). When free struct o2net_sock_container (sc), we should release sc->sc_page. Signed-off-by: NYounger Liu <younger.liu@huawei.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Goldwyn Rodrigues 提交于
While adding extends to a file, the credits are calculated incorrectly and if the requested clusters is more than one (or more because we used a conservative limit) then we run out of journal credits and we hit an assert in journalling code. The function parameter bits_wanted variable was not used at all. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
In ocfs2_remove_btree_range, when calling ocfs2_lock_refcount_tree and ocfs2_prepare_refcount_change_for_del failed, it goes to out and then tries to call mutex_unlock without mutex_lock before. And when calling ocfs2_reserve_blocks_for_rec_trunc failed, it should free ref_tree before return. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Goldwyn Rodrigues 提交于
Code cleanup: needs_checkpoint is assigned to but never used. Delete the variable. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Cc: Jeff Liu <jeff.liu@oracle.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xue jiufei 提交于
dlm_begin_reco_handler() returns without putting dlm when dlm recovery state is DLM_RECO_STATE_FINALIZE. Signed-off-by: Njoyce <xuejiufei@huawei.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
If we use le32_add_cpu to set ocfs2_dinode i_flags, it may lead to the corresponding flag corrupted. So we should change it to bitwise and/or operation. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: shencanquan <shencanquan@huawei.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
In dlm_request_all_locks, ret is type enum. But o2net_send_message returns a type int value. Then it will never run into the following error branch. So we should change the ret type from enum to int. Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: NSunil Mushran <sunil.mushran@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joseph Qi 提交于
Below 3 functions have already been declared in dlmcommon.h, so we have no need to declare them again in dlmrecovery.c: dlm_complete_recovery_thread dlm_launch_recovery_thread dlm_kick_recovery_thread Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: NSunil Mushran <sunil.mushran@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-