- 16 11月, 2017 24 次提交
-
-
由 Jan Kara 提交于
Use find_get_pages_range_tag() in afs_writepages_region() as we are interested only in pages from given range. Remove unnecessary code after this conversion. Link: http://lkml.kernel.org/r/20171009151359.31984-16-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
All users of pagevec_lookup() and pagevec_lookup_range() now pass PAGEVEC_SIZE as a desired number of pages. Just drop the argument. Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
Use new function for looking up pages since nr_pages argument from pagevec_lookup_range_tag() is going away. Link: http://lkml.kernel.org/r/20171009151359.31984-14-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
We want only pages from given range in nilfs_lookup_dirty_data_buffers(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-10-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
We want only pages from given range in gfs2_write_cache_jdata(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-9-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Peterson <rpeterso@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
__get_first_dirty_index() wants to lookup only the first dirty page after given index. There's no point in using pagevec_lookup_tag() for that. Just use find_get_pages_tag() directly. Link: http://lkml.kernel.org/r/20171009151359.31984-8-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NChao Yu <yuchao0@huawei.com> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
In several places we want to iterate over all tagged pages in a mapping. However the code was apparently copied from places that iterate only over a limited range and thus it checks for index <= end, optimizes the case where we are coming close to range end which is all pointless when end == ULONG_MAX. So just remove this dead code. [akpm@linux-foundation.org: fix warnings] Link: http://lkml.kernel.org/r/20171009151359.31984-7-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
We want only pages from given range in f2fs_write_cache_pages(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-6-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NChao Yu <yuchao0@huawei.com> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
We want only pages from given range in ext4_writepages(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-5-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
We want only pages from given range in ceph_writepages_start(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-4-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
We want only pages from given range in btree_write_cache_pages() and extent_write_cache_pages(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-3-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jérôme Glisse 提交于
This patch only affects users of mmu_notifier->invalidate_range callback which are device drivers related to ATS/PASID, CAPI, IOMMUv2, SVM ... and it is an optimization for those users. Everyone else is unaffected by it. When clearing a pte/pmd we are given a choice to notify the event under the page table lock (notify version of *_clear_flush helpers do call the mmu_notifier_invalidate_range). But that notification is not necessary in all cases. This patch removes almost all cases where it is useless to have a call to mmu_notifier_invalidate_range before mmu_notifier_invalidate_range_end. It also adds documentation in all those cases explaining why. Below is a more in depth analysis of why this is fine to do this: For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a process virtual address space). There is only 2 cases when you need to notify those secondary TLB while holding page table lock when clearing a pte/pmd: A) page backing address is free before mmu_notifier_invalidate_range_end B) a page table entry is updated to point to a new page (COW, write fault on zero page, __replace_page(), ...) Case A is obvious you do not want to take the risk for the device to write to a page that might now be used by something completely different. Case B is more subtle. For correctness it requires the following sequence to happen: - take page table lock - clear page table entry and notify (pmd/pte_huge_clear_flush_notify()) - set page table entry to point to new page If clearing the page table entry is not followed by a notify before setting the new pte/pmd value then you can break memory model like C11 or C++11 for the device. Consider the following scenario (device use a feature similar to ATS/ PASID): Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we assume they are write protected for COW (other case of B apply too). [Time N] ----------------------------------------------------------------- CPU-thread-0 {try to write to addrA} CPU-thread-1 {try to write to addrB} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {read addrA and populate device TLB} DEV-thread-2 {read addrB and populate device TLB} [Time N+1] --------------------------------------------------------------- CPU-thread-0 {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}} CPU-thread-1 {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+2] --------------------------------------------------------------- CPU-thread-0 {COW_step1: {update page table point to new page for addrA}} CPU-thread-1 {COW_step1: {update page table point to new page for addrB}} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+3] --------------------------------------------------------------- CPU-thread-0 {preempted} CPU-thread-1 {preempted} CPU-thread-2 {write to addrA which is a write to new page} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+3] --------------------------------------------------------------- CPU-thread-0 {preempted} CPU-thread-1 {preempted} CPU-thread-2 {} CPU-thread-3 {write to addrB which is a write to new page} DEV-thread-0 {} DEV-thread-2 {} [Time N+4] --------------------------------------------------------------- CPU-thread-0 {preempted} CPU-thread-1 {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+5] --------------------------------------------------------------- CPU-thread-0 {preempted} CPU-thread-1 {} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {read addrA from old page} DEV-thread-2 {read addrB from new page} So here because at time N+2 the clear page table entry was not pair with a notification to invalidate the secondary TLB, the device see the new value for addrB before seing the new value for addrA. This break total memory ordering for the device. When changing a pte to write protect or to point to a new write protected page with same content (KSM) it is ok to delay invalidate_range callback to mmu_notifier_invalidate_range_end() outside the page table lock. This is true even if the thread doing page table update is preempted right after releasing page table lock before calling mmu_notifier_invalidate_range_end Thanks to Andrea for thinking of a problematic scenario for COW. [jglisse@redhat.com: v2] Link: http://lkml.kernel.org/r/20171017031003.7481-2-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170901173011.10745-1-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alistair Popple <alistair@popple.id.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
There is no need to have a local return code set with -EINVAL when both the conditions following it return error codes appropriately. Just remove the redundant one. Link: http://lkml.kernel.org/r/20170929145444.17611-1-khandual@linux.vnet.ibm.comSigned-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Add sparse-checked slab_flags_t for struct kmem_cache::flags (SLAB_POISON, etc). SLAB is bloated temporarily by switching to "unsigned long", but only temporarily. Link: http://lkml.kernel.org/r/20171021100225.GA22428@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Guozhonghua 提交于
Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4F3CDE3A9@H3CMLB14-EX.srv.huawei-3com.comSigned-off-by: Nguozhonghua <guozhonghua@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Changwei Ge 提交于
When dlm_add_migration_mle returns -EEXIST, previously input mle will not be initialized. So we can't use its associated dlm object. And we truly don't need this mle for already launched migration progress, since oldmle has taken this role. Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED7AA61@H3CMLB14-EX.srv.huawei-3com.comSigned-off-by: NChangwei Ge <ge.changwei@h3c.com> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 alex chen 提交于
The subsystem.su_mutex is required while accessing the item->ci_parent, otherwise, NULL pointer dereference to the item->ci_parent will be triggered in the following situation: add node delete node sys_write vfs_write configfs_write_file o2nm_node_store o2nm_node_local_write do_rmdir vfs_rmdir configfs_rmdir mutex_lock(&subsys->su_mutex); unlink_obj item->ci_group = NULL; item->ci_parent = NULL; to_o2nm_cluster_from_node node->nd_item.ci_parent->ci_parent BUG since of NULL pointer dereference to nd_item.ci_parent Moreover, the o2nm_cluster also should be protected by the subsystem.su_mutex. [alex.chen@huawei.com: v2] Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.comSigned-off-by: NAlex Chen <alex.chen@huawei.com> Reviewed-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 alex chen 提交于
ip_alloc_sem should be taken in ocfs2_get_block() when reading file in DIRECT mode to prevent concurrent access to extent tree with ocfs2_dio_end_io_write(), which may cause BUGON in the following situation: read file 'A' end_io of writing file 'A' vfs_read __vfs_read ocfs2_file_read_iter generic_file_read_iter ocfs2_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO get_more_blocks ocfs2_get_block ocfs2_extent_map_get_blocks ocfs2_get_clusters ocfs2_get_clusters_nocache() ocfs2_search_extent_list return the index of record which contains the v_cluster, that is v_cluster > rec[i]->e_cpos. ocfs2_dio_end_io ocfs2_dio_end_io_write down_write(&oi->ip_alloc_sem); ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ... --> modify the rec[i]->e_cpos, resulting in v_cluster < rec[i]->e_cpos. BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)) [alex.chen@huawei.com: v3] Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Fixes: c15471f7 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: NAlex Chen <alex.chen@huawei.com> Reviewed-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Reviewed-by: NGang He <ghe@suse.com> Acked-by: NChangwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 alex chen 提交于
we should wait dio requests to finish before inode lock in ocfs2_setattr(), otherwise the following deadlock will happen: process 1 process 2 process 3 truncate file 'A' end_io of writing file 'A' receiving the bast messages ocfs2_setattr ocfs2_inode_lock_tracker ocfs2_inode_lock_full inode_dio_wait __inode_dio_wait -->waiting for all dio requests finish dlm_proxy_ast_handler dlm_do_local_bast ocfs2_blocking_ast ocfs2_generic_handle_bast set OCFS2_LOCK_BLOCKED flag dio_end_io dio_bio_end_aio dio_complete ocfs2_dio_end_io ocfs2_dio_end_io_write ocfs2_inode_lock __ocfs2_cluster_lock ocfs2_wait_for_mask -->waiting for OCFS2_LOCK_BLOCKED flag to be cleared, that is waiting for 'process 1' unlocking the inode lock inode_dio_end -->here dec the i_dio_count, but will never be called, so a deadlock happened. Link: http://lkml.kernel.org/r/59F81636.70508@huawei.comSigned-off-by: NAlex Chen <alex.chen@huawei.com> Reviewed-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Acked-by: NChangwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 piaojun 提交于
Link: http://lkml.kernel.org/r/59C5D7D6.9050106@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NAlex Chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Changwei Ge 提交于
When a node dies, other live nodes have to choose a new master for an existed lock resource mastered by the dead node. As for ocfs2/dlm implementation, this is done by function - dlm_move_lockres_to_recovery_list which marks those lock rsources as DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes lock resource's master later. So without invoking dlm_move_lockres_to_recovery_list, no master will be choosed after dlm recovery accomplishment since no lock resource can be found through ::resource list. What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock resources mastered a dead node, it will break up synchronization among nodes. So invoke dlm_move_lockres_to_recovery_list again. Fixs: 'commit ee8f7fcb ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.comSigned-off-by: NChangwei Ge <ge.changwei@h3c.com> Reported-by: NVitaly Mayatskih <v.mayatskih@gmail.com> Tested-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 piaojun 提交于
Link: http://lkml.kernel.org/r/59E064BB.8000005@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 piaojun 提交于
destroy_workqueue() will do flushing work for us. Link: http://lkml.kernel.org/r/59E06476.3090502@huawei.comSigned-off-by: NJun Piao <piaojun@huawei.com> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Guozhonghua 提交于
Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4D0743232@H3CMLB12-EX.srv.huawei-3com.comSigned-off-by: Nguozhonghua <guozhonghua@h3c.com> Acked-by: NChangwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 11月, 2017 1 次提交
-
-
由 Chao Yu 提交于
Commit 6184fc0b ("quota: Propagate error from ->acquire_dquot()") missed to handle error from dquot_initialize in dquot_file_open, fix it. Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 12 11月, 2017 1 次提交
-
-
由 Arnd Bergmann 提交于
__getnstimeofday() is a rather odd interface, with a number of quirks: - The caller may come from NMI context, but the implementation is not NMI safe, one way to get there from NMI is NMI handler: something bad panic() kmsg_dump() pstore_dump() pstore_record_init() __getnstimeofday() - The calling conventions are different from any other timekeeping functions, to deal with returning an error code during suspended timekeeping. Address the above issues by using a completely different method to get the time: ktime_get_real_fast_ns() is NMI safe and has a reasonable behavior when timekeeping is suspended: it returns the time at which it got suspended. As Thomas Gleixner explained, this is safe, as ktime_get_real_fast_ns() does not call into the clocksource driver that might be suspended. The result can easily be transformed into a timespec structure. Since ktime_get_real_fast_ns() was not exported to modules, add the export. The pstore behavior for the suspended case changes slightly, as it now stores the timestamp at which timekeeping was suspended instead of storing a zero timestamp. This change is not addressing y2038-safety, that's subject to a more complex follow up patch. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NKees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Colin Cross <ccross@android.com> Link: https://lkml.kernel.org/r/20171110152530.1926955-1-arnd@arndb.de
-
- 11 11月, 2017 2 次提交
-
-
由 Greg Edwards 提交于
guard_bio_eod() needs to look at the partition capacity, not just the capacity of the whole device, when determining if truncation is necessary. [ 60.268688] attempt to access beyond end of device [ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506 [ 60.268693] buffer_io_error: 2 callbacks suppressed [ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read Fixes: 74d46992 ("block: replace bi_bdev with a gendisk pointer and partitions index") Cc: stable@vger.kernel.org # v4.13 Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NGreg Edwards <gedwards@ddn.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
The contexts from which a SCSI device can be quiesced or resumed are: * Writing into /sys/class/scsi_device/*/device/state. * SCSI parallel (SPI) domain validation. * The SCSI device power management methods. See also scsi_bus_pm_ops. It is essential during suspend and resume that neither the filesystem state nor the filesystem metadata in RAM changes. This is why while the hibernation image is being written or restored that SCSI devices are quiesced. The SCSI core quiesces devices through scsi_device_quiesce() and scsi_device_resume(). In the SDEV_QUIESCE state execution of non-preempt requests is deferred. This is realized by returning BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI devices. Avoid that a full queue prevents power management requests to be submitted by deferring allocation of non-preempt requests for devices in the quiesced state. This patch has been tested by running the following commands and by verifying that after each resume the fio job was still running: for ((i=0; i<10; i++)); do ( cd /sys/block/md0/md && while true; do [ "$(<sync_action)" = "idle" ] && echo check > sync_action sleep 1 done ) & pids=($!) for d in /sys/class/block/sd*[a-z]; do bdev=${d#/sys/class/block/} hcil=$(readlink "$d/device") hcil=${hcil#../../../} echo 4 > "$d/queue/nr_requests" echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth" fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \ --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \ --iodepth_batch=1 --thread --loops=$((2**31)) & pids+=($!) done sleep 1 echo "$(date) Hibernating ..." >>hibernate-test-log.txt systemctl hibernate sleep 10 kill "${pids[@]}" echo idle > /sys/block/md0/md/sync_action wait echo "$(date) Done." >>hibernate-test-log.txt done Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348). Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NMartin Steigerwald <martin@lichtvoll.de> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 11月, 2017 12 次提交
-
-
由 Darrick J. Wong 提交于
When mounting fails, we must force-reclaim inodes (and disable delayed reclaim) /after/ the realtime and quota control have let go of the realtime and quota inodes. Without this, we corrupt the timer list and cause other weird problems. Found by xfs/376 fuzzing u3.bmbt[0].lastoff on an rmap filesystem to force a bogus post-eof extent reclaim that causes the fs to go down. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Make sure we don't list a block twice in the agfl by copying the contents of the AGFL to an array, sorting it, and looking for duplicates. We can easily check that the number of agfl entries we see actually matches the flcount, so do that too. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Use the uint* types instead of the u_int* types. This will (hopefully) pair with an xfsprogs cleanup. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
And also rename fill to nr_entries to match the rest of the code. Reported-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Reported-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Reported-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Reported-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Fix to check the correct value, and remove a duplicate handling of the uneven record number split algorith, Reported-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Tim Hansen 提交于
kmem_cache_destroy already checks for null values. Signed-off-by: NTim Hansen <devtimhansen@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Darrick J. Wong 提交于
It turns out that we only started zeroing a new da btree node's block header on v5 filesystems. Prior to that, we just wouldn't set anything at all, which means that the pad field never got set and would retain whatever happened to be in memory. Therefore, we can only check the pad for zeroness on v5 filesystems. shared/006 on a v4 filesystem exposes this scrub bug. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
The btree scrubber has some custom code to retrieve and check a btree block via xfs_btree_lookup_get_block. This function will either return an error code (verifiers failed) or a *pblock will be untouched (bad pointer). Since we previously set *pblock to NULL, we need to check *pblock, not pblock, to trigger the early bailout. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-