- 18 3月, 2020 2 次提交
-
-
由 Xiaoguang Wang 提交于
While transaction is going to commit, it first sets its state to be T_LOCKED and waits all outstanding handles to complete, and the committing transaction will always be in locked state so long as it has outstanding handles, also the whole fs will be locked and all later fs modification operations will be stucked in wait_transaction_locked(). It's hard to tell why handles are that slow, so here we add a new staic tracepoint to track such slow handle, and show io wait time and sched wait time, output likes below: fsstress-20347 [024] .... 1570.305454: jbd2_slow_handle_stats: dev 254,17 tid 15853 type 4 line_no 3101 interval 126 sync 0 requested_blocks 24 dirtied_blocks 0 trans_wait 122 space_wait 0 sched_wait 0 io_wait 126 "trans_wait 122" means that this current committing transaction has been locked for 122ms, due to this handle is not completed quickly. From "io_wait 126", we can see that io is the major reason. In this patch, we also add a per fs control file used to determine whether a handle can be considered to be slow. /proc/fs/jbd2/vdb1-8/stall_thresh default value is 100ms, users can set new threshold by echoing new value to this file. Later I also plan to add a proc file fs per fs to record these info. Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
If one process context is stucked in wait_on_buffer(), lock_buffer(), lock_page() and wait_on_page_writeback() and wait_on_bit_io(), it's hard to tell ture reason, for example, whether this page is under io, or this page is just locked too long by other process context. Normally io request has multiple bios, and every bio contains multiple pages which will hold data to be read from or written to device, so here we record page info or bio info in task_struct while process calls lock_page(), lock_buffer(), wait_on_page_writeback(), wait_on_buffer() and wait_on_bit_io(), we add a new proce interface: [lege@localhost linux]$ cat /proc/4516/wait_res 1 ffffd0969f95d3c0 4295369599 4295381596 Above info means that thread 4516 is waitting on a page, address is ffffd0969f95d3c0, and has waited for 11997ms. First field denotes the page address process is waitting on. Second field denotes the wait moment and the third denotes current moment. In practice, if we found a process waitting on one page for too long time, we can get page's address by reading /proc/$pid/wait_page, and search this page address in all block devices' /sys/kernel/debug/block/${devname}/rq_hang, if search operation hits one, we can get the request and know why this io request hangs that long. Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
- 17 1月, 2020 2 次提交
-
-
由 zhangyi (F) 提交于
commit 597599268e3b91cac71faf48743f4783dec682fc upstream. We do not unmap and clear dirty flag when forgetting a buffer without journal or does not belongs to any transaction, so the invalid dirty data may still be written to the disk later. It's fine if the corresponding block is never used before the next mount, and it's also fine that we invoke clean_bdev_aliases() related functions to unmap the block device mapping when re-allocating such freed block as data block. But this logic is somewhat fragile and risky that may lead to data corruption if we forget to clean bdev aliases. So, It's better to discard dirty data during forget time. We have been already handled all the cases of forgetting journalled buffer, this patch deal with the remaining two cases. - buffer is not journalled yet, - buffer is journalled but doesn't belongs to any transaction. We invoke __bforget() instead of __brelese() when forgetting an un-journalled buffer in jbd2_journal_forget(). After this patch we can remove all clean_bdev_aliases() related calls in ext4. Suggested-by: NJan Kara <jack@suse.cz> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Joseph Qi 提交于
Fix the following build warning on arch i386: ld: fs/jbd2/journal.o: in function `jbd2_seq_stats_show': journal.c:(.text+0x137d): undefined reference to `__udivdi3' Fixes: 7e2e7b9a ("alinux: jbd2: add new "stats" proc file") Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJeffle Xu <jefflexu@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
- 15 1月, 2020 3 次提交
-
-
由 Xiaoguang Wang 提交于
When jbd2 tries to get write access to one buffer, and if this buffer is under writeback with BH_Shadow flag, jbd2 will wait until this buffer has been written to disk, but sometimes the time taken to wait may be much long, especially disk capacity is almost full. Here add a proc entry "force-copy", if its value is not zero, jbd2 will always do meta buffer copy-cout, then we can eliminate the unnecessary wating time here, and reduce long tail latency for buffered-write. I construct such test case below: $cat offline.fio ; fio-rand-RW.job for fiotest [global] name=fio-rand-RW filename=fio-rand-RW rw=randrw rwmixread=60 rwmixwrite=40 bs=4K direct=0 numjobs=4 time_based=1 runtime=900 [file1] size=60G ioengine=sync iodepth=16 $cat online.fio ; fio-seq-write.job for fiotest [global] name=fio-seq-write filename=fio-seq-write rw=write bs=256K direct=0 numjobs=1 time_based=1 runtime=60 [file1] rate=50m size=10G ioengine=sync iodepth=16 With this patch: $cat /proc/fs/jbd2/sda5-8/force_copy 0 online fio almost always get such long tail latency: Jobs: 1 (f=1), 0B/s-0B/s: [W(1)][100.0%][w=50.0MiB/s][w=200 IOPS][eta 00m:00s] file1: (groupid=0, jobs=1): err= 0: pid=17855: Thu Nov 15 09:45:57 2018 write: IOPS=200, BW=50.0MiB/s (52.4MB/s)(3000MiB/60001msec) clat (usec): min=135, max=4086.6k, avg=867.21, stdev=50338.22 lat (usec): min=139, max=4086.6k, avg=871.16, stdev=50338.22 clat percentiles (usec): | 1.00th=[ 141], 5.00th=[ 143], 10.00th=[ 145], | 20.00th=[ 147], 30.00th=[ 147], 40.00th=[ 149], | 50.00th=[ 149], 60.00th=[ 151], 70.00th=[ 153], | 80.00th=[ 155], 90.00th=[ 159], 95.00th=[ 163], | 99.00th=[ 255], 99.50th=[ 273], 99.90th=[ 429], | 99.95th=[ 441], 99.99th=[3640656] $cat /proc/fs/jbd2/sda5-8/force_copy 1 online fio latency is much better. Jobs: 1 (f=1), 0B/s-0B/s: [W(1)][100.0%][w=50.0MiB/s][w=200 IOPS][eta 00m:00s] file1: (groupid=0, jobs=1): err= 0: pid=8084: Thu Nov 15 09:31:15 2018 write: IOPS=200, BW=50.0MiB/s (52.4MB/s)(3000MiB/60001msec) clat (usec): min=137, max=545, avg=151.35, stdev=16.22 lat (usec): min=140, max=548, avg=155.31, stdev=16.65 clat percentiles (usec): | 1.00th=[ 143], 5.00th=[ 145], 10.00th=[ 145], 20.00th=[ 147], | 30.00th=[ 147], 40.00th=[ 147], 50.00th=[ 149], 60.00th=[ 149], | 70.00th=[ 151], 80.00th=[ 155], 90.00th=[ 157], 95.00th=[ 161], | 99.00th=[ 239], 99.50th=[ 269], 99.90th=[ 420], 99.95th=[ 429], | 99.99th=[ 537] As to the cost: because we'll always need to copy meta buffer, will consume minor cpu time and some memory(at most 32MB for 128MB journal size). Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
/proc/fs/jbd2/${device}/info only shows whole average statistical info about jbd2's life cycle, but it can not show jbd2 info in specified time interval and sometimes this capability is very useful for trouble shooting. For example, we can not see how rs_locked and rs_flushing grows in specified time interval, but these two indexes can explain some reasons for app's behaviours. Here we add a new "stats" proc file like /proc/diskstats, then we can implement a simple tool jbd2_stats which'll display detailed jbd2 info in specified time interval. Like below(time interval 5s): [lege@localhost ~]$ cat /proc/fs/jbd2/vdb1-8/stats 51 30 8192 0 1 241616 0 0 22 0 47158 891 942 1000 1000 [lege@localhost ~]$ gcc -o jbd2_stat jbd2_stat.c ; ./jbd2_stat Device tid trans handles locked flushing logging vdb1-8 1861 158 359 13.00 0.00 2.00 Device tid trans handles locked flushing logging vdb1-8 1974 113 389 26.00 0.00 5.00 Device tid trans handles locked flushing logging vdb1-8 2188 214 308 10.00 0.00 7.00 Device tid trans handles locked flushing logging vdb1-8 2344 156 332 19.00 0.00 4.00 Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Joseph Qi 提交于
This is trying to do jbd2 checkpoint in a specific kernel thread, then checkpoint won't be under io throttle control. Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Reviewed by: Baoyou Xie <baoyou.xie@linux.alibaba.com> Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
- 27 12月, 2019 1 次提交
-
-
由 Xiaoguang Wang 提交于
commit 53cf978457325d8fb2cdecd7981b31a8229e446e upstream. This issue was found when I tried to put checkpoint work in a separate thread, the deadlock below happened: Thread1 | Thread2 __jbd2_log_wait_for_space | jbd2_log_do_checkpoint (hold j_checkpoint_mutex)| if (jh->b_transaction != NULL) | ... | jbd2_log_start_commit(journal, tid); |jbd2_update_log_tail | will lock j_checkpoint_mutex, | but will be blocked here. | jbd2_log_wait_commit(journal, tid); | wait_event(journal->j_wait_done_commit, | !tid_gt(tid, journal->j_commit_sequence)); | ... |wake_up(j_wait_done_commit) } | then deadlock occurs, Thread1 will never be waken up. To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint() when we are going to wait for transaction commit. Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 28 7月, 2019 1 次提交
-
-
由 Ross Zwisler 提交于
commit 6ba0e7dc64a5adcda2fbe65adc466891795d639e upstream. Currently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait for all the pages under writeback to be written out. The easiest way to cause this type of workload is do just dd from /dev/zero to a file until it fills the entire filesystem. This can cause journal_finish_inode_data_buffers() to wait for the duration of the entire dd operation. We can improve this situation by scoping each of the inode dirty ranges associated with a given transaction. We do this via the jbd2_inode structure so that the scoping is contained within jbd2 and so that it follows the lifetime and locking rules for that structure. This allows us to limit the writeback & wait in journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() respectively to the dirty range for a given struct jdb2_inode, keeping us from waiting forever if the inode in question is still being appended to. Signed-off-by: NRoss Zwisler <zwisler@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 5月, 2019 2 次提交
-
-
由 Chengguang Xu 提交于
commit 0d52154bb0a700abb459a2cbce0a30fc2549b67e upstream. When failing from creating cache jbd2_inode_cache, we will destroy the previously created cache jbd2_handle_cache twice. This patch fixes this by moving each cache initialization/destruction to its own separate, individual function. Signed-off-by: NChengguang Xu <cgxu519@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jiufei Xue 提交于
commit 742b06b5628f2cd23cb51a034cb54dc33c6162c5 upstream. We hit a BUG at fs/buffer.c:3057 if we detached the nbd device before unmounting ext4 filesystem. The typical chain of events leading to the BUG: jbd2_write_superblock submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)); The block device is removed and all the pages are invalidated. JBD2 was trying to write journal superblock to the block device which is no longer present. Fix this by checking the journal superblock's buffer head prior to submitting. Reported-by: NEric Ren <renzhen@linux.alibaba.com> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 4月, 2019 2 次提交
-
-
由 Theodore Ts'o 提交于
[ Upstream commit 538bcaa6261b77e71d37f5596c33127c1a3ec3f7 ] The jbd2 superblock is lockless now, so there is probably a race condition between writing it so disk and modifing contents of it, which may lead to checksum error. The following race is the one case that we have captured. jbd2 fsstress jbd2_journal_commit_transaction jbd2_journal_update_sb_log_tail jbd2_write_superblock jbd2_superblock_csum_set jbd2_journal_revoke jbd2_journal_set_features(revork) modify superblock submit_bh(checksum incorrect) Fix this by locking the buffer head before modifing it. We always write the jbd2 superblock after we modify it, so this just means calling the lock_buffer() a little earlier. This checksum corruption problem can be reproduced by xfstests generic/475. Reported-by: Nzhangyi (F) <yi.zhang@huawei.com> Suggested-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 luojiajun 提交于
[ Upstream commit 6e876c3dd205d30b0db6850e97a03d75457df007 ] In jbd2_journal_commit_transaction(), if we are in abort mode, we may flush the buffer without setting descriptor block checksum by goto start_journal_io. Then fs is mounted, jbd2_descriptor_block_csum_verify() failed. [ 271.379811] EXT4-fs (vdd): shut down requested (2) [ 271.381827] Aborting journal on device vdd-8. [ 271.597136] JBD2: Invalid checksum recovering block 22199 in log [ 271.598023] JBD2: recovery failed [ 271.598484] EXT4-fs (vdd): error loading journal Fix this problem by keep setting descriptor block checksum if the descriptor buffer is not NULL. This checksum problem can be reproduced by xfstests generic/388. Signed-off-by: Nluojiajun <luojiajun3@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 24 3月, 2019 2 次提交
-
-
由 zhangyi (F) 提交于
commit 01215d3edb0f384ddeaa5e4a22c1ae5ff634149f upstream. The jh pointer may be used uninitialized in the two cases below and the compiler complain about it when enabling JBUFFER_TRACE macro, fix them. In file included from fs/jbd2/transaction.c:19:0: fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’: ./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized] #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0) ^ fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here struct journal_head *jh; ^ In file included from fs/jbd2/transaction.c:19:0: fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’: ./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized] #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0) ^ fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here struct journal_head *jh; ^ Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 zhangyi (F) 提交于
commit 904cdbd41d749a476863a0ca41f6f396774f26e4 upstream. Now, we capture a data corruption problem on ext4 while we're truncating an extent index block. Imaging that if we are revoking a buffer which has been journaled by the committing transaction, the buffer's jbddirty flag will not be cleared in jbd2_journal_forget(), so the commit code will set the buffer dirty flag again after refile the buffer. fsx kjournald2 jbd2_journal_commit_transaction jbd2_journal_revoke commit phase 1~5... jbd2_journal_forget belongs to older transaction commit phase 6 jbddirty not clear __jbd2_journal_refile_buffer __jbd2_journal_unfile_buffer test_clear_buffer_jbddirty mark_buffer_dirty Finally, if the freed extent index block was allocated again as data block by some other files, it may corrupt the file data after writing cached pages later, such as during unmount time. (In general, clean_bdev_aliases() related helpers should be invoked after re-allocation to prevent the above corruption, but unfortunately we missed it when zeroout the head of extra extent blocks in ext4_ext_handle_unwritten_extents()). This patch mark buffer as freed and set j_next_transaction to the new transaction when it already belongs to the committing transaction in jbd2_journal_forget(), so that commit code knows it should clear dirty bits when it is done with the buffer. This problem can be reproduced by xfstests generic/455 easily with seeds (3246 3247 3248 3249). Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 14 11月, 2018 1 次提交
-
-
由 Jan Kara 提交于
commit ccd3c4373eacb044eb3832966299d13d2631f66f upstream. The code cleaning transaction's lists of checkpoint buffers has a bug where it increases bh refcount only after releasing journal->j_list_lock. Thus the following race is possible: CPU0 CPU1 jbd2_log_do_checkpoint() jbd2_journal_try_to_free_buffers() __journal_try_to_free_buffer(bh) ... while (transaction->t_checkpoint_io_list) ... if (buffer_locked(bh)) { <-- IO completes now, buffer gets unlocked --> spin_unlock(&journal->j_list_lock); spin_lock(&journal->j_list_lock); __jbd2_journal_remove_checkpoint(jh); spin_unlock(&journal->j_list_lock); try_to_free_buffers(page); get_bh(bh) <-- accesses freed bh Fix the problem by grabbing bh reference before unlocking journal->j_list_lock. Fixes: dc6e8d66 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()") Fixes: be1158cc ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()") Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com CC: stable@vger.kernel.org Reviewed-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 30 7月, 2018 1 次提交
-
-
由 Arnd Bergmann 提交于
jbd2 is one of the few callers of current_kernel_time64(), which is a wrapper around ktime_get_coarse_real_ts64(). This calls the latter directly for consistency with the rest of the kernel that is moving to the ktime_get_ family of time accessors. Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 17 6月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 13 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 21 5月, 2018 2 次提交
-
-
由 Wang Long 提交于
The kmem_cache_destroy() function already checks for null pointers, so we can remove the check at the call site. This patch also sets jbd2_handle_cache and jbd2_inode_cache to be NULL after freeing them in jbd2_journal_destroy_handle_cache(). Signed-off-by: NWang Long <wanglong19@meituan.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Wang Shilong 提交于
See following dmesg output with jbd2 debug enabled: ...(start_this_handle, 313): New handle 00000000c88d6ceb going live. ...(start_this_handle, 383): Handle 00000000c88d6ceb given 53 credits (total 53, free 32681) ...(do_get_write_access, 838): journal_head 0000000002856fc0, force_copy 0 ...(jbd2_journal_cancel_revoke, 421): journal_head 0000000002856fc0, cancelling revoke We have an extra line with every messages, this is a waste of buffer, we can fix it by removing "\n" in the caller or remove it in the __jbd2_debug(), i checked every jbd2_debug() passed '\n' explicitly. To avoid more lines, let's remove it inside __jbd2_debug(). Signed-off-by: NWang Shilong <wshilong@ddn.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 18 4月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
If ext4 tries to start a reserved handle via jbd2_journal_start_reserved(), and the journal has been aborted, this can result in a NULL pointer dereference. This is because the fields h_journal and h_transaction in the handle structure share the same memory, via a union, so jbd2_journal_start_reserved() will clear h_journal before calling start_this_handle(). If this function fails due to an aborted handle, h_journal will still be NULL, and the call to jbd2_journal_free_reserved() will pass a NULL journal to sub_reserve_credits(). This can be reproduced by running "kvm-xfstests -c dioread_nolock generic/475". Cc: stable@kernel.org # 3.11 Fixes: 8f7d89f3 ("jbd2: transaction reservation support") Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 20 2月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
This updates the jbd2 superblock unnecessarily, and on an abort we shouldn't truncate the log. Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 19 2月, 2018 2 次提交
-
-
由 Theodore Ts'o 提交于
Previously the jbd2 layer assumed that a file system check would be required after a journal abort. In the case of the deliberate file system shutdown, this should not be necessary. Allow the jbd2 layer to distinguish between these two cases by using the ESHUTDOWN errno. Also add proper locking to __journal_abort_soft(). Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
There were two error messages emitted by jbd2, one for a bad checksum for a jbd2 descriptor block, and one for a bad checksum for a jbd2 data block. Change the data block checksum error so that the two can be disambiguated. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 10 1月, 2018 1 次提交
-
-
由 Tobin C. Harding 提交于
Sphinx emits various (26) warnings when building make target 'htmldocs'. Currently struct definitions contain duplicate documentation, some as kernel-docs and some as standard c89 comments. We can reduce duplication while cleaning up the kernel docs. Move all kernel-docs to right above each struct member. Use the set of all existing comments (kernel-doc and c89). Add documentation for missing struct members and function arguments. Signed-off-by: NTobin C. Harding <me@tobin.cc> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 18 12月, 2017 1 次提交
-
-
由 Theodore Ts'o 提交于
A number of ext4 source files were skipped due because their copyright permission statements didn't match the expected text used by the automated conversion utilities. I've added SPDX tags for the rest. While looking at some of these files, I've noticed that we have quite a bit of variation on the licenses that were used --- in particular some of the Red Hat licenses on the jbd2 files use a GPL2+ license, and we have some files that have a LGPL-2.1 license (which was quite surprising). I've not attempted to do any license changes. Even if it is perfectly legal to relicense to GPL 2.0-only for consistency's sake, that should be done with ext4 developer community discussion. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 03 11月, 2017 1 次提交
-
-
由 Jan Kara 提交于
We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to prepare blocks for writing and the inode has some uncommitted metadata changes. In the fault handler ext4_dax_fault() we then detect this case (through VM_FAULT_NEEDDSYNC return value) and call helper dax_finish_sync_fault() to flush metadata changes and insert page table entry. Note that this will also dirty corresponding radix tree entry which is what we want - fsync(2) will still provide data integrity guarantees for applications not using userspace flushing. And applications using userspace flushing can avoid calling fsync(2) and thus avoid the performance overhead. Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 19 10月, 2017 1 次提交
-
-
由 Kees Cook 提交于
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 06 7月, 2017 1 次提交
-
-
由 Jeff Layton 提交于
Resetting this flag is almost certainly racy, and will be problematic with some coming changes. Make filemap_fdatawait_keep_errors return int, but not clear the flag(s). Have jbd2 call it instead of filemap_fdatawait and don't attempt to re-set the error flag if it fails. Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NJeff Layton <jlayton@redhat.com>
-
- 20 6月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly name it as a wait-queue entry. Propagate it to a couple of usage sites where the wait-bit-queue internals are exposed. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 5月, 2017 1 次提交
-
-
由 Tahsin Erdogan 提交于
When a transaction starts, start_this_handle() saves current PF_MEMALLOC_NOFS value so that it can be restored at journal stop time. Journal restart is a special case that calls start_this_handle() without stopping the transaction. start_this_handle() isn't aware that the original value is already stored so it overwrites it with current value. For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set at the end: jbd2_journal_start() jbd2__journal_restart() jbd2_journal_stop() Make jbd2__journal_restart() restore the original value before calling start_this_handle(). Fixes: 81378da6 ("jbd2: mark the transaction context with the scope GFP_NOFS context") Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 16 5月, 2017 2 次提交
-
-
由 Mauro Carvalho Chehab 提交于
kernel-doc will try to interpret a foo() string, except if properly escaped. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
-
由 Mauro Carvalho Chehab 提交于
kernel-doc script expects that a function documentation to be just before the function, otherwise it will be ignored. So, move the kernel-doc markup to the right place. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
-
- 04 5月, 2017 3 次提交
-
-
由 Jan Kara 提交于
Currently jbd2_write_superblock() silently adds REQ_SYNC to flags with which journal superblock is written. Make this explicit by making flags passed down to jbd2_write_superblock() contain REQ_SYNC. CC: linux-ext4@vger.kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michal Hocko 提交于
kjournald2 is central to the transaction commit processing. As such any potential allocation from this kernel thread has to be GFP_NOFS. Make sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170306131408.9828-8-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Suggested-by: NJan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
now that we have memalloc_nofs_{save,restore} api we can mark the whole transaction context as implicitly GFP_NOFS. All allocations will automatically inherit GFP_NOFS this way. This means that we do not have to mark any of those requests with GFP_NOFS and moreover all the ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded GFP_KERNEL allocations deep inside the vmalloc will be NOFS now. [akpm@linux-foundation.org: tweak comments] Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 4月, 2017 2 次提交
-
-
由 Jan Kara 提交于
Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the filesystem is mounted with nobarrier mount option, journal superblock writes ended up being async writes after this patch and that caused heavy performance regression for dbench4 benchmark with high number of processes. In my test setup with HP RAID array with non-volatile write cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%. Fix the problem by making sure journal superblock writes are always treated as synchronous since they generally block progress of the journalling machinery and thus the whole filesystem. Fixes: b685d3d6 CC: stable@vger.kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
I've hit a lockdep splat with generic/270 test complaining that: 3216.fsstress.b/3533 is trying to acquire lock: (jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150 but task is already holding lock: (jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850 The underlying problem is that jbd2_journal_force_commit_nested() (called from ext4_should_retry_alloc()) may get called while a transaction handle is started. In such case it takes care to not wait for commit of the running transaction (which would deadlock) but only for a commit of a transaction that is already committing (which is safe as that doesn't wait for any filesystem locks). In fact there are also other callers of jbd2_log_wait_commit() that take care to pass tid of a transaction that is already committing and for those cases, the lockdep instrumentation is too restrictive and leading to false positive reports. Fix the problem by calling jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the transaction isn't already committing. Fixes: 1eaa566dSigned-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 19 4月, 2017 1 次提交
-
-
由 Paul E. McKenney 提交于
A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: NDavid Rientjes <rientjes@google.com>
-