提交 · bda5239738fac6ad217a820f5d76b1af896fbb92 · openeuler / Kernel

09 5月, 2019 1 次提交

f2fs: remove new blank line of f2fs kernel message · bda52397

由 Chao Yu 提交于 4月 15, 2019

Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

bda52397

17 4月, 2019 1 次提交

f2fs: add tracepoint for f2fs_file_write_iter() · 126ce721

由 Chao Yu 提交于 4月 02, 2019

This patch adds tracepoint for f2fs_file_write_iter().
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

126ce721

06 4月, 2019 1 次提交

f2fs: Fix use of number of devices · 0916878d

由 Damien Le Moal 提交于 3月 16, 2019

For a single device mount using a zoned block device, the zone
information for the device is stored in the sbi->devs single entry
array and sbi->s_ndevs is set to 1. This differs from a single device
mount using a regular block device which does not allocate sbi->devs
and sets sbi->s_ndevs to 0.

However, sbi->s_devs == 0 condition is used throughout the code to
differentiate a single device mount from a multi-device mount where
sbi->s_ndevs is always larger than 1. This results in problems with
single zoned block device volumes as these are treated as multi-device
mounts but do not have the start_blk and end_blk information set. One
of the problem observed is skipping of zone discard issuing resulting in
write commands being issued to full zones or unaligned to a zone write
pointer.

Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
regular block device mount) and sbi->s_ndevs == 1 (single zoned block
device mount) in the same manner. This is done by introducing the
helper function f2fs_is_multi_device() and using this helper in place
of direct tests of sbi->s_ndevs value, improving code readability.

Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
Cc: <stable@vger.kernel.org>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0916878d

15 3月, 2019 1 次提交

f2fs: set pin_file under CAP_SYS_ADMIN · aff7b628

由 Jaegeuk Kim 提交于 3月 13, 2019

Android uses pin_file for uncrypt during OTA, and that should be managed by
CAP_SYS_ADMIN only.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

aff7b628

13 3月, 2019 2 次提交

f2fs: trace f2fs_ioc_shutdown · 559e87c4

由 Chao Yu 提交于 2月 26, 2019

This patch supports to trace f2fs_ioc_shutdown.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

559e87c4

f2fs: fix to dirty inode for i_mode recovery · ca597bdd

由 Chao Yu 提交于 2月 23, 2019

As Seulbae Kim reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=202637

We didn't recover permission field correctly after sudden power-cut,
the reason is in setattr we didn't add inode into global dirty list
once i_mode is changed, so latter checkpoint triggered by fsync will
not flush last i_mode into disk, result in this problem, fix it.
Reported-by: NSeulbae Kim <seulbae@gatech.edu>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ca597bdd

06 3月, 2019 1 次提交

f2fs: fix potential data inconsistence of checkpoint · c42d28ce

由 Chao Yu 提交于 2月 02, 2019

Previously, we changed lock from cp_rwsem to node_change, it solved
the deadlock issue which was caused by below race condition:

Thread A			Thread B
- f2fs_setattr
 - f2fs_lock_op  -- read_lock
 - dquot_transfer
  - __dquot_transfer
   - dquot_acquire
    - commit_dqblk
     - f2fs_quota_write
      - f2fs_write_begin
       - f2fs_write_failed
				- write_checkpoint
				 - block_operations
				  - f2fs_lock_all  -- write_lock
        - f2fs_truncate_blocks
         - f2fs_lock_op  -- read_lock

But it breaks the sematics of cp_rwsem, in other callers like:
- f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
- f2fs_direct_IO -> f2fs_write_failed

We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
bitmap update, which can cause further data corruption.

So this patch reverts previous fix implementation, and try to fix
deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
only for quota file, and keep the preallocated data/node in the tail of
quota file, we can expecte that the preallocated space can be used to
store quota info latter soon.

Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
Signed-off-by: NSheng Yong <shengyong1@huawei.com>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c42d28ce

16 2月, 2019 1 次提交

f2fs: add quick mode of checkpoint=disable for QA · db610a64

由 Jaegeuk Kim 提交于 1月 24, 2019

This mode returns mount() quickly with EAGAIN. We can trigger this by
shutdown(F2FS_GOING_DOWN_NEED_FSCK).
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

db610a64

24 1月, 2019 1 次提交

f2fs: use IS_ENCRYPTED() to check encryption status · 62230e0d

由 Chandan Rajendra 提交于 12月 12, 2018

This commit removes the f2fs specific f2fs_encrypted_inode() and makes
use of the generic IS_ENCRYPTED() macro to check for the encryption
status of an inode.
Acked-by: NChao Yu <yuchao0@huawei.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>

62230e0d

09 1月, 2019 2 次提交

f2fs: export FS_NOCOW_FL flag to user · 5d539245

由 Jaegeuk Kim 提交于 1月 03, 2019

This exports pin_file status to user.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5d539245

f2fs: wait on atomic writes to count F2FS_CP_WB_DATA · 31867b23

由 Jaegeuk Kim 提交于 12月 28, 2018

Otherwise, we can get wrong counts incurring checkpoint hang.

IO_W (CP:  -24, Data:   24, Flush: (   0    0    1), Discard: (   0    0))

Thread A                        Thread B
- f2fs_write_data_pages
 -  __write_data_page
  - f2fs_submit_page_write
   - inc_page_count(F2FS_WB_DATA)
     type is F2FS_WB_DATA due to file is non-atomic one
- f2fs_ioc_start_atomic_write
 - set_inode_flag(FI_ATOMIC_FILE)
                                - f2fs_write_end_io
                                 - dec_page_count(F2FS_WB_CP_DATA)
                                   type is F2FS_WB_DATA due to file becomes
                                   atomic one

Cc: <stable@vger.kernel.org>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

31867b23

27 12月, 2018 2 次提交

f2fs: check PageWriteback flag for ordered case · bae0ee7a

由 Chao Yu 提交于 12月 25, 2018

For all ordered cases in f2fs_wait_on_page_writeback(), we need to
check PageWriteback status, so let's clean up to relocate the check
into f2fs_wait_on_page_writeback().
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

bae0ee7a

f2fs: fix to dirty inode synchronously · b32e0190

由 Chao Yu 提交于 12月 18, 2018

If user change inode's i_flags via ioctl, let's add it into global
dirty list, so that checkpoint can guarantee its persistence before
fsync, it can make checkpoint keeping strong consistency.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b32e0190

14 12月, 2018 1 次提交

f2fs: add an ioctl() to explicitly trigger fsck later · 0cd6d9b0

由 Jaegeuk Kim 提交于 11月 28, 2018

This adds an option in ioctl(F2FS_IOC_SHUTDOWN) in order to trigger fsck by
setting a NEED_FSCK flag.

Generally, shutdown is used for the test to validate filesystem consistency, and
setting NEED_FSCK flag can be used for Android to trigger fsck.f2fs at boot time
explicitly so that we could measure the elapsed time as well as force filesystem
check.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0cd6d9b0

27 11月, 2018 6 次提交

f2fs: fix m_may_create to make OPU DIO write correctly · f4f0b677

由 Jia Zhu 提交于 11月 20, 2018

Previously, we added a parameter @map.m_may_create to trigger OPU
allocation and call f2fs_balance_fs() correctly.

But in get_more_blocks(), @create has been overwritten by below code.
So the function f2fs_map_blocks() will not allocate new block address
but directly go out. Meanwile,there are several functions calling
f2fs_map_blocks() directly and @map.m_may_create not initialized.
CODE:
create = dio->op == REQ_OP_WRITE;
	if (dio->flags & DIO_SKIP_HOLES) {
		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
						i_blkbits))
			create = 0;
	}

This patch fixes it.
Signed-off-by: NJia Zhu <zhujia13@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f4f0b677

f2fs: fix out-place-update DIO write · f9d6d059

由 Chao Yu 提交于 11月 13, 2018

In get_more_blocks(), we may override @create as below code:

	create = dio->op == REQ_OP_WRITE;
	if (dio->flags & DIO_SKIP_HOLES) {
		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
						i_blkbits))
			create = 0;
	}

But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create
is 1, so in LFS mode, dio overwrite under LFS mode can easily run out
of free segments, result in below panic.

 Call Trace:
  allocate_segment_by_default+0xa8/0x270 [f2fs]
  f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs]
  __allocate_data_block+0x306/0x480 [f2fs]
  f2fs_map_blocks+0x6f6/0x920 [f2fs]
  __get_data_block+0x4f/0xb0 [f2fs]
  get_data_block_dio_write+0x50/0x60 [f2fs]
  do_blockdev_direct_IO+0xcd5/0x21e0
  __blockdev_direct_IO+0x3a/0x3c
  f2fs_direct_IO+0x1ff/0x4a0 [f2fs]
  generic_file_direct_write+0xd9/0x160
  __generic_file_write_iter+0xbb/0x1e0
  f2fs_file_write_iter+0xaf/0x220 [f2fs]
  __vfs_write+0xd0/0x130
  vfs_write+0xb2/0x1b0
  SyS_pwrite64+0x69/0xa0
  ? vtime_user_exit+0x29/0x70
  do_syscall_64+0x6e/0x160
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8

So this patch introduces a parameter map.m_may_create to indicate that
f2fs_map_blocks() is called from write or read path, which can give the
right hint to let f2fs_map_blocks() trigger OPU allocation and call
f2fs_balanc_fs() correctly.

BTW, it disables physical address preallocation for direct IO in
f2fs_preallocate_blocks, which is redundant to OPU allocation of
f2fs_map_blocks.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f9d6d059

f2fs: move dir data flush to write checkpoint process · b61ac5b7

由 Yunlei He 提交于 11月 06, 2018

This patch move dir data flush to write checkpoint process, by
doing this, it may reduce some time for dir fsync.

pre:
	-f2fs_do_sync_file enter
		-file_write_and_wait_range  <- flush & wait
		-write_checkpoint
			-do_checkpoint	    <- wait all
	-f2fs_do_sync_file exit

now:
	-f2fs_do_sync_file enter
		-write_checkpoint
			-block_operations   <- flush dir & no wait
			-do_checkpoint	    <- wait all
	-f2fs_do_sync_file exit
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b61ac5b7

f2fs: change segment to section in f2fs_ioc_gc_range · 67b0e42b

由 Yunlong Song 提交于 10月 30, 2018

f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves
blocks of section each time, so fix it from segment to section.
Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

67b0e42b

f2fs: clean up f2fs_sb_has_##feature_name · 7beb01f7

由 Chao Yu 提交于 10月 24, 2018

In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
access .raw_super field, to avoid unneeded pointer conversion, this
patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.

Just do cleanup, no logic change.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7beb01f7

f2fs: introduce __is_large_section() for cleanup · 2c70c5e3

由 Chao Yu 提交于 10月 24, 2018

Introduce a wrapper __is_large_section() to clean up codes.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2c70c5e3

23 10月, 2018 2 次提交

f2fs: fix to keep project quota consistent · 78130819

由 Chao Yu 提交于 9月 25, 2018

This patch does below changes to keep consistence of project quota data
in sudden power-cut case:
- update inode.i_projid and project quota atomically under lock_op() in
f2fs_ioc_setproject()
- recover inode.i_projid and project quota in recover_inode()
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

78130819

f2fs: guarantee journalled quota data by checkpoint · af033b2a

由 Chao Yu 提交于 9月 20, 2018

For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
    hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

af033b2a

17 10月, 2018 2 次提交

f2fs: do not update REQ_TIME in case of error conditions · c75f2feb

由 Sahitya Tummala 提交于 10月 05, 2018

The REQ_TIME should be updated only in case of success cases
as followed at all other places in the file system.
Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c75f2feb

f2fs: checkpoint disabling · 4354994f

由 Daniel Rosenberg 提交于 8月 20, 2018

Note that, it requires "f2fs: return correct errno in f2fs_gc".

This adds a lightweight non-persistent snapshotting scheme to f2fs.

To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.
Signed-off-by: NDaniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4354994f

01 10月, 2018 2 次提交

f2fs: allow out-place-update for direct IO in LFS mode · f847c699

由 Chao Yu 提交于 9月 27, 2018

Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't
allow triggering any in-place-update writes, so we fallback direct
write to buffered write, result in bad performance of large size
write.

This patch adds to support triggering out-place-update for direct IO
to enhance its performance.

Note that it needs to exclude direct read IO during direct write,
since new data writing to new block address will no be valid until
write finished.

storage: zram

time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 1073741824" -c "fsync"

Before:
real	0m13.061s
user	0m0.327s
sys	0m12.486s

After:
real	0m6.448s
user	0m0.228s
sys	0m6.212s
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f847c699

f2fs: refactor ->page_mkwrite() flow · 39a86958

由 Chao Yu 提交于 9月 27, 2018

Thread A				Thread B
- f2fs_vm_page_mkwrite
					- f2fs_setattr
					 - down_write(i_mmap_sem)
					 - truncate_setsize
					 - f2fs_truncate
					 - up_write(i_mmap_sem)
 - f2fs_reserve_block
 reserve NEW_ADDR
 - skip dirty page due to truncation

1. we don't need to rserve new block address for a truncated page.
2. dn.data_blkaddr is used out of node page lock coverage.

Refactor ->page_mkwrite() flow to fix above issues:
- use __do_map_lock() to avoid racing checkpoint()
- lock data page in prior to dnode page
- cover f2fs_reserve_block with i_mmap_sem lock
- wait page writeback before zeroing page
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

39a86958

13 9月, 2018 1 次提交

f2fs: add SPDX license identifiers · 7c1a000d

由 Chao Yu 提交于 9月 12, 2018

Remove the verbose license text from f2fs files and replace them with
SPDX tags.  This does not change the license of any of the code.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7c1a000d

12 9月, 2018 1 次提交

f2fs: fix setattr project check upon fssetxattr ioctl · c8e92757

由 Wang Shilong 提交于 9月 11, 2018

Currently, project quota could be changed by fssetxattr
ioctl, and existed permission check inode_owner_or_capable()
is obviously not enough, just think that common users could
change project id of file, that could make users to
break project quota easily.

This patch try to follow same regular of xfs project
quota:

"Project Quota ID state is only allowed to change from
within the init namespace. Enforce that restriction only
if we are trying to change the quota ID state.
Everything else is allowed in user namespaces."

Besides that, check and set project id'state should
be an atomic operation, protect whole operation with
inode lock.
Signed-off-by: NWang Shilong <wshilong@ddn.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c8e92757

06 9月, 2018 2 次提交

f2fs: avoid wrong decrypted data from disk · 0ded69f6

由 Jaegeuk Kim 提交于 8月 22, 2018

1. Create a file in an encrypted directory
2. Do GC & drop caches
3. Read stale data before its bio for metapage was not issued yet
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0ded69f6

f2fs: fix to avoid NULL pointer dereference on se->discard_map · 7d20c8ab

由 Chao Yu 提交于 9月 04, 2018

https://bugzilla.kernel.org/show_bug.cgi?id=200951

These is a NULL pointer dereference issue reported in bugzilla:

Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.

The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
 sda1: FAT  /boot
 sda2: F2FS /
 sda3: F2FS /home
 sda4: F2FS

The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
 mounting with "discard" option, but the device does not support discard

The USB host is USB3.0 and UASP capable. It is the one on RK3399.

Given this everything works fine, except there is no TRIM support.

In order to enable TRIM a new UDEV rule is added [1]:
 /etc/udev/rules.d/10-sata-bridge-trim.rules:
 ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
 Unable to handle kernel NULL pointer dereference

Also tested on a x86_64 system: works fine even with TRIM enabled.
 same disc
 same bridge
 different usb host controller
 different cpu architecture
 not root filesystem

Regards,
  Vicenç.

[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280

 Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
 Mem abort info:
   ESR = 0x96000004
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
 [000000000000003e] pgd=0000000000000000
 Internal error: Oops: 96000004 [#1] SMP
 Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
 CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
 Hardware name: Sapphire-RK3399 Board (DT)
 pstate: 00000005 (nzcv daif -PAN -UAO)
 pc : update_sit_entry+0x304/0x4b0
 lr : update_sit_entry+0x108/0x4b0
 sp : ffff00000ca13bd0
 x29: ffff00000ca13bd0 x28: 000000000000003e
 x27: 0000000000000020 x26: 0000000000080000
 x25: 0000000000000048 x24: ffff8000ebb85cf8
 x23: 0000000000000253 x22: 00000000ffffffff
 x21: 00000000000535f2 x20: 00000000ffffffdf
 x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
 x17: 0000000007ce6926 x16: 000000001c83ffa8
 x15: 0000000000000000 x14: ffff8000f602df90
 x13: 0000000000000006 x12: 0000000000000040
 x11: 0000000000000228 x10: 0000000000000000
 x9 : 0000000000000000 x8 : 0000000000000000
 x7 : 00000000000535f2 x6 : ffff8000ebff3440
 x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
 x3 : 00000000ffffffff x2 : 0000000000000020
 x1 : 0000000000000000 x0 : ffff8000eb9e5800
 Process nvim (pid: 957, stack limit = 0x0000000063a78320)
 Call trace:
  update_sit_entry+0x304/0x4b0
  f2fs_invalidate_blocks+0x98/0x140
  truncate_node+0x90/0x400
  f2fs_remove_inode_page+0xe8/0x340
  f2fs_evict_inode+0x2b0/0x408
  evict+0xe0/0x1e0
  iput+0x160/0x260
  do_unlinkat+0x214/0x298
  __arm64_sys_unlinkat+0x3c/0x68
  el0_svc_handler+0x94/0x118
  el0_svc+0x8/0xc
 Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
 ---[ end trace a0f21a307118c477 ]---

The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.

So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Tested-by: NVicente Bergas <vicencb@gmail.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7d20c8ab

21 8月, 2018 1 次提交

f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc · 6f8d4455

由 Jaegeuk Kim 提交于 7月 25, 2018

The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of
fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop.

If it hits the miximum retrials in GC, let's give a chance to release
gc_mutex for a short time in order not to go into live lock in the worst
case.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6f8d4455

15 8月, 2018 1 次提交

f2fs: rework fault injection handling to avoid a warning · 7fa750a1

由 Arnd Bergmann 提交于 8月 13, 2018

When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
unused label:

fs/f2fs/segment.c: In function '__submit_discard_cmd':
fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]

This could be fixed by adding another #ifdef around it, but the more
reliable way of doing this seems to be to remove the other #ifdefs
where that is easily possible.

By defining time_to_inject() as a trivial stub, most of the checks for
CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
formatting of the code.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7fa750a1

14 8月, 2018 2 次提交

f2fs: fix avoid race between truncate and background GC · a33c1502

由 Chao Yu 提交于 8月 05, 2018

Thread A				Background GC
- f2fs_setattr isize to 0
 - truncate_setsize
					- gc_data_segment
					 - f2fs_get_read_data_page page #0
					  - set_page_dirty
					  - set_cold_data
 - f2fs_truncate

- f2fs_setattr isize to 4k
- read 4k <--- hit data in cached page #0

Above race condition can cause read out invalid data in a truncated
page, fix it by i_gc_rwsem[WRITE] lock.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a33c1502

f2fs: avoid race between zero_range and background GC · c7079853

由 Chao Yu 提交于 8月 05, 2018

Thread A				Background GC
- f2fs_zero_range
 - truncate_pagecache_range
					- gc_data_segment
					 - get_read_data_page
					  - move_data_page
					   - set_page_dirty
					   - set_cold_data
 - f2fs_do_zero_range
  - dn->data_blkaddr = NEW_ADDR;
  - f2fs_set_data_blkaddr

Actually, we don't need to set dirty & checked flag on the page, since
all valid data in the page should be zeroed by zero_range().

Use i_gc_rwsem[WRITE] to avoid such race condition.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c7079853

11 8月, 2018 2 次提交

f2fs: fix to reset i_gc_failures correctly · 30933364

由 Chao Yu 提交于 7月 28, 2018

Let's reset i_gc_failures to zero when we unset pinned state for file.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

30933364

f2fs: fix to avoid broken of dnode block list · 50fa53ec

由 Chao Yu 提交于 8月 02, 2018

f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.

By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.

Sheng Yong helps to do the test with this patch:

Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8

1 / PERSIST / Index

Base:
	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
1	867.82		204.15		41440.03	41370.54	680.8		1025.94		1031.08
2	871.87		205.87		41370.3		40275.2		791.14		1065.84		1101.7
3	866.52		205.69		41795.67	40596.16	694.69		1037.16		1031.48
Avg	868.7366667	205.2366667	41535.33333	40747.3		722.21		1042.98		1054.753333

After:
	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
1	798.81		202.5		41143		40613.87	602.71		838.08		913.83
2	805.79		206.47		40297.2		41291.46	604.44		840.75		924.27
3	814.83		206.17		41209.57	40453.62	602.85		834.66		927.91
Avg	806.4766667	205.0466667	40883.25667	40786.31667	603.3333333	837.83		922.0033333

Patched/Original:
	0.928332713	0.999074239	0.984300676	1.000957528	0.835398753	0.803303994	0.874141189

It looks like atomic write will suffer performance regression.

I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.

BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:

- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
 - writeback data: P1, P2, P3;
 - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
 - preflush + fua;
- power-cut

If we don't wait dnode writeback for atomic_write:

	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
1	779.91		206.03		41621.5		40333.16	716.9		1038.21		1034.85
2	848.51		204.35		40082.44	39486.17	791.83		1119.96		1083.77
3	772.12		206.27		41335.25	41599.65	723.29		1055.07		971.92
Avg	800.18		205.55		41013.06333	40472.99333	744.0066667	1071.08		1030.18

Patched/Original:
	0.92108464	1.001526693	0.987425886	0.993268102	1.030180511	1.026942031	0.976702294

SQLite's performance recovers.

Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."

Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

50fa53ec

02 8月, 2018 4 次提交

f2fs: don't allow any writes on aborted atomic writes · 455e3a58

由 Jaegeuk Kim 提交于 7月 27, 2018

In order to prevent abusing atomic writes by abnormal users, we've added a
threshold, 20% over memory footprint, which disallows further atomic writes.
Previously, however, SQLite doesn't know the files became normal, so that
it could write stale data and commit on revoked normal database file.

Once f2fs detects such the abnormal behavior, this patch tries to avoid further
writes in write_begin().
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

455e3a58

f2fs: clean up ioctl interface naming · 059c0648

由 Chao Yu 提交于 7月 17, 2018

Romve redundant prefix 'f2fs_' in the middle of f2fs_ioc_f2fs_write_checkpoint().
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

059c0648

f2fs: clean up with f2fs_encrypted_inode() · 5b72d5e0

由 Chao Yu 提交于 7月 17, 2018

Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5b72d5e0

f2fs: fix to propagate error from __get_meta_page() · 7735730d

由 Chao Yu 提交于 7月 17, 2018

If caller of __get_meta_page() can handle error, let's propagate error
from __get_meta_page().
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7735730d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功