提交 · bad97817dece759dd6c0b24f862b7d0ed588edda · OpenHarmony / kernel_linux

01 10月, 2013 1 次提交

nilfs2: fix issue with race condition of competition between segments for dirty blocks · 7f42ec39

由 Vyacheslav Dubeyko 提交于 9月 30, 2013

Many NILFS2 users were reported about strange file system corruption
(for example):

   NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
   NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

But such error messages are consequence of file system's issue that takes
place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
and Anton Eliasson <devel@antoneliasson.se> were reported about another
issue not so recently.  These reports describe the issue with segctor
thread's crash:

  BUG: unable to handle kernel paging request at 0000000000004c83
  IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

  Call Trace:
   nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
   nilfs_segctor_construct+0x17b/0x290 [nilfs2]
   nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
   kthread+0xc0/0xd0
   ret_from_fork+0x7c/0xb0

These two issues have one reason.  This reason can raise third issue
too.  Third issue results in hanging of segctor thread with eating of
100% CPU.

REPRODUCING PATH:

One of the possible way or the issue reproducing was described by
Jermoe me Poulin <jeromepoulin@gmail.com>:

1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e > lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} > /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux > ps-aux-crashed.log
13. sysrq+W
14. sysrq+E  wait for everything to terminate
15. sysrq+SUSB

Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.

REPRODUCIBILITY:

The issue is reproduced not stable [60% - 80%].  It is very important to
have proper environment for the issue reproducing.  The critical
conditions for successful reproducing:

(1) It should have big modified file by mmap() way.

(2) This file should have the count of dirty blocks are greater that
    several segments in size (for example, two or three) from time to time
    during processing.

(3) It should be intensive background activity of files modification
    in another thread.

INVESTIGATION:

First of all, it is possible to see that the reason of crash is not valid
page address:

  NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
  NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783

Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
number.  And b_blocknr with b_size values look like block numbers.  So,
buffer_head's pointer points on not proper address value.

Detailed investigation of the issue is discovered such picture:

  [-----------------------------SEGMENT 6783-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783

  [-----------------------------SEGMENT 6784-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
  NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
  NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
  NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
  [----------] ditto
  NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15

  [-----------------------------SEGMENT 6785-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
  NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
  NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
  [----------] ditto
  NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12

  NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
  NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
  NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
  NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785

  NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82

  BUG: unable to handle kernel paging request at 0000000000001a82
  IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]

Usually, for every segment we collect dirty files in list.  Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer.  Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.

It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase.  Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:

  [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
  [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8

The next pointer is the same but prev pointer has changed.  It means
that buffer_head has next pointer from one list but prev pointer from
another.  Such modification can be made several times.  And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.

FIX:
This patch adds:

(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
    for every proccessed dirty block;

(2) checking of BH_Async_Write flag in
    nilfs_lookup_dirty_data_buffers() and
    nilfs_lookup_dirty_node_buffers();

(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
    nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
Reported-by: NJerome Poulin <jeromepoulin@gmail.com>
Reported-by: NAnton Eliasson <devel@antoneliasson.se>
Cc: Paul Fertser <fercerpav@gmail.com>
Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
Cc: Elmer Zhang <freeboy6716@gmail.com>
Cc: Kenneth Langga <klangga@gmail.com>
Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f42ec39

04 7月, 2013 1 次提交

] nilfs2: use atomic64_t type for inodes_count and blocks_count fields in nilfs_root struct · e5f7f848

由 Vyacheslav Dubeyko 提交于 7月 03, 2013

The cp_inodes_count and cp_blocks_count are represented as __le64 type in
on-disk structure (struct nilfs_checkpoint).  But analogous fields in
in-core structure (struct nilfs_root) are represented by atomic_t type.

This patch replaces atomic_t on atomic64_t type in representation of
inodes_count and blocks_count fields in struct nilfs_root.
Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: NJoern Engel <joern@logfs.org>
Cc: Clemens Eisserer <linuxhippy@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5f7f848

31 7月, 2012 1 次提交

nilfs2: Convert to new freezing mechanism · 2c22b337

由 Jan Kara 提交于 6月 12, 2012

We change nilfs_page_mkwrite() to provide proper freeze protection for
writeable page faults (we must wait for frozen filesystem even if the
page is fully mapped).

We remove all vfs_check_frozen() checks since they are now handled by
the generic code.

CC: linux-nilfs@vger.kernel.org
CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c22b337

21 6月, 2012 1 次提交

nilfs2: ensure proper cache clearing for gc-inodes · fbb24a3a

由 Ryusuke Konishi 提交于 6月 20, 2012

A gc-inode is a pseudo inode used to buffer the blocks to be moved by
garbage collection.

Block caches of gc-inodes must be cleared every time a garbage collection
function (nilfs_clean_segments) completes.  Otherwise, stale blocks
buffered in the caches may be wrongly reused in successive calls of the GC
function.

For user files, this is not a problem because their gc-inodes are
distinguished by a checkpoint number as well as an inode number.  They
never buffer different blocks if either an inode number, a checkpoint
number, or a block offset differs.

However, gc-inodes of sufile, cpfile and DAT file can store different data
for the same block offset.  Thus, the nilfs_clean_segments function can
move incorrect block for these meta-data files if an old block is cached.
I found this is really causing meta-data corruption in nilfs.

This fixes the issue by ensuring cache clear of gc-inodes and resolves
reported GC problems including checkpoint file corruption, b-tree
corruption, and the following warning during GC.

  nilfs_palloc_freev: entry number 307234 already freed.
  ...
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>	[2.6.37+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fbb24a3a

22 11月, 2011 1 次提交

freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e

由 Tejun Heo 提交于 11月 21, 2011

There is no reason to export two functions for entering the
refrigerator.  Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
  indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
  __refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>

a0acae0e

11 6月, 2011 1 次提交

nilfs2: fix problem in setting checkpoint interval · 071d73cf

由 Ryusuke Konishi 提交于 6月 10, 2011

Checkpoint generation interval of nilfs goes wrong after user has
changed the interval parameter with nilfs-tune tool.

 segctord starting. Construction interval = 5 seconds,
 CP frequency < 30 seconds
 segctord starting. Construction interval = 0 seconds,
 CP frequency < 30 seconds

This turned out to be caused by a trivial bug in initialization code
of log writer.  This will fix it.
Reported-by: NAndrea Gelmini <andrea.gelmini@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

071d73cf

10 5月, 2011 7 次提交

nilfs2: use mark_buffer_dirty to mark btnode or meta data dirty · 5fc7b141

由 Ryusuke Konishi 提交于 5月 05, 2011

This replaces nilfs_mdt_mark_buffer_dirty and nilfs_btnode_mark_dirty
macros with mark_buffer_dirty and gets rid of nilfs_mark_buffer_dirty,
an own mark buffer dirty function.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

5fc7b141

nilfs2: always set back pointer to host inode in mapping->host · aa405b1f

由 Ryusuke Konishi 提交于 5月 05, 2011

In the current nilfs, page cache for btree nodes and meta data files
do not set a valid back pointer to the host inode in mapping->host.

This will change it so that every address space in nilfs uses
mapping->host to hold its host inode.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

aa405b1f

nilfs2: use list_first_entry · 0cc12838

由 Ryusuke Konishi 提交于 5月 05, 2011

This uses list_first_entry macro instead of list_entry if it's used to
get the first entry.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

0cc12838

nilfs2: zero fill unused portion of super root block · 56eb5538

由 Ryusuke Konishi 提交于 4月 30, 2011

The super root block is newly-allocated each time it is written back
to disk, so unused portion of the block should be cleared.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

56eb5538

nilfs2: super root size should change depending on inode size · 6c6de1aa

由 Ryusuke Konishi 提交于 4月 30, 2011

The size of super root structure depends on inode size, so
NILFS_SR_BYTES macro should be a function of the inode size.  This
fixes the issue.

Even though a different size value will be written for a possible
future filesystem with extended inode, but fortunately this does not
break disk format compatibility.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

6c6de1aa

nilfs2: get rid of private page allocator · 1cb2d38c

由 Ryusuke Konishi 提交于 4月 04, 2011

Previously, nilfs was cloning pages for mmapped region to freeze their
data and ensure consistency of checksum during writeback cycles.  A
private page allocator was used for this page cloning.  But, we no
longer need to do that since clear_page_dirty_for_io function sets up
pte so that vm_ops->page_mkwrite function is called right before the
mmapped pages are modified and nilfs_page_mkwrite function can safely
wait for the pages to be written back to disk.

So, this stops making a copy of mmapped pages during writeback, and
eliminates the private page allocation and deallocation functions from
nilfs.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

1cb2d38c

nilfs2: merge list_del()/list_add_tail() to list_move_tail() · eaae0f37

由 Nicolas Kaiser 提交于 3月 19, 2011

Merge list_del() + list_add_tail() to list_move_tail().
Signed-off-by: NNicolas Kaiser <nikai@nikai.net>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

eaae0f37

09 3月, 2011 7 次提交

nilfs2: get rid of nilfs_sb_info structure · e3154e97

由 Ryusuke Konishi 提交于 3月 09, 2011

This directly uses sb->s_fs_info to keep a nilfs filesystem object and
fully removes the intermediate nilfs_sb_info structure.  With this
change, the hierarchy of on-memory structures of nilfs will be
simplified as follows:

Before:
  super_block
       -> nilfs_sb_info
             -> the_nilfs
                   -> cptree --+-> nilfs_root (current file system)
                               +-> nilfs_root (snapshot A)
                               +-> nilfs_root (snapshot B)
                               :
             -> nilfs_sc_info (log writer structure)
After:
  super_block
       -> the_nilfs
             -> cptree --+-> nilfs_root (current file system)
                         +-> nilfs_root (snapshot A)
                         +-> nilfs_root (snapshot B)
                         :
             -> nilfs_sc_info (log writer structure)

The reason why we didn't design so from the beginning is because the
initial shape also differed from the above.  The early hierachy was
composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a
shared nilfs object.  On the kernel 2.6.37, it was changed to the
current shape in order to unify super block instances into one per
device, and this cleanup became applicable as the result.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

e3154e97

nilfs2: use sb instance instead of nilfs_sb_info struct · f7545144

由 Ryusuke Konishi 提交于 3月 09, 2011

This replaces sbi uses with direct reference to sb instance.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

f7545144

nilfs2: get rid of sc_sbi back pointer · d96bbfa2

由 Ryusuke Konishi 提交于 3月 09, 2011

Removes sci->sc_sbi which is a back pointer to nilfs_sb_info struct
from log writer object (nilfs_sc_info).
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

d96bbfa2

nilfs2: move log writer onto nilfs object · 3fd3fe5a

由 Ryusuke Konishi 提交于 3月 09, 2011

Log writer is held by the nilfs_sb_info structure. This moves it into
nilfs object and replaces all uses of NILFS_SC() accessor.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

3fd3fe5a

nilfs2: move s_inode_lock and s_dirty_files into nilfs object · 693dd321

由 Ryusuke Konishi 提交于 3月 09, 2011

Moves s_inode_lock spinlock and s_dirty_files list to nilfs object
from nilfs_sb_info structure.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

693dd321

nilfs2: move parameters on nilfs_sb_info into nilfs object · 574e6c31

由 Ryusuke Konishi 提交于 3月 09, 2011

This moves four parameter variables on nilfs_sb_info s_resuid,
s_resgid, s_interval and s_watermark to the nilfs object.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

574e6c31

nilfs2: move mount options to nilfs object · 3b2ce58b

由 Ryusuke Konishi 提交于 3月 09, 2011

This moves mount_opt local variable to nilfs object from nilfs_sb_info
struct.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

3b2ce58b

02 3月, 2011 1 次提交

nilfs2: fix regression that i-flag is not set on changeless checkpoints · 72746ac6

由 Ryusuke Konishi 提交于 2月 28, 2011

According to the report from Jiro SEKIBA titled "regression in
2.6.37?"  (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
later kernels, lscp command no longer displays "i" flag on checkpoints
that snapshot operations or garbage collection created.

This is a regression of nilfs2 checkpointing function, and it's
critical since it broke behavior of a part of nilfs2 applications.
For instance, snapshot manager of TimeBrowse gets to create
meaningless snapshots continuously; snapshot creation triggers another
checkpoint, but applications cannot distinguish whether the new
checkpoint contains meaningful changes or not without the i-flag.

This patch fixes the regression and brings that application behavior
back to normal.
Reported-by: NJiro SEKIBA <jir@unicus.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NJiro SEKIBA <jir@unicus.jp>
Cc: stable <stable@kernel.org>  [2.6.37]

72746ac6

10 1月, 2011 3 次提交

nilfs2: unfold nilfs_dat_inode function · 365e215c

由 Ryusuke Konishi 提交于 12月 27, 2010

nilfs_dat_inode function was a wrapper to switch between normal dat
inode and gcdat, a clone of the dat inode for garbage collection.

This function got obsolete when the gcdat inode was removed, and now
we can access the dat inode directly from a nilfs object. So, we will
unfold the wrapper and remove it.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

365e215c

nilfs2: mark buffer heads as delayed until the data is written to disk · 27e6c7a3

由 Ryusuke Konishi 提交于 12月 26, 2010

Nilfs does not allocate new blocks on disk until they are actually
written to. To implement fiemap, we need to deal with such blocks.

To allow successive fiemap patch to distinguish mapped but unallocated
regions, this marks buffer heads of those new blocks as delayed and
clears the flag after the blocks are written to disk.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

27e6c7a3

nilfs2: call nilfs_error inside bmap routines · e828949e

由 Ryusuke Konishi 提交于 11月 19, 2010

Some functions using nilfs bmap routines can wrongly return invalid
argument error (i.e. -EINVAL) that bmap returns as an internal code
for btree corruption.

This fixes the issue by catching and converting the internal EINVAL to
EIO and calling nilfs_error function inside bmap routines.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

e828949e

27 10月, 2010 1 次提交

mm: add account_page_writeback() · f629d1c9

由 Michael Rubin 提交于 10月 26, 2010

To help developers and applications gain visibility into writeback
behaviour this patch adds two counters to /proc/vmstat.

  # grep nr_dirtied /proc/vmstat
  nr_dirtied 3747
  # grep nr_written /proc/vmstat
  nr_written 3618

These entries allow user apps to understand writeback behaviour over time
and learn how it is impacting their performance.  Currently there is no
way to inspect dirty and writeback speed over time.  It's not possible for
nr_dirty/nr_writeback.

These entries are necessary to give visibility into writeback behaviour.
We have /proc/diskstats which lets us understand the io in the block
layer.  We have blktrace for more in depth understanding.  We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing.  There is no way to know how active it is over the whole system, if
it's falling behind or to quantify it's efforts.  With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data.  Comparing the rates of change between the two allow developers to
see when writeback is not able to keep up with incoming traffic and the
rate of dirty memory being sent to the IO back end.  This allows folks to
understand their io workloads and track kernel issues.  Non kernel
engineers at Google often use these counters to solve puzzling performance
problems.

Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written

Patch #5 add writeback thresholds to /proc/vmstat

Currently these values are in debugfs. But they should be promoted to
/proc since they are useful for developers who are writing databases
and file servers and are not debugging the kernel.

The output is as below:

 # grep threshold /proc/vmstat
 nr_pages_dirty_threshold 409111
 nr_pages_dirty_background_threshold 818223

This patch:

This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting.  Not using these
routines means that some code will lose track of the accounting and we get
bugs.

Modify nilfs2 to use interface.
Signed-off-by: NMichael Rubin <mrubin@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f629d1c9

23 10月, 2010 9 次提交

nilfs2: eliminate sparse warning - "context imbalance" · 6b81e14e

由 Jiro SEKIBA 提交于 10月 14, 2010

insert sparse annotations to fix following sparse warning.

fs/nilfs2/segment.c:2681:3: warning: context imbalance in 'nilfs_segctor_kill_thread' - unexpected unlock

nilfs_segctor_kill_thread is only called inside sc_state_lock lock.
sparse doesn't detect the context and warn "unexpected unlock".
__acquires/__releases pretend to lock/unlock the sc_state_lock for sparse.
Signed-off-by: NJiro SEKIBA <jir@unicus.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

6b81e14e

nilfs2: add bdev freeze/thaw support · 5beb6e0b

由 Ryusuke Konishi 提交于 9月 20, 2010

Nilfs hasn't supported the freeze/thaw feature because it didn't work
due to the peculiar design that multiple super block instances could
be allocated for a device. This limitation was removed by the patch
"nilfs2: do not allocate multiple super block instances for a device".

So now this adds the freeze/thaw support to nilfs.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

5beb6e0b

nilfs2: get rid of back pointer to writable sb instance · 090fd5b1

由 Ryusuke Konishi 提交于 9月 05, 2010

Nilfs object holds a back pointer to a writable super block instance
in nilfs->ns_writer, and this became eliminable since sb is now made
per device and all inodes have a valid pointer to it.

This deletes the ns_writer pointer and a reader/writer semaphore
protecting it.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

090fd5b1

nilfs2: get rid of GCDAT inode · c1c1d709

由 Ryusuke Konishi 提交于 8月 29, 2010

This applies prepared rollback function and redirect function of
metadata file to DAT file, and eliminates GCDAT inode.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

c1c1d709

nilfs2: add routines to redirect access to buffers of DAT file · b1f6a4f2

由 Ryusuke Konishi 提交于 8月 31, 2010

During garbage collection (GC), DAT file, which converts virtual block
number to real block number, may return disk block number that is not
yet written to the device.

To avoid access to unwritten blocks, the current implementation stores
changes to the caches of GCDAT during GC and atomically commit the
changes into the DAT file after they are written to the device.

This patch, instead, adds a function that makes a copy of specified
buffer and stores it in nilfs_shadow_map, and a function to get the
backup copy as needed (nilfs_mdt_freeze_buffer and
nilfs_mdt_get_frozen_buffer respectively).

Before DAT changes block number in an entry block, it makes a copy and
redirect access to the buffer so that address conversion function
(i.e. nilfs_dat_translate) refers to the old address saved in the
copy.

This patch gives requisites for such redirection.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

b1f6a4f2

nilfs2: move inode count and block count into root object · b7c06342

由 Ryusuke Konishi 提交于 8月 14, 2010

This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root
object.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

b7c06342

nilfs2: use root object to get ifile · e912a5b6

由 Ryusuke Konishi 提交于 8月 14, 2010

This rewrites functions using ifile so that they get ifile from
nilfs_root object, and will remove sbi->s_ifile. Some functions that
don't know the root object are extended to receive it from caller.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

e912a5b6

nilfs2: remove own inode hash used for GC · 263d90ce

由 Ryusuke Konishi 提交于 8月 20, 2010

This uses inode hash function that vfs provides instead of the own
hash table for caching gc inodes.  This finally removes the own inode
hash from nilfs.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

263d90ce

nilfs2: keep zero value in i_cno except for gc-inodes · 6c43f410

由 Ryusuke Konishi 提交于 8月 20, 2010

On-memory inode structures of nilfs have a member "i_cno" which stores
a checkpoint number related to the inode.  For gc-inodes, this field
indicates version of data each gc-inode caches for GC.  Log writer
temporarily uses "i_cno" to transfer the latest checkpoint number.

This stops the latter use and lets only gc-inodes use it.

The purpose of this patch is to allow the successive change use
"i_cno" for inode lookup.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

6c43f410

23 7月, 2010 4 次提交

nilfs2: do not update log cursor for small change · 32502047

由 Ryusuke Konishi 提交于 6月 29, 2010

Super blocks of nilfs are periodically overwritten in order to record
the recent log position. This shortens recovery time after unclean
unmount, but the current implementation performs the update even for a
few blocks of change. If the filesystem gets small changes slowly and
continually, super blocks may be updated excessively.

This moderates the issue by skipping update of log cursor if it does
not cross a segment boundary.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

32502047

nilfs2: sync super blocks in turns · b2ac86e1

由 Jiro SEKIBA 提交于 6月 28, 2010

This will sync super blocks in turns instead of syncing duplicate
super blocks at the time.  This will help searching valid super root
when super block is written into disk before log is written, which is
happen when barrier-less block devices are unmounted uncleanly.  In
the situation, old super block likely points to valid log.

This patch introduces ns_sbwcount member to the nilfs object and adds
nilfs_sb_will_flip() function; ns_sbwcount counts how many times super
blocks write back to the disk.  And, nilfs_sb_will_flip() decides
whether flipping required or not based on the count of ns_sbwcount to
sync super blocks asymmetrically.

The following functions are also changed:

 - nilfs_prepare_super(): flips super blocks according to the
   argument.  The argument is calculated by nilfs_sb_will_flip()
   function.

 - nilfs_cleanup_super(): sets "clean" flag to both super blocks if
   they point to the same checkpoint.

To update both of super block information, caller of
nilfs_commit_super must set the information on both super blocks.
Signed-off-by: NJiro SEKIBA <jir@unicus.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

b2ac86e1

nilfs2: introduce nilfs_prepare_super · d26493b6

由 Jiro SEKIBA 提交于 6月 28, 2010

This function checks validity of super block pointers.
If first super block is invalid, it will swap the super blocks.
The function should be called before any super block information updates.
Caller must obtain nilfs->ns_sem.
Signed-off-by: NJiro SEKIBA <jir@unicus.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

d26493b6

nilfs2: get rid of macros for segment summary information · 4762077c

由 Ryusuke Konishi 提交于 5月 23, 2010

This removes macros to test segment summary flags and redefines a few
relevant macros with inline functions.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

4762077c

10 5月, 2010 2 次提交

nilfs2: make nilfs_sc_*_ops static · 4e819509

由 Ryusuke Konishi 提交于 4月 23, 2010

This kills the following sparse warnings:

fs/nilfs2/segment.c:567:28: warning: symbol 'nilfs_sc_file_ops' was not declared. Should it be static?
fs/nilfs2/segment.c:617:28: warning: symbol 'nilfs_sc_dat_ops' was not declared. Should it be static?
fs/nilfs2/segment.c:625:28: warning: symbol 'nilfs_sc_dsync_ops' was not declared. Should it be static?
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

4e819509

nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info · fdce895e

由 Li Hong 提交于 4月 10, 2010

In nilfs_segctor_thread(), timer is a local variable allocated on stack. Its
address can't be set to sci->sc_timer and passed in several procedures.

It works now by chance, just because other procedures are called by
nilfs_segctor_thread() directly or indirectly and the stack hasn't been
deallocated yet.
Signed-off-by: NLi Hong <lihong.hi@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

fdce895e

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多