1. 14 9月, 2009 3 次提交
    • R
      nilfs2: use semaphore to protect pointer to a writable FS-instance · 027d6404
      Ryusuke Konishi 提交于
      will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to
      retain a writable FS-instance for a period.
      
      The pair functions were making up some kind of recursive lock with a
      mutex, but they became overkill since the commit
      201913ed.  Furthermore, they caused
      the following lockdep warning because the mutex can be released by a
      task which didn't lock it:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at:
       [<c1359ff5>] mutex_unlock+0x8/0xa
       but there are no more locks to release!
      
       other info that might help us debug this:
       no locks held by kswapd0/422.
      
       stack backtrace:
       Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51
       Call Trace:
        [<c1358f97>] ? printk+0xf/0x18
        [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7
        [<c11578de>] ? prop_put_global+0x3/0x35
        [<c1050195>] lock_release+0xed/0x1dc
        [<c1359ff5>] ? mutex_unlock+0x8/0xa
        [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119
        [<c1359ff5>] mutex_unlock+0x8/0xa
        [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2]
        [<c1092653>] shrink_page_list+0x379/0x68d
        [<c109171b>] ? isolate_pages_global+0xb4/0x18c
        [<c1092bd2>] shrink_list+0x26b/0x54b
        [<c10930be>] shrink_zone+0x20c/0x2a2
        [<c10936b7>] kswapd+0x407/0x591
        [<c1091667>] ? isolate_pages_global+0x0/0x18c
        [<c1040603>] ? autoremove_wake_function+0x0/0x33
        [<c10932b0>] ? kswapd+0x0/0x591
        [<c104033b>] kthread+0x69/0x6e
        [<c10402d2>] ? kthread+0x0/0x6e
        [<c1003e33>] kernel_thread_helper+0x7/0x1a
      
      This patch uses a reader/writer semaphore instead of the own lock and
      kills this warning.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      027d6404
    • H
      nilfs2: fix format string compile warning (ino_t) · b5696e5e
      Heiko Carstens 提交于
      Unlike on most other architectures ino_t is an unsigned int on s390.
      So add an explicit cast to avoid this compile warning:
      
      fs/nilfs2/recovery.c: In function 'recover_dsync_blocks':
      fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      b5696e5e
    • R
      nilfs2: fix ignored error code in __nilfs_read_inode() · 1b2f5a64
      Ryusuke Konishi 提交于
      The __nilfs_read_inode function is ignoring the error code returned
      from nilfs_read_inode_common(), and wrongly delivers a success code
      (zero) when it escapes from the function in erroneous cases.
      
      This adds the missing error handling.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      1b2f5a64
  2. 31 8月, 2009 1 次提交
    • R
      nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key · b1f1b8ce
      Ryusuke Konishi 提交于
      This will fix the following preempt count underflow reported from
      users with the title "[NILFS users] segctord problem" (Message-ID:
      <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID:
      <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>):
      
       WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0()
       Hardware name: HP Compaq 6530b (KR980UT#ABC)
       Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod
       Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7
       Call Trace:
        [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0
        [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0
        [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20
        [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0
        [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2]
        [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2]
        [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2]
        [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2]
        [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2]
        [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40
        [<ffffffff803c06e0>] ? __up_write+0xe0/0x150
        [<ffffffff80262959>] ? up_write+0x9/0x10
        [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2]
        [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2]
        [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2]
        [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2]
        [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2]
        [<ffffffff80252633>] ? add_timer+0x13/0x20
        [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90
        [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40
        [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
        [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
        [<ffffffff8025e556>] kthread+0x56/0x90
        [<ffffffff8020cdea>] child_rip+0xa/0x20
        [<ffffffff8025e500>] ? kthread+0x0/0x90
        [<ffffffff8020cde0>] ? child_rip+0x0/0x20
      
      This problem was caused due to a missing radix_tree_preload() call in
      the retry path of nilfs_btnode_prepare_change_key() function.
      Reported-by: NEric A <eric225125@yahoo.com>
      Reported-by: NJerome Poulin <jeromepoulin@gmail.com>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NJerome Poulin <jeromepoulin@gmail.com>
      Cc: stable@kernel.org
      b1f1b8ce
  3. 19 8月, 2009 1 次提交
    • R
      nilfs2: fix oopses with doubly mounted snapshots · a9245860
      Ryusuke Konishi 提交于
      will fix kernel oopses like the following:
      
       # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
       # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
       # umount /test1
       # umount /test2
      
      BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
      in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
      1 lock held by umount.nilfs2/3886:
       #0:  (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c
      irq event stamp: 1219
      hardirqs last  enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119
      hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119
      softirqs last  enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad
      softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a
      Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
      Call Trace:
       [<c1023549>] __might_sleep+0x107/0x10e
       [<c13603c0>] do_page_fault+0x246/0x397
       [<c136017a>] ? do_page_fault+0x0/0x397
       [<c135e753>] error_code+0x6b/0x70
       [<c136017a>] ? do_page_fault+0x0/0x397
       [<c104f805>] ? __lock_acquire+0x91/0x12fd
       [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
       [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
       [<c1050b2b>] lock_acquire+0xba/0xdd
       [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<c135d4fe>] down_write+0x2a/0x46
       [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<c104ea2c>] ? mark_held_locks+0x43/0x5b
       [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133
       [<c104ece4>] ? trace_hardirqs_on+0xb/0xd
       [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2]
       [<c10b3352>] generic_shutdown_super+0x49/0xb8
       [<c10b33de>] kill_block_super+0x1d/0x31
       [<c10e6599>] ? vfs_quota_off+0x0/0x12
       [<c10b398f>] deactivate_super+0x57/0x6c
       [<c10c4bc3>] mntput_no_expire+0x8c/0xb4
       [<c10c5094>] sys_umount+0x27f/0x2a4
       [<c10c50c6>] sys_oldumount+0xd/0xf
       [<c10031a4>] sysenter_do_call+0x12/0x38
       ...
      
      This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
      remaining sget() use").
      
      In the patch, a new "put resource" function, nilfs_put_sbinfo()
      was introduced to delay freeing nilfs_sb_info struct.
      
      But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
      function to check the reference count, and it caused the nilfs_sb_info
      was freed when user mounted a snapshot twice.
      
      This bug also suggests there was unseen memory leak in usual mount
      /umount operations for nilfs.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      a9245860
  4. 18 8月, 2009 1 次提交
  5. 02 8月, 2009 1 次提交
  6. 01 8月, 2009 1 次提交
  7. 14 7月, 2009 1 次提交
  8. 13 7月, 2009 1 次提交
  9. 05 7月, 2009 5 次提交
  10. 24 6月, 2009 1 次提交
  11. 12 6月, 2009 10 次提交
  12. 10 6月, 2009 14 次提交
    • R
      nilfs2: support contiguous lookup of blocks · c3a7abf0
      Ryusuke Konishi 提交于
      Although get_block() callback function can return extent of contiguous
      blocks with bh->b_size, nilfs_get_block() function did not support
      this feature.
      
      This adds contiguous lookup feature to the block mapping codes of
      nilfs, and allows the nilfs_get_blocks() function to return the extent
      information by applying the feature.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      c3a7abf0
    • R
      nilfs2: add sync_page method to page caches of meta data · fa032744
      Ryusuke Konishi 提交于
      This applies block_sync_page() function to the sync_page method of
      page caches for meta data files, gc page caches, and btree node
      buffers.  This is a companion patch of ("nilfs2: enable sync_page
      mothod") which applied the function for data pages.
      
      This allows lock_page() for those meta data to unplug pending bio
      requests.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      fa032744
    • R
      nilfs2: use device's backing_dev_info for btree node caches · a53b4751
      Ryusuke Konishi 提交于
      Previously, default_backing_dev_info was used for the mapping of btree
      node caches.  This uses device dependent backing_dev_info to allow
      detailed control of the device for the btree node pages.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      a53b4751
    • R
      nilfs2: return EBUSY against delete request on snapshot · 30c25be7
      Ryusuke Konishi 提交于
      This helps userland programs like the rmcp command to distinguish
      error codes returned against a checkpoint removal request.
      
      Previously -EPERM was returned, and not discriminable from real
      permission errors.  This also allows removal of the latest checkpoint
      because the deletion leads to create a new checkpoint, and thus it's
      harmless for the filesystem.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      30c25be7
    • R
      nilfs2: enable sync_page method · e85dc1d5
      Ryusuke Konishi 提交于
      This adds a missing sync_page method which unplugs bio requests when
      waiting for page locks. This will improve read performance of nilfs.
      
      Here is a measurement result using dd command.
      
      Without this patch:
      
       # mount -t nilfs2 /dev/sde1 /test
       # dd if=/test/aaa of=/dev/null bs=512k
       1024+0 records in
       1024+0 records out
       536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s
      
      With this patch:
      
       # mount -t nilfs2 /dev/sde1 /test
       # dd if=/test/aaa of=/dev/null bs=512k
       1024+0 records in
       1024+0 records out
       536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      e85dc1d5
    • R
      nilfs2: set bio unplug flag for the last bio in segment · 30bda0b8
      Ryusuke Konishi 提交于
      This sets BIO_RW_UNPLUG flag on the last bio of each segment during
      write.  The last bio should be unplugged immediately because the
      caller waits for the completion after the submission.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      30bda0b8
    • R
      nilfs2: allow future expansion of metadata read out via get info ioctl · 003ff182
      Ryusuke Konishi 提交于
      Nilfs has some ioctl commands to read out metadata from meta data
      files:
      
       - NILFS_IOCTL_GET_CPINFO for checkpoint file,
       - NILFS_IOCTL_GET_SUINFO for segment usage file, and
       - NILFS_IOCTL_GET_VINFO for Disk Address Transalation (DAT) file,
         respectively.
      
      Every routine on these metadata files is implemented so that it allows
      future expansion of on-disk format.  But, the above ioctl commands do
      not support expansion even though nilfs_argv structure can handle
      arbitrary size for data exchanged via ioctl.
      
      This allows future expansion of the following structures which give
      basic format of the "get information" ioctls:
      
       - struct nilfs_cpinfo
       - struct nilfs_suinfo
       - struct nilfs_vinfo
      
      So, this introduces forward compatility of such ioctl commands.
      
      In this patch, a sanity check in nilfs_ioctl_get_info() function is
      changed to accept larger data structure [1], and metadata read
      routines are rewritten so that they become compatible for larger
      structures; the routines will just ignore the remaining fields which
      the current version of nilfs doesn't know.
      
      [1] The ioctl function already has another upper limit (PAGE_SIZE
          against a structure, which appears in nilfs_ioctl_wrap_copy
          function), and this will not cause security problem.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      003ff182
    • H
      NILFS2: Pagecache usage optimization on NILFS2 · 258ef67e
      Hisashi Hifumi 提交于
      Hi,
      
      I introduced "is_partially_uptodate" aops for NILFS2.
      
      A page can have multiple buffers and even if a page is not uptodate, some buffers
      can be uptodate on pagesize != blocksize environment.
      This aops checks that all buffers which correspond to a part of a file
      that we want to read are uptodate. If so, we do not have to issue actual
      read IO to HDD even if a page is not uptodate because the portion we
      want to read are uptodate.
      "block_is_partially_uptodate" function is already used by ext2/3/4.
      With the following patch random read/write mixed workloads or random read after
      random write workloads can be optimized and we can get performance improvement.
      
      I did a performance test using the sysbench.
      
      1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil
      e-rw-ratio=1 run
      
      -2.6.30-rc5
      
      Test execution summary:
          total time:                          151.2907s
          total number of events:              200000
          total time taken by event execution: 2409.8387
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0120s
               max:                            0.9306s
               approx.  95 percentile:         0.0439s
      
      Threads fairness:
          events (avg/stddev):           12500.0000/238.52
          execution time (avg/stddev):   150.6149/0.01
      
      -2.6.30-rc5-patched
      
      Test execution summary:
          total time:                          140.8828s
          total number of events:              200000
          total time taken by event execution: 2240.8577
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0112s
               max:                            0.8750s
               approx.  95 percentile:         0.0418s
      
      Threads fairness:
          events (avg/stddev):           12500.0000/218.43
          execution time (avg/stddev):   140.0536/0.01
      
      arch: ia64
      pagesize: 16k
      
      Thanks.
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      258ef67e
    • R
      nilfs2: remove nilfs_btree_operations from btree mapping · 7cde31d7
      Ryusuke Konishi 提交于
      will remove indirect function calls using nilfs_btree_operations
      table.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      7cde31d7
    • R
      nilfs2: remove nilfs_direct_operations from direct mapping · 355c6b61
      Ryusuke Konishi 提交于
      will remove indirect function calls using nilfs_direct_operations
      table.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      355c6b61
    • R
      nilfs2: remove bmap pointer operations · d4b96157
      Ryusuke Konishi 提交于
      Previously, the bmap codes of nilfs used three types of function
      tables.  The abuse of indirect function calls decreased source
      readability and suffered many indirect jumps which would confuse
      branch prediction of processors.
      
      This eliminates one type of the function tables,
      nilfs_bmap_ptr_operations, which was used to dispatch low level
      pointer operations of the nilfs bmap.
      
      This adds a new integer variable "b_ptr_type" to nilfs_bmap struct,
      and uses the value to select the pointer operations.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      d4b96157
    • R
      nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct · 3033342a
      Ryusuke Konishi 提交于
      This will cut off 16 bytes from the nilfs_bmap struct which is
      embedded in the on-memory inode of nilfs.
      
      The b_high field was never used, and the b_low field stores a constant
      value which can be determined by whether the inode uses btree for
      block mapping or not.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      3033342a
    • R
      nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function · e473c1f2
      Ryusuke Konishi 提交于
      This indirect function is set to NULL only for gc cache inodes, but
      the gc cache inodes never call this function.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      e473c1f2
    • R
      nilfs2: move get block functions in bmap.c into btree codes · f198dbb9
      Ryusuke Konishi 提交于
      Two get block function for btree nodes, nilfs_bmap_get_block() and
      nilfs_bmap_get_new_block(), are called only from the btree codes.
      This relocation will increase opportunities of compiler optimization.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      f198dbb9