1. 22 1月, 2013 2 次提交
  2. 15 1月, 2013 3 次提交
    • N
      f2fs: fix the debugfs entry creation path · 4589d25d
      Namjae Jeon 提交于
      As the "status" debugfs entry will be maintained for entire F2FS filesystem
      irrespective of the number of partitions.
      So, we can move the initialization to the init part of the f2fs and destroy will
      be done from exit part. After making changes, for individual partition mount -
      entry creation code will not be executed.
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      4589d25d
    • M
      f2fs: add global mutex_lock to protect f2fs_stat_list · 66af62ce
      majianpeng 提交于
      There is an race condition between umounting f2fs and reading f2fs/status, which
      results in oops.
      
      Fox example:
      Thread A			Thread B
      umount f2fs 			cat f2fs/status
      
      f2fs_destroy_stats() {		stat_show() {
      				 list_for_each_entry_safe(&f2fs_stat_list)
       list_del(&si->stat_list);
       mutex_lock(&si->stat_lock);
       si->sbi = NULL;
       mutex_unlock(&si->stat_lock);
       kfree(sbi->stat_info);
      } 				 mutex_lock(&si->stat_lock) <- si is gone.
      				 ...
      				}
      
      Solution with a global lock: f2fs_stat_mutex:
      Thread A			Thread B
      umount f2fs 			cat f2fs/status
      
      f2fs_destroy_stats() {		stat_show() {
       mutex_lock(&f2fs_stat_mutex);
       list_del(&si->stat_list);
       mutex_unlock(&f2fs_stat_mutex);
       kfree(sbi->stat_info);		 mutex_lock(&f2fs_stat_mutex);
      }				 list_for_each_entry_safe(&f2fs_stat_list)
      				 ...
      				 mutex_unlock(&f2fs_stat_mutex);
      				}
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      [jaegeuk.kim@samsung.com: fix typos, description, and remove the existing lock]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      66af62ce
    • N
      f2fs: remove the blk_plug usage in f2fs_write_data_pages · fa9150a8
      Namjae Jeon 提交于
      Let's consider the usage of blk_plug in f2fs_write_data_pages().
      We can come up with the two issues: lock contention and task awareness.
      
      1. Merging bios prior to grabing "queue lock"
       The f2fs merges consecutive IOs in the file system level before
       submitting any bios, which is similar with the back merge by the
       plugging mechanism in attempt_plug_merge(). Both of them need to acquire
       no queue lock.
      
      2. Merging policy with respect to tasks
       The f2fs merges IOs as much as possible regardless of tasks, while
       blk-plugging is conducted on a basis of tasks. As we can understand
       there are trade-offs, f2fs tries to maximize the write performance with
       well-merged bios.
      
      As a result, if f2fs produces many consecutive but separated bios in
      writepages(), it would be good to use blk-plugging since f2fs would be
      able to avoid queue lock contention in the block layer by merging them.
      But, f2fs merges IOs and submit one bio, which means that there are not
      much chances to merge bios by attempt_plug_merge().
      
      However, f2fs has already been used blk_plug by triggering generic_writepages()
      in f2fs_write_data_pages().
      So to make the overall code consistency, I'd like to remove blk_plug there.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      fa9150a8
  3. 14 1月, 2013 2 次提交
  4. 11 1月, 2013 2 次提交
    • J
      f2fs: move f2fs_balance_fs to punch_hole · 9eaeba70
      Jaegeuk Kim 提交于
      The f2fs_fallocate() has two operations: punch_hole and expand_size.
      
      Only in the case of punch_hole, dirty node pages can be produced, so let's
      trigger f2fs_balance_fs() in this case only.
      Furthermore, let's trigger it at every data truncation routine.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9eaeba70
    • J
      f2fs: add f2fs_balance_fs in several interfaces · 7d82db83
      Jaegeuk Kim 提交于
      The f2fs_balance_fs() is to check the number of free sections and decide whether
      it needs to conduct cleaning or not. If there are not enough free sections, the
      cleaning job should be started.
      
      In order to control an amount of free sections even under high utilization, f2fs
      should call f2fs_balance_fs at all the VFS interfaces that are able to produce
      dirty pages.
      This patch adds the function calls in the missing interfaces as follows.
      
      1. f2fs_setxattr()
      The f2fs_setxattr() produces dirty node pages so that we should call
      f2fs_balance_fs() either likewise doing in other VFS interfaces such as
      f2fs_lookup(), f2fs_mkdir(), and so on.
      
      2. f2fs_sync_file()
      We should guarantee serving free sections for syncing metadata during fsync.
      Previously, there is no space check before triggering checkpoint and
      sync_node_pages.
      Therefore, if a bunch of fsync calls are triggered under 100% of FS utilization,
      f2fs is able to be faced with no free sections, resulting in BUG_ON().
      
      3. f2fs_sync_fs()
      Before calling write_checkpoint(), we should guarantee that there are minimum
      free sections.
      
      4. f2fs_write_inode()
      f2fs_write_inode() is also able to produce dirty node pages.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      7d82db83
  5. 10 1月, 2013 1 次提交
    • J
      f2fs: revisit the f2fs_gc flow · 408e9375
      Jaegeuk Kim 提交于
      I'd like to revisit the f2fs_gc flow and rewrite as follows.
      
      1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
        remove it.
      2. Background GC marks victim blocks as dirty one at a time.
      3. Foreground GC should do cleaning job until acquiring enough free
        sections. Afterwards, it needs to do checkpoint.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      408e9375
  6. 04 1月, 2013 7 次提交
  7. 03 1月, 2013 2 次提交
    • H
      mempolicy: remove arg from mpol_parse_str, mpol_to_str · a7a88b23
      Hugh Dickins 提交于
      Remove the unused argument (formerly no_context) from mpol_parse_str()
      and from mpol_to_str().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7a88b23
    • E
      epoll: prevent missed events on EPOLL_CTL_MOD · 128dd175
      Eric Wong 提交于
      EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
      ensure events are not missed.  Since the modifications to the interest
      mask are not protected by the same lock as ep_poll_callback, we need to
      ensure the change is visible to other CPUs calling ep_poll_callback.
      
      We also need to ensure f_op->poll() has an up-to-date view of past
      events which occured before we modified the interest mask.  So this
      barrier also pairs with the barrier in wq_has_sleeper().
      
      This should guarantee either ep_poll_callback or f_op->poll() (or both)
      will notice the readiness of a recently-ready/modified item.
      
      This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
      http://thread.gmane.org/gmane.linux.kernel/1408782/Signed-off-by: NEric Wong <normalperson@yhbt.net>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
      Tested-by: N"Junchang(Jason) Wang" <junchang.wang@yale.edu>
      Cc: netdev@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      128dd175
  8. 02 1月, 2013 4 次提交
    • B
      GFS2: Reset rd_last_alloc when it reaches the end of the rgrp · 13d2eb01
      Bob Peterson 提交于
      In function rg_mblk_search, it's searching for multiple blocks in
      a given state (e.g. "free"). If there's an active block reservation
      its goal is the next free block of that. If the resource group
      contains the dinode's goal block, that's used for the search. But
      if neither is the case, it uses the rgrp's last allocated block.
      That way, consecutive allocations appear after one another on media.
      The problem comes in when you hit the end of the rgrp; it would never
      start over and search from the beginning. This became a problem,
      since if you deleted all the files and data from the rgrp, it would
      never start over and find free blocks. So it had to keep searching
      further out on the media to allocate blocks. This patch resets the
      rd_last_alloc after it does an unsuccessful search at the end of
      the rgrp.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      13d2eb01
    • B
      GFS2: Stop looking for free blocks at end of rgrp · 15bd50ad
      Bob Peterson 提交于
      This patch adds a return code check after calling function
      gfs2_rbm_from_block while determining the free extent size.
      That way, when the end of an rgrp is reached, it won't try
      to process unaligned blocks after the end.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      15bd50ad
    • A
      GFS2: Fix race in gfs2_rs_alloc · f1213cac
      Abhijith Das 提交于
      QE aio tests uncovered a race condition in gfs2_rs_alloc where it's possible
      to come out of the function with a valid ip->i_res allocation but it gets
      freed before use resulting in a NULL ptr dereference.
      
      This patch envelopes the initial short-circuit check for non-NULL ip->i_res
      into the mutex lock. With this patch, I was able to successfully run the
      reproducer test multiple times.
      
      Resolves: rhbz#878476
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1213cac
    • N
      GFS2: Initialize hex string to '0' · ec148752
      Nathan Straz 提交于
      When generating the DLM lock name, a value of 0 would skip
      the loop and leave the string unchanged.  This left locks with
      a value of 0 unlabeled.  Initializing the string to '0' fixes this.
      Signed-off-by: NNathan Straz <nstraz@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ec148752
  9. 28 12月, 2012 11 次提交
  10. 27 12月, 2012 2 次提交
    • T
      ext4: avoid hang when mounting non-journal filesystems with orphan list · 0e9a9a1a
      Theodore Ts'o 提交于
      When trying to mount a file system which does not contain a journal,
      but which does have a orphan list containing an inode which needs to
      be truncated, the mount call with hang forever in
      ext4_orphan_cleanup() because ext4_orphan_del() will return
      immediately without removing the inode from the orphan list, leading
      to an uninterruptible loop in kernel code which will busy out one of
      the CPU's on the system.
      
      This can be trivially reproduced by trying to mount the file system
      found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
      source tree.  If a malicious user were to put this on a USB stick, and
      mount it on a Linux desktop which has automatic mounts enabled, this
      could be considered a potential denial of service attack.  (Not a big
      deal in practice, but professional paranoids worry about such things,
      and have even been known to allocate CVE numbers for such problems.)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      0e9a9a1a
    • T
      ext4: lock i_mutex when truncating orphan inodes · 721e3eba
      Theodore Ts'o 提交于
      Commit c278531d added a warning when ext4_flush_unwritten_io() is
      called without i_mutex being taken.  It had previously not been taken
      during orphan cleanup since races weren't possible at that point in
      the mount process, but as a result of this c278531d, we will now see
      a kernel WARN_ON in this case.  Take the i_mutex in
      ext4_orphan_cleanup() to suppress this warning.
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      721e3eba
  11. 26 12月, 2012 4 次提交
    • E
      f2fs: Don't assign e_id in f2fs_acl_from_disk · 48c6d121
      Eric W. Biederman 提交于
      With user namespaces enabled building f2fs fails with:
      
       CC      fs/f2fs/acl.o
      fs/f2fs/acl.c: In function ‘f2fs_acl_from_disk’:
      fs/f2fs/acl.c:85:21: error: ‘struct posix_acl_entry’ has no member named ‘e_id’
      make[2]: *** [fs/f2fs/acl.o] Error 1
      make[2]: Target `__build' not remade because of errors.
      
      e_id is a backwards compatibility field only used for file systems
      that haven't been converted to use kuids and kgids.  When the posix
      acl tag field is neither ACL_USER nor ACL_GROUP assigning e_id is
      unnecessary.  Remove the assignment so f2fs will build with user
      namespaces enabled.
      
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Amit Sahrawat <a.sahrawat@samsung.com>
      Acked-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      48c6d121
    • J
      f2fs: do f2fs_balance_fs in front of dir operations · 1efef832
      Jaegeuk Kim 提交于
      In order to conserve free sections to deal with the worst-case scenarios, f2fs
      should be able to freeze all the directory operations especially when there are
      not enough free sections. The f2fs_balance_fs() is for this use.
      
      When FS utilization becomes almost 100%, directory operations can be failed due
      to -ENOSPC frequently, which produces some dirty node pages occasionally.
      
      Previously, in such a case, f2fs_balance_fs() is not able to be triggered since
      it is triggered only if the directory operation ends up with success.
      
      So, this patch triggers f2fs_balance_fs() at first before handling directory
      operations.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1efef832
    • J
      f2fs: should recover orphan and fsync data · 30f0c758
      Jaegeuk Kim 提交于
      The recovery routine should do all the time regardless of normal umount action.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      30f0c758
    • J
      f2fs: fix handling errors got by f2fs_write_inode · 398b1ac5
      Jaegeuk Kim 提交于
      Ruslan reported that f2fs hangs with an infinite loop in f2fs_sync_file():
      
      	while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0)
      		f2fs_write_inode(inode, NULL);
      
      The reason was revealed that the cold flag is not set even thought this inode is
      a normal file. Therefore, sync_node_pages() skips to write node blocks since it
      only writes cold node blocks.
      
      The cold flag is stored to the node_footer in node block, and whenever a new
      node page is allocated, it is set according to its file type, file or directory.
      
      But, after sudden-power-off, when recovering the inode page, f2fs doesn't recover
      its cold flag.
      
      So, let's assign the cold flag in more right places.
      
      One more thing:
      If f2fs_write_inode() returns an error due to whatever situations, there would
      be no dirty node pages so that sync_node_pages() returns zero.
      (i.e., zero means nothing was written.)
      Reported-by: NRuslan N. Marchenko <me@ruff.mobi>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      398b1ac5