1. 11 12月, 2012 17 次提交
    • J
      f2fs: fix endian conversion bugs reported by sparse · 25ca923b
      Jaegeuk Kim 提交于
      This patch should resolve the bugs reported by the sparse tool.
      Initial reports were written by "kbuild test robot" managed by fengguang.wu.
      
      In my local machines, I've tested also by running:
      > make C=2 CF="-D__CHECK_ENDIAN__"
      
      Accordingly, I've found lots of warnings and bugs related to the endian
      conversion. And I've fixed all at this moment.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      25ca923b
    • S
      f2fs: remove unneeded version.h header file from f2fs.h · cf0e3a64
      Sachin Kamat 提交于
      Including <linux/version.h> is not necessary.
      Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org>
      cf0e3a64
    • J
      f2fs: update Kconfig and Makefile · a14d5393
      Jaegeuk Kim 提交于
      This adds Makefile and Kconfig for f2fs, and updates Makefile and Kconfig files
      in the fs directory.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a14d5393
    • G
      f2fs: move proc files to debugfs · 902829aa
      Greg Kroah-Hartman 提交于
      This moves all of the f2fs debugging files into debugfs. The files are
      located in /sys/kernel/debug/f2fs/
      
      Note, I think we are generating all of the same information in each of
      the files for every unique f2fs filesystem in the machine.  This copies
      the functionality that was present in the proc files, but this should be
      fixed up in the future.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [jaegeuk.kim@samsung.com: merged 3 debugfs entries into a *status* entry]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      902829aa
    • J
      f2fs: add recovery routines for roll-forward · d624c96f
      Jaegeuk Kim 提交于
      This adds roll-forward routines to recover fsynced data.
      
      - F2FS uses basically roll-back model with checkpointing.
      
      - In order to implement fsync(), there are two approaches as follows.
      
      1. A roll-back model with checkpointing at every fsync()
       : This is a naive method, but suffers from very low performance.
      
      2. A roll-forward model
       : F2FS adopts this model where all the fsynced data should be recovered, which
         were written after checkpointing was done. In order to figure out the data,
         F2FS keeps a "fsync" mark in direct node blocks. In addition, F2FS remains
         the location of next node block in each direct node block for reconstructing
         the chain of node blocks during the recovery.
      
      - In order to enhance the performance, F2FS keeps a "dentry" mark also in direct
        node blocks. If this is set during the recovery, F2FS replays adding a dentry.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      d624c96f
    • J
      f2fs: add garbage collection functions · 7bc09003
      Jaegeuk Kim 提交于
      This adds on-demand and background cleaning functions.
      
      - The basic background cleaning policy is trying to do cleaning jobs as much as
        possible whenever the system is idle. Once the background cleaning is done,
        the cleaner sleeps an amount of time not to interfere with VFS calls. The time
        is dynamically adjusted according to the status of whole segments, which is
        decreased when the following conditions are satisfied.
      
        . GC is not conducted currently, and
        . IO subsystem is idle by checking the number of requets in bdev's request
           list, and
        . There are enough dirty segments.
      
        Otherwise, the time is increased incrementally until to the maximum time.
        Note that, min and max times are 10 secs and 30 secs by default.
      
      - F2FS adopts a default victim selection policy where background cleaning uses
        a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
      
      - The method of moving data during the cleaning is slightly different between
        background and on-demand cleaning schemes. In the case of background cleaning,
        F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
        will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
        move the data right away.
      
      - In order to identify valid blocks in a victim segment, F2FS scans the bitmap
        of the segment managed as an SIT entry.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      7bc09003
    • J
      f2fs: add xattr and acl functionalities · af48b85b
      Jaegeuk Kim 提交于
      This implements xattr and acl functionalities.
      
      - F2FS uses a node page to contain use extended attributes.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      af48b85b
    • J
      f2fs: add core directory operations · 6b4ea016
      Jaegeuk Kim 提交于
      this adds core functions to find, add, delete, and link dentries.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      6b4ea016
    • J
      f2fs: add inode operations for special inodes · 57397d86
      Jaegeuk Kim 提交于
      This adds inode operations for directory, symlink, and special inodes.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      57397d86
    • J
      f2fs: add core inode operations · 19f99cee
      Jaegeuk Kim 提交于
      This adds core functions to get, read, write, and evict an inode.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      19f99cee
    • J
      f2fs: add address space operations for data · eb47b800
      Jaegeuk Kim 提交于
      This adds address space operations for data.
      
      - F2FS supports readpages(), writepages(), and direct_IO().
      
      - Because of out-of-place writes, f2fs_direct_IO() does not write data in place.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      eb47b800
    • J
      f2fs: add file operations · fbfa2cc5
      Jaegeuk Kim 提交于
      This adds memory operations and file/file_inode operations.
      
      - F2FS supports fallocate(), mmap(), fsync(), and basic ioctl().
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      fbfa2cc5
    • J
      f2fs: add segment operations · 351df4b2
      Jaegeuk Kim 提交于
      This adds specific functions not only to manage dirty/free segments, SIT pages,
      a cache for SIT entries, and summary entries, but also to allocate free blocks
      and write three types of pages: data, node, and meta.
      
      - F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
        and dirty segments respectively.
      
      - The key information of an SIT entry consists of a segment number, the number
        of valid blocks in the segment, a bitmap to identify there-in valid or invalid
        blocks.
      
      - An SIT page is composed of a certain range of SIT entries, which is maintained
        by the address space of meta_inode.
      
      - To cache SIT entries, a simple array is used. The index for the array is the
        segment number.
      
      - A summary entry for data contains the parent node information. A summary entry
        for node contains its node offset from the inode.
      
      - F2FS manages information about six active logs and those summary entries in
        memory. Whenever one of them is changed, its summary entries are flushed to
        its SIT page maintained by the address space of meta_inode.
      
      - This patch adds a default block allocation function which supports heap-based
        allocation policy.
      
      - This patch adds core functions to write data, node, and meta pages. Since LFS
        basically produces a series of sequential writes, F2FS merges sequential bios
        with a single one as much as possible to reduce the IO scheduling overhead.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      351df4b2
    • J
      f2fs: add node operations · e05df3b1
      Jaegeuk Kim 提交于
      This adds specific functions to manage NAT pages, a cache for NAT entries, free
      nids, direct/indirect node blocks for indexing data, and address space for node
      pages.
      
      - The key information of an NAT entry consists of a node id and a block address.
      
      - An NAT page is composed of block addresses covered by a certain range of NAT
        entries, which is maintained by the address space of meta_inode.
      
      - A radix tree structure is used to cache NAT entries. The index for the tree
        is a node id.
      
      - When there is no free nid, F2FS should scan NAT entries to find new one. In
        order to avoid scanning frequently, F2FS manages a list containing a number of
        free nids in memory. Only when free nids in the list are exhausted, scanning
        process, build_free_nids(), is triggered.
      
      - F2FS has direct and indirect node blocks for indexing data. This patch adds
        fuctions related to the node block management such as getting, allocating, and
        truncating node blocks to index data.
      
      - In order to cache node blocks in memory, F2FS has a node_inode with an address
        space for node pages. This patch also adds the address space operations for
        node_inode.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      e05df3b1
    • J
      f2fs: add checkpoint operations · 127e670a
      Jaegeuk Kim 提交于
      This adds functions required by the checkpoint operations.
      
      Basically, f2fs adopts a roll-back model with checkpoint blocks written in the
      CP area. The checkpoint procedure includes as follows.
      
      - write_checkpoint()
      1. block_operations() freezes VFS calls.
      2. submit cached bios.
      3. flush_nat_entries() writes NAT pages updated by dirty NAT entries.
      4. flush_sit_entries() writes SIT pages updated by dirty SIT entries.
      5. do_checkpoint() writes,
        - checkpoint block (#0)
        - orphan inode blocks
        - summary blocks made by active logs
        - checkpoint block (copy of #0)
      6. unblock_opeations()
      
      In order to provide an address space for meta pages, f2fs_sb_info has a special
      inode, namely meta_inode. This patch also adds the address space operations for
      meta_inode.
      Signed-off-by: NChul Lee <chur.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      127e670a
    • J
      f2fs: add super block operations · aff063e2
      Jaegeuk Kim 提交于
      This adds the implementation of superblock operations for f2fs, which includes
      - init_f2fs_fs/exit_f2fs_fs
      - f2fs_mount
      - super_operations of f2fs
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      aff063e2
    • J
      f2fs: add superblock and major in-memory structure · 39a53e0c
      Jaegeuk Kim 提交于
      This adds the following major in-memory structures in f2fs.
      
      - f2fs_sb_info:
        contains f2fs-specific information, two special inode pointers for node and
        meta address spaces, and orphan inode management.
      
      - f2fs_inode_info:
        contains vfs_inode and other fs-specific information.
      
      - f2fs_nm_info:
        contains node manager information such as NAT entry cache, free nid list,
        and NAT page management.
      
      - f2fs_node_info:
        represents a node as node id, inode number, block address, and its version.
      
      - f2fs_sm_info:
        contains segment manager information such as SIT entry cache, free segment
        map, current active logs, dirty segment management, and segment utilization.
        The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
        curseg_info.
      
      In addition, add F2FS_SUPER_MAGIC in magic.h.
      Signed-off-by: NChul Lee <chur.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      39a53e0c
  2. 09 12月, 2012 1 次提交
    • L
      vfs: fix O_DIRECT read past end of block device · 684c9aae
      Linus Torvalds 提交于
      The direct-IO write path already had the i_size checks in mm/filemap.c,
      but it turns out the read path did not, and removing the block size
      checks in fs/block_dev.c (commit bbec0270: "blkdev_max_block: make
      private to fs/buffer.c") removed the magic "shrink IO to past the end of
      the device" code there.
      
      Fix it by truncating the IO to the size of the block device, like the
      write path already does.
      
      NOTE! I suspect the write path would be *much* better off doing it this
      way in fs/block_dev.c, rather than hidden deep in mm/filemap.c.  The
      mm/filemap.c code is extremely hard to follow, and has various
      conditionals on the target being a block device (ie the flag passed in
      to 'generic_write_checks()', along with a conditional update of the
      inode timestamp etc).
      
      It is also quite possible that we should treat this whole block device
      size as a "s_maxbytes" issue, and try to make the logic even more
      generic.  However, in the meantime this is the fairly minimal targeted
      fix.
      
      Noted by Milan Broz thanks to a regression test for the cryptsetup
      reencrypt tool.
      Reported-and-tested-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      684c9aae
  3. 06 12月, 2012 1 次提交
  4. 05 12月, 2012 1 次提交
    • L
      vfs: avoid "attempt to access beyond end of device" warnings · 57302e0d
      Linus Torvalds 提交于
      The block device access simplification that avoided accessing the (racy)
      block size information (commit bbec0270: "blkdev_max_block: make
      private to fs/buffer.c") no longer checks the maximum block size in the
      block mapping path.
      
      That was _almost_ as simple as just removing the code entirely, because
      the readers and writers all check the size of the device anyway, so
      under normal circumstances it "just worked".
      
      However, the block size may be such that the end of the device may
      straddle one single buffer_head.  At which point we may still want to
      access the end of the device, but the buffer we use to access it
      partially extends past the end.
      
      The 'bd_set_size()' function intentionally sets the block size to avoid
      this, but mounting the device - or setting the block size by hand to
      some other value - can modify that block size.
      
      So instead, teach 'submit_bh()' about the special case of the buffer
      head straddling the end of the device, and turning such an access into a
      smaller IO access, avoiding the problem.
      
      This, btw, also means that unlike before, we can now access the whole
      device regardless of device block size setting.  So now, even if the
      device size is only 512-byte aligned, we can read and write even the
      last sector even when having a much bigger block size for accessing the
      rest of the device.
      
      So with this, we could now get rid of the 'bd_set_size()' block size
      code entirely - resulting in faster IO for the common case - but that
      would be a separate patch.
      Reported-and-tested-by: NRomain Francoise <romain@orebokech.com>
      Reporeted-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57302e0d
  5. 30 11月, 2012 9 次提交
  6. 29 11月, 2012 1 次提交
  7. 28 11月, 2012 1 次提交
  8. 27 11月, 2012 2 次提交
    • J
      writeback: put unused inodes to LRU after writeback completion · 4eff96dd
      Jan Kara 提交于
      Commit 169ebd90 ("writeback: Avoid iput() from flusher thread")
      removed iget-iput pair from inode writeback.  As a side effect, inodes
      that are dirty during iput_final() call won't be ever added to inode LRU
      (iput_final() doesn't add dirty inodes to LRU and later when the inode
      is cleaned there's noone to add the inode there).  Thus inodes are
      effectively unreclaimable until someone looks them up again.
      
      The practical effect of this bug is limited by the fact that inodes are
      pinned by a dentry for long enough that the inode gets cleaned.  But
      still the bug can have nasty consequences leading up to OOM conditions
      under certain circumstances.  Following can easily reproduce the
      problem:
      
        for (( i = 0; i < 1000; i++ )); do
          mkdir $i
          for (( j = 0; j < 1000; j++ )); do
            touch $i/$j
            echo 2 > /proc/sys/vm/drop_caches
          done
        done
      
      then one needs to run 'sync; ls -lR' to make inodes reclaimable again.
      
      We fix the issue by inserting unused clean inodes into the LRU after
      writeback finishes in inode_sync_complete().
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reported-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>		[3.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4eff96dd
    • S
      proc: check vma->vm_file before dereferencing · 05f56484
      Stanislav Kinsbursky 提交于
      Commit 7b540d06 ("proc_map_files_readdir(): don't bother with
      grabbing files") switched proc_map_files_readdir() to use @f_mode
      directly instead of grabbing @file reference, but same time the test for
      @vm_file presence was lost leading to nil dereference.  The patch brings
      the test back.
      
      The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped
      (which is set to 'n' by default) so the bug doesn't affect regular
      kernels.
      
      The regression is 3.7-rc1 only as far as I can tell.
      
      [gorcunov@openvz.org: provided changelog]
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05f56484
  9. 23 11月, 2012 1 次提交
    • J
      jbd: Fix lock ordering bug in journal_unmap_buffer() · 25389bb2
      Jan Kara 提交于
      Commit 09e05d48 introduced a wait for transaction commit into
      journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
      in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
      we forgot to drop buffer lock before waiting for transaction commit and thus
      deadlock is possible when kjournald wants to lock the buffer.
      
      Fix the problem by dropping the buffer lock before waiting for transaction
      commit. Since we are still holding page lock (and that is OK), buffer cannot
      disappear under us.
      
      CC: stable@vger.kernel.org # Wherever commit 09e05d48 was taken
      Signed-off-by: NJan Kara <jack@suse.cz>
      25389bb2
  10. 20 11月, 2012 5 次提交
  11. 19 11月, 2012 1 次提交