1. 28 2月, 2010 1 次提交
  2. 05 1月, 2010 1 次提交
    • B
      exofs: simple_write_end does not mark_inode_dirty · efd124b9
      Boaz Harrosh 提交于
      exofs uses simple_write_end() for it's .write_end handler. But
      it is not enough because simple_write_end() does not call
      mark_inode_dirty() when it extends i_size. So even if we do
      call mark_inode_dirty at beginning of write out, with a very
      long IO and a saturated system we might get the .write_inode()
      called while still extend-writing to file and miss out on the last
      i_size updates.
      
      So override .write_end, call simple_write_end(), and afterwords if
      i_size was changed call mark_inode_dirty().
      
      It stands to logic that since simple_write_end() was the one extending
      i_size it should also call mark_inode_dirty(). But it looks like all
      users of simple_write_end() are memory-bound pseudo filesystems, who
      could careless about mark_inode_dirty(). I might submit a
      warning-comment patch to simple_write_end() in future.
      
      CC: Stable <stable@kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      efd124b9
  3. 10 12月, 2009 5 次提交
    • B
      exofs: Multi-device mirror support · 04dc1e88
      Boaz Harrosh 提交于
      This patch changes on-disk format, it is accompanied with a parallel
      patch to mkfs.exofs that enables multi-device capabilities.
      
      After this patch, old exofs will refuse to mount a new formatted FS and
      new exofs will refuse an old format. This is done by moving the magic
      field offset inside the FSCB. A new FSCB *version* field was added. In
      the future, exofs will refuse to mount unmatched FSCB version. To
      up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option
      before mounting.
      
      Introduced, a new object that contains a *device-table*. This object
      contains the default *data-map* and a linear array of devices
      information, which identifies the devices used in the filesystem. This
      object is only written to offline by mkfs.exofs. This is why it is kept
      separate from the FSCB, since the later is written to while mounted.
      
      Same partition number, same object number is used on all devices only
      the device varies.
      
      * define the new format, then load the device table on mount time make
        sure every thing is supported.
      
      * Change I/O engine to now support Mirror IO, .i.e write same data
        to multiple devices, read from a random device to spread the
        read-load from multiple clients (TODO: stripe read)
      
      Implementation notes:
       A few points introduced in previous patch should be mentioned here:
      
      * Special care was made so absolutlly all operation that have any chance
        of failing are done before any osd-request is executed. This is to
        minimize the need for a data consistency recovery, to only real IO
        errors.
      
      * Each IO state has a kref. It starts at 1, any osd-request executed
        will increment the kref, finally when all are executed the first ref
        is dropped. At IO-done, each request completion decrements the kref,
        the last one to return executes the internal _last_io() routine.
        _last_io() will call the registered io_state_done. On sync mode a
        caller does not supply a done method, indicating a synchronous
        request, the caller is put to sleep and a special io_state_done is
        registered that will awaken the caller. Though also in sync mode all
        operations are executed in parallel.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      04dc1e88
    • B
      exofs: Move all operations to an io_engine · 06886a5a
      Boaz Harrosh 提交于
      In anticipation for multi-device operations, we separate osd operations
      into an abstract I/O API. Currently only one device is used but later
      when adding more devices, we will drive all devices in parallel according
      to a "data_map" that describes how data is arranged on multiple devices.
      The file system level operates, like before, as if there is one object
      (inode-number) and an i_size. The io engine will split this to the same
      object-number but on multiple device.
      
      At first we introduce Mirror (raid 1) layout. But at the final outcome
      we intend to fully implement the pNFS-Objects data-map, including
      raid 0,4,5,6 over mirrored devices, over multiple device-groups. And
      more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12
      
      * Define an io_state based API for accessing osd storage devices
        in an abstract way.
        Usage:
      	First a caller allocates an io state with:
      		exofs_get_io_state(struct exofs_sb_info *sbi,
      				   struct exofs_io_state** ios);
      
      	Then calles one of:
      		exofs_sbi_create(struct exofs_io_state *ios);
      		exofs_sbi_remove(struct exofs_io_state *ios);
      		exofs_sbi_write(struct exofs_io_state *ios);
      		exofs_sbi_read(struct exofs_io_state *ios);
      		exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len);
      
      	And when done
      		exofs_put_io_state(struct exofs_io_state *ios);
      
      * Convert all source files to use this new API
      * Convert from bio_alloc to bio_kmalloc
      * In io engine we make use of the now fixed osd_req_decode_sense
      
      There are no functional changes or on disk additions after this patch.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      06886a5a
    • B
      exofs: refactor exofs_i_info initialization into common helper · 9cfdc7aa
      Boaz Harrosh 提交于
      There are two places that initialize inodes: exofs_iget() and
      exofs_new_inode()
      
      As more members of exofs_i_info that need initialization are
      added this code will grow. (soon)
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      9cfdc7aa
    • B
      exofs: dbg-print less · fe33cc1e
      Boaz Harrosh 提交于
      Iner-loops printing is converted to EXOFS_DBG2 which is #defined
      to nothing.
      
      It is now almost bareable to just leave debug-on. Every operation
      is printed once, with most relevant info (I hope).
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      fe33cc1e
    • B
      exofs: More sane debug print · 58311c43
      Boaz Harrosh 提交于
      debug prints should be somewhat useful without actually
      reading the source code
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      58311c43
  4. 21 6月, 2009 2 次提交
  5. 10 6月, 2009 2 次提交
  6. 01 4月, 2009 4 次提交
    • B
      exofs: super_operations and file_system_type · ba9e5e98
      Boaz Harrosh 提交于
      This patch ties all operation vectors into a file system superblock
      and registers the exofs file_system_type at module's load time.
      
      * The file system control block (AKA on-disk superblock) resides in
        an object with a special ID (defined in common.h).
        Information included in the file system control block is used to
        fill the in-memory superblock structure at mount time. This object
        is created before the file system is used by mkexofs.c It contains
        information such as:
      	- The file system's magic number
      	- The next inode number to be allocated
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      ba9e5e98
    • B
      exofs: dir_inode and directory operations · e6af00f1
      Boaz Harrosh 提交于
      implementation of directory and inode operations.
      
      * A directory is treated as a file, and essentially contains a list
        of <file name, inode #> pairs for files that are found in that
        directory. The object IDs correspond to the files' inode numbers
        and are allocated using a 64bit incrementing global counter.
      * Each file's control block (AKA on-disk inode) is stored in its
        object's attributes. This applies to both regular files and other
        types (directories, device files, symlinks, etc.).
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      e6af00f1
    • B
      exofs: address_space_operations · beaec07b
      Boaz Harrosh 提交于
      OK Now we start to read and write from osd-objects. We try to
      collect at most contiguous pages as possible in a single write/read.
      The first page index is the object's offset.
      
      TODO:
         In 64-bit a single bio can carry at most 128 pages.
         Add support of chaining multiple bios
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      beaec07b
    • B
      exofs: file and file_inode operations · e8062719
      Boaz Harrosh 提交于
      implementation of the file_operations and inode_operations for
      regular data files.
      
      Most file_operations are generic vfs implementations except:
      - exofs_truncate will truncate the OSD object as well
      - Generic file_fsync is not good for none_bd devices so open code it
      - The default for .flush in Linux is todo nothing so call exofs_fsync
        on the file.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      e8062719