1. 10 12月, 2009 7 次提交
    • B
      exofs: Move all operations to an io_engine · 06886a5a
      Boaz Harrosh 提交于
      In anticipation for multi-device operations, we separate osd operations
      into an abstract I/O API. Currently only one device is used but later
      when adding more devices, we will drive all devices in parallel according
      to a "data_map" that describes how data is arranged on multiple devices.
      The file system level operates, like before, as if there is one object
      (inode-number) and an i_size. The io engine will split this to the same
      object-number but on multiple device.
      
      At first we introduce Mirror (raid 1) layout. But at the final outcome
      we intend to fully implement the pNFS-Objects data-map, including
      raid 0,4,5,6 over mirrored devices, over multiple device-groups. And
      more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12
      
      * Define an io_state based API for accessing osd storage devices
        in an abstract way.
        Usage:
      	First a caller allocates an io state with:
      		exofs_get_io_state(struct exofs_sb_info *sbi,
      				   struct exofs_io_state** ios);
      
      	Then calles one of:
      		exofs_sbi_create(struct exofs_io_state *ios);
      		exofs_sbi_remove(struct exofs_io_state *ios);
      		exofs_sbi_write(struct exofs_io_state *ios);
      		exofs_sbi_read(struct exofs_io_state *ios);
      		exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len);
      
      	And when done
      		exofs_put_io_state(struct exofs_io_state *ios);
      
      * Convert all source files to use this new API
      * Convert from bio_alloc to bio_kmalloc
      * In io engine we make use of the now fixed osd_req_decode_sense
      
      There are no functional changes or on disk additions after this patch.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      06886a5a
    • B
      exofs: move osd.c to ios.c · 8ce9bdd1
      Boaz Harrosh 提交于
      If I do a "git mv" together with a massive code change
      and commit in one patch, git looses the rename and
      records a delete/new instead. This is bad because I want
      a rename recorded so later rebased/cherry-picked patches
      to the old name will work. Also the --follow is lost.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      8ce9bdd1
    • B
      exofs: statfs blocks is sectors not FS blocks · cae012d8
      Boaz Harrosh 提交于
      Even though exofs has a 4k block size, statfs blocks
      is in sectors (512 bytes).
      
      Also if target returns 0 for capacity then make it
      ULLONG_MAX. df does not like zero-size filesystems
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      cae012d8
    • B
      exofs: Prints on mount and unmout · 19fe294f
      Boaz Harrosh 提交于
      It is important to print in the logs when a filesystem was
      mounted and eventually unmounted.
      
      Print the osd-device's osd_name and pid the FS was
      mounted/unmounted on.
      
      TODO: How to also print the namespace path the filesystem was
            mounted on?
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      19fe294f
    • B
      exofs: refactor exofs_i_info initialization into common helper · 9cfdc7aa
      Boaz Harrosh 提交于
      There are two places that initialize inodes: exofs_iget() and
      exofs_new_inode()
      
      As more members of exofs_i_info that need initialization are
      added this code will grow. (soon)
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      9cfdc7aa
    • B
      exofs: dbg-print less · fe33cc1e
      Boaz Harrosh 提交于
      Iner-loops printing is converted to EXOFS_DBG2 which is #defined
      to nothing.
      
      It is now almost bareable to just leave debug-on. Every operation
      is printed once, with most relevant info (I hope).
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      fe33cc1e
    • B
      exofs: More sane debug print · 58311c43
      Boaz Harrosh 提交于
      debug prints should be somewhat useful without actually
      reading the source code
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      58311c43
  2. 24 9月, 2009 1 次提交
    • B
      exofs: remove BKL from super operations · 1ba50bbe
      Boaz Harrosh 提交于
      the two places inside exofs that where taking the BKL were:
      exofs_put_super() - .put_super
      and
      exofs_sync_fs() - which is .sync_fs and is also called from
                        .write_super.
      
      Now exofs_sync_fs() is protected from itself by also taking
      the sb_lock.
      
      exofs_put_super() directly calls exofs_sync_fs() so there is no
      danger between these two either.
      
      In anyway there is absolutely nothing dangerous been done
      inside exofs_sync_fs().
      
      Unless there is some subtle race with the actual lifetime of
      the super_block in regard to .put_super and some other parts
      of the VFS. Which is highly unlikely.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1ba50bbe
  3. 13 7月, 2009 1 次提交
  4. 21 6月, 2009 3 次提交
  5. 12 6月, 2009 4 次提交
    • C
      exofs: add ->sync_fs · 80e09fb9
      Christoph Hellwig 提交于
      Add a ->sync_fs method for data integrity syncs, and reimplement
      ->write_super ontop of it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      80e09fb9
    • C
      ->write_super lock_super pushdown · ebc1ac16
      Christoph Hellwig 提交于
      Push down lock_super into ->write_super instances and remove it from the
      caller.
      
      Following filesystem don't need ->s_lock in ->write_super and are skipped:
      
       * bfs, nilfs2 - no other uses of s_lock and have internal locks in
      	->write_super
       * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
       * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
       	->write_super
       * xfs - no other uses of s_lock and uses internal lock (buffer lock on
      	superblock buffer) to serialize ->write_super.  Also xfs_fs_write_super
      	is superflous and will go away in the next merge window
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ebc1ac16
    • C
      push BKL down into ->put_super · 6cfd0148
      Christoph Hellwig 提交于
      Move BKL into ->put_super from the only caller.  A couple of
      filesystems had trivial enough ->put_super (only kfree and NULLing of
      s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
      hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
      of them probably don't need it, but I'd rather sort that out individually.
      Preferably after all the other BKL pushdowns in that area.
      
      [AV: original used to move lock_super() down as well; these changes are
      removed since we don't do lock_super() at all in generic_shutdown_super()
      now]
      [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6cfd0148
    • C
      remove ->write_super call in generic_shutdown_super · 8c85e125
      Christoph Hellwig 提交于
      We just did a full fs writeout using sync_filesystem before, and if
      that's not enough for the filesystem it can perform it's own writeout
      in ->put_super, which many filesystems already do.
      
      Move a call to foofs_write_super into every foofs_put_super for now to
      guarantee identical behaviour until it's cleaned up by the individual
      filesystem maintainers.
      
      Exceptions:
      
       - affs already has identical copy & pasted code at the beginning of
         affs_put_super so no need to do it twice.
       - xfs does the right thing without it and I have changes pending for
         the xfs tree touching this are so I don't really need conflicts
         here..
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8c85e125
  6. 10 6月, 2009 3 次提交
  7. 11 5月, 2009 1 次提交
    • T
      block: add rq->resid_len · c3a4d78c
      Tejun Heo 提交于
      rq->data_len served two purposes - the length of data buffer on issue
      and the residual count on completion.  This duality creates some
      headaches.
      
      First of all, block layer and low level drivers can't really determine
      what rq->data_len contains while a request is executing.  It could be
      the total request length or it coulde be anything else one of the
      lower layers is using to keep track of residual count.  This
      complicates things because blk_rq_bytes() and thus
      [__]blk_end_request_all() relies on rq->data_len for PC commands.
      Drivers which want to report residual count should first cache the
      total request length, update rq->data_len and then complete the
      request with the cached data length.
      
      Secondly, it makes requests default to reporting full residual count,
      ie. reporting that no data transfer occurred.  The residual count is
      an exception not the norm; however, the driver should clear
      rq->data_len to zero to signify the normal cases while leaving it
      alone means no data transfer occurred at all.  This reverse default
      behavior complicates code unnecessarily and renders block PC on some
      drivers (ide-tape/floppy) unuseable.
      
      This patch adds rq->resid_len which is used only for residual count.
      
      While at it, remove now unnecessasry blk_rq_bytes() caching in
      ide_pc_intr() as rq->data_len is not changed anymore.
      
      Boaz	: spotted missing conversion in osd
      Sergei	: spotted too early conversion to blk_rq_bytes() in ide-tape
      
      [ Impact: cleanup residual count handling, report 0 resid by default ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Darrick J. Wong <djwong@us.ibm.com>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c3a4d78c
  8. 01 4月, 2009 8 次提交
    • B
      exofs: Documentation · 214c8adb
      Boaz Harrosh 提交于
      Added some documentation in exofs.txt, as well as a BUGS file.
      
      For further reading, operation instructions, example scripts
      and up to date infomation and code please see:
      http://open-osd.orgSigned-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      214c8adb
    • B
      exofs: export_operations · 8cf74b39
      Boaz Harrosh 提交于
      implement export_operations and set in superblock.
      It is now posible to export exofs via nfs
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      8cf74b39
    • B
      exofs: super_operations and file_system_type · ba9e5e98
      Boaz Harrosh 提交于
      This patch ties all operation vectors into a file system superblock
      and registers the exofs file_system_type at module's load time.
      
      * The file system control block (AKA on-disk superblock) resides in
        an object with a special ID (defined in common.h).
        Information included in the file system control block is used to
        fill the in-memory superblock structure at mount time. This object
        is created before the file system is used by mkexofs.c It contains
        information such as:
      	- The file system's magic number
      	- The next inode number to be allocated
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      ba9e5e98
    • B
      exofs: dir_inode and directory operations · e6af00f1
      Boaz Harrosh 提交于
      implementation of directory and inode operations.
      
      * A directory is treated as a file, and essentially contains a list
        of <file name, inode #> pairs for files that are found in that
        directory. The object IDs correspond to the files' inode numbers
        and are allocated using a 64bit incrementing global counter.
      * Each file's control block (AKA on-disk inode) is stored in its
        object's attributes. This applies to both regular files and other
        types (directories, device files, symlinks, etc.).
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      e6af00f1
    • B
      exofs: address_space_operations · beaec07b
      Boaz Harrosh 提交于
      OK Now we start to read and write from osd-objects. We try to
      collect at most contiguous pages as possible in a single write/read.
      The first page index is the object's offset.
      
      TODO:
         In 64-bit a single bio can carry at most 128 pages.
         Add support of chaining multiple bios
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      beaec07b
    • B
      exofs: symlink_inode and fast_symlink_inode operations · 982980d7
      Boaz Harrosh 提交于
      Generic implementation of symlink ops.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      982980d7
    • B
      exofs: file and file_inode operations · e8062719
      Boaz Harrosh 提交于
      implementation of the file_operations and inode_operations for
      regular data files.
      
      Most file_operations are generic vfs implementations except:
      - exofs_truncate will truncate the OSD object as well
      - Generic file_fsync is not good for none_bd devices so open code it
      - The default for .flush in Linux is todo nothing so call exofs_fsync
        on the file.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      e8062719
    • B
      exofs: Kbuild, Headers and osd utils · b14f8ab2
      Boaz Harrosh 提交于
      This patch includes osd infrastructure that will be used later by
      the file system.
      
      Also the declarations of constants, on disk structures,
      and prototypes.
      
      And the Kbuild+Kconfig files needed to build the exofs module.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      b14f8ab2