1. 28 2月, 2010 1 次提交
  2. 10 12月, 2009 3 次提交
    • B
      exofs: Multi-device mirror support · 04dc1e88
      Boaz Harrosh 提交于
      This patch changes on-disk format, it is accompanied with a parallel
      patch to mkfs.exofs that enables multi-device capabilities.
      
      After this patch, old exofs will refuse to mount a new formatted FS and
      new exofs will refuse an old format. This is done by moving the magic
      field offset inside the FSCB. A new FSCB *version* field was added. In
      the future, exofs will refuse to mount unmatched FSCB version. To
      up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option
      before mounting.
      
      Introduced, a new object that contains a *device-table*. This object
      contains the default *data-map* and a linear array of devices
      information, which identifies the devices used in the filesystem. This
      object is only written to offline by mkfs.exofs. This is why it is kept
      separate from the FSCB, since the later is written to while mounted.
      
      Same partition number, same object number is used on all devices only
      the device varies.
      
      * define the new format, then load the device table on mount time make
        sure every thing is supported.
      
      * Change I/O engine to now support Mirror IO, .i.e write same data
        to multiple devices, read from a random device to spread the
        read-load from multiple clients (TODO: stripe read)
      
      Implementation notes:
       A few points introduced in previous patch should be mentioned here:
      
      * Special care was made so absolutlly all operation that have any chance
        of failing are done before any osd-request is executed. This is to
        minimize the need for a data consistency recovery, to only real IO
        errors.
      
      * Each IO state has a kref. It starts at 1, any osd-request executed
        will increment the kref, finally when all are executed the first ref
        is dropped. At IO-done, each request completion decrements the kref,
        the last one to return executes the internal _last_io() routine.
        _last_io() will call the registered io_state_done. On sync mode a
        caller does not supply a done method, indicating a synchronous
        request, the caller is put to sleep and a special io_state_done is
        registered that will awaken the caller. Though also in sync mode all
        operations are executed in parallel.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      04dc1e88
    • B
      exofs: Move all operations to an io_engine · 06886a5a
      Boaz Harrosh 提交于
      In anticipation for multi-device operations, we separate osd operations
      into an abstract I/O API. Currently only one device is used but later
      when adding more devices, we will drive all devices in parallel according
      to a "data_map" that describes how data is arranged on multiple devices.
      The file system level operates, like before, as if there is one object
      (inode-number) and an i_size. The io engine will split this to the same
      object-number but on multiple device.
      
      At first we introduce Mirror (raid 1) layout. But at the final outcome
      we intend to fully implement the pNFS-Objects data-map, including
      raid 0,4,5,6 over mirrored devices, over multiple device-groups. And
      more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12
      
      * Define an io_state based API for accessing osd storage devices
        in an abstract way.
        Usage:
      	First a caller allocates an io state with:
      		exofs_get_io_state(struct exofs_sb_info *sbi,
      				   struct exofs_io_state** ios);
      
      	Then calles one of:
      		exofs_sbi_create(struct exofs_io_state *ios);
      		exofs_sbi_remove(struct exofs_io_state *ios);
      		exofs_sbi_write(struct exofs_io_state *ios);
      		exofs_sbi_read(struct exofs_io_state *ios);
      		exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len);
      
      	And when done
      		exofs_put_io_state(struct exofs_io_state *ios);
      
      * Convert all source files to use this new API
      * Convert from bio_alloc to bio_kmalloc
      * In io engine we make use of the now fixed osd_req_decode_sense
      
      There are no functional changes or on disk additions after this patch.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      06886a5a
    • B
      exofs: move osd.c to ios.c · 8ce9bdd1
      Boaz Harrosh 提交于
      If I do a "git mv" together with a massive code change
      and commit in one patch, git looses the rename and
      records a delete/new instead. This is bad because I want
      a rename recorded so later rebased/cherry-picked patches
      to the old name will work. Also the --follow is lost.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      8ce9bdd1
  3. 21 6月, 2009 1 次提交
  4. 10 6月, 2009 1 次提交
  5. 11 5月, 2009 1 次提交
    • T
      block: add rq->resid_len · c3a4d78c
      Tejun Heo 提交于
      rq->data_len served two purposes - the length of data buffer on issue
      and the residual count on completion.  This duality creates some
      headaches.
      
      First of all, block layer and low level drivers can't really determine
      what rq->data_len contains while a request is executing.  It could be
      the total request length or it coulde be anything else one of the
      lower layers is using to keep track of residual count.  This
      complicates things because blk_rq_bytes() and thus
      [__]blk_end_request_all() relies on rq->data_len for PC commands.
      Drivers which want to report residual count should first cache the
      total request length, update rq->data_len and then complete the
      request with the cached data length.
      
      Secondly, it makes requests default to reporting full residual count,
      ie. reporting that no data transfer occurred.  The residual count is
      an exception not the norm; however, the driver should clear
      rq->data_len to zero to signify the normal cases while leaving it
      alone means no data transfer occurred at all.  This reverse default
      behavior complicates code unnecessarily and renders block PC on some
      drivers (ide-tape/floppy) unuseable.
      
      This patch adds rq->resid_len which is used only for residual count.
      
      While at it, remove now unnecessasry blk_rq_bytes() caching in
      ide_pc_intr() as rq->data_len is not changed anymore.
      
      Boaz	: spotted missing conversion in osd
      Sergei	: spotted too early conversion to blk_rq_bytes() in ide-tape
      
      [ Impact: cleanup residual count handling, report 0 resid by default ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Darrick J. Wong <djwong@us.ibm.com>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c3a4d78c
  6. 01 4月, 2009 1 次提交