1. 23 3月, 2013 4 次提交
    • L
      drbd: use the cached meta_dev_idx · 68e41a43
      Lars Ellenberg 提交于
      Now we have the cached meta_dev_idx member,
      we can get rid of a few rcu_read_lock() sections and rcu_dereference().
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      68e41a43
    • L
      drbd: prepare for new striped layout of activity log · 3a4d4eb3
      Lars Ellenberg 提交于
      Introduce two new on-disk meta data fields: al_stripes and al_stripe_size_4k
      The intended use case is activity log on RAID 0 or similar.
      Logically consecutive transactions will advance their on-disk position
      by al_stripe_size_4k 4kB (transaction sized) blocks.
      
      Right now, these are still asserted to be the backward compatible
      values al_stripes = 1, al_stripe_size_4k = 8 (which amounts to 32kB).
      
      Also introduce a caching member for meta_dev_idx in the in-core
      structure: even though it is initially passed in in the rcu-protected
      disk_conf structure, it cannot change without a detach/attach cycle.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a4d4eb3
    • L
      drbd: cleanup ondisk meta data layout calculations and defines · ae8bf312
      Lars Ellenberg 提交于
      Add a comment about our meta data layout variants,
      and rename a few defines (e.g. MD_RESERVED_SECT -> MD_128MB_SECT)
      to make it clear that they are short hand for fixed constants,
      and not arbitrarily to be redefined as one may see fit.
      
      Properly pad struct meta_data_on_disk to 4kB,
      and initialize to zero not only the first 512 Byte,
      but all of it in drbd_md_sync().
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ae8bf312
    • L
      drbd: cleanup bogus assert message · 9114d795
      Lars Ellenberg 提交于
      This fixes ASSERT( mdev->state.disk == D_FAILED ) in drivers/block/drbd/drbd_main.c
      
      When we detach from local disk, we let the local refcount hit zero twice.
      
      First, we transition to D_FAILED, so we won't give out new references
      to incoming requests; we still may give out *internal* references, though.
      Once the refcount hits zero [1] while in D_FAILED, we queue a transition
      to D_DISKLESS to our worker.  We need to queue it, because we may be in
      atomic context when putting the reference.
      Once the transition to D_DISKLESS actually happened [2] from worker context,
      we don't give out new internal references either.
      
      Between hitting zero the first time [1] and actually transition to
      D_DISKLESS [2], there may be a few very short lived internal get/put,
      so we may hit zero more than once while being in D_FAILED, or even see a
      race where a an internal get_ldev() happened while D_FAILED, but the
      corresponding put_ldev() happens just after the transition to D_DISKLESS.
      
      That's why we have the additional test_and_set_bit(GO_DISKLESS,);
      and that's why the assert was placed wrong.
      Since there was exactly one code path left to drbd_go_diskless(),
      and that checks already for D_FAILED, drop that assert,
      and fold in the drbd_queue_work().
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9114d795
  2. 28 2月, 2013 7 次提交
  3. 27 2月, 2013 3 次提交
    • S
      libceph: update osd request/reply encoding · 1b83bef2
      Sage Weil 提交于
      Use the new version of the encoding for osd requests and replies.  In the
      process, update the way we are tracking request ops and reply lengths and
      results in the struct ceph_osd_request.  Update the rbd and fs/ceph users
      appropriately.
      
      The main changes are:
       - we keep pointers into the request memory for fields we need to update
         each time the request is sent out over the wire
       - we keep information about the result in an array in the request struct
         where the users can easily get at it.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      1b83bef2
    • A
      rbd: pass length, not op for osd completions · c47f9371
      Alex Elder 提交于
      The only thing type-specific osd completion functions do with their
      osd op parameter is (in some cases) extract the number of bytes
      transferred from it.  In the other cases, the xferred bytes field
      is not used, and total message data transfer byte count (which may
      well be zero) is used.
      
      Just set the object request transfer count in the main osd request
      callback function and provide that to the other routines.  There is
      then no longer any need to pass the op pointer to the type-specific
      completion routines, so drop those parameters.
      
      Stop doing anything with the total message data length.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      c47f9371
    • A
      rbd: move rbd_osd_trivial_callback() · 39bf2c5d
      Alex Elder 提交于
      This function is slightly out of place, probably the result
      of an errant automatic merge or something.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      39bf2c5d
  4. 26 2月, 2013 5 次提交
  5. 23 2月, 2013 1 次提交
  6. 22 2月, 2013 6 次提交
    • G
      loopdev: ignore negative offset when calculate loop device size · b7a1da69
      Guo Chao 提交于
      Negative offset may cause loop device size larger than backing file
      size.
      
       $ fallocate -l 1M a
       $ losetup --offset 0xffffffffffff0000 /dev/loop0 a
       $ blockdev --getsize64 /dev/loop0
       1114112
       $ ls -l a
       -rw-r--r-- 1 root root 1048576 Jan 23 12:46 a
       $ cat /dev/loop0
       cat: /dev/loop0: Input/output error
      
      It makes no sense to do that. Only apply offset when it's positive.
      
      Fix a typo in the comment by the way.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b7a1da69
    • G
      loopdev: remove an user triggerable oops · b1a66504
      Guo Chao 提交于
      When loopdev is built as module and we pass an invalid parameter,
      loop_init() will return directly without deregister misc device, which
      will cause an oops when insert loop module next time because we left some
      garbage in the misc device list.
      
      Test case:
      sudo modprobe loop max_part=1024
      (failed due to invalid parameter)
      sudo modprobe loop
      (oops)
      
      Clean up nicely to avoid such oops.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1a66504
    • G
      loopdev: move common code into loop_figure_size() · 7b0576a3
      Guo Chao 提交于
      Update block device size in accord with gendisk size and let userspace
      know the change in loop_figure_size(). This is a clean up to remove
      common code of loop_figure_size()'s two callers.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7b0576a3
    • G
      loopdev: update block device size in loop_set_status() · 541c742a
      Guo Chao 提交于
      Loop device driver sometimes fails to impose the size limit on the
      device. Keep issuing following two commands:
      
      losetup --offset 7517244416 --sizelimit 3224971264 /dev/loop0 backed_file
      blockdev --getsize64 /dev/loop0
      
      blockdev reports file size instead of sizelimit several out of 100 times.
      
      The problems are:
      
      	- losetup set up the device in two ioctl:
      		  LOOP_SET_FD and LOOP_SET_STATUS64.
      
      	- LOOP_SET_STATUS64 only update size of gendisk.
      
      Block device size will be updated lazily when device comes to use. If udev
      rushes in between the two ioctl, it will bring in a block device whose
      size is backing file size. If the device is not released after
      LOOP_SET_STATUS64 ioctl, blockdev will not see the updated size.
      
      Update block size in LOOP_SET_STATUS64 ioctl.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Reported-by: NM. Hindess <hindessm@uk.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      541c742a
    • G
      loopdev: fix a deadlock · 5370019d
      Guo Chao 提交于
      bd_mutex and lo_ctl_mutex can be held in different order.
      
      Path #1:
      
      blkdev_open
       blkdev_get
        __blkdev_get (hold bd_mutex)
         lo_open (hold lo_ctl_mutex)
      
      Path #2:
      
      blkdev_ioctl
       lo_ioctl (hold lo_ctl_mutex)
        lo_set_capacity (hold bd_mutex)
      
      Lockdep does not report it, because path #2 actually holds a subclass of
      lo_ctl_mutex.  This subclass seems creep into the code by mistake.  The
      patch author actually just mentioned it in the changelog, see commit
      f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see:
      
      	http://marc.info/?l=linux-kernel&m=123806169129727&w=2
      
      Path #2 hold bd_mutex to call bd_set_size(), I've protected it
      with i_mutex in a previous patch, so drop bd_mutex at this site.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5370019d
    • C
      drivers/block/swim3.c: fix null pointer dereference · 7414d4f6
      Cong Ding 提交于
      The use of pointer fs should be after the null check.
      Signed-off-by: NCong Ding <dinggnu@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7414d4f6
  7. 20 2月, 2013 9 次提交
    • A
      libceph: drop return value from page vector copy routines · 903bb32e
      Alex Elder 提交于
      The return values provided for ceph_copy_to_page_vector() and
      ceph_copy_from_page_vector() serve no purpose, so get rid of them.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      903bb32e
    • A
      rbd: ignore result of ceph_copy_from_page_vector() · 23ed6e13
      Alex Elder 提交于
      The result of ceph_copy_from_page_vector() is simply the length
      argument it is provided.
      
      This is called by rbd_obj_method_sync(), which returns the result if
      it's non-negative.  But we always either ignore or overwrite that
      return value.  So explicitly ignore what's returned by the copy
      function, and have rbd_obj_method_sync() always return either a
      negative errno or 0.
      
      We also return the result of ceph_copy_from_page_vector() in
      rbd_obj_read_sync().  There we still want to return the number of
      bytes transferred, but we can use the value we already have in hand
      rather than what ceph_copy_from_page_vector() provides.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      23ed6e13
    • A
      rbd: prevent bytes transferred overflow · 1ceae7ef
      Alex Elder 提交于
      In rbd_obj_read_sync(), verify the number of bytes transferred won't
      exceed what can be represented by a size_t before using it to
      indicate the number of bytes to copy to the result buffer.
      
      (The real motivation for this is to prepare for the next patch.)
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      1ceae7ef
    • A
      libceph: allow STAT osd operations · fbfab539
      Alex Elder 提交于
      Add support for CEPH_OSD_OP_STAT operations in the osd client
      and in rbd.
      
      This operation sends no data to the osd; everything required is
      encoded in identity of the target object.
      
      The result will be ENOENT if the object doesn't exist.  If it does
      exist and no other error occurs the server returns the size and last
      modification time of the target object as output data (in little
      endian format).  The size is a 64 bit unsigned and the time is
      ceph_timespec structure (two unsigned 32-bit integers, representing
      a seconds and nanoseconds value).
      
      This resolves:
          http://tracker.ceph.com/issues/4007Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      fbfab539
    • A
      rbd: add parentheses to object request iterator macros · ef06f4d3
      Alex Elder 提交于
      The for_each_obj_request*() macros should parenthesize their uses of
      the ireq parameter.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ef06f4d3
    • R
      xen-blkback: use balloon pages for persistent grants · 087ffecd
      Roger Pau Monne 提交于
      With current persistent grants implementation we are not freeing the
      persistent grants after we disconnect the device. Since grant map
      operations change the mfn of the allocated page, and we can no longer
      pass it to __free_page without setting the mfn to a sane value, use
      balloon grant pages instead, as the gntdev device does.
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      Cc: stable@vger.kernel.org
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      087ffecd
    • K
      xen-blkfront: drop the use of llist_for_each_entry_safe · f84adf49
      Konrad Rzeszutek Wilk 提交于
      Replace llist_for_each_entry_safe with a while loop.
      
      llist_for_each_entry_safe can trigger a bug in GCC 4.1, so it's best
      to remove it and use a while loop and do the deletion manually.
      
      Specifically this bug can be triggered by hot-unplugging a disk, either
      by doing xm block-detach or by save/restore cycle.
      
      BUG: unable to handle kernel paging request at fffffffffffffff0
      IP: [<ffffffffa0047223>] blkif_free+0x63/0x130 [xen_blkfront]
      The crash call trace is:
      	...
      bad_area_nosemaphore+0x13/0x20
      do_page_fault+0x25e/0x4b0
      page_fault+0x25/0x30
      ? blkif_free+0x63/0x130 [xen_blkfront]
      blkfront_resume+0x46/0xa0 [xen_blkfront]
      xenbus_dev_resume+0x6c/0x140
      pm_op+0x192/0x1b0
      device_resume+0x82/0x1e0
      dpm_resume+0xc9/0x1a0
      dpm_resume_end+0x15/0x30
      do_suspend+0x117/0x1e0
      
      When drilling down to the assembler code, on newer GCC it does
      .L29:
              cmpq    $-16, %r12      #, persistent_gnt check
              je      .L30    	#, out of the loop
      .L25:
      	... code in the loop
              testq   %r13, %r13      # n
              je      .L29    	#, back to the top of the loop
              cmpq    $-16, %r12      #, persistent_gnt check
              movq    16(%r12), %r13  # <variable>.node.next, n
              jne     .L25    	#,	back to the top of the loop
      .L30:
      
      While on GCC 4.1, it is:
      L78:
      	... code in the loop
      	testq   %r13, %r13      # n
              je      .L78    #,	back to the top of the loop
              movq    16(%rbx), %r13  # <variable>.node.next, n
              jmp     .L78    #,	back to the top of the loop
      
      Which basically means that the exit loop condition instead of
      being:
      
      	&(pos)->member != NULL;
      
      is:
      	;
      
      which makes the loop unbound.
      
      Since xen-blkfront is the only user of the llist_for_each_entry_safe
      macro remove it from llist.h.
      
      Orabug: 16263164
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f84adf49
    • K
      xen/blkback: Don't trust the handle from the frontend. · 01c681d4
      Konrad Rzeszutek Wilk 提交于
      The 'handle' is the device that the request is from. For the life-time
      of the ring we copy it from a request to a response so that the frontend
      is not surprised by it. But we do not need it - when we start processing
      I/Os we have our own 'struct phys_req' which has only most essential
      information about the request. In fact the 'vbd_translate' ends up
      over-writing the preq.dev with a value from the backend.
      
      This assignment of preq.dev with the 'handle' value is superfluous
      so lets not do it.
      
      Cc: stable@vger.kernel.org
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      01c681d4
    • J
      xen-blkback: do not leak mode property · 9d092603
      Jan Beulich 提交于
      "be->mode" is obtained from xenbus_read(), which does a kmalloc() for
      the message body. The short string is never released, so do it along
      with freeing "be" itself, and make sure the string isn't kept when
      backend_changed() doesn't complete successfully (which made it
      desirable to slightly re-structure that function, so that the error
      cleanup can be done in one place).
      Reported-by: NOlaf Hering <olaf@aepfle.de>
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9d092603
  8. 19 2月, 2013 2 次提交
  9. 15 2月, 2013 1 次提交
  10. 14 2月, 2013 2 次提交