1. 18 12月, 2014 2 次提交
  2. 30 10月, 2014 1 次提交
    • M
      libceph: use memalloc flags for net IO · 89baaa57
      Mike Christie 提交于
      This patch has ceph's lib code use the memalloc flags.
      
      If the VM layer needs to write data out to free up memory to handle new
      allocation requests, the block layer must be able to make forward progress.
      To handle that requirement we use structs like mempools to reserve memory for
      objects like bios and requests.
      
      The problem is when we send/receive block layer requests over the network
      layer, net skb allocations can fail and the system can lock up.
      To solve this, the memalloc related flags were added. NBD, iSCSI
      and NFS uses these flags to tell the network/vm layer that it should
      use memory reserves to fullfill allcation requests for structs like
      skbs.
      
      I am running ceph in a bunch of VMs in my laptop, so this patch was
      not tested very harshly.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
      89baaa57
  3. 15 10月, 2014 3 次提交
  4. 09 8月, 2014 1 次提交
    • I
      libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly · 5f740d7e
      Ilya Dryomov 提交于
      Determining ->last_piece based on the value of ->page_offset + length
      is incorrect because length here is the length of the entire message.
      ->last_piece set to false even if page array data item length is <=
      PAGE_SIZE, which results in invalid length passed to
      ceph_tcp_{send,recv}page() and causes various asserts to fire.
      
          # cat pages-cursor-init.sh
          #!/bin/bash
          rbd create --size 10 --image-format 2 foo
          FOO_DEV=$(rbd map foo)
          dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
          rbd snap create foo@snap
          rbd snap protect foo@snap
          rbd clone foo@snap bar
          # rbd_resize calls librbd rbd_resize(), size is in bytes
          ./rbd_resize bar $(((4 << 20) + 512))
          rbd resize --size 10 bar
          BAR_DEV=$(rbd map bar)
          # trigger a 512-byte copyup -- 512-byte page array data item
          dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5
      
      The problem exists only in ceph_msg_data_pages_cursor_init(),
      ceph_msg_data_pages_advance() does the right thing.  The size_t cast is
      unnecessary.
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      5f740d7e
  5. 08 7月, 2014 2 次提交
  6. 17 5月, 2014 1 次提交
  7. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  8. 05 4月, 2014 1 次提交
  9. 08 2月, 2014 1 次提交
  10. 26 1月, 2014 1 次提交
    • I
      libceph: add ceph_kv{malloc,free}() and switch to them · eeb0bed5
      Ilya Dryomov 提交于
      Encapsulate kmalloc vs vmalloc memory allocation and freeing logic into
      two helpers, ceph_kvmalloc() and ceph_kvfree(), and switch to them.
      
      ceph_kvmalloc() kmalloc()'s a maximum of 8 pages, anything bigger is
      vmalloc()'ed with __GFP_HIGHMEM set.  This changes the existing
      behaviour:
      
      - for buffers (ceph_buffer_new()), from trying to kmalloc() everything
        and using vmalloc() just as a fallback
      
      - for messages (ceph_msg_new()), from going to vmalloc() for anything
        bigger than a page
      
      - for messages (ceph_msg_new()), from disallowing vmalloc() to use high
        memory
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eeb0bed5
  11. 14 1月, 2014 2 次提交
  12. 01 1月, 2014 3 次提交
  13. 24 11月, 2013 1 次提交
    • K
      ceph: Convert to immutable biovecs · f38a5181
      Kent Overstreet 提交于
      Now that we've got a mechanism for immutable biovecs -
      bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
      respect it instead of using the bvec array directly.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Sage Weil <sage@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      f38a5181
  14. 10 8月, 2013 1 次提交
  15. 25 7月, 2013 1 次提交
  16. 03 5月, 2013 2 次提交
  17. 02 5月, 2013 16 次提交
    • A
      libceph: fix two messenger bugs · a51b272e
      Alex Elder 提交于
      This patch makes four small changes in the ceph messenger.
      
      While getting copyup functionality working I found two bugs in the
      messenger.  Existing paths through the code did not trigger these
      problems, but they're fixed here:
          - In ceph_msg_data_pagelist_cursor_init(), the cursor's
            last_piece field was being checked against the length
            supplied.  This was OK until this commit: ccba6d98 libceph:
            implement multiple data items in a message That commit changed
            the cursor init routines to allow lengths to be supplied that
            exceeded the size of the current data item. Because of this,
            we have to use the assigned cursor resid field rather than the
            provided length in determining whether the cursor points to
            the last piece of a data item.
          - In ceph_msg_data_add_pages(), a BUG_ON() was erroneously
            catching attempts to add page data to a message if the message
            already had data assigned to it. That was OK until that same
            commit, at which point it was fine for messages to have
            multiple data items. It slipped through because that BUG_ON()
            call was present twice in that function. (You can never be too
            careful.)
      
      In addition two other minor things are changed:
          - In ceph_msg_data_cursor_init(), the local variable "data" was
            getting assigned twice.
          - In ceph_msg_data_advance(), it was assumed that the
            type-specific advance routine would set new_piece to true
            after it advanced past the last piece. That may have been
            fine, but since we check for that case we might as well set it
            explicitly in ceph_msg_data_advance().
      
      This resolves:
          http://tracker.ceph.com/issues/4762Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a51b272e
    • A
      libceph: add, don't set data for a message · 90af3602
      Alex Elder 提交于
      Change the names of the functions that put data on a pagelist to
      reflect that we're adding to whatever's already there rather than
      just setting it to the one thing.  Currently only one data item is
      ever added to a message, but that's about to change.
      
      This resolves:
          http://tracker.ceph.com/issues/2770Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      90af3602
    • A
      libceph: implement multiple data items in a message · ca8b3a69
      Alex Elder 提交于
      This patch adds support to the messenger for more than one data item
      in its data list.
      
      A message data cursor has two more fields to support this:
          - a count of the number of bytes left to be consumed across
            all data items in the list, "total_resid"
          - a pointer to the head of the list (for validation only)
      
      The cursor initialization routine has been split into two parts: the
      outer one, which initializes the cursor for traversing the entire
      list of data items; and the inner one, which initializes the cursor
      to start processing a single data item.
      
      When a message cursor is first initialized, the outer initialization
      routine sets total_resid to the length provided.  The data pointer
      is initialized to the first data item on the list.  From there, the
      inner initialization routine finishes by setting up to process the
      data item the cursor points to.
      
      Advancing the cursor consumes bytes in total_resid.  If the resid
      field reaches zero, it means the current data item is fully
      consumed.  If total_resid indicates there is more data, the cursor
      is advanced to point to the next data item, and then the inner
      initialization routine prepares for using that.  (A check is made at
      this point to make sure we don't wrap around the front of the list.)
      
      The type-specific init routines are modified so they can be given a
      length that's larger than what the data item can support.  The resid
      field is initialized to the smaller of the provided length and the
      length of the entire data item.
      
      When total_resid reaches zero, we're done.
      
      This resolves:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ca8b3a69
    • A
      libceph: replace message data pointer with list · 5240d9f9
      Alex Elder 提交于
      In place of the message data pointer, use a list head which links
      through message data items.  For now we only support a single entry
      on that list.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5240d9f9
    • A
      libceph: have cursor point to data · 8ae4f4f5
      Alex Elder 提交于
      Rather than having a ceph message data item point to the cursor it's
      associated with, have the cursor point to a data item.  This will
      allow a message cursor to be used for more than one data item.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8ae4f4f5
    • A
      libceph: move cursor into message · 36153ec9
      Alex Elder 提交于
      A message will only be processing a single data item at a time, so
      there's no need for each data item to have its own cursor.
      
      Move the cursor embedded in the message data structure into the
      message itself.  To minimize the impact, keep the data->cursor
      field, but make it be a pointer to the cursor in the message.
      
      Move the definition of ceph_msg_data above ceph_msg_data_cursor so
      the cursor can point to the data without a forward definition rather
      than vice-versa.
      
      This and the upcoming patches are part of:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      36153ec9
    • A
      libceph: record bio length · c851c495
      Alex Elder 提交于
      The bio is the only data item type that doesn't record its full
      length.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c851c495
    • A
      libceph: skip message if too big to receive · f759ebb9
      Alex Elder 提交于
      We know the length of our message buffers.  If we get a message
      that's too long, just dump it and ignore it.  If skip was set
      then con->in_msg won't be valid, so be careful not to dereference
      a null pointer in the process.
      
      This resolves:
          http://tracker.ceph.com/issues/4664Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      f759ebb9
    • A
      libceph: fix possible CONFIG_BLOCK build problem · ea96571f
      Alex Elder 提交于
      This patch:
          15a0d7b libceph: record message data length
      did not enclose some bio-specific code inside CONFIG_BLOCK as
      it should have.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ea96571f
    • A
      libceph: provide data length when preparing message · 98fa5dd8
      Alex Elder 提交于
      In prepare_message_data(), the length used to initialize the cursor
      is taken from the header of the message provided.  I'm working
      toward not using the header data length field to determine length in
      outbound messages, and this is a step in that direction.  For
      inbound messages this will be set to be the actual number of bytes
      that are arriving (which may be less than the total size of the data
      buffer available).
      
      This resolves:
          http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      98fa5dd8
    • A
      libceph: record message data length · a1930804
      Alex Elder 提交于
      Keep track of the length of the data portion for a message in a
      separate field in the ceph_msg structure.  This information has
      been maintained in wire byte order in the message header, but
      that's going to change soon.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a1930804
    • A
      libceph: account for alignment in pages cursor · 56fc5659
      Alex Elder 提交于
      When a cursor for a page array data message is initialized it needs
      to determine the initial value for cursor->last_piece.  Currently it
      just checks if length is less than a page, but that's not correct.
      The data in the first page in the array will be offset by a page
      offset based on the alignment recorded for the data.  (All pages
      thereafter will be aligned at the base of the page, so there's
      no need to account for this except for the first page.)
      
      Because this was wrong, there was a case where the length of a piece
      would be calculated as all of the residual bytes in the message and
      that plus the page offset could exceed the length of a page.
      
      So fix this case.  Make sure the sum won't wrap.
      
      This resolves a third issue described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      56fc5659
    • A
      libceph: page offset must be less than page size · 5df521b1
      Alex Elder 提交于
      Currently ceph_msg_data_pages_advance() allows the page offset value
      to be PAGE_SIZE, apparently assuming ceph_msg_data_pages_next() will
      treat it as 0.  But that doesn't happen, and the result led to a
      helpful assertion failure.
      
      Change ceph_msg_data_pages_advance() to truncate the offset to 0
      before returning if it reaches PAGE_SIZE.
      
      Make a few other minor adjustments in this area (comments and a
      better assertion) while modifying it.
      
      This resolves a second issue described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      5df521b1
    • A
      libceph: fix broken data length assertions · 1190bf06
      Alex Elder 提交于
      It's OK for the result of a read to come back with fewer bytes than
      were requested.  So don't trigger a BUG() in that case when
      initializing the data cursor.
      
      This resolves the first problem described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1190bf06
    • A
      libceph: make message data be a pointer · 6644ed7b
      Alex Elder 提交于
      Begin the transition from a single message data item to a list of
      them by replacing the "data" structure in a message with a pointer
      to a ceph_msg_data structure.
      
      A null pointer will indicate the message has no data; replace the
      use of ceph_msg_has_data() with a simple check for a null pointer.
      
      Create functions ceph_msg_data_create() and ceph_msg_data_destroy()
      to dynamically allocate and free a data item structure of a given type.
      
      When a message has its data item "set," allocate one of these to
      hold the data description, and free it when the last reference to
      the message is dropped.
      
      This partially resolves:
          http://tracker.ceph.com/issues/4429Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6644ed7b
    • A
      libceph: use only ceph_msg_data_advance() · 8ea299bc
      Alex Elder 提交于
      The *_msg_pos_next() functions do little more than call
      ceph_msg_data_advance().  Replace those wrapper functions with
      a simple call to ceph_msg_data_advance().
      
      This cleanup is related to:
          http://tracker.ceph.com/issues/4428Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8ea299bc