1. 20 4月, 2015 1 次提交
    • I
      libceph: don't overwrite specific con error msgs · 67c64eb7
      Ilya Dryomov 提交于
      - specific con->error_msg messages (e.g. "protocol version mismatch")
        end up getting overwritten by a catch-all "socket error on read
        / write", introduced in commit 3a140a0d ("libceph: report socket
        read/write error message")
      - "bad message sequence # for incoming message" loses to "bad crc" due
        to the fact that -EBADMSG is used for both
      
      Fix it, and tidy up con->error_msg assignments and pr_errs while at it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      67c64eb7
  2. 08 4月, 2015 1 次提交
    • I
      Revert "libceph: use memalloc flags for net IO" · 6d7fdb0a
      Ilya Dryomov 提交于
      This reverts commit 89baaa57.
      
      Dirty page throttling should be sufficient for us in the general case
      so there is no need to use __GFP_MEMALLOC - it would be needed only in
      the swap-over-rbd case, which we currently don't support.  (It would
      probably take approximately the commit that is being reverted to add
      that support, but we would also need the "swap" option to distinguish
      from the general case and make sure swap ceph_client-s aren't shared
      with anything else.)  See ceph-devel threads [1] and [2] for the
      details of why enabling pfmemalloc reserves for all cases is a bad
      thing.
      
      On top of potential system lockups related to drained emergency
      reserves, this turned out to cause ceph lockups in case peers are on
      the same host and communicating via loopback due to sk_filter()
      dropping pfmemalloc skbs on the receiving side because the receiving
      loopback socket is not tagged with SOCK_MEMALLOC.
      
      [1] "SOCK_MEMALLOC vs loopback"
          http://www.spinics.net/lists/ceph-devel/msg22998.html
      [2] "[PATCH] libceph: don't set memalloc flags in loopback case"
          http://www.spinics.net/lists/ceph-devel/msg23392.html
      
      Conflicts:
      	net/ceph/messenger.c [ context: tcp_nodelay option ]
      
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Sage Weil <sage@redhat.com>
      Cc: stable@vger.kernel.org # 3.18+, needs backporting
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Acked-by: NMike Christie <michaelc@cs.wisc.edu>
      Acked-by: NMel Gorman <mgorman@suse.de>
      6d7fdb0a
  3. 19 2月, 2015 1 次提交
  4. 18 12月, 2014 2 次提交
  5. 30 10月, 2014 1 次提交
    • M
      libceph: use memalloc flags for net IO · 89baaa57
      Mike Christie 提交于
      This patch has ceph's lib code use the memalloc flags.
      
      If the VM layer needs to write data out to free up memory to handle new
      allocation requests, the block layer must be able to make forward progress.
      To handle that requirement we use structs like mempools to reserve memory for
      objects like bios and requests.
      
      The problem is when we send/receive block layer requests over the network
      layer, net skb allocations can fail and the system can lock up.
      To solve this, the memalloc related flags were added. NBD, iSCSI
      and NFS uses these flags to tell the network/vm layer that it should
      use memory reserves to fullfill allcation requests for structs like
      skbs.
      
      I am running ceph in a bunch of VMs in my laptop, so this patch was
      not tested very harshly.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
      89baaa57
  6. 15 10月, 2014 3 次提交
  7. 09 8月, 2014 1 次提交
    • I
      libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly · 5f740d7e
      Ilya Dryomov 提交于
      Determining ->last_piece based on the value of ->page_offset + length
      is incorrect because length here is the length of the entire message.
      ->last_piece set to false even if page array data item length is <=
      PAGE_SIZE, which results in invalid length passed to
      ceph_tcp_{send,recv}page() and causes various asserts to fire.
      
          # cat pages-cursor-init.sh
          #!/bin/bash
          rbd create --size 10 --image-format 2 foo
          FOO_DEV=$(rbd map foo)
          dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
          rbd snap create foo@snap
          rbd snap protect foo@snap
          rbd clone foo@snap bar
          # rbd_resize calls librbd rbd_resize(), size is in bytes
          ./rbd_resize bar $(((4 << 20) + 512))
          rbd resize --size 10 bar
          BAR_DEV=$(rbd map bar)
          # trigger a 512-byte copyup -- 512-byte page array data item
          dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5
      
      The problem exists only in ceph_msg_data_pages_cursor_init(),
      ceph_msg_data_pages_advance() does the right thing.  The size_t cast is
      unnecessary.
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      5f740d7e
  8. 08 7月, 2014 2 次提交
  9. 17 5月, 2014 1 次提交
  10. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  11. 05 4月, 2014 1 次提交
  12. 08 2月, 2014 1 次提交
  13. 26 1月, 2014 1 次提交
    • I
      libceph: add ceph_kv{malloc,free}() and switch to them · eeb0bed5
      Ilya Dryomov 提交于
      Encapsulate kmalloc vs vmalloc memory allocation and freeing logic into
      two helpers, ceph_kvmalloc() and ceph_kvfree(), and switch to them.
      
      ceph_kvmalloc() kmalloc()'s a maximum of 8 pages, anything bigger is
      vmalloc()'ed with __GFP_HIGHMEM set.  This changes the existing
      behaviour:
      
      - for buffers (ceph_buffer_new()), from trying to kmalloc() everything
        and using vmalloc() just as a fallback
      
      - for messages (ceph_msg_new()), from going to vmalloc() for anything
        bigger than a page
      
      - for messages (ceph_msg_new()), from disallowing vmalloc() to use high
        memory
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eeb0bed5
  14. 14 1月, 2014 2 次提交
  15. 01 1月, 2014 3 次提交
  16. 24 11月, 2013 1 次提交
    • K
      ceph: Convert to immutable biovecs · f38a5181
      Kent Overstreet 提交于
      Now that we've got a mechanism for immutable biovecs -
      bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
      respect it instead of using the bvec array directly.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Sage Weil <sage@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      f38a5181
  17. 10 8月, 2013 1 次提交
  18. 25 7月, 2013 1 次提交
  19. 03 5月, 2013 2 次提交
  20. 02 5月, 2013 13 次提交
    • A
      libceph: fix two messenger bugs · a51b272e
      Alex Elder 提交于
      This patch makes four small changes in the ceph messenger.
      
      While getting copyup functionality working I found two bugs in the
      messenger.  Existing paths through the code did not trigger these
      problems, but they're fixed here:
          - In ceph_msg_data_pagelist_cursor_init(), the cursor's
            last_piece field was being checked against the length
            supplied.  This was OK until this commit: ccba6d98 libceph:
            implement multiple data items in a message That commit changed
            the cursor init routines to allow lengths to be supplied that
            exceeded the size of the current data item. Because of this,
            we have to use the assigned cursor resid field rather than the
            provided length in determining whether the cursor points to
            the last piece of a data item.
          - In ceph_msg_data_add_pages(), a BUG_ON() was erroneously
            catching attempts to add page data to a message if the message
            already had data assigned to it. That was OK until that same
            commit, at which point it was fine for messages to have
            multiple data items. It slipped through because that BUG_ON()
            call was present twice in that function. (You can never be too
            careful.)
      
      In addition two other minor things are changed:
          - In ceph_msg_data_cursor_init(), the local variable "data" was
            getting assigned twice.
          - In ceph_msg_data_advance(), it was assumed that the
            type-specific advance routine would set new_piece to true
            after it advanced past the last piece. That may have been
            fine, but since we check for that case we might as well set it
            explicitly in ceph_msg_data_advance().
      
      This resolves:
          http://tracker.ceph.com/issues/4762Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a51b272e
    • A
      libceph: add, don't set data for a message · 90af3602
      Alex Elder 提交于
      Change the names of the functions that put data on a pagelist to
      reflect that we're adding to whatever's already there rather than
      just setting it to the one thing.  Currently only one data item is
      ever added to a message, but that's about to change.
      
      This resolves:
          http://tracker.ceph.com/issues/2770Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      90af3602
    • A
      libceph: implement multiple data items in a message · ca8b3a69
      Alex Elder 提交于
      This patch adds support to the messenger for more than one data item
      in its data list.
      
      A message data cursor has two more fields to support this:
          - a count of the number of bytes left to be consumed across
            all data items in the list, "total_resid"
          - a pointer to the head of the list (for validation only)
      
      The cursor initialization routine has been split into two parts: the
      outer one, which initializes the cursor for traversing the entire
      list of data items; and the inner one, which initializes the cursor
      to start processing a single data item.
      
      When a message cursor is first initialized, the outer initialization
      routine sets total_resid to the length provided.  The data pointer
      is initialized to the first data item on the list.  From there, the
      inner initialization routine finishes by setting up to process the
      data item the cursor points to.
      
      Advancing the cursor consumes bytes in total_resid.  If the resid
      field reaches zero, it means the current data item is fully
      consumed.  If total_resid indicates there is more data, the cursor
      is advanced to point to the next data item, and then the inner
      initialization routine prepares for using that.  (A check is made at
      this point to make sure we don't wrap around the front of the list.)
      
      The type-specific init routines are modified so they can be given a
      length that's larger than what the data item can support.  The resid
      field is initialized to the smaller of the provided length and the
      length of the entire data item.
      
      When total_resid reaches zero, we're done.
      
      This resolves:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ca8b3a69
    • A
      libceph: replace message data pointer with list · 5240d9f9
      Alex Elder 提交于
      In place of the message data pointer, use a list head which links
      through message data items.  For now we only support a single entry
      on that list.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5240d9f9
    • A
      libceph: have cursor point to data · 8ae4f4f5
      Alex Elder 提交于
      Rather than having a ceph message data item point to the cursor it's
      associated with, have the cursor point to a data item.  This will
      allow a message cursor to be used for more than one data item.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8ae4f4f5
    • A
      libceph: move cursor into message · 36153ec9
      Alex Elder 提交于
      A message will only be processing a single data item at a time, so
      there's no need for each data item to have its own cursor.
      
      Move the cursor embedded in the message data structure into the
      message itself.  To minimize the impact, keep the data->cursor
      field, but make it be a pointer to the cursor in the message.
      
      Move the definition of ceph_msg_data above ceph_msg_data_cursor so
      the cursor can point to the data without a forward definition rather
      than vice-versa.
      
      This and the upcoming patches are part of:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      36153ec9
    • A
      libceph: record bio length · c851c495
      Alex Elder 提交于
      The bio is the only data item type that doesn't record its full
      length.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c851c495
    • A
      libceph: skip message if too big to receive · f759ebb9
      Alex Elder 提交于
      We know the length of our message buffers.  If we get a message
      that's too long, just dump it and ignore it.  If skip was set
      then con->in_msg won't be valid, so be careful not to dereference
      a null pointer in the process.
      
      This resolves:
          http://tracker.ceph.com/issues/4664Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      f759ebb9
    • A
      libceph: fix possible CONFIG_BLOCK build problem · ea96571f
      Alex Elder 提交于
      This patch:
          15a0d7b libceph: record message data length
      did not enclose some bio-specific code inside CONFIG_BLOCK as
      it should have.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ea96571f
    • A
      libceph: provide data length when preparing message · 98fa5dd8
      Alex Elder 提交于
      In prepare_message_data(), the length used to initialize the cursor
      is taken from the header of the message provided.  I'm working
      toward not using the header data length field to determine length in
      outbound messages, and this is a step in that direction.  For
      inbound messages this will be set to be the actual number of bytes
      that are arriving (which may be less than the total size of the data
      buffer available).
      
      This resolves:
          http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      98fa5dd8
    • A
      libceph: record message data length · a1930804
      Alex Elder 提交于
      Keep track of the length of the data portion for a message in a
      separate field in the ceph_msg structure.  This information has
      been maintained in wire byte order in the message header, but
      that's going to change soon.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a1930804
    • A
      libceph: account for alignment in pages cursor · 56fc5659
      Alex Elder 提交于
      When a cursor for a page array data message is initialized it needs
      to determine the initial value for cursor->last_piece.  Currently it
      just checks if length is less than a page, but that's not correct.
      The data in the first page in the array will be offset by a page
      offset based on the alignment recorded for the data.  (All pages
      thereafter will be aligned at the base of the page, so there's
      no need to account for this except for the first page.)
      
      Because this was wrong, there was a case where the length of a piece
      would be calculated as all of the residual bytes in the message and
      that plus the page offset could exceed the length of a page.
      
      So fix this case.  Make sure the sum won't wrap.
      
      This resolves a third issue described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      56fc5659
    • A
      libceph: page offset must be less than page size · 5df521b1
      Alex Elder 提交于
      Currently ceph_msg_data_pages_advance() allows the page offset value
      to be PAGE_SIZE, apparently assuming ceph_msg_data_pages_next() will
      treat it as 0.  But that doesn't happen, and the result led to a
      helpful assertion failure.
      
      Change ceph_msg_data_pages_advance() to truncate the offset to 0
      before returning if it reaches PAGE_SIZE.
      
      Make a few other minor adjustments in this area (comments and a
      better assertion) while modifying it.
      
      This resolves a second issue described in:
          http://tracker.ceph.com/issues/4598Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      5df521b1