1. 15 12月, 2016 2 次提交
    • I
      libceph: always signal completion when done · c297eb42
      Ilya Dryomov 提交于
      r_safe_completion is currently, and has always been, signaled only if
      on-disk ack was requested.  It's there for fsync and syncfs, which wait
      for in-flight writes to flush - all data write requests set ONDISK.
      
      However, the pool perm check code introduced in 4.2 sends a write
      request with only ACK set.  An unfortunately timed syncfs can then hang
      forever: r_safe_completion won't be signaled because only an unsafe
      reply was requested.
      
      We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
      that is somewhat incomplete and yet another special case.  Instead,
      rename this completion to r_done_completion and always signal it when
      the OSD client is done with the request, whether unsafe, safe, or
      error.  This is a bit cleaner and helps with the cancellation code.
      Reported-by: NYan, Zheng <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c297eb42
    • Y
      ceph: avoid creating orphan object when checking pool permission · 80e80fbb
      Yan, Zheng 提交于
      Pool permission check needs to write to the first object. But for
      snapshot, head of the first object may have already been deleted.
      Skip the check for snapshot inode to avoid creating orphan object.
      
      Link: http://tracker.ceph.com/issues/18211Signed-off-by: NYan, Zheng <zyan@redhat.com>
      80e80fbb
  2. 13 12月, 2016 29 次提交
    • Y
      ceph: properly set issue_seq for cap release · dc24de82
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      dc24de82
    • J
      ceph: add flags parameter to send_cap_msg · 1e4ef0c6
      Jeff Layton 提交于
      Add a flags parameter to send_cap_msg, so we can request expedited
      service from the MDS when we know we'll be waiting on the result.
      
      Set that flag in the case of try_flush_caps. The callers of that
      function generally wait synchronously on the result, so it's beneficial
      to ask the server to expedite it.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      1e4ef0c6
    • J
      ceph: update cap message struct version to 10 · 43b29673
      Jeff Layton 提交于
      The userland ceph has MClientCaps at struct version 10. This brings the
      kernel up the same version.
      
      For now, all of the the new stuff is set to default values including
      the flags field, which will be conditionally set in a later patch.
      
      Note that we don't need to set the change_attr and btime to anything
      since we aren't currently setting the feature flag. The MDS should
      ignore those values.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      43b29673
    • J
      ceph: define new argument structure for send_cap_msg · 0ff8bfb3
      Jeff Layton 提交于
      When we get to this many arguments, it's hard to work with positional
      parameters. send_cap_msg is already at 25 arguments, with more needed.
      
      Define a new args structure and pass a pointer to it to send_cap_msg.
      Eventually it might make sense to embed one of these inside
      ceph_cap_snap instead of tracking individual fields.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      0ff8bfb3
    • J
      ceph: move xattr initialzation before the encoding past the ceph_mds_caps · 9670079f
      Jeff Layton 提交于
      Just for clarity. This part is inside the header, so it makes sense to
      group it with the rest of the stuff in the header.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      9670079f
    • J
      ceph: fix minor typo in unsafe_request_wait · 4945a084
      Jeff Layton 提交于
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      4945a084
    • Y
      ceph: record truncate size/seq for snap data writeback · 5f743e45
      Yan, Zheng 提交于
      Dirty snapshot data needs to be flushed unconditionally. If they
      were created before truncation, writeback should use old truncate
      size/seq.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      5f743e45
    • Y
      ceph: check availability of mds cluster on mount · e9e427f0
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      e9e427f0
    • Y
      ceph: fix splice read for no Fc capability case · 7ce469a5
      Yan, Zheng 提交于
      When iov_iter type is ITER_PIPE, copy_page_to_iter() increases
      the page's reference and add the page to a pipe_buffer. It also
      set the pipe_buffer's ops to page_cache_pipe_buf_ops. The comfirm
      callback in page_cache_pipe_buf_ops expects the page is from page
      cache and uptodate, otherwise it return error.
      
      For ceph_sync_read() case, pages are not from page cache. So we
      can't call copy_page_to_iter() when iov_iter type is ITER_PIPE.
      The fix is using iov_iter_get_pages_alloc() to allocate pages
      for the pipe. (the code is similar to default_file_splice_read)
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      7ce469a5
    • Y
      ceph: try getting buffer capability for readahead/fadvise · 2b1ac852
      Yan, Zheng 提交于
      For readahead/fadvise cases, caller of ceph_readpages does not
      hold buffer capability. Pages can be added to page cache while
      there is no buffer capability. This can cause data integrity
      issue.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      2b1ac852
    • N
      ceph: fix scheduler warning due to nested blocking · 5c341ee3
      Nikolay Borisov 提交于
      try_get_cap_refs can be used as a condition in a wait_event* calls.
      This is all fine until it has to call __ceph_do_pending_vmtruncate,
      which in turn acquires the i_truncate_mutex. This leads to a situation
      in which a task's state is !TASK_RUNNING and at the same time it's
      trying to acquire a sleeping primitive. In essence a nested sleeping
      primitives are being used. This causes the following warning:
      
      WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0()
      do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110
       ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6
      CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G           O    4.4.20-clouder2 #6
      Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015
       0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0
       ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c
       0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0
      Call Trace:
       [<ffffffff812f4409>] dump_stack+0x67/0x9e
       [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0
       [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50
       [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
       [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110
       [<ffffffff8107767f>] __might_sleep+0x9f/0xb0
       [<ffffffff81612d30>] mutex_lock+0x20/0x40
       [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph]
       [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph]
       [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph]
       [<ffffffff81094370>] ? wait_woken+0xb0/0xb0
       [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph]
       [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260
       [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200
       [<ffffffff811b46ce>] ? iput+0x9e/0x230
       [<ffffffff81077632>] ? __might_sleep+0x52/0xb0
       [<ffffffff81156147>] ? __might_fault+0x37/0x40
       [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170
       [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0
       [<ffffffff81199369>] vfs_write+0xa9/0x190
       [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70
       [<ffffffff8119a056>] SyS_write+0x46/0xa0
      
      This happens since wait_event_interruptible can interfere with the
      mutex locking code, since they both fiddle with the task state.
      
      Fix the issue by using the newly-added nested blocking infrastructure
      in 61ada528 ("sched/wait: Provide infrastructure to deal with
      nested blocking")
      
      Link: https://lwn.net/Articles/628628/Signed-off-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      5c341ee3
    • Z
      ceph: fix printing wrong return variable in ceph_direct_read_write() · a380a031
      Zhi Zhang 提交于
      Fix printing wrong return variable for invalidate_inode_pages2_range in
      ceph_direct_read_write().
      Signed-off-by: NZhi Zhang <zhang.david2011@gmail.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a380a031
    • T
      crush: include mapper.h in mapper.c · f6c0d1a3
      Tobias Klauser 提交于
      Include linux/crush/mapper.h in crush/mapper.c to get the prototypes of
      crush_find_rule and crush_do_rule which are defined there. This fixes
      the following GCC warnings when building with 'W=1':
      
        net/ceph/crush/mapper.c:40:5: warning: no previous prototype for ‘crush_find_rule’ [-Wmissing-prototypes]
        net/ceph/crush/mapper.c:793:5: warning: no previous prototype for ‘crush_do_rule’ [-Wmissing-prototypes]
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      [idryomov@gmail.com: corresponding !__KERNEL__ include]
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      f6c0d1a3
    • I
      rbd: silence bogus -Wmaybe-uninitialized warning · d4c2269b
      Ilya Dryomov 提交于
      drivers/block/rbd.c: In function ‘rbd_watch_cb’:
      drivers/block/rbd.c:3690:5: error: ‘struct_v’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
      drivers/block/rbd.c:3759:5: note: ‘struct_v’ was declared here
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d4c2269b
    • I
      libceph: no need to drop con->mutex for ->get_authorizer() · b3bbd3f2
      Ilya Dryomov 提交于
      ->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and
      ->check_message_signature() shouldn't be doing anything with or on the
      connection (like closing it or sending messages).
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      b3bbd3f2
    • I
      libceph: drop len argument of *verify_authorizer_reply() · 0dde5848
      Ilya Dryomov 提交于
      The length of the reply is protocol-dependent - for cephx it's
      ceph_x_authorize_reply.  Nothing sensible can be passed from the
      messenger layer anyway.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      0dde5848
    • I
      libceph: verify authorize reply on connect · 5c056fdc
      Ilya Dryomov 提交于
      After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
      the client gets back a ceph_x_authorize_reply, which it is supposed to
      verify to ensure the authenticity and protect against replay attacks.
      The code for doing this is there (ceph_x_verify_authorizer_reply(),
      ceph_auth_verify_authorizer_reply() + plumbing), but it is never
      invoked by the the messenger.
      
      AFAICT this goes back to 2009, when ceph authentication protocols
      support was added to the kernel client in 4e7a5dcd ("ceph:
      negotiate authentication protocol; implement AUTH_NONE protocol").
      
      The second param of ceph_connection_operations::verify_authorizer_reply
      is unused all the way down.  Pass 0 to facilitate backporting, and kill
      it in the next commit.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      5c056fdc
    • I
      libceph: no need for GFP_NOFS in ceph_monc_init() · 5418d0a2
      Ilya Dryomov 提交于
      It's called during inital setup, when everything should be allocated
      with GFP_KERNEL.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      5418d0a2
    • I
      libceph: stop allocating a new cipher on every crypto request · 7af3ea18
      Ilya Dryomov 提交于
      This is useless and more importantly not allowed on the writeback path,
      because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
      can recurse back into the filesystem:
      
          kworker/9:3     D ffff92303f318180     0 20732      2 0x00000080
          Workqueue: ceph-msgr ceph_con_workfn [libceph]
           ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
           ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
           00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
          Call Trace:
           [<ffffffff951eb4e1>] ? schedule+0x31/0x80
           [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10
           [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130
           [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30
           [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
           [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270
           [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
           [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120
           [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
           [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190
           [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0
           [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320
           [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350
           [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0
           [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60
           [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560
           [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0
           [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
           [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580
           [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph]
           [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
           [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60
           [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
           [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130
           [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180
           [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0
           [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150
           [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120
           [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
           [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
           [<ffffffff950d40a0>] ? release_sock+0x40/0x90
           [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0
           [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph]
           [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph]
           [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph]
           [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph]
           [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
           [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
           [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph]
           [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph]
           [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph]
           [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
           [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410
           [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480
           [<ffffffff94c95170>] ? process_one_work+0x410/0x410
           [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0
           [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40
           [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190
      
      Allocating the cipher along with the key fixes the issue - as long the
      key doesn't change, a single cipher context can be used concurrently in
      multiple requests.
      
      We still can't take that GFP_KERNEL allocation though.  Both
      ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
      GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.
      Reported-by: NLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      7af3ea18
    • I
      libceph: uninline ceph_crypto_key_destroy() · 6db2304a
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      6db2304a
    • I
      2b1e1a7c
    • I
      libceph: switch ceph_x_decrypt() to ceph_crypt() · e15fd0a1
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      e15fd0a1
    • I
      libceph: switch ceph_x_encrypt() to ceph_crypt() · d03857c6
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      d03857c6
    • I
      libceph: tweak calcu_signature() a little · 4eb4517c
      Ilya Dryomov 提交于
      - replace an ad-hoc array with a struct
      - rename to calc_signature() for consistency
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      4eb4517c
    • I
      libceph: rename and align ceph_x_authorizer::reply_buf · 7882a26d
      Ilya Dryomov 提交于
      It's going to be used as a temporary buffer for in-place en/decryption
      with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
      Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      7882a26d
    • I
      libceph: introduce ceph_crypt() for in-place en/decryption · a45f795c
      Ilya Dryomov 提交于
      Starting with 4.9, kernel stacks may be vmalloced and therefore not
      guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
      option is enabled by default on x86.  This makes it invalid to use
      on-stack buffers with the crypto scatterlist API, as sg_set_buf()
      expects a logical address and won't work with vmalloced addresses.
      
      There isn't a different (e.g. kvec-based) crypto API we could switch
      net/ceph/crypto.c to and the current scatterlist.h API isn't getting
      updated to accommodate this use case.  Allocating a new header and
      padding for each operation is a non-starter, so do the en/decryption
      in-place on a single pre-assembled (header + data + padding) heap
      buffer.  This is explicitly supported by the crypto API:
      
          "... the caller may provide the same scatter/gather list for the
           plaintext and cipher text. After the completion of the cipher
           operation, the plaintext data is replaced with the ciphertext data
           in case of an encryption and vice versa for a decryption."
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      a45f795c
    • I
      libceph: introduce ceph_x_encrypt_offset() · 55d9cc83
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      55d9cc83
    • I
      libceph: old_key in process_one_ticket() is redundant · 462e6504
      Ilya Dryomov 提交于
      Since commit 0a990e70 ("ceph: clean up service ticket decoding"),
      th->session_key isn't assigned until everything is decoded.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      462e6504
    • I
      libceph: ceph_x_encrypt_buflen() takes in_len · 36721ece
      Ilya Dryomov 提交于
      Pass what's going to be encrypted - that's msg_b, not ticket_blob.
      ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
      the maxlen calculation, but makes it a bit clearer.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      36721ece
  3. 12 12月, 2016 2 次提交
  4. 11 12月, 2016 4 次提交
  5. 10 12月, 2016 3 次提交