1. 16 12月, 2016 3 次提交
  2. 15 12月, 2016 8 次提交
  3. 13 12月, 2016 16 次提交
    • T
      crush: include mapper.h in mapper.c · f6c0d1a3
      Tobias Klauser 提交于
      Include linux/crush/mapper.h in crush/mapper.c to get the prototypes of
      crush_find_rule and crush_do_rule which are defined there. This fixes
      the following GCC warnings when building with 'W=1':
      
        net/ceph/crush/mapper.c:40:5: warning: no previous prototype for ‘crush_find_rule’ [-Wmissing-prototypes]
        net/ceph/crush/mapper.c:793:5: warning: no previous prototype for ‘crush_do_rule’ [-Wmissing-prototypes]
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      [idryomov@gmail.com: corresponding !__KERNEL__ include]
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      f6c0d1a3
    • I
      libceph: no need to drop con->mutex for ->get_authorizer() · b3bbd3f2
      Ilya Dryomov 提交于
      ->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and
      ->check_message_signature() shouldn't be doing anything with or on the
      connection (like closing it or sending messages).
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      b3bbd3f2
    • I
      libceph: drop len argument of *verify_authorizer_reply() · 0dde5848
      Ilya Dryomov 提交于
      The length of the reply is protocol-dependent - for cephx it's
      ceph_x_authorize_reply.  Nothing sensible can be passed from the
      messenger layer anyway.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      0dde5848
    • I
      libceph: verify authorize reply on connect · 5c056fdc
      Ilya Dryomov 提交于
      After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
      the client gets back a ceph_x_authorize_reply, which it is supposed to
      verify to ensure the authenticity and protect against replay attacks.
      The code for doing this is there (ceph_x_verify_authorizer_reply(),
      ceph_auth_verify_authorizer_reply() + plumbing), but it is never
      invoked by the the messenger.
      
      AFAICT this goes back to 2009, when ceph authentication protocols
      support was added to the kernel client in 4e7a5dcd ("ceph:
      negotiate authentication protocol; implement AUTH_NONE protocol").
      
      The second param of ceph_connection_operations::verify_authorizer_reply
      is unused all the way down.  Pass 0 to facilitate backporting, and kill
      it in the next commit.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      5c056fdc
    • I
      libceph: no need for GFP_NOFS in ceph_monc_init() · 5418d0a2
      Ilya Dryomov 提交于
      It's called during inital setup, when everything should be allocated
      with GFP_KERNEL.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      5418d0a2
    • I
      libceph: stop allocating a new cipher on every crypto request · 7af3ea18
      Ilya Dryomov 提交于
      This is useless and more importantly not allowed on the writeback path,
      because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
      can recurse back into the filesystem:
      
          kworker/9:3     D ffff92303f318180     0 20732      2 0x00000080
          Workqueue: ceph-msgr ceph_con_workfn [libceph]
           ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
           ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
           00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
          Call Trace:
           [<ffffffff951eb4e1>] ? schedule+0x31/0x80
           [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10
           [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130
           [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30
           [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
           [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270
           [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
           [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120
           [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
           [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190
           [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0
           [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320
           [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350
           [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0
           [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60
           [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560
           [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0
           [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
           [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580
           [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph]
           [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
           [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60
           [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
           [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130
           [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180
           [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0
           [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150
           [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120
           [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
           [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
           [<ffffffff950d40a0>] ? release_sock+0x40/0x90
           [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0
           [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph]
           [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph]
           [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph]
           [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph]
           [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
           [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
           [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph]
           [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph]
           [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph]
           [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
           [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410
           [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480
           [<ffffffff94c95170>] ? process_one_work+0x410/0x410
           [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0
           [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40
           [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190
      
      Allocating the cipher along with the key fixes the issue - as long the
      key doesn't change, a single cipher context can be used concurrently in
      multiple requests.
      
      We still can't take that GFP_KERNEL allocation though.  Both
      ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
      GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.
      Reported-by: NLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      7af3ea18
    • I
      libceph: uninline ceph_crypto_key_destroy() · 6db2304a
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      6db2304a
    • I
      2b1e1a7c
    • I
      libceph: switch ceph_x_decrypt() to ceph_crypt() · e15fd0a1
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      e15fd0a1
    • I
      libceph: switch ceph_x_encrypt() to ceph_crypt() · d03857c6
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      d03857c6
    • I
      libceph: tweak calcu_signature() a little · 4eb4517c
      Ilya Dryomov 提交于
      - replace an ad-hoc array with a struct
      - rename to calc_signature() for consistency
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      4eb4517c
    • I
      libceph: rename and align ceph_x_authorizer::reply_buf · 7882a26d
      Ilya Dryomov 提交于
      It's going to be used as a temporary buffer for in-place en/decryption
      with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
      Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      7882a26d
    • I
      libceph: introduce ceph_crypt() for in-place en/decryption · a45f795c
      Ilya Dryomov 提交于
      Starting with 4.9, kernel stacks may be vmalloced and therefore not
      guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
      option is enabled by default on x86.  This makes it invalid to use
      on-stack buffers with the crypto scatterlist API, as sg_set_buf()
      expects a logical address and won't work with vmalloced addresses.
      
      There isn't a different (e.g. kvec-based) crypto API we could switch
      net/ceph/crypto.c to and the current scatterlist.h API isn't getting
      updated to accommodate this use case.  Allocating a new header and
      padding for each operation is a non-starter, so do the en/decryption
      in-place on a single pre-assembled (header + data + padding) heap
      buffer.  This is explicitly supported by the crypto API:
      
          "... the caller may provide the same scatter/gather list for the
           plaintext and cipher text. After the completion of the cipher
           operation, the plaintext data is replaced with the ciphertext data
           in case of an encryption and vice versa for a decryption."
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      a45f795c
    • I
      libceph: introduce ceph_x_encrypt_offset() · 55d9cc83
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      55d9cc83
    • I
      libceph: old_key in process_one_ticket() is redundant · 462e6504
      Ilya Dryomov 提交于
      Since commit 0a990e70 ("ceph: clean up service ticket decoding"),
      th->session_key isn't assigned until everything is decoded.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      462e6504
    • I
      libceph: ceph_x_encrypt_buflen() takes in_len · 36721ece
      Ilya Dryomov 提交于
      Pass what's going to be encrypted - that's msg_b, not ticket_blob.
      ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
      the maxlen calculation, but makes it a bit clearer.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      36721ece
  4. 11 12月, 2016 8 次提交
  5. 10 12月, 2016 5 次提交
    • N
      SUNRPC: fix refcounting problems with auth_gss messages. · 1cded9d2
      NeilBrown 提交于
      There are two problems with refcounting of auth_gss messages.
      
      First, the reference on the pipe->pipe list (taken by a call
      to rpc_queue_upcall()) is not counted.  It seems to be
      assumed that a message in pipe->pipe will always also be in
      pipe->in_downcall, where it is correctly reference counted.
      
      However there is no guaranty of this.  I have a report of a
      NULL dereferences in rpc_pipe_read() which suggests a msg
      that has been freed is still on the pipe->pipe list.
      
      One way I imagine this might happen is:
      - message is queued for uid=U and auth->service=S1
      - rpc.gssd reads this message and starts processing.
        This removes the message from pipe->pipe
      - message is queued for uid=U and auth->service=S2
      - rpc.gssd replies to the first message. gss_pipe_downcall()
        calls __gss_find_upcall(pipe, U, NULL) and it finds the
        *second* message, as new messages are placed at the head
        of ->in_downcall, and the service type is not checked.
      - This second message is removed from ->in_downcall and freed
        by gss_release_msg() (even though it is still on pipe->pipe)
      - rpc.gssd tries to read another message, and dereferences a pointer
        to this message that has just been freed.
      
      I fix this by incrementing the reference count before calling
      rpc_queue_upcall(), and decrementing it if that fails, or normally in
      gss_pipe_destroy_msg().
      
      It seems strange that the reply doesn't target the message more
      precisely, but I don't know all the details.  In any case, I think the
      reference counting irregularity became a measureable bug when the
      extra arg was added to __gss_find_upcall(), hence the Fixes: line
      below.
      
      The second problem is that if rpc_queue_upcall() fails, the new
      message is not freed. gss_alloc_msg() set the ->count to 1,
      gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
      then the pointer is discarded so the memory never gets freed.
      
      Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1cded9d2
    • E
      net: skb_condense() can also deal with empty skbs · 3174fed9
      Eric Dumazet 提交于
      It seems attackers can also send UDP packets with no payload at all.
      
      skb_condense() can still be a win in this case.
      
      It will be possible to replace the custom code in tcp_add_backlog()
      to get full benefit from skb_condense()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3174fed9
    • E
      udp: udp_rmem_release() should touch sk_rmem_alloc later · 02ab0d13
      Eric Dumazet 提交于
      In flood situations, keeping sk_rmem_alloc at a high value
      prevents producers from touching the socket.
      
      It makes sense to lower sk_rmem_alloc only at the end
      of udp_rmem_release() after the thread draining receive
      queue in udp_recvmsg() finished the writes to sk_forward_alloc.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ab0d13
    • E
      udp: add batching to udp_rmem_release() · 6b229cf7
      Eric Dumazet 提交于
      If udp_recvmsg() constantly releases sk_rmem_alloc
      for every read packet, it gives opportunity for
      producers to immediately grab spinlocks and desperatly
      try adding another packet, causing false sharing.
      
      We can add a simple heuristic to give the signal
      by batches of ~25 % of the queue capacity.
      
      This patch considerably increases performance under
      flood by about 50 %, since the thread draining the queue
      is no longer slowed by false sharing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b229cf7
    • E
      udp: copy skb->truesize in the first cache line · c84d9490
      Eric Dumazet 提交于
      In UDP RX handler, we currently clear skb->dev before skb
      is added to receive queue, because device pointer is no longer
      available once we exit from RCU section.
      
      Since this first cache line is always hot, lets reuse this space
      to store skb->truesize and thus avoid a cache line miss at
      udp_recvmsg()/udp_skb_destructor time while receive queue
      spinlock is held.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c84d9490