1. 26 5月, 2016 1 次提交
  2. 26 4月, 2016 1 次提交
    • I
      libceph: make authorizer destruction independent of ceph_auth_client · 6c1ea260
      Ilya Dryomov 提交于
      Starting the kernel client with cephx disabled and then enabling cephx
      and restarting userspace daemons can result in a crash:
      
          [262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000
          [262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130
          [262671.584334] PGD 0
          [262671.635847] Oops: 0000 [#1] SMP
          [262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu
          [262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014
          [262672.268937] Workqueue: ceph-msgr con_work [libceph]
          [262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000
          [262672.428330] RIP: 0010:[<ffffffff811cd04a>]  [<ffffffff811cd04a>] kfree+0x5a/0x130
          [262672.535880] RSP: 0018:ffff880149aeba58  EFLAGS: 00010286
          [262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018
          [262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012
          [262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000
          [262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40
          [262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840
          [262673.131722] FS:  0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000
          [262673.245377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0
          [262673.417556] Stack:
          [262673.472943]  ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04
          [262673.583767]  ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00
          [262673.694546]  ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800
          [262673.805230] Call Trace:
          [262673.859116]  [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph]
          [262673.968705]  [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph]
          [262674.078852]  [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph]
          [262674.134249]  [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph]
          [262674.189124]  [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph]
          [262674.243749]  [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph]
          [262674.297485]  [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph]
          [262674.350813]  [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph]
          [262674.403312]  [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph]
          [262674.454712]  [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690
          [262674.505096]  [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph]
          [262674.555104]  [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0
          [262674.604072]  [<ffffffff810901ea>] worker_thread+0x11a/0x470
          [262674.652187]  [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310
          [262674.699022]  [<ffffffff810957a2>] kthread+0xd2/0xf0
          [262674.744494]  [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0
          [262674.789543]  [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70
          [262674.834094]  [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0
      
      What happens is the following:
      
          (1) new MON session is established
          (2) old "none" ac is destroyed
          (3) new "cephx" ac is constructed
          ...
          (4) old OSD session (w/ "none" authorizer) is put
                ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer)
      
      osd->o_auth.authorizer in the "none" case is just a bare pointer into
      ac, which contains a single static copy for all services.  By the time
      we get to (4), "none" ac, freed in (2), is long gone.  On top of that,
      a new vtable installed in (3) points us at ceph_x_destroy_authorizer(),
      so we end up trying to destroy a "none" authorizer with a "cephx"
      destructor operating on invalid memory!
      
      To fix this, decouple authorizer destruction from ac and do away with
      a single static "none" authorizer by making a copy for each OSD or MDS
      session.  Authorizers themselves are independent of ac and so there is
      no reason for destroy_authorizer() to be an ac op.  Make it an op on
      the authorizer itself by turning ceph_authorizer into a real struct.
      
      Fixes: http://tracker.ceph.com/issues/15447Reported-by: NAlan Zhang <alan.zhang@linux.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      6c1ea260
  3. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  4. 26 3月, 2016 17 次提交
  5. 25 2月, 2016 3 次提交
  6. 16 2月, 2016 1 次提交
    • D
      mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d
      Dave Hansen 提交于
      We will soon modify the vanilla get_user_pages() so it can no
      longer be used on mm/tasks other than 'current/current->mm',
      which is by far the most common way it is called.  For now,
      we allow the old-style calls, but warn when they are used.
      (implemented in previous patch)
      
      This patch switches all callers of:
      
      	get_user_pages()
      	get_user_pages_unlocked()
      	get_user_pages_locked()
      
      to stop passing tsk/mm so they will no longer see the warnings.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4edcf0d
  7. 05 2月, 2016 5 次提交
  8. 27 1月, 2016 2 次提交
  9. 22 1月, 2016 8 次提交
    • I
      libceph: remove outdated comment · 7e01726a
      Ilya Dryomov 提交于
      MClientMount{,Ack} are long gone.  The receipt of bare monmap doesn't
      actually indicate a mount success as we are yet to authenticate at that
      point in time.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      7e01726a
    • I
      libceph: kill off ceph_x_ticket_handler::validity · f6cdb292
      Ilya Dryomov 提交于
      With it gone, no need to preserve ceph_timespec in process_one_ticket()
      either.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      f6cdb292
    • I
      libceph: invalidate AUTH in addition to a service ticket · 187d131d
      Ilya Dryomov 提交于
      If we fault due to authentication, we invalidate the service ticket we
      have and request a new one - the idea being that if a service rejected
      our authorizer, it must have expired, despite mon_client's attempts at
      periodic renewal.  (The other possibility is that our ticket is too new
      and the service hasn't gotten it yet, in which case invalidating isn't
      necessary but doesn't hurt.)
      
      Invalidating just the service ticket is not enough, though.  If we
      assume a failure on mon_client's part to renew a service ticket, we
      have to assume the same for the AUTH ticket.  If our AUTH ticket is
      bad, we won't get any service tickets no matter how hard we try, so
      invalidate AUTH ticket along with the service ticket.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      187d131d
    • I
      libceph: fix authorizer invalidation, take 2 · 6abe097d
      Ilya Dryomov 提交于
      Back in 2013, commit 4b8e8b5d ("libceph: fix authorizer
      invalidation") tried to fix authorizer invalidation issues by clearing
      validity field.  However, nothing ever consults this field, so it
      doesn't force us to request any new secrets in any way and therefore we
      never get out of the exponential backoff mode:
      
          [  129.973812] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  130.706785] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  131.710088] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  133.708321] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  137.706598] libceph: osd2 192.168.122.1:6810 connect authorization failure
          ...
      
      AFAICT this was the case at the time 4b8e8b5d was merged, too.
      
      Using timespec solely as a bool isn't nice, so introduce a new have_key
      flag, specifically for this purpose.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      6abe097d
    • I
      libceph: clear messenger auth_retry flag if we fault · f6330cc1
      Ilya Dryomov 提交于
      Commit 20e55c4c ("libceph: clear messenger auth_retry flag when we
      authenticate") got us only half way there.  We clear the flag if the
      second attempt succeeds, but it also needs to be cleared if that
      attempt fails, to allow for the exponential backoff to kick in.
      Otherwise, if ->should_authenticate() thinks our keys are valid, we
      will busy loop, incrementing auth_retry to no avail:
      
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 1
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 2
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 3
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 4
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 5
          ...
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      f6330cc1
    • I
      libceph: fix ceph_msg_revoke() · 67645d76
      Ilya Dryomov 提交于
      There are a number of problems with revoking a "was sending" message:
      
      (1) We never make any attempt to revoke data - only kvecs contibute to
      con->out_skip.  However, once the header (envelope) is written to the
      socket, our peer learns data_len and sets itself to expect at least
      data_len bytes to follow front or front+middle.  If ceph_msg_revoke()
      is called while the messenger is sending message's data portion,
      anything we send after that call is counted by the OSD towards the now
      revoked message's data portion.  The effects vary, the most common one
      is the eventual hang - higher layers get stuck waiting for the reply to
      the message that was sent out after ceph_msg_revoke() returned and
      treated by the OSD as a bunch of data bytes.  This is what Matt ran
      into.
      
      (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
      is wrong.  If ceph_msg_revoke() is called before the tag is sent out or
      while the messenger is sending the header, we will get a connection
      reset, either due to a bad tag (0 is not a valid tag) or a bad header
      CRC, which kind of defeats the purpose of revoke.  Currently the kernel
      client refuses to work with header CRCs disabled, but that will likely
      change in the future, making this even worse.
      
      (3) con->out_skip is not reset on connection reset, leading to one or
      more spurious connection resets if we happen to get a real one between
      con->out_skip is set in ceph_msg_revoke() and before it's cleared in
      write_partial_skip().
      
      Fixing (1) and (3) is trivial.  The idea behind fixing (2) is to never
      zero the tag or the header, i.e. send out tag+header regardless of when
      ceph_msg_revoke() is called.  That way the header is always correct, no
      unnecessary resets are induced and revoke stands ready for disabled
      CRCs.  Since ceph_msg_revoke() rips out con->out_msg, introduce a new
      "message out temp" and copy the header into it before sending.
      
      Cc: stable@vger.kernel.org # 4.0+
      Reported-by: NMatt Conner <matt.conner@keepertech.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Tested-by: NMatt Conner <matt.conner@keepertech.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      67645d76
    • G
      libceph: use list_for_each_entry_safe · 10bcee14
      Geliang Tang 提交于
      Use list_for_each_entry_safe() instead of list_for_each_safe() to
      simplify the code.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      [idryomov@gmail.com: nuke call to list_splice_init() as well]
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      10bcee14
    • G
      libceph: use list_next_entry instead of list_entry_next · 17ddc49b
      Geliang Tang 提交于
      list_next_entry has been defined in list.h, so I replace list_entry_next
      with it.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      17ddc49b
  10. 03 11月, 2015 1 次提交
    • I
      libceph: clear msg->con in ceph_msg_release() only · 583d0fef
      Ilya Dryomov 提交于
      The following bit in ceph_msg_revoke_incoming() is unsafe:
      
          struct ceph_connection *con = msg->con;
          if (!con)
                  return;
          mutex_lock(&con->mutex);
          <more msg->con use>
      
      There is nothing preventing con from getting destroyed right after
      msg->con test.  One easy way to reproduce this is to disable message
      signing only on the server side and try to map an image.  The system
      will go into a
      
          libceph: read_partial_message ffff880073f0ab68 signature check failed
          libceph: osd0 192.168.255.155:6801 bad crc/signature
          libceph: read_partial_message ffff880073f0ab68 signature check failed
          libceph: osd0 192.168.255.155:6801 bad crc/signature
      
      loop which has to be interrupted with Ctrl-C.  Hit Ctrl-C and you are
      likely to end up with a random GP fault if the reset handler executes
      "within" ceph_msg_revoke_incoming():
      
                           <yet another reply w/o a signature>
                                         ...
                <Ctrl-C>
          rbd_obj_request_end
            ceph_osdc_cancel_request
              __unregister_request
                ceph_osdc_put_request
                  ceph_msg_revoke_incoming
                                         ...
                                      osd_reset
                                        __kick_osd_requests
                                          __reset_osd
                                            remove_osd
                                              ceph_con_close
                                                reset_connection
                                                  <clear con->in_msg->con>
                                                  <put con ref>
                                                    put_osd
                                                      <free osd/con>
                    <msg->con use> <-- !!!
      
      If ceph_msg_revoke_incoming() executes "before" the reset handler,
      osd/con will be leaked because ceph_msg_revoke_incoming() clears
      con->in_msg but doesn't put con ref, while reset_connection() only puts
      con ref if con->in_msg != NULL.
      
      The current msg->con scheme was introduced by commits 38941f80
      ("libceph: have messages point to their connection") and 92ce034b
      ("libceph: have messages take a connection reference"), which defined
      when messages get associated with a connection and when that
      association goes away.  Part of the problem is that this association is
      supposed to go away in much too many places; closing this race entirely
      requires either a rework of the existing or an addition of a new layer
      of synchronization.
      
      In lieu of that, we can make it *much* less likely to hit by
      disassociating messages only on their destruction and resend through
      a different connection.  This makes the code simpler and is probably
      a good thing to do regardless - this patch adds a msg_con_set() helper
      which is is called from only three places: ceph_con_send() and
      ceph_con_in_msg_alloc() to set msg->con and ceph_msg_release() to clear
      it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      583d0fef