1. 18 5月, 2013 1 次提交
    • J
      ceph: ceph_pagelist_append might sleep while atomic · 39be95e9
      Jim Schutt 提交于
      Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
      while holding a lock, but it's spoiled because ceph_pagelist_addpage()
      always calls kmap(), which might sleep.  Here's the result:
      
      [13439.295457] ceph: mds0 reconnect start
      [13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
      [13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
          . . .
      [13439.376225] Call Trace:
      [13439.378757]  [<ffffffff81076f4c>] __might_sleep+0xfc/0x110
      [13439.384353]  [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph]
      [13439.391491]  [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph]
      [13439.398035]  [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50
      [13439.403775]  [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20
      [13439.409277]  [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph]
      [13439.415622]  [<ffffffff81196748>] ? igrab+0x28/0x70
      [13439.420610]  [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph]
      [13439.427584]  [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph]
      [13439.434499]  [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
      [13439.441646]  [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph]
      [13439.448363]  [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
      [13439.455250]  [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph]
      [13439.461534]  [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
      [13439.468432]  [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10
      [13439.473934]  [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph]
      [13439.480464]  [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph]
      [13439.486492]  [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph]
      [13439.493190]  [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph]
          . . .
      [13439.587132] ceph: mds0 reconnect success
      [13490.720032] ceph: mds0 caps stale
      [13501.235257] ceph: mds0 recovery completed
      [13501.300419] ceph: mds0 caps renewed
      
      Fix it up by encoding locks into a buffer first, and when the number
      of encoded locks is stable, copy that into a ceph_pagelist.
      
      [elder@inktank.com: abbreviated the stack info a bit.]
      
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NJim Schutt <jaschut@sandia.gov>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      39be95e9
  2. 02 5月, 2013 4 次提交
  3. 23 2月, 2013 1 次提交
  4. 20 2月, 2013 1 次提交
  5. 12 2月, 2013 1 次提交
    • E
      ceph: Translate between uid and gids in cap messages and kuids and kgids · 05cb11c1
      Eric W. Biederman 提交于
      - Make the uid and gid arguments of send_cap_msg() used to compose
        ceph_mds_caps messages of type kuid_t and kgid_t.
      
      - Pass inode->i_uid and inode->i_gid in __send_cap to send_cap_msg()
        through variables of type kuid_t and kgid_t.
      
      - Modify struct ceph_cap_snap to store uids and gids in types kuid_t
        and kgid_t.  This allows capturing inode->i_uid and inode->i_gid in
        ceph_queue_cap_snap() without loss and pssing them to
        __ceph_flush_snaps() where they are removed from struct
        ceph_cap_snap and passed to send_cap_msg().
      
      - In handle_cap_grant translate uid and gids in the initial user
        namespace stored in struct ceph_mds_cap into kuids and kgids
        before setting inode->i_uid and inode->i_gid.
      
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      05cb11c1
  6. 03 8月, 2012 1 次提交
    • S
      ceph: simplify+fix atomic_open · 5ef50c3b
      Sage Weil 提交于
      The initial ->atomic_open op was carried over from the old intent code,
      which was incomplete and didn't really work.  Replace it with a fresh
      method.  In particular:
      
       * always attempt to do an atomic open+lookup, both for the create case
         and for lookups of existing files.
       * fix symlink handling by returning 1 to the VFS so that we can follow
         the link to its destination. This fixes a longstanding ceph bug (#2392).
      Signed-off-by: NSage Weil <sage@inktank.com>
      5ef50c3b
  7. 31 7月, 2012 1 次提交
    • A
      ceph: define snap counts as u32 everywhere · aa711ee3
      Alex Elder 提交于
      There are two structures in which a count of snapshots are
      maintained:
      
          struct ceph_snap_context {
      	...
              u32 num_snaps;
      	...
          }
      and
          struct ceph_snap_realm {
      	...
              u32 num_prior_parent_snaps;   /*  had prior to parent_since */
      	...
              u32 num_snaps;
      	...
          }
      
      These fields never take on negative values (e.g., to hold special
      meaning), and so are really inherently unsigned.  Furthermore they
      take their value from over-the-wire or on-disk formatted 32-bit
      values.
      
      So change their definition to have type u32, and change some spots
      elsewhere in the code to account for this change.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      aa711ee3
  8. 14 7月, 2012 5 次提交
  9. 22 3月, 2012 2 次提交
  10. 13 1月, 2012 1 次提交
  11. 04 1月, 2012 1 次提交
  12. 08 12月, 2011 1 次提交
    • S
      ceph: use i_ceph_lock instead of i_lock · be655596
      Sage Weil 提交于
      We have been using i_lock to protect all kinds of data structures in the
      ceph_inode_info struct, including lists of inodes that we need to iterate
      over while avoiding races with inode destruction.  That requires grabbing
      a reference to the inode with the list lock protected, but igrab() now
      takes i_lock to check the inode flags.
      
      Changing the list lock ordering would be a painful process.
      
      However, using a ceph-specific i_ceph_lock in the ceph inode instead of
      i_lock is a simple mechanical change and avoids the ordering constraints
      imposed by igrab().
      Reported-by: NAmon Ott <a.ott@m-privacy.de>
      Signed-off-by: NSage Weil <sage@newdream.net>
      be655596
  13. 06 11月, 2011 1 次提交
  14. 04 11月, 2011 1 次提交
  15. 26 10月, 2011 2 次提交
  16. 27 7月, 2011 6 次提交
  17. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  18. 20 7月, 2011 1 次提交
  19. 12 5月, 2011 1 次提交
  20. 05 5月, 2011 1 次提交
  21. 22 3月, 2011 2 次提交
  22. 04 3月, 2011 1 次提交
  23. 20 2月, 2011 1 次提交
  24. 13 1月, 2011 1 次提交
    • S
      ceph: add dir_layout to inode · 6c0f3af7
      Sage Weil 提交于
      Add a ceph_dir_layout to the inode, and calculate dentry hash values based
      on the parent directory's specified dir_hash function.  This is needed
      because the old default Linux dcache hash function is extremely week and
      leads to a poor distribution of files among dir fragments.
      Signed-off-by: NSage Weil <sage@newdream.net>
      6c0f3af7
  25. 07 1月, 2011 1 次提交