1. 05 4月, 2014 1 次提交
    • Y
      ceph: use fl->fl_file as owner identifier of flock and posix lock · eb13e832
      Yan, Zheng 提交于
      flock and posix lock should use fl->fl_file instead of process ID
      as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
      is usually equal to fl->fl_file, but it also can be a customized
      value). The process ID of who holds the lock is just for F_GETLK
      fcntl(2).
      
      The fix is rename the 'pid' fields of struct ceph_mds_request_args
      and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
      to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
      We also set the most significant bit of the 'owner' field. MDS can
      use that bit to distinguish between old and new clients.
      
      The MDS counterpart of this patch modifies the flock code to not
      take the 'pid_namespace' into consideration when checking conflict
      locks.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eb13e832
  2. 18 2月, 2014 1 次提交
  3. 01 1月, 2014 2 次提交
  4. 14 12月, 2013 1 次提交
  5. 07 9月, 2013 1 次提交
  6. 04 7月, 2013 1 次提交
    • S
      ceph: avoid accessing invalid memory · 54464296
      Sasha Levin 提交于
      when mounting ceph with a dev name that starts with a slash, ceph
      would attempt to access the character before that slash. Since we
      don't actually own that byte of memory, we would trigger an
      invalid access:
      
      [   43.499934] BUG: unable to handle kernel paging request at ffff880fa3a97fff
      [   43.500984] IP: [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300
      [   43.501491] PGD 743b067 PUD 10283c4067 PMD 10282a6067 PTE 8000000fa3a97060
      [   43.502301] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [   43.503006] Dumping ftrace buffer:
      [   43.503596]    (ftrace buffer empty)
      [   43.504046] CPU: 0 PID: 10879 Comm: mount Tainted: G        W    3.10.0-sasha #1129
      [   43.504851] task: ffff880fa625b000 ti: ffff880fa3412000 task.ti: ffff880fa3412000
      [   43.505608] RIP: 0010:[<ffffffff818f3884>]  [<ffffffff818f3884>] parse_mount_options$
      [   43.506552] RSP: 0018:ffff880fa3413d08  EFLAGS: 00010286
      [   43.507133] RAX: ffff880fa3a98000 RBX: ffff880fa3a98000 RCX: 0000000000000000
      [   43.507893] RDX: ffff880fa3a98001 RSI: 000000000000002f RDI: ffff880fa3a98000
      [   43.508610] RBP: ffff880fa3413d58 R08: 0000000000001f99 R09: ffff880fa3fe64c0
      [   43.509426] R10: ffff880fa3413d98 R11: ffff880fa38710d8 R12: ffff880fa3413da0
      [   43.509792] R13: ffff880fa3a97fff R14: 0000000000000000 R15: ffff880fa3413d90
      [   43.509792] FS:  00007fa9c48757e0(0000) GS:ffff880fd2600000(0000) knlGS:000000000000$
      [   43.509792] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   43.509792] CR2: ffff880fa3a97fff CR3: 0000000fa3bb9000 CR4: 00000000000006b0
      [   43.509792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   43.509792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   43.509792] Stack:
      [   43.509792]  0000e5180000000e ffffffff85ca1900 ffff880fa38710d8 ffff880fa3413d98
      [   43.509792]  0000000000000120 0000000000000000 ffff880fa3a98000 0000000000000000
      [   43.509792]  ffffffff85cf32a0 0000000000000000 ffff880fa3413dc8 ffffffff818f3c72
      [   43.509792] Call Trace:
      [   43.509792]  [<ffffffff818f3c72>] ceph_mount+0xa2/0x390
      [   43.509792]  [<ffffffff81226314>] ? pcpu_alloc+0x334/0x3c0
      [   43.509792]  [<ffffffff81282f8d>] mount_fs+0x8d/0x1a0
      [   43.509792]  [<ffffffff812263d0>] ? __alloc_percpu+0x10/0x20
      [   43.509792]  [<ffffffff8129f799>] vfs_kern_mount+0x79/0x100
      [   43.509792]  [<ffffffff812a224d>] do_new_mount+0xcd/0x1c0
      [   43.509792]  [<ffffffff812a2e8d>] do_mount+0x15d/0x210
      [   43.509792]  [<ffffffff81220e55>] ? strndup_user+0x45/0x60
      [   43.509792]  [<ffffffff812a2fdd>] SyS_mount+0x9d/0xe0
      [   43.509792]  [<ffffffff83fd816c>] tracesys+0xdd/0xe2
      [   43.509792] Code: 4c 8b 5d c0 74 0a 48 8d 50 01 49 89 14 24 eb 17 31 c0 48 83 c9 ff $
      [   43.509792] RIP  [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300
      [   43.509792]  RSP <ffff880fa3413d08>
      [   43.509792] CR2: ffff880fa3a97fff
      [   43.509792] ---[ end trace 22469cd81e93af51 ]---
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Reviewed-by: NSage Weil <sage@inktan.com>
      54464296
  7. 02 5月, 2013 1 次提交
  8. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  9. 23 2月, 2013 1 次提交
  10. 13 12月, 2012 2 次提交
  11. 03 10月, 2012 1 次提交
  12. 02 10月, 2012 1 次提交
    • A
      ceph: let path portion of mount "device" be optional · c98f533c
      Alex Elder 提交于
      A recent change to /sbin/mountall causes any trailing '/' character
      in the "device" (or fs_spec) field in /etc/fstab to be stripped.  As
      a result, an entry for a ceph mount that intends to mount the root
      of the name space ends up with now path portion, and the ceph mount
      option processing code rejects this.
      
      That is, an entry in /etc/fstab like:
          cephserver:port:/ /mnt ceph defaults 0 0
      provides to the ceph code just "cephserver:port:" as the "device,"
      and that gets rejected.
      
      Although this is a bug in /sbin/mountall, we can have the ceph mount
      code support an empty/nonexistent path, interpreting it to mean the
      root of the name space.
      
      RFC 5952 offers recommendations for how to express IPv6 addresses,
      and recommends the usage found in RFC 3986 (which specifies the
      format for URI's) for representing both IPv4 and IPv6 addresses that
      include port numbers.  (See in particular the definition of
      "authority" found in the Appendix of RFC 3986.)
      
      According to those standards, no host specification will ever
      contain a '/' character.  As a result, it is sufficient to scan a
      provided "device" from an /etc/fstab entry for the first '/'
      character, and if it's found, treat that as the beginning of the
      path.  If no '/' character is present, we can treat the entire
      string as the monitor host specification(s), and assume the path
      to be the root of the name space.  We'll still require a ':' to
      separate the host portion from the (possibly empty) path portion.
      
      This means that we can more formally define how ceph will interpret
      the "device" it's provided when processing a mount request:
      
          "device" will look like:
              <server_spec>[,<server_spec>...]:[<path>]
          where
              <server_spec> is <ip>[:<port>]
              <path> is optional, but if present must begin with '/'
      
      This addresses http://tracker.newdream.net/issues/2919Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NDan Mick <dan.mick@inktank.com>
      c98f533c
  13. 31 7月, 2012 1 次提交
  14. 14 7月, 2012 1 次提交
  15. 22 3月, 2012 3 次提交
  16. 21 3月, 2012 1 次提交
  17. 13 1月, 2012 1 次提交
  18. 12 1月, 2012 1 次提交
    • A
      ceph: always initialize the dentry in open_root_dentry() · d46cfba5
      Alex Elder 提交于
      When open_root_dentry() gets a dentry via d_obtain_alias() it does
      not get initialized.  If the dentry obtained came from the cache,
      this is OK.  But if not, the result is an improperly initialized
      dentry.
      
      To fix this, call ceph_init_dentry() regardless of which path
      produced the dentry.  That function returns immediately for a dentry
      that is already initialized, it is safe to use either way.
      
      (Credit to Sage, who suggested this fix.)
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      d46cfba5
  19. 10 1月, 2012 1 次提交
  20. 07 1月, 2012 1 次提交
  21. 03 12月, 2011 1 次提交
  22. 12 11月, 2011 1 次提交
    • S
      ceph: initialize root dentry · 774ac21d
      Sage Weil 提交于
      Set up d_fsdata on the root dentry.  This fixes a NULL pointer dereference
      in ceph_d_prune on umount.  It also means we can eventually strip out all
      of the conditional checks on d_fsdata because it is now set unconditionally
      (prior to setting up the d_ops).
      
      Fix the ceph_d_prune debug print while we're here.
      Signed-off-by: NSage Weil <sage@newdream.net>
      774ac21d
  23. 06 11月, 2011 1 次提交
  24. 26 10月, 2011 3 次提交
  25. 23 8月, 2011 1 次提交
  26. 27 7月, 2011 2 次提交
  27. 30 3月, 2011 1 次提交
  28. 22 3月, 2011 2 次提交
  29. 20 1月, 2011 1 次提交
  30. 13 1月, 2011 2 次提交
    • T
      ceph: fsc->*_wq's aren't used in memory reclaim path · 01e6acc4
      Tejun Heo 提交于
      fsc->*_wq's aren't depended upon during memory reclaim.  Convert to
      alloc_workqueue() w/o WQ_MEM_RECLAIM.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Sage Weil <sage@newdream.net>
      Cc: ceph-devel@vger.kernel.org
      Signed-off-by: NSage Weil <sage@newdream.net>
      01e6acc4
    • S
      ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS · 14303d20
      Sage Weil 提交于
      This implements the DIRLAYOUTHASH protocol feature, which passes the dir
      layout over the wire from the MDS.  This gives the client knowledge
      of the correct hash function to use for mapping dentries among dir
      fragments.
      
      Note that if this feature is _not_ present on the client but is on the
      MDS, the client may misdirect requests.  This will result in a forward
      and degrade performance.  It may also result in inaccurate NFS filehandle
      generation, which will prevent fh resolution when the inode is not present
      in the client cache and the parent directories have been fragmented.
      Signed-off-by: NSage Weil <sage@newdream.net>
      14303d20
  31. 29 10月, 2010 1 次提交