- 22 1月, 2016 5 次提交
-
-
由 Yan, Zheng 提交于
Cap message from MDS can update i_size. In that case, we don't hold i_mutex. So it's unsafe to directly access inode->i_size while holding i_mutex. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
When receiving -EOLDSNAP from OSD, we need to re-send corresponding write request. Due to locking issue, we can send new request inside another OSD request's complete callback. So we use worker to re-send request for AIO write. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
The basic idea of AIO support is simple, just call kiocb::ki_complete() in OSD request's complete callback. But there are several special cases. when IO span multiple objects, we need to wait until all OSD requests are complete, then call kiocb::ki_complete(). Error handling in this case is tricky too. For simplify, AIO both span multiple objects and extends i_size are not allowed. Another special case is check EOF for reading (other client can write to the file and extend i_size concurrently). For simplify, the direct-IO/AIO code path does do the check, fallback to normal syn read instead. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Minfei Huang 提交于
The variant pagep will still get the invalid page point, although ceph fails in function ceph_update_writeable_page. To fix this issue, Assigne the page to pagep until there is no failure in function ceph_update_writeable_page. Signed-off-by: NMinfei Huang <mnfhuang@gmail.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
ceph_update_writeable_page() unlocks the page on errors, so page_mkwrite() should not unlock the page again. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
- 07 11月, 2015 1 次提交
-
-
由 Michal Hocko 提交于
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NMichal Hocko <mhocko@suse.com> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 11月, 2015 7 次提交
-
-
由 Ilya Dryomov 提交于
We can use msg->con instead - at the point we sign an outgoing message or check the signature on the incoming one, msg->con is always set. We wouldn't know how to sign a message without an associated session (i.e. msg->con == NULL) and being able to sign a message using an explicitly provided authorizer is of no use. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
If we get a unsafe reply for request that created/modified inode, add the unsafe request to a list in the newly created/modified inode. So we can make fsync() wait these unsafe requests. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Previously we add request to i_unsafe_dirops when registering request. So ceph_fsync() also waits for imcomplete requests. This is unnecessary, ceph_fsync() only needs to wait unsafe requests. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
ceph_check_caps() invalidate page cache when inode is not used by any open file. This behaviour is not friendly for workload that repeatly read files. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Zhu, Caifeng 提交于
Both ceph_sync_direct_write and ceph_sync_read iterate iovec elements one by one, send one OSD request for each iovec. This is sub-optimal, We can combine serveral iovec into one page vector, and send an OSD request for the whole page vector. Signed-off-by: NZhu, Caifeng <zhucaifeng@unissoft-nj.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Arnd Bergmann 提交于
create_request_message() computes the maximum length of a message, but uses the wrong type for the time stamp: sizeof(struct timespec) may be 8 or 16 depending on the architecture, while sizeof(struct ceph_timespec) is always 8, and that is what gets put into the message. Found while auditing the uses of timespec for y2038 problems. Fixes: b8e69066 ("ceph: include time stamp in every MDS request") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Geliang Tang 提交于
Signed-off-by: NGeliang Tang <geliangtang@163.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
- 23 10月, 2015 1 次提交
-
-
由 Benjamin Coddington 提交于
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct locks API function, use the check within locks_lock_inode_wait(). This allows for some later cleanup. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
-
- 11 9月, 2015 1 次提交
-
-
由 Kirill A. Shutemov 提交于
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct structs should be constant. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2015 9 次提交
-
-
由 Yan, Zheng 提交于
When readahead encounters file holes, osd reply returns error -ENOENT, finish_read() skips adding pages to the the page cache. So readahead does not work for file holes. The fix is adding zero pages to the page cache when -ENOENT is returned. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Jianpeng Ma 提交于
Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Jianpeng Ma 提交于
parent inode is needed in creating new inode case. For ceph_open, the target inode already exists. Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Jianpeng Ma 提交于
err != 0 is already handled. So skip this. Signed-off-by: NJianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Brad Hubbard 提交于
Signed-off-by: NBrad Hubbard <bhubbard@redhat.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
During MDS failovers, MClientSnap message may cause kclient to move some inodes from root directory's snaprealm to mdsdir's snaprealm and queue snapshots for these inodes. For a FS has never created any snapshot, both root directory's snaprealm and mdsdir's snaprealm share the same snapshot contexts (both are ceph_empty_snapc). This confuses ceph_put_wrbuffer_cap_refs(), make it unable to distinguish snapshot buffers from head buffers. The fix is do not use ceph_empty_snapc as snaprealm's cached context. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
After forced umount, ceph_writepages_start() skips flushing dirty pages. To make sure inode's reference count get dropped to zero, we need to invalidate dirty pages. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
This patch makes try_get_cap_refs() and __do_request() check if the file system was forced umount, and return -EIO if it was. This patch also adds a helper function to drops dirty caps and wakes up blocking operation. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
- 05 9月, 2015 1 次提交
-
-
由 Kees Cook 提交于
Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NJan Kara <jack@suse.com> Acked-by: NPaul Moore <paul@paul-moore.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 7月, 2015 2 次提交
-
-
由 Yan, Zheng 提交于
commit e548e9b9 makes the kclient only re-send cap flush once during MDS failover. If the kclient sends a cap flush after MDS enters reconnect stage but before MDS recovers. The kclient will skip re-sending the same cap flush when MDS recovers. This causes problem for newly created inode. The MDS handles cap flushes before replaying unsafe requests, so it's possible that MDS find corresponding inode is missing when handling cap flush. The fix is reverting to old behaviour: always re-send when MDS recovers Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
posix locks should be in ctx->flc_posix list Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 25 6月, 2015 13 次提交
-
-
由 Yan, Zheng 提交于
Before a page get locked, someone else can write data to the page and increase the i_size. So we should re-check the i_size after pages are locked. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Previously our dcache readdir code relies on that child dentries in directory dentry's d_subdir list are sorted by dentry's offset in descending order. When adding dentries to the dcache, if a dentry already exists, our readdir code moves it to head of directory dentry's d_subdir list. This design relies on dcache internals. Al Viro suggests using ncpfs's approach: keeping array of pointers to dentries in page cache of directory inode. the validity of those pointers are presented by directory inode's complete and ordered flags. When a dentry gets pruned, we clear directory inode's complete flag in the d_prune() callback. Before moving a dentry to other directory, we clear the ordered flag for both old and new directory. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
GFP_NOFS memory allocation is required for page writeback path. But there is no need to use GFP_NOFS in syscall path and readpage path Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
if flushing caps were revoked, we should re-send the cap flush in client reconnect stage. This guarantees that MDS processes the cap flush message before issuing the flushing caps to other client. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
According to this information, MDS can trim its completed caps flush list (which is used to detect duplicated cap flush). Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
So we know TID of the oldest pending caps flushing. Later patch will send this information to MDS, so that MDS can trim its completed caps flush list. Tracking pending caps flushing globally also simplifies syncfs code. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Previously we do not trace accurate TID for flushing caps. when MDS failovers, we have no choice but to re-send all flushing caps with a new TID. This can cause problem because MDS can has already flushed some caps and has issued the same caps to other client. The re-sent cap flush has a new TID, which makes MDS unable to detect if it has already processed the cap flush. This patch adds code to track pending caps flushing accurately. When re-sending cap flush is needed, we use its original flush TID. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
fsync() on directory should flush dirty caps and wait for any uncommitted directory opertions to commit. But ceph_dir_fsync() only waits for uncommitted directory opertions. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Current ceph_fsync() only flushes dirty caps and wait for them to be flushed. It doesn't wait for caps that has already been flushing. This patch makes ceph_fsync() wait for pending flushing caps too. Besides, this patch also makes caps_are_flushed() peroperly handle tid wrapping. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
when copying files to cephfs, file data may stay in page cache after corresponding file is closed. Cached data use Fc capability. If we include Fc capability in cap_wanted, MDS will treat files with cached data as open files, and journal them in an EOpen event when trimming log segment. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Ilya Dryomov 提交于
No need to bifurcate wait now that we've got ceph_timeout_jiffies(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org> Reviewed-by: NYan, Zheng <zyan@redhat.com>
-