- 20 6月, 2017 8 次提交
-
-
由 Goldwyn Rodrigues 提交于
Return EAGAIN if any of the following checks fail + i_rwsem is not lockable + NODATACOW or PREALLOC is not set + Cannot nocow at the desired location + Writing beyond end of file which is not allocated Acked-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable immediately. IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin if it needs allocation either due to file extension, writing to a hole, or COW or waiting for other DIOs to finish. Return -EAGAIN if we don't have extent list in memory. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
Return EAGAIN if any of the following checks fail for direct I/O: + i_rwsem is lockable + Writing beyond end of file (will trigger allocation) + Blocks are not allocated at the write location Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
A new bio operation flag REQ_NOWAIT is introduced to identify bio's orignating from iocb with IOCB_NOWAIT. This flag indicates to return immediately if a request cannot be made instead of retrying. Stacked devices such as md (the ones with make_request_fn hooks) currently are not supported because it may block for housekeeping. For example, an md can have a part of the device suspended. For this reason, only request based devices are supported. In the future, this feature will be expanded to stacked devices by teaching them how to handle the REQ_NOWAIT flags. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps. This is used by XFS in the XFS patch. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
RWF_NOWAIT informs kernel to bail out if an AIO request will block for reasons such as file allocations, or a writeback triggered, or would block while allocating requests while performing direct I/O. RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags. FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable of returning -EAGAIN if the AIO call will block. This must be set by supporting filesystems in the ->open() call. Filesystems xfs, btrfs and ext4 would be supported in the following patches. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will carry the RWF_* flags. We cannot use aio_flags because they are not checked for validity which may break existing applications. Note, the only place RWF_HIPRI comes in effect is dio_await_one(). All the rest of the locations, aio code return -EIOCBQUEUED before the checks for RWF_HIPRI. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
Also added RWF_SUPPORTED to encompass all flags. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 6月, 2017 1 次提交
-
-
由 NeilBrown 提交于
"flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 6月, 2017 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 6月, 2017 10 次提交
-
-
由 Al Viro 提交于
As it is, short copy in write() to append-only file will fail to truncate the excessive allocated blocks. As the matter of fact, all checks in ufs_truncate_blocks() are either redundant or wrong for that caller. As for the only other caller (ufs_evict_inode()), we only need the file type checks there. Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and it really needs splitting into "new" and "extend" cases, but that's for later Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Omar Sandoval 提交于
btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication, which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2). For a nodesize of 16kB, this overflow happens at 16k items. Usually, num_items is a small constant passed to btrfs_start_transaction(), but we also use btrfs_calc_trans_metadata_size() for metadata reservations for extent items in btrfs_delalloc_{reserve,release}_metadata(). In drop_outstanding_extents(), num_items is calculated as inode->reserved_extents - inode->outstanding_extents. The difference between these two counters is usually small, but if many delalloc extents are reserved and then the outstanding extents are merged in btrfs_merge_extent_hook(), the difference can become large enough to overflow in btrfs_calc_trans_metadata_size(). The overflow manifests itself as a leak of a multiple of 4 GB in delalloc_block_rsv and the metadata bytes_may_use counter. This in turn can cause early ENOSPC errors. Additionally, these WARN_ONs in extent-tree.c will be hit when unmounting: WARN_ON(fs_info->delalloc_block_rsv.size > 0); WARN_ON(fs_info->delalloc_block_rsv.reserved > 0); WARN_ON(space_info->bytes_pinned > 0 || space_info->bytes_reserved > 0 || space_info->bytes_may_use > 0); Fix it by casting nodesize to a u64 so that btrfs_calc_trans_metadata_size() does a full 64-bit multiplication. While we're here, do the same in btrfs_calc_trunc_metadata_size(); this can't overflow with any existing uses, but it's better to be safe here than have another hard-to-debug problem later on. Cc: stable@vger.kernel.org Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Liu Bo 提交于
Before this, we use 'filled' mode here, ie. if all range has been filled with EXTENT_DEFRAG bits, get to clear it, but if the defrag range joins the adjacent delalloc range, then we'll have EXTENT_DEFRAG bits in extent_state until releasing this inode's pages, and that prevents extent_data from being freed. This clears the bit if any was found within the ordered extent. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Su Yue 提交于
In verify_dir_item, it wants to printk name_len of dir_item but printk data_len acutally. Fix it by calling btrfs_dir_name_len instead of btrfs_dir_data_len. Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 09 6月, 2017 5 次提交
-
-
由 Christoph Hellwig 提交于
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Once we move the block layer to its own status code we'll still want to propagate the bio_iov_iter_get_pages, so restructure __blkdev_direct_IO to take ret into account when returning the errno. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Only read bio->bi_error once in the common path. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 6月, 2017 11 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 Christoph Hellwig 提交于
For some file systems we still memcpy into it, but in various places this already allows us to use the proper uuid helpers. More to come.. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM) Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 Amir Goldstein 提交于
Use the common helper uuid_is_null() and remove the xfs specific helper uuid_is_nil(). The common helper does not check for the NULL pointer value as xfs helper did, but xfs code never calls the helper with a pointer that can be NULL. Conform comments and warning strings to use the term 'null uuid' instead of 'nil uuid', because this is the terminology used by lib/uuid.c and its users. It is also the terminology used in userspace by libuuid and xfsprogs. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> [hch: remove now unused uuid.[ch]] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 Christoph Hellwig 提交于
Opencode uuid_getnodeuniq in the only caller, and directly decode the uuid_t representation instead of using a structure cast for it. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
These helper are used to compare and copy two uuid_t type objects. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> [hch: also provide the respective guid_ versions] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 Christoph Hellwig 提交于
Our "little endian" UUID really is a Wintel GUID, so rename it and its helpers such (guid_t). The big endian UUID is the only true one, so give it the name uuid_t. The uuid_le and uuid_be names are retained for now, but will hopefully go away soon. The exception to that are the _cmp helpers that will be replaced by better primitives ASAP and thus don't get the new names. Also the _to_bin helpers are named to match the better named uuid_parse routine in userspace. Also remove the existing typedef in XFS that's now been superceeded by the generic type name. Signed-off-by: NChristoph Hellwig <hch@lst.de> [andy: also update the UUID_LE/UUID_BE macros including fallout] Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Use the generic Linux definition to implement our UUID type, this will allow using more generic infrastructure in the future. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Amir Goldstein 提交于
uuid_t definition is about to change. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
This essentially is a partial revert of commit ff548773 ("afs: Move UUID struct to linux/uuid.h") and moves struct uuid_v1 back into fs/afs as struct afs_uuid. It however keeps it as big endian structure so that we can use the normal uuid generation helpers when casting to/from struct afs_uuid. The V1 uuid intrepretation in struct form isn't really useful to the rest of the kernel, and not really compatible to it either, so move it back to AFS instead of polluting the global uuid.h. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NDavid Howells <dhowells@redhat.com>
-
由 Richard Narron 提交于
This fixes a problem with reading files larger than 2GB from a UFS-2 file system: https://bugzilla.kernel.org/show_bug.cgi?id=195721 The incorrect UFS s_maxsize limit became a problem as of commit c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") which started using s_maxbytes to avoid a page index overflow in do_generic_file_read(). That caused files to be truncated on UFS-2 file systems because the default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it. Here I simply increase the default to a common value used by other file systems. Signed-off-by: NRichard Narron <comet.berkeley@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Will B <will.brokenbourgh2877@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737fSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2017 1 次提交
-
-
由 Jan Kara 提交于
nfs_initialise_sb() and nfs_clone_super() are declared as extern even though they are used only in fs/nfs/super.c. Mark them as static. Also remove explicit 'inline' directive from nfs_initialise_sb() and leave it upto compiler to decide whether inlining is worth it. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 03 6月, 2017 1 次提交
-
-
由 Ross Zwisler 提交于
We currently have two related PMD vs PTE races in the DAX code. These can both be easily triggered by having two threads reading and writing simultaneously to the same private mapping, with the key being that private mapping reads can be handled with PMDs but private mapping writes are always handled with PTEs so that we can COW. Here is the first race: CPU 0 CPU 1 (private mapping write) __handle_mm_fault() create_huge_pmd() - FALLBACK handle_pte_fault() passes check for pmd_devmap() (private mapping read) __handle_mm_fault() create_huge_pmd() dax_iomap_pmd_fault() inserts PMD dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD installed in our page tables at this spot. Here's the second race: CPU 0 CPU 1 (private mapping read) __handle_mm_fault() passes check for pmd_none() create_huge_pmd() dax_iomap_pmd_fault() inserts PMD (private mapping write) __handle_mm_fault() create_huge_pmd() - FALLBACK (private mapping read) __handle_mm_fault() passes check for pmd_none() create_huge_pmd() handle_pte_fault() dax_iomap_pte_fault() inserts PTE dax_iomap_pmd_fault() inserts PMD, but we already have a PTE at this spot. The core of the issue is that while there is isolation between faults to the same range in the DAX fault handlers via our DAX entry locking, there is no isolation between faults in the code in mm/memory.c. This means for instance that this code in __handle_mm_fault() can run: if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) { ret = create_huge_pmd(&vmf); But by the time we actually get to run the fault handler called by create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE fault has installed a normal PMD here as a parent. This is the cause of the 2nd race. The first race is similar - there is the following check in handle_pte_fault(): } else { /* See comment in pte_alloc_one_map() */ if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) return 0; So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we will bail and retry the fault. This is correct, but there is nothing preventing the PMD from being installed after this check but before we actually get to the DAX PTE fault handlers. In my testing these races result in the following types of errors: BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1 BUG: non-zero nr_ptes on freeing mm: 15 Fix this issue by having the DAX fault handlers verify that it is safe to continue their fault after they have taken an entry lock to block other racing faults. [ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries] Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Reported-by: NPawel Lebioda <pawel.lebioda@intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Pawel Lebioda <pawel.lebioda@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Eryu Guan <eguan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 6月, 2017 1 次提交
-
-
由 Bart Van Assche 提交于
Since using scsi_req() is only allowed against request queues for which struct scsi_request is the first member of their private request data, refuse to submit SCSI commands against a queue for which this is not the case. References: commit 82ed4db4 ("block: split scsi_request out of struct request") Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJ. Bruce Fields <bfields@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Omar Sandoval <osandov@fb.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 6月, 2017 1 次提交
-
-
由 Jeff Mahoney 提交于
If we have to recover relocation during mount, we'll ultimately have to evict the orphan inode. That goes through the reservation dance, where priority_reclaim_metadata_space and flush_space expect fs_info->fs_root to be valid. That's the next thing to be set up during mount, so we crash, almost always in flush_space trying to join the transaction but priority_reclaim_metadata_space is possible as well. This call path has been problematic in the past WRT whether ->fs_root is valid yet. Commit 957780eb (Btrfs: introduce ticketed enospc infrastructure) added new users that are called in the direct path instead of the async path that had already been worked around. The thing is that we don't actually need the fs_root, specifically, for anything. We either use it to determine whether the root is the chunk_root for use in choosing an allocation profile or as a root to pass btrfs_join_transaction before immediately committing it. Anything that isn't the chunk root works in the former case and any root works in the latter. A simple fix is to use a root we know will always be there: the extent_root. Cc: <stable@vger.kernel.org> # v4.8+ Fixes: 957780eb (Btrfs: introduce ticketed enospc infrastructure) Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-