- 12 12月, 2009 2 次提交
-
-
由 Mathieu Desnoyers 提交于
Setting fops and private data outside of the mutex at debugfs file creation introduces a race where the files can be opened with the wrong file operations and private data. It is easy to trigger with a process waiting on file creation notification. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: stable <stable@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Stefan Richter 提交于
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: NDavid P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 10 12月, 2009 11 次提交
-
-
由 Boaz Harrosh 提交于
This patch changes on-disk format, it is accompanied with a parallel patch to mkfs.exofs that enables multi-device capabilities. After this patch, old exofs will refuse to mount a new formatted FS and new exofs will refuse an old format. This is done by moving the magic field offset inside the FSCB. A new FSCB *version* field was added. In the future, exofs will refuse to mount unmatched FSCB version. To up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option before mounting. Introduced, a new object that contains a *device-table*. This object contains the default *data-map* and a linear array of devices information, which identifies the devices used in the filesystem. This object is only written to offline by mkfs.exofs. This is why it is kept separate from the FSCB, since the later is written to while mounted. Same partition number, same object number is used on all devices only the device varies. * define the new format, then load the device table on mount time make sure every thing is supported. * Change I/O engine to now support Mirror IO, .i.e write same data to multiple devices, read from a random device to spread the read-load from multiple clients (TODO: stripe read) Implementation notes: A few points introduced in previous patch should be mentioned here: * Special care was made so absolutlly all operation that have any chance of failing are done before any osd-request is executed. This is to minimize the need for a data consistency recovery, to only real IO errors. * Each IO state has a kref. It starts at 1, any osd-request executed will increment the kref, finally when all are executed the first ref is dropped. At IO-done, each request completion decrements the kref, the last one to return executes the internal _last_io() routine. _last_io() will call the registered io_state_done. On sync mode a caller does not supply a done method, indicating a synchronous request, the caller is put to sleep and a special io_state_done is registered that will awaken the caller. Though also in sync mode all operations are executed in parallel. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
In anticipation for multi-device operations, we separate osd operations into an abstract I/O API. Currently only one device is used but later when adding more devices, we will drive all devices in parallel according to a "data_map" that describes how data is arranged on multiple devices. The file system level operates, like before, as if there is one object (inode-number) and an i_size. The io engine will split this to the same object-number but on multiple device. At first we introduce Mirror (raid 1) layout. But at the final outcome we intend to fully implement the pNFS-Objects data-map, including raid 0,4,5,6 over mirrored devices, over multiple device-groups. And more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12 * Define an io_state based API for accessing osd storage devices in an abstract way. Usage: First a caller allocates an io state with: exofs_get_io_state(struct exofs_sb_info *sbi, struct exofs_io_state** ios); Then calles one of: exofs_sbi_create(struct exofs_io_state *ios); exofs_sbi_remove(struct exofs_io_state *ios); exofs_sbi_write(struct exofs_io_state *ios); exofs_sbi_read(struct exofs_io_state *ios); exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len); And when done exofs_put_io_state(struct exofs_io_state *ios); * Convert all source files to use this new API * Convert from bio_alloc to bio_kmalloc * In io engine we make use of the now fixed osd_req_decode_sense There are no functional changes or on disk additions after this patch. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
If I do a "git mv" together with a massive code change and commit in one patch, git looses the rename and records a delete/new instead. This is bad because I want a rename recorded so later rebased/cherry-picked patches to the old name will work. Also the --follow is lost. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
Even though exofs has a 4k block size, statfs blocks is in sectors (512 bytes). Also if target returns 0 for capacity then make it ULLONG_MAX. df does not like zero-size filesystems Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
It is important to print in the logs when a filesystem was mounted and eventually unmounted. Print the osd-device's osd_name and pid the FS was mounted/unmounted on. TODO: How to also print the namespace path the filesystem was mounted on? Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
There are two places that initialize inodes: exofs_iget() and exofs_new_inode() As more members of exofs_i_info that need initialization are added this code will grow. (soon) Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
Iner-loops printing is converted to EXOFS_DBG2 which is #defined to nothing. It is now almost bareable to just leave debug-on. Every operation is printed once, with most relevant info (I hope). Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
debug prints should be somewhat useful without actually reading the source code Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Theodore Ts'o 提交于
Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This fixes: ERROR: "jbd2_log_start_commit" [fs/ext4/ext4.ko] undefined! Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 08 12月, 2009 1 次提交
-
-
由 Roel Kluin 提交于
Return the PTR_ERR of the correct pointer. This fixes the debugging code. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 07 12月, 2009 1 次提交
-
-
由 Akira Fujita 提交于
This patch fixes three problems in the handling of the EXT4_IOC_MOVE_EXT ioctl: 1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for original and donor files, but they allow the illegal write access to donor file, since donor file is overwritten by original file data. To fix this problem, change access mode checks of original (r->r/w) and donor (r->w) files. 2. Disallow the use of donor files that have a setuid or setgid bits. 3. Call mnt_want_write() and mnt_drop_write() before and after ext4_move_extents() calling to get write access to a mount. Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 12月, 2009 7 次提交
-
-
由 Jan Kara 提交于
We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid This means that we may end-up with transferring all quotas. Add we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in case of QUOTA_INIT_BLOCKS. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Currently all quota block reservation macros contains hard-coded "2" aka MAXQUOTAS value. This is no good because in some places it is not obvious to understand what does this digit represent. Let's introduce new macro with self descriptive name. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Acked-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
This fixes a leak of blocks in an inode prealloc list if device failures cause ext4_mb_mark_diskspace_used() to fail. Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Acked-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Josef Bacik 提交于
There is a potential race when a transaction is committing right when the file system is being umounting. This could reduce in a race because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super before the commit code calls a callback so the mballoc code can release freed blocks in the transaction, resulting in a panic trying to access the freed s_group_info. The fix is to wait for the transaction to finish committing before we shutdown the multiblock allocator. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
When ext4_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate the pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 08 12月, 2009 1 次提交
-
-
由 Theodore Ts'o 提交于
Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled, will cause ext4 to be used for either ext2 or ext3 file system mounts when ext2 or ext3 is not enabled in the configuration. This allows minimalist kernel fanatics to drop to file system drivers from their compiled kernel with out losing functionality. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 12月, 2009 1 次提交
-
-
由 Roel Kluin 提交于
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 05 12月, 2009 2 次提交
-
-
由 Adam Buchbinder 提交于
"Definition" is misspelled "defintion" in several comments; this patch fixes them. No code changes. Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Adam Buchbinder 提交于
"Journaled" is misspelled "journlaled" in an output string; this patch fixed it. No changes in functionality. Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 04 12月, 2009 4 次提交
-
-
由 Giuseppe Scrivano 提交于
Signed-off-by: NGiuseppe Scrivano <gscrivano@gnu.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 André Goddard Rosa 提交于
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Alberto Bertogli 提交于
Signed-off-by: NAlberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Uwe Kleine-König 提交于
This patch was generated by git grep -E -i -l 'offest' | xargs -r perl -p -i -e 's/offest/offset/' Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 03 12月, 2009 10 次提交
-
-
由 Wu Fengguang 提交于
- no one is calling wb_writeback and write_cache_pages with wbc.nonblocking=1 any more - lumpy pageout will want to do nonblocking writeback without the congestion wait So remove the congestion checks as suggested by Chris. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Alex Elder <aelder@sgi.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
It will lower the flush priority for NFS, and maybe more in future. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
This is dead code because no bdi flush thread will be started for !bdi_cap_writeback_dirty bdi. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Steven Whitehouse 提交于
This patch fixes some ref counting issues. Firstly by moving the point at which we drop the ref count after a dlm lock operation has completed we ensure that we never call gfs2_glock_hold() on a lock with a zero ref count. Secondly, by using atomic_dec_and_lock() in gfs2_glock_put() we ensure that at no time will a glock with zero ref count appear on the lru_list. That means that we can remove the check for this in our shrinker (which was racy). Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Wu Fengguang 提交于
No one is calling wb_writeback and write_cache_pages with wbc.nonblocking=1 any more. And lumpy pageout will want to do nonblocking writeback without the congestion wait. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Benjamin Marzinski 提交于
When a gfs2 filesystem is grown, it needs to rebuild the rindex list to be able to use the new space. gfs2 does this when the rindex is marked not uptodate, which happens when the rindex glock is dropped. However, on a single node setup, there is never any reason to drop the rindex glock, so gfs2 never invalidates the the rindex. This patch makes gfs2 automatically drop the rindex glock after filesystem grows, so it can refresh the rindex list. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
There are two spare field in the header common to all GFS2 metadata. One is just the right size to fit a journal id in it, and this patch updates the journal code so that each time a metadata block is modified, we tag it with the journal id of the node which is performing the modification. The reason for this is that it should make it much easier to debug issues which arise if we can tell which node was the last to modify a particular metadata block. Since the field is updated before the block is written into the journal, each journal should only contain metadata which is tagged with its own journal id. The one exception to this is the journal header block, which might have a different node's id in it, if that journal was recovered by another node in the cluster. Thus each journal will contain a record of which nodes recovered it, via the journal header. The other field in the metadata header could potentially be used to hold information about what kind of operation was performed, but for the time being we just zero it on each transaction so that if we use it for that in future, we'll know that the information (where it exists) is reliable. I did consider using the other field to hold the journal sequence number, however since in GFS2's journaling we write the modified data into the journal and not the original data, this gives no information as to what action caused the modification, so I think we can probably come up with a better use for those 64 bits in the future. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
In some cases we already have the rindex lock when we enter this function. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This function only had one caller left, and that caller only called it for leaf blocks, hence one branch of the "if" was never taken. In addition the call to get_left had already verified the metadata type, so the function can be reduced to a single line of code in its caller. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Since the default is barriers on, this only displays the nobarrier option when that is active. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-