提交 · 8274de77b7072d983fe4b452b981b3e520f12698 · openeuler / raspberrypi-kernel

23 12月, 2013 30 次提交

f2fs: add a new mount option: inline_data · 8274de77

由 Huajun Li 提交于 11月 10, 2013

Add a mount option: inline_data. If the mount option is set,
data of New created small files can be stored in their inode.
Signed-off-by: NHuajun Li <huajun.li@intel.com>
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NWeihong Xu <weihong.xu@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

8274de77

f2fs: add flags and helpers to support inline data · 1001b347

由 Huajun Li 提交于 11月 10, 2013

Add new inode flags F2FS_INLINE_DATA and FI_INLINE_DATA to indicate
whether the inode has inline data.

Inline data makes use of inode block's data indices region to save small
file. Currently there are 923 data indices in an inode block. Since
inline xattr has made use of the last 50 indices to save its data, there
are 873 indices left which can be used for inline data. When
FI_INLINE_DATA is set, the layout of inode block's indices region is
like below:

+-----------------+
|                 | Reserved. reserve_new_block() will make use of
| i_addr[0]       | i_addr[0] when we need to reserve a new data block
|                 | to convert inline data into regular one's.
|-----------------|
|                 | Used by inline data. A file whose size is less than
| i_addr[1~872]   | 3488 bytes(~3.4k) and doesn't reserve extra
|                 | blocks by fallocate() can be saved here.
|-----------------|
|                 |
| i_addr[873~922] | Reserved for inline xattr
|                 |
+-----------------+
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NHuajun Li <huajun.li@intel.com>
Signed-off-by: NWeihong Xu <weihong.xu@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

1001b347

f2fs: send REQ_META or REQ_PRIO when reading meta area · 03232305

由 Changman Lee 提交于 11月 24, 2013

Let's send REQ_META or REQ_PRIO when reading meta area such as NAT/SIT
etc.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

03232305

f2fs: add detailed information of bio types in the tracepoints · a709f4a2

由 Jaegeuk Kim 提交于 11月 24, 2013

This patch inserts information of bio types in more detail.
So, we can now see REQ_META and REQ_PRIO too.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a709f4a2

f2fs: add a new function: f2fs_reserve_block() · b600965c

由 Huajun Li 提交于 11月 10, 2013

Add the function f2fs_reserve_block() to easily reserve new blocks, and
use it to clean up more codes.
Signed-off-by: NHuajun Li <huajun.li@intel.com>
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NWeihong Xu <weihong.xu@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b600965c

f2fs: avoid lock debugging overhead · 0daaad97

由 Jaegeuk Kim 提交于 11月 24, 2013

If CONFIG_F2FS_CHECK_FS is unset, we don't need to add any debugging overhead.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

0daaad97

f2fs: read contiguous sit entry pages by merging for mount performance · 74de593a

由 Chao Yu 提交于 11月 22, 2013

Previously we read sit entries page one by one, this method lost the chance
of reading contiguous page together. So we read pages as contiguous as
possible for better mount performance.

change log:
 o merge judgements/use 'Continue' or 'Break' instead of 'Goto' as Gu Zheng
   suggested.
 o add mark_page_accessed() before release page to delay VM reclaiming.
 o remove '*order' for simplification of function as Jaegeuk Kim suggested.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix a bug on the block address calculation]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

74de593a

f2fs: adds a tracepoint for f2fs_submit_read_bio · d4d288bc

由 Chao Yu 提交于 11月 24, 2013

This patch adds a tracepoint for f2fs_submit_read_bio.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_bio]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

d4d288bc

f2fs: adds a tracepoint for submit_read_page · 87b8872d

由 Chao Yu 提交于 11月 20, 2013

This patch adds a tracepoint for submit_read_page.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_page]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

87b8872d

f2fs: simplify IS_DATASEG and IS_NODESEG macro · 61ae45c8

由 Changman Lee 提交于 11月 21, 2013

It is not efficient comparing each segment type to find node or data.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: remove unnecessary white spaces]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

61ae45c8

f2fs: merge read IOs at ra_nat_pages() · 7107e0a9

由 Jaegeuk Kim 提交于 11月 21, 2013

Change log from v1:
  o add mark_page_accessed() not to reclaim the nat pages.

This patch changes the policy of submitting read bios at ra_nat_pages.

Previously, f2fs submits small read bios with block plugging.
But, with this patch, f2fs itself merges read bios first and then submits a
large bio, which can reduce the bio handling overheads.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

7107e0a9

f2fs: add a new function to support for merging contiguous read · 924b720b

由 Chao Yu 提交于 11月 20, 2013

For better read performance, we add a new function to support for merging
contiguous read as the one for write.

v1-->v2:
 o add declarations here as Gu Zheng suggested.
 o use new structure f2fs_bio_info introduced by Jaegeuk Kim.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Acked-by: NGu Zheng <guz.fnst@cn.fujitsu.com>

924b720b

G
f2fs: move the list_head initialization into the lock protection region · ce3b7d80
由 Gu Zheng 提交于 11月 19, 2013
```
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
```
ce3b7d80

f2fs: simplify write_orphan_inodes for better readable · 502c6e0b

由 Gu Zheng 提交于 11月 19, 2013

Simplify write_orphan_inodes for better readable. Because we hold the
orphan_inode_mutex, so it's safe to use list_for_each_entry instead of
list_for_each_safe.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

502c6e0b

G
f2fs: convert inc/dec_valid_node_count to inc/dec one count · ef86d709
由 Gu Zheng 提交于 11月 19, 2013
```
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
```
ef86d709

f2fs: convert dev_valid_block_count to void · da19b0dc

由 Gu Zheng 提交于 11月 19, 2013

Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

da19b0dc

f2fs: convert remove_inode_page to void · 58e674d6

由 Gu Zheng 提交于 11月 19, 2013

Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

58e674d6

f2fs: introduce a bio array for per-page write bios · 1ff7bd3b

由 Jaegeuk Kim 提交于 11月 19, 2013

The f2fs has three bio types, NODE, DATA, and META, and manages some data
structures per each bio types.

The codes are a little bit messy, thus, this patch introduces a bio array
which groups individual data structures as follows.

struct f2fs_bio_info {
	struct bio *bio;		/* bios to merge */
	sector_t last_block_in_bio;	/* last block number */
	struct mutex io_mutex;		/* mutex for bio */
};

struct f2fs_sb_info {
	...
	struct f2fs_bio_info write_io[NR_PAGE_TYPE];	/* for write bios */
	...
};

The code changes from this new data structure are trivial.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

1ff7bd3b

f2fs: disable the extent cache ops on high fragmented files · c11abd1a

由 Jaegeuk Kim 提交于 11月 19, 2013

The f2fs manages an extent cache to search a number of consecutive data blocks
very quickly.

However it conducts unnecessary cache operations if the file is highly
fragmented with no valid extent cache.

In such the case, we don't need to handle the extent cache, but just can disable
the cache facility.

Nevertheless, this patch gives one more chance to enable the extent cache.

For example,
1. create a file
2. write data sequentially which produces a large valid extent cache
3. update some data, resulting in a fragmented extent
4. if the fragmented extent is too small, then drop extent cache
5. close the file

6. open the file again
7. give another chance to make a new extent cache
8. write data sequentially again which creates another big extent cache.
...
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

c11abd1a

f2fs: use sbi->write_mutex for write bios · 971767ca

由 Jaegeuk Kim 提交于 11月 18, 2013

This patch removes an unnecessary semaphore (i.e., sbi->bio_sem).
There is no reason to use the semaphore when f2fs submits read and write IOs.
Instead, let's use a write mutex and cover the sbi->bio[] by the lock.

Change log from v1:
 o split write_mutex suggested by Chao Yu

Chao described,
"All DATA/NODE/META bio buffers in superblock is protected by
'sbi->write_mutex', but each bio buffer area is independent, So we
should split write_mutex to three for DATA/NODE/META."
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

971767ca

f2fs: clean up the do_submit_bio flow · 7d5e5109

由 Jaegeuk Kim 提交于 11月 18, 2013

This patch introduces PAGE_TYPE_OF_BIO() and cleans up do_submit_bio() with it.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

7d5e5109

f2fs: use f2fs_put_page to release page for uniform style · 75c3c8bc

由 Chao Yu 提交于 11月 16, 2013

We should use f2fs_put_page to release page for uniform style of f2fs code.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

75c3c8bc

f2fs: add a tracepoint for f2fs_issue_discard · 1661d07c

由 Jaegeuk Kim 提交于 11月 12, 2013

This patch adds a tracepoint for f2fs_issue_discard.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

1661d07c

f2fs: introduce f2fs_issue_discard() to clean up · 37208879

由 Jaegeuk Kim 提交于 11月 12, 2013

Change log from v1:
 o fix 32bit drops reported by Dan Carpenter

This patch adds f2fs_issue_discard() to clean up blkdev_issue_discard() flows.

Dan carpenter reported:
"block_t is a 32 bit type and sector_t is a 64 bit type.  The upper 32
bits of the sector_t are not used because the shift will wrap."
Bug-Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

37208879

f2fs: add a sysfs entry to control max_discards · 7ac8c3b0

由 Jaegeuk Kim 提交于 11月 12, 2013

If frequent small discards are issued to the device, the performance would
be degraded significantly.
So, this patch adds a sysfs entry to control the number of discards to be
issued during a checkpoint procedure.

By default, f2fs does not issue any small discards, which means max_discards
is zero.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

7ac8c3b0

f2fs: add key functions for small discards · b2955550

由 Jaegeuk Kim 提交于 11月 12, 2013

This patch adds key functions to activate the small discard feature.

Note that this procedure is conducted during the checkpoint only.

In flush_sit_entries(), when a new dirty sit entry is flushed, f2fs calls
add_discard_addrs() which searches candidates to be discarded.
The candidates should be marked *invalidated* and also previous checkpoint
recognizes it as *valid*.

At the end of a checkpoint procedure, f2fs throws discards based on the
discard entry list.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b2955550

f2fs: add a slab cache entry for small discards · 7fd9e544

由 Jaegeuk Kim 提交于 11月 15, 2013

This patch adds a slab cache entry for small discards.

Each entry consists of:

struct discard_entry {
	struct list_head list;	/* list head */
	block_t blkaddr;	/* block address to be discarded */
	int len;		/* # of consecutive blocks of the discard */
};
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

7fd9e544

f2fs: improve searching speed of __next_free_blkoff · e81c93cf

由 Changman Lee 提交于 11月 15, 2013

To find a zero bit using the result of OR operation between ckpt_valid_map
and cur_valid_map is more fast than find a zero bit in each bitmap.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: adjust changed function name]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

e81c93cf

f2fs: introduce __find_rev_next(_zero)_bit · 9a7f143a

由 Changman Lee 提交于 11月 15, 2013

When f2fs_set_bit is used, in a byte MSB and LSB is reversed,
in that case we can use __find_rev_next_bit or __find_rev_next_zero_bit.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: change the function names]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

9a7f143a

aio: clean up and fix aio_setup_ring page mapping · 3dc9acb6

由 Linus Torvalds 提交于 12月 20, 2013

Since commit 36bc08cc ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.

However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them.  And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.

Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).

So clean it all up, making it much more straightforward.  Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit d6c355c7 ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.
Tested-and-acked-by: NBenjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dc9acb6

22 12月, 2013 2 次提交

aio/migratepages: make aio migrate pages sane · 8e321fef

由 Benjamin LaHaise 提交于 12月 21, 2013

The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>

8e321fef

aio: fix kioctx leak introduced by "aio: Fix a trinity splat" · 1881686f

由 Benjamin LaHaise 提交于 12月 21, 2013

e34ecee2 reworked the percpu reference
counting to correct a bug trinity found.  Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put.  Add that reference count back in to fix things.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org

1881686f

21 12月, 2013 1 次提交

pstore: Don't allow high traffic options on fragile devices · df36ac1b

由 Luck, Tony 提交于 12月 18, 2013

Some pstore backing devices use on board flash as persistent
storage. These have limited numbers of write cycles so it
is a poor idea to use them from high frequency operations.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df36ac1b

17 12月, 2013 7 次提交

xfs: abort metadata writeback on permanent errors · ac8809f9

由 Dave Chinner 提交于 12月 12, 2013

If we are doing aysnc writeback of metadata, we can get write errors
but have nobody to report them to. At the moment, we simply attempt
to reissue the write from io completion in the hope that it's a
transient error.

When it's not a transient error, the buffer is stuck forever in
this loop, and we cannot break out of it. Eventually, unmount will
hang because the AIL cannot be emptied and everything goes downhill
from them.

To solve this problem, only retry the write IO once before aborting
it. We don't throw the buffer away because some transient errors can
last minutes (e.g.  FC path failover) or even hours (thin
provisioned devices that have run out of backing space) before they
go away. Hence we really want to keep trying until we can't try any
more.

Because the buffer was not cleaned, however, it does not get removed
from the AIL and hence the next pass across the AIL will start IO on
it again. As such, we still get the "retry forever" semantics that
we currently have, but we allow other access to the buffer in the
mean time. Meanwhile the filesystem can continue to modify the
buffer and relog it, so the IO errors won't hang the log or the
filesystem.

Now when we are pushing the AIL, we can see all these "permanent IO
error" buffers and we can issue a warning about failures before we
retry the IO. We can also catch these buffers when unmounting an
issue a corruption warning, too.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

ac8809f9

xfs: swalloc doesn't align allocations properly · 33177f05

由 Dave Chinner 提交于 12月 12, 2013

When swalloc is specified as a mount option, allocations are
supposed to be aligned to the stripe width rather than the stripe
unit of the underlying filesystem. However, it does not do this.

What the implementation does is round up the allocation size to a
stripe width, hence ensuring that all allocations span a full stripe
width. It does not, however, ensure that that allocation is aligned
to a stripe width, and hence the allocations can span multiple
underlying stripes and so still see RMW cycles for things like
direct IO on MD RAID.

So, if the swalloc mount option is set, change the allocation
alignment in xfs_bmap_btalloc() to use the stripe width rather than
the stripe unit.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

33177f05

xfs: remove xfsbdstrat error · 83a0adc3

由 Christoph Hellwig 提交于 12月 17, 2013

The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
handles the case of a shut down filesystem.  Most of the users have private,
uncached buffers that can just be freed in this case, but the complex error
handling in xfs_bioerror_relse messes up the case when it's called without
a locked buffer.

Remove xfsbdstrat and opencode the error handling in the callers.  All but
one can simply return an error and don't need to deal with buffer state,
and the one caller that cares about the buffer state could do with a major
cleanup as well, but we'll defer that to later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

83a0adc3

xfs: align initial file allocations correctly · 6e708bcf

由 Dave Chinner 提交于 11月 22, 2013

The function xfs_bmap_isaeof() is used to indicate that an
allocation is occurring at or past the end of file, and as such
should be aligned to the underlying storage geometry if possible.

Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the
behaviour of this function for empty files - it turned off
allocation alignment for this case accidentally. Hence large initial
allocations from direct IO are not getting correctly aligned to the
underlying geometry, and that is cause write performance to drop in
alignment sensitive configurations.

Fix it by considering allocation into empty files as requiring
aligned allocation again.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit f9b395a8)

6e708bcf

xfs: fix infinite loop by detaching the group/project hints from user dquot · 718cc6f8

由 Jie Liu 提交于 11月 26, 2013

xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
  # mount -ouquota,gquota /dev/sda7 /xfs
  # mkdir /xfs/test
  # xfs_quota -xc 'off -g' /xfs <-- hangs up
  # echo w > /proc/sysrq-trigger
  # dmesg

  SysRq : Show Blocked State
  task                        PC stack   pid father
  xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
  [snip]
  Call Trace:
  [<ffffffff81aaa21d>] schedule+0xad/0xc0
  [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
  [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
  [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
  [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
  [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
  [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
  [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
  [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
  [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
  [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
  [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
  [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
  [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
  [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
  [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
  [<ffffffff812fba4b>] ? iput+0x5b/0x90
  [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
  [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s).  Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
  [ b84a3a96 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

Cc: stable@vger.kernel.org
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit df8052e7)

718cc6f8

xfs: fix assertion failure at xfs_setattr_nonsize · 5c227278

由 Jie Liu 提交于 11月 26, 2013

For CRC enabled v5 super block, change a file's ownership can simply
trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
project quota are enabled, i.e,

[  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
[  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
[  305.383939] Call Trace:
[  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
[  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
[  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
[  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
[  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
[  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
[  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]

This fix adjust the assertion to check if the super block support both
quota inodes or not.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit 5a01dd54)

5c227278

xfs: fix false assertion at xfs_qm_vop_create_dqattach · 30d161c9

由 Jie Liu 提交于 11月 26, 2013

After the previous fix, there still has another ASSERT failure if turning
off any type of quota while fsstress is running at the same time.

Backtrace in this case:

[   50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
[   50.867924] ------------[ cut here ]------------
... <snip>
[   50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
[   50.867999] invalid opcode: 0000 [#1] SMP
[   50.869407] Call Trace:
[   50.869446]  [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
[   50.869512]  [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
[   50.869564]  [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
[   50.869615]  [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
[   50.869655]  [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
[   50.869689]  [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
[   50.869723]  [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
[   50.869757]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[   50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
[   50.870003] RIP  [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
[   50.870050]  RSP <ffff88002941fd60>
[   50.879251] ---[ end trace c93a2b342341c65b ]---

We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
however the assertion itself is not right IMHO.  While performing quota off, we
firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
any special locks, see xfs_qm_scall_quotaoff().  Hence there is no guarantee
that the desired quota is still active.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit 37eb9706)

30d161c9