提交 · 8b06bc592ebc5a31e8d0b9c2ab17c6e78dde1f86 · openeuler / raspberrypi-kernel

26 3月, 2010 3 次提交

ocfs2: Grow discontig block groups in one transaction. · 8b06bc59

由 Joel Becker 提交于 3月 26, 2010

Rather than extending the transaction every time we add an extent to a
discontiguous block group, we grab enough credits to fill the extent
list up front.  This means we can free the bits in the same transaction
if we end up not getting enough space.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8b06bc59

ocfs2: Set suballoc_loc on allocated metadata. · 2b6cb576

由 Joel Becker 提交于 3月 26, 2010

Get the suballoc_loc from ocfs2_claim_new_inode() or
ocfs2_claim_metadata().  Store it on the appropriate field of the block
we just allocated.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

2b6cb576

ocfs2: Return allocated metadata blknos on the ocfs2_suballoc_result. · ba206635

由 Joel Becker 提交于 3月 26, 2010

Rather than calculating the resulting block number, return it on the
ocfs2_suballoc_result structure.  This way we can calculate block
numbers for discontiguous block groups.

Cluster groups keep doing it the old way.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

ba206635

06 5月, 2010 1 次提交

ocfs2: ocfs2_claim_*() don't need an ocfs2_super argument. · 1ed9b777

由 Joel Becker 提交于 5月 06, 2010

They all take an ocfs2_alloc_context, which has the allocation inode.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

1ed9b777

26 3月, 2010 3 次提交

ocfs2: Trim suballocations if they cross discontiguous regions · 13e434cf

由 Joel Becker 提交于 3月 26, 2010

A discontiguous block group can find a range of free bits that straddle
more than one region of its space. Callers can't handle that, so we
trim the returned bits until they fit within one region.

Only cluster allocations ask for min_bits>1. Discontiguous block groups
are only for block allocations. So min_bits doesn't matter here.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

13e434cf

J
ocfs2: ocfs2_claim_suballoc_bits() doesn't need an osb argument. · aa8f8e93
由 Joel Becker 提交于 3月 26, 2010
```
It's contained on ac->ac_inode->i_sb anyway.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
```
aa8f8e93

ocfs2: Add suballoc_loc to metadata blocks. · 9cbc0123

由 Joel Becker 提交于 3月 26, 2010

We need a suballoc_loc field on any suballocated block.  Define them.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

9cbc0123

13 4月, 2010 3 次提交

ocfs2: Pass suballocation results back via a structure. · 7d1fe093

由 Joel Becker 提交于 4月 13, 2010

We're going to be adding more info to a suballocator allocation.  Rather
than growing every function in the chain, let's pass a result structure
around.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

7d1fe093

ocfs2: Allocate discontiguous block groups. · 798db35f

由 Joel Becker 提交于 4月 13, 2010

If we cannot get a contiguous region for a block group, allocate a
discontiguous one when the filesystem supports it.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

798db35f

ocfs2: Define data structures for discontiguous block groups. · 4cbe4249

由 Joel Becker 提交于 4月 13, 2010

Defines the OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG feature bit and modifies
struct ocfs2_group_desc for the feature.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

4cbe4249

06 5月, 2010 18 次提交

ocfs2/dlm: Increase o2dlm lockres hash size · 0467ae95

由 Sunil Mushran 提交于 5月 05, 2010

Lockres hash size of 16KB is far too small for large filesystems (where we
have hundreds of thousands of lock resources stored in the table).
This patch increases it to 128KB.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0467ae95

ocfs2: Make ocfs2_extend_trans() really extend. · c901fb00

由 Tao Ma 提交于 4月 26, 2010

In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's
blocks. But if jbd2_journal_extend() fails, it will only restart
with the the new number of blocks. This tends to be awkward since
in most cases we want additional reserved blocks. It makes our code
harder to mantain since the caller can't be sure all the original
blocks will not be accessed and dirtied again. There are 15 callers
of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add
h_buffer_credits before they call ocfs2_extend_trans(). This makes
ocfs2_extend_trans() really extend atop the original block count.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

c901fb00

ocfs2/trivial: Code cleanup for allocation reservation. · 3e4218df

由 Tao Ma 提交于 4月 06, 2010

Two tiny cleanup for allocation reservation.
1. Remove some extra codes in ocfs2_local_alloc_find_clear_bits.
2. Remove an unuseful variables in ocfs2_find_resv_lhs.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

3e4218df

ocfs2: make ocfs2_adjust_resv_from_alloc simple. · b065556a

由 Tao Ma 提交于 4月 08, 2010

When we allocate some bits from the reservation, we always
allocate from the r_start(see ocfs2_resmap_resv_bits).
So there should be no reason to check between r_start
and start. And I don't think we will change this behaviour
later by allocating from some bits after r_start.  Why not make
ocfs2_adjust_resv_from_alloc simple for now?

The only chance we have to adjust the reservation is when we haven't
reached the end. With this patch, the function is more readable.

Note:
btw, this patch also fixes an original bug in the function
which I haven't found before.
	if (end < ocfs2_resv_end(resv))
		rhs = end - ocfs2_resv_end(resv);
This code is of course buggy. ;)
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b065556a

ocfs2: Make nointr a default mount option · 4b37fcb7

由 Sunil Mushran 提交于 4月 13, 2010

OCFS2 has never really supported intr. This patch acknowledges this reality
and makes nointr the default mount option. In a later patch, we intend to
support intr.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

4b37fcb7

ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE · 5c80d4c9

由 Sunil Mushran 提交于 4月 13, 2010

o2dlm join and leave messages are more than informational as they are
required for debugging locking issues. This patch changes them from
KERN_INFO to KERN_NOTICE.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

5c80d4c9

o2net: log socket state changes · 23fd9abd

由 Srinivas Eeda 提交于 3月 31, 2010

This patch logs socket state changes that lead to socket shutdown.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

23fd9abd

ocfs2: print node # when tcp fails · a5196ec5

由 Wengang Wang 提交于 3月 30, 2010

Print the node number of a peer node if sending it a message failed.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a5196ec5

ocfs2: Add dir_resv_level mount option · 83f92318

由 Mark Fasheh 提交于 4月 05, 2010

The default behavior for directory reservations stays the same, but we add a
mount option so people can tweak the size of directory reservations
according to their workloads.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

83f92318

ocfs2: change default reservation window sizes · b07f8f24

由 Mark Fasheh 提交于 4月 05, 2010

The default reservation size of 4 (32-bit windows) is a bit too ambitious.
Scale it back to 16 bits (resv_level=2). I have been testing various sizes
on a 4-node cluster which runs a mixed workload that is heavily threaded.
With a 256MB local alloc, I get *roughly* the following levels of average file
fragmentation:

resv_level=0	70%
resv_level=1	21%
resv_level=2	23%
resv_level=3	24%
resv_level=4	60%
resv_level=5	did not test
resv_level=6	60%

resv_level=2 seemed like a good compromise between not letting windows be
too small, but not so big that heavier workloads will immediately suffer
without tuning.

This patch also change the behavior of directory reservations - they now
track file reservations.  The previous compromise of giving directory
windows only 8 bits wound up fragmenting more at some window sizes because
file allocations had smaller unused windows to poach from.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b07f8f24

ocfs2: increase the default size of local alloc windows · 6b82021b

由 Mark Fasheh 提交于 4月 05, 2010

I have observed that the current size of 8M gives us pretty poor
fragmentation on multi-threaded workloads which do lots of writes.

Generally, I can increase the size of local alloc windows and observe a
marked decrease in fragmentation, even up and beyond window sizes of 512
megabytes. This makes sense for a couple reasons - larger local alloc means
more room for reservation windows. On multi-node workloads the larger local
alloc helps as well because we don't have to do window slides as often.

Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no
longer used and the comment above it was out of date.

To test fragmentation, I used a workload which launched 4 threads that did
4k writes into a series of about 140 alternating files.

With resv_level=2, and a 4k/4k file system I observed the following average
fragmentation for various localalloc= parameters:

localalloc=	avg. fragmentation
	8		48
	32		16
	64		10
	120		7

On larger cluster sizes, the difference is more dramatic.

The new default size top out at 256M, which we'll only get for cluster
sizes of 32K and above.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

6b82021b

ocfs2: clean up localalloc mount option size parsing · 73c8a800

由 Mark Fasheh 提交于 4月 05, 2010

This patch pulls the local alloc sizing code into localalloc.c and provides
a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged
except that I correctly calculate the maximum local alloc size. The old code
in ocfs2_parse_options() calculated the max size as:

ocfs2_local_alloc_size(sb) * 8

which is correct, in bits. Unfortunately though the option passed in is in
megabytes. Ultimately, this bug made no real difference - the shrink code
would catch a too-large size and bring it down to something reasonable.
Still, it's less than efficient as-is.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

73c8a800

ocfs2: remove ocfs2_local_alloc_in_range() · a57c8fd2

由 Mark Fasheh 提交于 3月 16, 2010

Inodes are always allocated from the global bitmap now so we don't need this
any more. Also, the existing implementation bounces reservations around
needlessly.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a57c8fd2

ocfs2: allocate btree internal block groups from the global bitmap · 33d5d380

由 Mark Fasheh 提交于 2月 24, 2010

Otherwise, the need for a very large contiguous allocation tends to
wreak havoc on many inode allocation reservations on the local alloc, thus
ruining any chances for contiguousness.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

33d5d380

ocfs2: use allocation reservations for directory data · e3b4a97d

由 Mark Fasheh 提交于 12月 07, 2009

Use the reservations system for unindexed dir tree allocations. We don't
bother with the indexed tree as reads from it are mostly random anyway.
Directory reservations are marked seperately, to allow the reservations code
a chance to optimize their window sizes. This patch allocates only 8 bits
for directory windows as they generally are not expected to grow as quickly
as file data. Future improvements to dir window sizing can trivially be
made.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e3b4a97d

ocfs2: use allocation reservations during file write · 4fe370af

由 Mark Fasheh 提交于 12月 07, 2009

Add a per-inode reservations structure and pass it through to the
reservations code.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

4fe370af

ocfs2: allocation reservations · d02f00cc

由 Mark Fasheh 提交于 12月 07, 2009

This patch improves Ocfs2 allocation policy by allowing an inode to
reserve a portion of the local alloc bitmap for itself. The reserved
portion (allocation window) is advisory in that other allocation
windows might steal it if the local alloc bitmap becomes
full. Otherwise, the reservations are honored and guaranteed to be
free. When the local alloc window is moved to a different portion of
the bitmap, existing reservations are discarded.

Reservation windows are represented internally by a red-black
tree. Within that tree, each node represents the reservation window of
one inode. An LRU of active reservations is also maintained. When new
data is written, we allocate it from the inodes window. When all bits
in a window are exhausted, we allocate a new one as close to the
previous one as possible. Should we not find free space, an existing
reservation is pulled off the LRU and cannibalized.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

d02f00cc

ocfs2: Make ocfs2_journal_dirty() void. · ec20cec7

由 Joel Becker 提交于 3月 19, 2010

jbd[2]_journal_dirty_metadata() only returns 0.  It's been returning 0
since before the kernel moved to git.  There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning.  All over ocfs2, we have blocks of code checking this can't
fail status.  In the past few years, we've tried to avoid adding these
checks, because they are pointless.  But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function.  All error
checking is removed from other files.  We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday.  They
won't.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

ec20cec7

24 3月, 2010 1 次提交

ocfs2: Clear undo bits when local alloc is freed · b4414eea

由 Mark Fasheh 提交于 3月 11, 2010

When the local alloc file changes windows, unused bits are freed back to the
global bitmap. By defnition, those bits can not be in use by any file. Also,
the local alloc will never have been able to allocate those bits if they
were part of a previous truncate. Therefore it makes sense that we should
clear unused local alloc bits in the undo buffer so that they can be used
immediatly.

[ Modified to call it ocfs2_release_clusters() -- Joel ]
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b4414eea

20 3月, 2010 2 次提交

ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block. · b2317968

由 Tao Ma 提交于 3月 19, 2010

You can't store a pointer that you haven't filled in yet and expect it
to work.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b2317968

ocfs2: Fix the update of name_offset when removing xattrs · dfe4d3d6

由 Tao Ma 提交于 3月 19, 2010

When replacing a xattr's value, in some case we wipe its name/value
first and then re-add it. The wipe is done by
ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or
block. We currently adjust name_offset for all the entries which have
(offset < name_offset). This does not adjust the entrie we're replacing.
Since we are replacing the entry, we don't adjust the total entry count.
When we calculate a new namevalue location, we trust the entries
now-wrong offset in ocfs2_xa_get_free_start().  The solution is to
also adjust the name_offset for the replaced entry, allowing
ocfs2_xa_get_free_start() to calculate the new namevalue location
correctly.

The following script can trigger a kernel panic easily.

echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE
mount -t ocfs2 $DEVICE $MNT_DIR
FILE=$MNT_DIR/$RANDOM
for((i=0;i<76;i++))
do
string_76="a$string_76"
done
string_78="aa$string_76"
string_82="aaaa$string_78"

touch $FILE
setfattr -n 'user.test1234567890' -v $string_76 $FILE
setfattr -n 'user.test1234567890' -v $string_78 $FILE
setfattr -n 'user.test1234567890' -v $string_82 $FILE
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

dfe4d3d6

19 3月, 2010 1 次提交

ocfs2: Always try for maximum bits with new local alloc windows · b22b63eb

由 Mark Fasheh 提交于 3月 11, 2010

What we were doing before was to ask for the current window size as the
maximum allocation. This had the effect of limiting the amount of allocation
we could get for the local alloc during times when the window size was
shrunk due to fragmentation. In some cases, that could actually *increase*
fragmentation by artificially limiting the number of bits we can accept. So
while we still want to ask for a minimum number of bits equal to window
size, there is no reason why we should limit the number of bits the local
alloc should accept. Hence always allow the maximum number of local alloc
bits.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b22b63eb

18 3月, 2010 4 次提交

ocfs2: set i_mode on disk during acl operations · fcefd25a

由 Mark Fasheh 提交于 3月 15, 2010

ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory
inode, but never setting it on the disk copy. Thus, acls were some times not
getting propagated between nodes. This patch fixes the issue by adding a
helper function ocfs2_acl_set_mode() which does this the right way.
ocfs2_set_acl() and ocfs2_init_acl() are then updated to call
ocfs2_acl_set_mode().
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

fcefd25a

ocfs2: Update i_blocks in reflink operations. · 6527f8f8

由 Tao Ma 提交于 3月 10, 2010

In reflink, we need to upate i_blocks for the target inode.
Reported-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

6527f8f8

ocfs2: Change bg_chain check for ocfs2_validate_gd_parent. · 78c37eb0

由 Tao Ma 提交于 3月 03, 2010

In ocfs2_validate_gd_parent, we check bg_chain against the
cl_next_free_rec of the dinode. Actually in resize, we have
the chance of bg_chain == cl_next_free_rec. So add some
additional condition check for it.

I also rename paramter "clean_error" to "resize", since the
old one is not clearly enough to indicate that we should only
meet with this case in resize.

btw, the correpsonding bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

78c37eb0

· ee860b6a

由 Sachin Prabhu 提交于 3月 10, 2010

[PATCH] Skip check for mandatory locks when unlocking

ocfs2_lock() will skip locks on file which has mode set to 02666. This
is a problem in cases where the mode of the file is changed after a
process has obtained a lock on the file.

ocfs2_lock() should skip the check for mandatory locks when unlocking a
file.
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

ee860b6a

15 3月, 2010 4 次提交

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 · a3d3203e

由 Linus Torvalds 提交于 3月 14, 2010

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (34 commits)
  ACPI: processor: push file static MADT pointer into internal map_madt_entry()
  ACPI: processor: refactor internal map_lsapic_id()
  ACPI: processor: refactor internal map_x2apic_id()
  ACPI: processor: refactor internal map_lapic_id()
  ACPI: processor: driver doesn't need to evaluate _PDC
  ACPI: processor: remove early _PDC optin quirks
  ACPI: processor: add internal processor_physically_present()
  ACPI: processor: move acpi_get_cpuid into processor_core.c
  ACPI: processor: export acpi_get_cpuid()
  ACPI: processor: mv processor_pdc.c processor_core.c
  ACPI: processor: mv processor_core.c processor_driver.c
  ACPI: plan to delete "acpi=ht" boot option
  ACPI: remove "acpi=ht" DMI blacklist
  PNPACPI: add bus number support
  PNPACPI: add window support
  resource: add window support
  resource: add bus number support
  resource: expand IORESOURCE_TYPE_BITS to make room for bus resource type
  acpiphp: Execute ACPI _REG method for hotadded devices
  ACPI video: Be more liberal in validating _BQC behaviour
  ...

a3d3203e

init dynamic bin_attribute structures · f937331b

由 Wolfram Sang 提交于 3月 15, 2010

Commit 6992f533 ("sysfs: Use one lockdep
class per sysfs attribute.") introduced this requirement.  First, at25
was fixed manually.  Then, other occurences were found with coccinelle
and the following semantic patch.  Results were reviewed and fixed up:

    @ init @
    identifier struct_name, bin;
    @@

    	struct struct_name {
    		...
    		struct bin_attribute bin;
    		...
    	};

    @ main extends init @
    expression E;
    statement S;
    identifier name, err;
    @@

    (
    	struct struct_name *name;
    |
    -	struct struct_name *name = NULL;
    +	struct struct_name *name;
    )
    	...
    (
    	sysfs_bin_attr_init(&name->bin);
    |
    +	sysfs_bin_attr_init(&name->bin);
    	if (sysfs_create_bin_file(E, &name->bin))
    		S
    |
    +	sysfs_bin_attr_init(&name->bin);
    	err = sysfs_create_bin_file(E, &name->bin);
    )
Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f937331b

Merge branches 'battery-2.6.34', 'bugzilla-10805', 'bugzilla-14668',... · ec28dcc6

由 Len Brown 提交于 3月 14, 2010

Merge branches 'battery-2.6.34', 'bugzilla-10805', 'bugzilla-14668', 'bugzilla-531916-power-state', 'ht-warn-2.6.34', 'pnp', 'processor-rename', 'sony-2.6.34', 'suse-bugzilla-531547', 'tz-check', 'video' and 'misc-2.6.34' into release

ec28dcc6

ACPI: processor: push file static MADT pointer into internal map_madt_entry() · 149fe9c2

由 Alex Chiang 提交于 2月 22, 2010

There's no real need for a pointer to the MADT to be global. The only
function who uses it is map_madt_entry.

This allows us to remove some more ugly #ifdefs.
Acked-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

149fe9c2