提交 · b4d693fcc5fe99ed211addb5c6a0f8398f0b266e · openeuler / Kernel

10 9月, 2010 19 次提交

ocfs2: Cache system inodes of other slots. · b4d693fc

由 Tao Ma 提交于 8月 16, 2010

Durring orphan scan, if we are slot 0, and we are replaying
orphan_dir:0001, the general process is that for every file
in this dir:
1. we will iget orphan_dir:0001, since there is no inode for it.
   we will have to create an inode and read it from the disk.
2. do the normal work, such as delete_inode and remove it from
   the dir if it is allowed.
3. call iput orphan_dir:0001 when we are done. In this case,
   since we have no dcache for this inode, i_count will
   reach 0, and VFS will have to call clear_inode and in
   ocfs2_clear_inode we will checkpoint the inode which will let
   ocfs2_cmt and journald begin to work.
4. We loop back to 1 for the next file.

So you see, actually for every deleted file, we have to read the
orphan dir from the disk and checkpoint the journal. It is very
time consuming and cause a lot of journal checkpoint I/O.
A better solution is that we can have another reference for these
inodes in ocfs2_super. So if there is no other race among
nodes(which will let dlmglue to checkpoint the inode), for step 3,
clear_inode won't be called and for step 1, we may only need to
read the inode for the 1st time. This is a big win for us.

So this patch will try to cache system inodes of other slots so
that we will have one more reference for these inodes and avoid
the extra inode read and journal checkpoint.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b4d693fc

libfs: Fix shift bug in generic_check_addressable() · a33f13ef

由 Joel Becker 提交于 8月 16, 2010

generic_check_addressable() erroneously shifts pages down by a block
factor when it should be shifting up.  To prevent overflow, we shift
blocks down to pages.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a33f13ef

OCFS2: Allow huge (> 16 TiB) volumes to mount · 3bdb8efd

由 Patrick J. LoPresti 提交于 7月 22, 2010

The OCFS2 developers have already done all of the hard work to allow
volumes larger than 16 TiB.  But there is still a "sanity check" in
fs/ocfs2/super.c that prevents the mounting of such volumes, even when
the cluster size and journal options would allow it.

This patch replaces that sanity check with a more sophisticated one to
mount a huge volume provided that (a) it is addressable by the raw
word/address size of the system (borrowing a test from ext4); (b) the
volume is using JBD2; and (c) the JBD2_FEATURE_INCOMPAT_64BIT flag is
set on the journal.

I factored out the sanity check into its own function.  I also moved it
from ocfs2_initialize_super() down to ocfs2_check_volume(); any earlier,
and the journal will not have been initialized yet.

This patch is one of a pair, and it depends on the other ("JBD2: Allow
feature checks before journal recovery").

I have tested this patch on small volumes, huge volumes, and huge
volumes without 64-bit block support in the journal.  All of them appear
to work or to fail gracefully, as appropriate.
Signed-off-by: NPatrick LoPresti <lopresti@gmail.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

3bdb8efd

JBD2: Allow feature checks before journal recovery · 1113e1b5

由 Patrick J. LoPresti 提交于 7月 22, 2010

Before we start accessing a huge (> 16 TiB) OCFS2 volume, we need to
confirm that its journal supports 64-bit offsets.  In particular, we
need to check the journal's feature bits before recovering the journal.

This is not possible with JBD2 at present, because the journal
superblock (where the feature bits reside) is not loaded from disk until
the journal is recovered.

This patch loads the journal superblock in
jbd2_journal_check_used_features() if it has not already been loaded,
allowing us to check the feature bits before journal recovery.
Signed-off-by: NPatrick LoPresti <lopresti@gmail.com>
Cc: linux-ext4@vger.kernel.org
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

1113e1b5

ext3/ext4: Factor out disk addressability check · 30ca22c7

由 Patrick J. LoPresti 提交于 7月 22, 2010

As part of adding support for OCFS2 to mount huge volumes, we need to
check that the sector_t and page cache of the system are capable of
addressing the entire volume.

An identical check already appears in ext3 and ext4.  This patch moves
the addressability check into its own function in fs/libfs.c and
modifies ext3 and ext4 to invoke it.

[Edited to -EINVAL instead of BUG_ON() for bad blocksize_bits -- Joel]
Signed-off-by: NPatrick LoPresti <lopresti@gmail.com>
Cc: linux-ext4@vger.kernel.org
Acked-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

30ca22c7

J

Merge branch 'cow_readahead' of git://oss.oracle.com/git/tma/linux-2.6 into merge-2 · 729963a1
由 Joel Becker 提交于 9月 10, 2010

729963a1
T
ocfs2: Remove obsolete comments before ocfs2_start_trans. · 17ae5211
由 Tao Ma 提交于 8月 02, 2010
```
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
```
17ae5211
T
ocfs2: Remove unused old_id in ocfs2_commit_cache. · f9c57ada
由 Tao Ma 提交于 8月 02, 2010
```
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
```
f9c57ada

ocfs2: Remove ocfs2_sync_inode() · 4c38881f

由 Jan Kara 提交于 8月 05, 2010

ocfs2_sync_inode() is used only from ocfs2_sync_file(). But all data has
already been written before calling ocfs2_sync_file() and ocfs2 doesn't use
inode's private_list for tracking metadata buffers thus sync_mapping_buffers()
is superfluous as well.
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

4c38881f

Reorganize data elements to reduce struct sizes · 83fd9c7f

由 Goldwyn Rodrigues 提交于 6月 10, 2010

Thanks for the comments. I have incorportated them all.

CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled.
Statistics now look like -
ocfs2_write_ctxt: 2144 - 2136 = 8
ocfs2_inode_info: 1960 - 1848 = 112
ocfs2_journal: 168 - 160 = 8
ocfs2_lock_res: 336 - 304 = 32
ocfs2_refcount_tree: 512 - 472 = 40
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

83fd9c7f

ocfs2: Remove obscure error handling in direct_write. · 95fa859a

由 Tao Ma 提交于 6月 09, 2010

In ocfs2, actually we don't allow any direct write pass i_size,
see the function ocfs2_prepare_inode_for_write. So we don't
need the bogus simple_setsize.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

95fa859a

ocfs2: Add some trace log for orphan scan. · 3c3f20c9

由 Tao Ma 提交于 6月 01, 2010

Now orphan scan worker has no trace log, so it is
very hard to tell whether it is finished or blocked.
So add 2 mlog trace log so that we can tell whether
the current orphan scan worker is blocked or not.
It does help when I analyzed a orphan scan bug.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

3c3f20c9

Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8. · ddee5cdb

由 Tristan Ye 提交于 5月 22, 2010

The reason why we need this ioctl is to offer the none-privileged
end-user a possibility to get filesys info gathering.

We use OCFS2_IOC_INFO to manipulate the new ioctl, userspace passes a
structure to kernel containing an array of request pointers and request
count, such as,

* From userspace:

struct ocfs2_info_blocksize oib = {
        .ib_req = {
                .ir_magic = OCFS2_INFO_MAGIC,
                .ir_code = OCFS2_INFO_BLOCKSIZE,
                ...
        }
        ...
}

struct ocfs2_info_clustersize oic = {
        ...
}

uint64_t reqs[2] = {(unsigned long)&oib,
                    (unsigned long)&oic};

struct ocfs2_info info = {
        .oi_requests = reqs,
        .oi_count = 2,
}

ret = ioctl(fd, OCFS2_IOC_INFO, &info);

* In kernel:

Get the request pointers from *info*, then handle each request one bye one.

Idea here is to make the spearated request small enough to guarantee
a better backward&forward compatibility since a small piece of request
would be less likely to be broken if filesys on raw disk get changed.

Currently, the following 7 requests are supported per the requirement from
userspace tool o2info, and I believe it will grow over time:-)

        OCFS2_INFO_CLUSTERSIZE
        OCFS2_INFO_BLOCKSIZE
        OCFS2_INFO_MAXSLOTS
        OCFS2_INFO_LABEL
        OCFS2_INFO_UUID
        OCFS2_INFO_FS_FEATURES
        OCFS2_INFO_JOURNAL_SIZE

This ioctl is only specific to OCFS2.
Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

ddee5cdb

Merge master.kernel.org:/home/rmk/linux-2.6-arm · 152831be

由 Linus Torvalds 提交于 9月 09, 2010

* master.kernel.org:/home/rmk/linux-2.6-arm: (30 commits)
  ARM: Update mach-types
  ARM: Partially revert "Auto calculate ZRELADDR and provide option for exceptions"
  ARM: Ensure PTE modifications via dma_alloc_coherent are visible
  ARM: 6359/1: ep93xx: move clock initialization earlier
  Revert "[ARM] pxa: remove now unnecessary dma_needs_bounce()"
  ARM: 6352/1: perf: fix event validation
  ARM: 6344/1: Mark CPU_32v6K as depended on CPU_V7
  ARM: 6343/1: wire up fanotify and prlimit64 syscalls on ARM
  ARM: 6330/1: perf: reword comments relating to perf_event_do_pending
  ARM: pxa168fb: fix section mismatch
  ARM: pxa: Make id const in pwm_probe()
  ARM: pxa: fix CI_HSYNC and CI_VSYNC MFP defines for pxa300
  ARM: pxa: remove __init from cpufreq_driver->init()
  ARM: imx: set cache line size to 64 bytes for i.MX5
  mx5/clock: fix clear bit fields issue in _clk_ccgr_disable function
  mxc/tzic: add base address when accessing TZIC registers
  ARM: mach-shmobile: ap4evb: fix write protect for SDHI1
  ARM: mach-shmobile: ap4evb: modify FSI2 ID
  ARM: mach-shmobile: do not enable the PLLC2 clock on init
  ARM: mach-shmobile: Clock framework comment fix
  ...

152831be

R
ARM: Update mach-types · a14d0404
由 Russell King 提交于 9月 09, 2010
```
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
```
a14d0404

ARM: Partially revert "Auto calculate ZRELADDR and provide option for exceptions" · 9e84ed63

由 Russell King 提交于 9月 09, 2010

Partially revert e69edc79, which introduced automatic zreladdr
support.  The change in the way the manual definition is defined
seems to be error and conflict prone.  Go back to the original way
we were handling this for the time being, while keeping the automatic
zreladdr facility.

Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

9e84ed63

R

Merge branch 'origin' · de9ea203
由 Russell King 提交于 9月 09, 2010

de9ea203

lglock: make lg_lock_global() actually lock globally · a73f8844

由 Jonathan Corbet 提交于 9月 08, 2010

lg_lock_global() currently only acquires spinlocks for online CPUs, but
it's meant to lock all possible CPUs.  Lglock-protected resources may be
associated with removed CPUs - and, indeed, that could happen with the
per-superblock open files lists.

At Nick's suggestion, change for_each_online_cpu() to
for_each_possible_cpu() to protect accesses to those resources.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a73f8844

mm: Move vma_stack_continue into mm.h · 39aa3cb3

由 Stefan Bader 提交于 8月 31, 2010

So it can be used by all that need to check for that.
Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39aa3cb3

09 9月, 2010 13 次提交

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · 26a94e81

由 Linus Torvalds 提交于 9月 09, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/nes: Fix hang with modified FIN handling on A0 cards
  RDMA/nes: Change state to closing after FIN
  RDMA/nes: Fix double CLOSE event indication crash
  RDMA/nes: Write correct register write to set TX pause param
  RDMA/cxgb3: Don't exceed the max HW CQ depth

26a94e81

Merge branch 'fixes' of git://oss.oracle.com/git/tma/linux-2.6 · cad46744

由 Linus Torvalds 提交于 9月 09, 2010

* 'fixes' of git://oss.oracle.com/git/tma/linux-2.6:
  ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan
  ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions
  ocfs2: allow return of new inode block location before allocation of the inode
  ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open coding
  ocfs2: split out inode alloc code from ocfs2_mknod_locked
  Ocfs2: Fix a regression bug from mainline commit(6b933c8e).
  ocfs2: Fix deadlock when allocating page
  ocfs2: properly set and use inode group alloc hint
  ocfs2: Use the right group in nfs sync check.
  ocfs2: Flush drive's caches on fdatasync
  ocfs2: make __ocfs2_page_mkwrite handle file end properly.
  ocfs2: Fix incorrect checksum validation error
  ocfs2: Fix metaecc error messages

cad46744

R

Merge branches 'cxgb3' and 'nes' into for-linus · 17859d07
由 Roland Dreier 提交于 9月 08, 2010

17859d07

RDMA/nes: Fix hang with modified FIN handling on A0 cards · 29da03b9

由 Faisal Latif 提交于 9月 01, 2010

Changing state to CLOSING when FIN is received causes A0 cards to
hang.  Fix this by checking for A0 cards in FIN handling.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

29da03b9

RDMA/nes: Change state to closing after FIN · 67d70721

由 Faisal Latif 提交于 8月 14, 2010

When the driver receives an AE for FIN received, it closes the
connection without changing the state of the connection in the
hardware to closing.  By changing the state to closing, hardware will
do a normal close sequence.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

67d70721

RDMA/nes: Fix double CLOSE event indication crash · dae58728

由 Faisal Latif 提交于 8月 14, 2010

During a stress testing in a large cluster, multiple close event are
detected and BUG() is hit in the iWARP core. The cause is that the
active node gave up while waiting for an MPA response from the peer
and tried to close the connection by sending RST. The passive node
driver receives the RST but is waiting for MPA response from the user.
When the MPA accept is received, the driver offloads the connection
and sends a CLOSE event. The driver gets an AE indicating RESET
received and also sends a CLOSE event, hitting a BUG().

Fix this by correcting RESET handling and sending CLOSE events.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

dae58728

RDMA/nes: Write correct register write to set TX pause param · 70c9db0f

由 Chien Tung 提交于 9月 07, 2010

Setting TX pause param writes to the wrong register location causing
the adapter to hang. Correct the define used to write the reigster.

Addresses: https://bugs.openfabrics.org/show_bug.cgi?id=2116Reported-by: NShiri Franchi <shirif@voltaire.com>
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

70c9db0f

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · cc491e27

由 Linus Torvalds 提交于 9月 08, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - fix device removal on unload
  Input: bcm5974 - adjust major/minor to scale
  Input: MT - initialize slots to unused
  Input: use PIT_TICK_RATE in vt beep ioctl
  Input: wacom - fix mousewheel handling for old wacom tablets

cc491e27

Merge branch 'semaphore-for-linus' of... · 2c20130f

由 Linus Torvalds 提交于 9月 08, 2010

Merge branch 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  semaphore: Add DEFINE_SEMAPHORE

2c20130f

Merge branch 'x86-fixes-for-linus' of... · 1faa6ec8

由 Linus Torvalds 提交于 9月 08, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
  io-mapping: Fix the address space annotations
  x86: Fix the address space annotations of iomap_atomic_prot_pfn()
  x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev

1faa6ec8

Merge branch 'core-fixes-for-linus' of... · 79637a41

由 Linus Torvalds 提交于 9月 08, 2010

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  gcc-4.6: kernel/*: Fix unused but set warnings
  mutex: Fix annotations to include it in kernel-locking docbook
  pid: make setpgid() system call use RCU read-side critical section
  MAINTAINERS: Add RCU's public git tree

79637a41

Merge branch 'perf-fixes-for-linus' of... · 899edae6

由 Linus Torvalds 提交于 9月 08, 2010

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Try to handle unknown nmis with an enabled PMU
  perf, x86: Fix handle_irq return values
  perf, x86: Fix accidentally ack'ing a second event on intel perf counter
  oprofile, x86: fix init_sysfs() function stub
  lockup_detector: Sync touch_*_watchdog back to old semantics
  tracing: Fix a race in function profile
  oprofile, x86: fix init_sysfs error handling
  perf_events: Fix time tracking for events with pid != -1 and cpu != -1
  perf: Initialize callchains roots's childen hits
  oprofile: fix crash when accessing freed task structs

899edae6

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · c8c727db

由 Linus Torvalds 提交于 9月 08, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix lock annotations
  fuse: flush background queue on connection close

c8c727db

08 9月, 2010 8 次提交

ARM: Ensure PTE modifications via dma_alloc_coherent are visible · 2be23c47

由 Russell King 提交于 9月 08, 2010

Dave Hylands reports:
| We've observed a problem with dma_alloc_writecombine when the system
| is under heavy load (heavy bus traffic).  We've managed to reduce the
| problem to the following snippet, which is run from a kthread in a
| continuous loop:
|
|   void *virtAddr;
|   dma_addr_t physAddr;
|   unsigned int numBytes = 256;
|
|   for (;;) {
|       virtAddr = dma_alloc_writecombine(NULL,
|             numBytes, &physAddr, GFP_KERNEL);
|       if (virtAddr == NULL) {
|          printk(KERN_ERR "Running out of memory\n");
|          break;
|       }
|
|       /* access DMA memory allocated */
|       tmp = virtAddr;
|       *tmp = 0x77;
|
|       /* free DMA memory */
|       dma_free_writecombine(NULL,
|             numBytes, virtAddr, physAddr);
|
|         ...sleep here...
|     }
|
| By itself, the code will run forever with no issues. However, as we
| increase our bus traffic (typically using DMA) then the *tmp = 0x77
| line will eventually cause a page fault. If we add a small delay (a
| few microseconds) before the *tmp = 0x77, then we don't see a page
| fault, even under heavy load.

A dsb() is required after modifying the PTE entries to ensure that they
will always be visible.  Add this dsb().
Reported-by: NDave Hylands <dhylands@gmail.com>
Tested-by: NDave Hylands <dhylands@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

2be23c47

semaphore: Add DEFINE_SEMAPHORE · febc88c5

由 Thomas Gleixner 提交于 9月 07, 2010

The full cleanup of init_MUTEX[_LOCKED] and DECLARE_MUTEX has not been
done. Some of the users are real semaphores and we should name them as
such instead of confusing everyone with "MUTEX".

Provide the infrastructure to get finally rid of init_MUTEX[_LOCKED]
and DECLARE_MUTEX.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
LKML-Reference: <20100907125054.795929962@linutronix.de>

febc88c5

ARM: 6359/1: ep93xx: move clock initialization earlier · a387f0f5

由 Mika Westerberg 提交于 9月 03, 2010

Commit 7cfe2494 ("ARM: AMBA: Add pclk support to AMBA bus
infrastructure") changed AMBA bus to handle the PCLK automatically.
However, in EP93xx clock initialization is arch_initcall which is done
later than AMBA device identification. This causes
amba_get_enable_pclk() to fail resulting device where UARTs are not
functional.

So change ep93xx_clock_init() to be postcore_initcall.
Signed-off-by: NMika Westerberg <mika.westerberg@iki.fi>
Acked-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

a387f0f5

Revert "[ARM] pxa: remove now unnecessary dma_needs_bounce()" · 0485e18b

由 Russell King 提交于 9月 05, 2010

This reverts commit 4fa5518c, which causes a compilation regression for
IXP4xx platforms.
Reported-by: NRichard Cochran <richardcochran@gmail.com>
Acked-by: NEric Miao <eric.y.miao@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

0485e18b

ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan · 97b8f4a9

由 Mark Fasheh 提交于 8月 13, 2010

ocfs2_create_inode_in_orphan() is used by reflink to create the newly
reflinked inode simultaneously in the orphan dir. This allows us to easily
handle partially-reflinked files during recovery cleanup.

We have a problem though - the orphan dir stringifies inode # to determine
a unique name under which the orphan entry dirent can be created. Since
ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir
before it can allocate the inode, we currently call into the orphan code:

       /*
        * We give the orphan dir the root blkno to fake an orphan name,
        * and allocate enough space for our insertion.
        */
       status = ocfs2_prepare_orphan_dir(osb, &orphan_dir,
                                         osb->root_blkno,
                                         orphan_name, &orphan_insert);

Using osb->root_blkno might work fine on unindexed directories, but the
orphan dir can have an index.  When it has that index, the above code fails
to allocate the proper index entry.  Later, when we try to remove the file
from the orphan dir (using the actual inode #), the reflink operation will
fail.

To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the
newly split out orphan and inode alloc code to figure out what the inode
block number will be (once allocated) and then prepare the orphan dir from
that data.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

97b8f4a9

ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions · dd43bcde

由 Mark Fasheh 提交于 8月 13, 2010

We do this because ocfs2_create_inode_in_orphan() wants to order locking of
the orphan dir with respect to locking of the inode allocator *before*
making any changes to the directory.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

dd43bcde

ocfs2: allow return of new inode block location before allocation of the inode · e49e2767

由 Mark Fasheh 提交于 8月 13, 2010

This allows code which needs to know the eventual block number of an inode
but can't allocate it yet due to transaction or lock ordering. For example,
ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation
of the orphan dir because it can't yet know where the actual inode is placed
- that code is actually in ocfs2_mknod_locked. This is a problem when the
orphan dirs are indexed as the junk inode number will create an index entry
which goes unused (and fails the later removal from the orphan dir).  Now
with these interfaces, ocfs2_create_inode_in_orphan() can run the block
group search (and get back the inode block number) *before* any actual
allocation occurs.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

e49e2767

ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open coding · d5134982

由 Mark Fasheh 提交于 8月 13, 2010

ocfs2_search_chain() makes the same updates as
ocfs2_alloc_dinode_update_counts to the alloc inode. Instead of open coding
the bitmap update, use our helper function.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

d5134982

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功