提交 · c628937803c652132d21f383736375e2feee4bfb · openeuler / raspberrypi-kernel

13 12月, 2012 1 次提交

pstore/ram: Fix bounds checks for mem_size, record_size, console_size and ftrace_size · c6289378

由 Arve Hjønnevåg 提交于 12月 11, 2012

The bounds check in ramoops_init_prz was incorrect and ramoops_init_przs
had no check. Additionally, ramoops_init_przs allows record_size to be 0,
but ramoops_pstore_write_buf would always crash in this case.
Signed-off-by: NArve Hjønnevåg <arve@android.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>

c6289378

18 11月, 2012 1 次提交

pstore/ram: Fix undefined usage of rounddown_pow_of_two(0) · b042e474

由 Maxime Bizon 提交于 10月 22, 2012

record_size / console_size / ftrace_size can be 0 (this is how you disable
the feature), but rounddown_pow_of_two(0) is undefined. As suggested by
Kees Cook, use !is_power_of_2() as a condition to call
rounddown_pow_of_two and avoid its undefined behavior on the value 0. This
issue has been present since commit 1894a253 (ramoops: Move to
fs/pstore/ram.c).

Cc: stable@vger.kernel.org
Signed-off-by: NMaxime Bizon <mbizon@freebox.fr>
Signed-off-by: NFlorian Fainelli <ffainelli@freebox.fr>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>

b042e474

17 11月, 2012 1 次提交

pstore/ram: Fixup section annotations · 53f21a8e

由 Hannes Reinecke 提交于 10月 17, 2012

The compiler complained about missing section annotations.
Fix it.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>

53f21a8e

15 11月, 2012 1 次提交

pstore: Fix NULL pointer dereference in console writes · 70a6f46d

由 Colin Ian King 提交于 11月 14, 2012

Passing a NULL id causes a NULL pointer deference in writers such as
erst_writer and efi_pstore_write because they expect to update this id.
Pass a dummy id instead.

This avoids a cascade of oopses caused when the initial
pstore_console_write passes a null which in turn causes writes to the
console causing further oopses in subsequent pstore_console_write calls.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>

70a6f46d

09 11月, 2012 12 次提交

fanotify: fix missing break · 848561d3

由 Eric Paris 提交于 11月 08, 2012

Anders Blomdell noted in 2010 that Fanotify lost events and provided a
test case.  Eric Paris confirmed it was a bug and posted a fix to the
list

  https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE

but never applied it.  Repeated attempts over time to actually get him
to apply it have never had a reply from anyone who has raised it

So apply it anyway
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Reported-by: NAnders Blomdell <anders.blomdell@control.lth.se>
Cc: Eric Paris <eparis@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

848561d3

revert "epoll: support for disabling items, and a self-test app" · a80a6b85

由 Andrew Morton 提交于 11月 08, 2012

Revert commit 03a7beb5 ("epoll: support for disabling items, and a
self-test app") pending resolution of the issues identified by Michael
Kerrisk, copied below.

We'll revisit this for 3.8.

: I've taken a look at this patch as it currently stands in 3.7-rc1, and
: done a bit of testing. (By the way, the test program
: tools/testing/selftests/epoll/test_epoll.c does not compile...)
:
: There are one or two places where the behavior seems a little strange,
: so I have a question or two at the end of this mail. But other than
: that, I want to check my understanding so that the interface can be
: correctly documented.
:
: Just to go though my understanding, the problem is the following
: scenario in a multithreaded application:
:
: 1. Multiple threads are performing epoll_wait() operations,
:    and maintaining a user-space cache that contains information
:    corresponding to each file descriptor being monitored by
:    epoll_wait().
:
: 2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
:    a file descriptor from the epoll interest list, and
:    delete the corresponding record from the user-space cache.
:
: 3. The problem with (2) is that some other thread may have
:    previously done an epoll_wait() that retrieved information
:    about the fd in question, and may be in the middle of using
:    information in the cache that relates to that fd. Thus,
:    there is a potential race.
:
: 4. The race can't solved purely in user space, because doing
:    so would require applying a mutex across the epoll_wait()
:    call, which would of course blow thread concurrency.
:
: Right?
:
: Your solution is the EPOLL_CTL_DISABLE operation. I want to
: confirm my understanding about how to use this flag, since
: the description that has accompanied the patches so far
: has been a bit sparse
:
: 0. In the scenario you're concerned about, deleting a file
:    descriptor means (safely) doing the following:
:    (a) Deleting the file descriptor from the epoll interest list
:        using EPOLL_CTL_DEL
:    (b) Deleting the corresponding record in the user-space cache
:
: 1. It's only meaningful to use this EPOLL_CTL_DISABLE in
:    conjunction with EPOLLONESHOT.
:
: 2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
:    conjunction is a logical error.
:
: 3. The correct way to code multithreaded applications using
:    EPOLL_CTL_DISABLE and EPOLLONESHOT is as follows:
:
:    a. All EPOLL_CTL_ADD and EPOLL_CTL_MOD operations should
:       should EPOLLONESHOT.
:
:    b. When a thread wants to delete a file descriptor, it
:       should do the following:
:
:       [1] Call epoll_ctl(EPOLL_CTL_DISABLE)
:       [2] If the return status from epoll_ctl(EPOLL_CTL_DISABLE)
:           was zero, then the file descriptor can be safely
:           deleted by the thread that made this call.
:       [3] If the epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY,
:           then the descriptor is in use. In this case, the calling
:           thread should set a flag in the user-space cache to
:           indicate that the thread that is using the descriptor
:           should perform the deletion operation.
:
: Is all of the above correct?
:
: The implementation depends on checking on whether
: (events & ~EP_PRIVATE_BITS) == 0
: This replies on the fact that EPOLL_CTL_AD and EPOLL_CTL_MOD always
: set EPOLLHUP and EPOLLERR in the 'events' mask, and EPOLLONESHOT
: causes those flags (as well as all others in ~EP_PRIVATE_BITS) to be
: cleared.
:
: A corollary to the previous paragraph is that using EPOLL_CTL_DISABLE
: is only useful in conjunction with EPOLLONESHOT. However, as things
: stand, one can use EPOLL_CTL_DISABLE on a file descriptor that does
: not have EPOLLONESHOT set in 'events' This results in the following
: (slightly surprising) behavior:
:
: (a) The first call to epoll_ctl(EPOLL_CTL_DISABLE) returns 0
:     (the indicator that the file descriptor can be safely deleted).
: (b) The next call to epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY.
:
: This doesn't seem particularly useful, and in fact is probably an
: indication that the user made a logic error: they should only be using
: epoll_ctl(EPOLL_CTL_DISABLE) on a file descriptor for which
: EPOLLONESHOT was set in 'events'. If that is correct, then would it
: not make sense to return an error to user space for this case?

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paton J. Lewis" <palewis@adobe.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a80a6b85

xfs: fix reading of wrapped log data · 6ce377af

由 Dave Chinner 提交于 11月 02, 2012

Commit 44396476 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.

Cc: <stable@vger.kernel.org> # 3.0.x, 3.2.x, 3.4.x 3.6.x
Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

6ce377af

xfs: fix buffer shudown reference count mismatch · 03b1293e

由 Dave Chinner 提交于 11月 02, 2012

When we shut down the filesystem, we have to unpin and free all the
buffers currently active in the CIL. To do this we unpin and remove
them in one operation as a result of a failed iclogbuf write. For
buffers, we do this removal via a simultated IO completion of after
marking the buffer stale.

At the time we do this, we have two references to the buffer - the
active LRU reference and the buf log item.  The LRU reference is
removed by marking the buffer stale, and the active CIL reference is
by the xfs_buf_iodone() callback that is run by
xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
callback).

However, ioend processing requires one more reference - that of the
IO that it is completing. We don't have this reference, so we free
the buffer prematurely and use it after it is freed. For buffers
marked with XBF_ASYNC, this leads to assert failures in
xfs_buf_rele() on debug kernels because the b_hold count is zero.

Fix this by making sure we take the necessary IO reference before
starting IO completion processing on the stale buffer, and set the
XBF_ASYNC flag to ensure that IO completion processing removes all
the active references from the buffer to ensure it is fully torn
down.

Cc: <stable@vger.kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

03b1293e

xfs: don't vmap inode cluster buffers during free · 4b62acfe

由 Dave Chinner 提交于 11月 02, 2012

Inode buffers do not need to be mapped as inodes are read or written
directly from/to the pages underlying the buffer. This fixes a
regression introduced by commit 611c9946 ("xfs: make XBF_MAPPED the
default behaviour").
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

4b62acfe

xfs: invalidate allocbt blocks moved to the free list · ca250b1b

由 Dave Chinner 提交于 11月 02, 2012

When we free a block from the alloc btree tree, we move it to the
freelist held in the AGFL and mark it busy in the busy extent tree.
This typically happens when we merge btree blocks.

Once the transaction is committed and checkpointed, the block can
remain on the free list for an indefinite amount of time.  Now, this
isn't the end of the world at this point - if the free list is
shortened, the buffer is invalidated in the transaction that moves
it back to free space. If the buffer is allocated as metadata from
the free list, then all the modifications getted logged, and we have
no issues, either. And if it gets allocated as userdata direct from
the freelist, it gets invalidated and so will never get written.

However, during the time it sits on the free list, pressure on the
log can cause the AIL to be pushed and the buffer that covers the
block gets pushed for write. IOWs, we end up writing a freed
metadata block to disk. Again, this isn't the end of the world
because we know from the above we are only writing to free space.

The problem, however, is for validation callbacks. If the block was
on old btree root block, then the level of the block is going to be
higher than the current tree root, and so will fail validation.
There may be other inconsistencies in the block as well, and
currently we don't care because the block is in free space. Shutting
down the filesystem because a freed block doesn't pass write
validation, OTOH, is rather unfriendly.

So, make sure we always invalidate buffers as they move from the
free space trees to the free list so that we guarantee they never
get written to disk while on the free list.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NPhil White <pwhite@sgi.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

ca250b1b

xfs: silence uninitialised f.file warning. · 1e7acbb7

由 Dave Chinner 提交于 10月 25, 2012

Uninitialised variable build warning introduced by 2903ff01 ("switch
simple cases of fget_light to fdget"), gcc is not smart enough to
work out that the variable is not used uninitialised, and the commit
removed the initialisation at declaration that the old variable had.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

1e7acbb7

xfs: growfs: don't read garbage for new secondary superblocks · eaef8543

由 Dave Chinner 提交于 10月 09, 2012

When updating new secondary superblocks in a growfs operation, the
superblock buffer is read from the newly grown region of the
underlying device. This is not guaranteed to be zero, so violates
the underlying assumption that the unused parts of superblocks are
zero filled. Get a new buffer for these secondary superblocks to
ensure that the unused regions are zero filled correctly.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

eaef8543

xfs: move allocation stack switch up to xfs_bmapi_allocate · 1f3c785c

由 Dave Chinner 提交于 10月 05, 2012

Switching stacks are xfs_alloc_vextent can cause deadlocks when we
run out of worker threads on the allocation workqueue. This can
occur because xfs_bmap_btalloc can make multiple calls to
xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
return with the AGF locked in the current allocation transaction.

If we then need to make another allocation, and all the allocation
worker contexts are exhausted because the are blocked waiting for
the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
work completed to release the AGF.  Hence allocation effectively
deadlocks.

To avoid this, move the stack switch one layer up to
xfs_bmapi_allocate() so that all of the allocation attempts in a
single switched stack transaction occur in a single worker context.
This avoids the problem of an allocation being blocked waiting for
a worker thread whilst holding the AGF.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

1f3c785c

xfs: introduce XFS_BMAPI_STACK_SWITCH · 326c0355

由 Dave Chinner 提交于 10月 05, 2012

Certain allocation paths through xfs_bmapi_write() are in situations
where we have limited stack available. These are almost always in
the buffered IO writeback path when convertion delayed allocation
extents to real extents.

The current stack switch occurs for userdata allocations, which
means we also do stack switches for preallocation, direct IO and
unwritten extent conversion, even those these call chains have never
been implicated in a stack overrun.

Hence, let's target just the single stack overun offended for stack
switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
the caller can pass xfs_bmapi_write() to indicate it should switch
stacks if it needs to do allocation.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

326c0355

xfs: zero allocation_args on the kernel stack · 408cc4e9

由 Mark Tinguely 提交于 9月 20, 2012

Zero the kernel stack space that makes up the xfs_alloc_arg structures.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

408cc4e9

xfs: only update the last_sync_lsn when a transaction completes · 7e9620f2

由 Dave Chinner 提交于 10月 08, 2012

The log write code stamps each iclog with the current tail LSN in
the iclog header so that recovery knows where to find the tail of
thelog once it has found the head. Normally this is taken from the
first item on the AIL - the log item that corresponds to the oldest
active item in the log.

The problem is that when the AIL is empty, the tail lsn is dervied
from the the l_last_sync_lsn, which is the LSN of the last iclog to
be written to the log. In most cases this doesn't happen, because
the AIL is rarely empty on an active filesystem. However, when it
does, it opens up an interesting case when the transaction being
committed to the iclog spans multiple iclogs.

That is, the first iclog is stamped with the l_last_sync_lsn, and IO
is issued. Then the next iclog is setup, the changes copied into the
iclog (takes some time), and then the l_last_sync_lsn is stamped
into the header and IO is issued. This is still the same
transaction, so the tail lsn of both iclogs must be the same for log
recovery to find the entire transaction to be able to replay it.

The problem arises in that the iclog buffer IO completion updates
the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
completes it's IO before the second iclog is filled and has the tail
lsn stamped in it, it will stamp the LSN of the first iclog into
it's tail lsn field. If the system fails at this point, log recovery
will not see a complete transaction, so the transaction will no be
replayed.

The fix is simple - the l_last_sync_lsn is updated when a iclog
buffer IO completes, and this is incorrect. The l_last_sync_lsn
shoul dbe updated when a transaction is completed by a iclog buffer
IO. That is, only iclog buffers that have transaction commit
callbacks attached to them should update the l_last_sync_lsn. This
means that the last_sync_lsn will only move forward when a commit
record it written, not in the middle of a large transaction that is
rolling through multiple iclog buffers.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

7e9620f2

07 11月, 2012 7 次提交

GFS2: Test bufdata with buffer locked and gfs2_log_lock held · 96e5d1d3

由 Benjamin Marzinski 提交于 11月 07, 2012

In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

96e5d1d3

GFS2: Don't call file_accessed() with a shared glock · 3d162688

由 Benjamin Marzinski 提交于 11月 06, 2012

file_accessed() was being called by gfs2_mmap() with a shared glock. If it
needed to update the atime, it was crashing because it dirtied the inode in
gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
checked if the caller was already holding a glock, but it didn't make sure that
the glock was in the exclusive state. Now, instead of calling file_accessed()
while holding the shared lock in gfs2_mmap(), file_accessed() is called after
grabbing and releasing the glock to update the inode. If file_accessed() needs
to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().

gfs2_dirty_inode() now also checks to make sure that if the calling process has
already locked the glock, it has an exclusive lock.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3d162688

GFS2: Fix FITRIM argument handling · 076f0faa

由 Lukas Czerner 提交于 10月 16, 2012

Currently implementation in gfs2 uses FITRIM arguments as it were in
file system blocks units which is wrong. The FITRIM arguments
(fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are
actually in bytes.

Moreover, check for start argument beyond the end of file system, len
argument being smaller than file system block and minlen argument being
bigger than biggest resource group were missing.

This commit converts the code to convert FITRIM argument to file system
blocks and also adds appropriate checks mentioned above.

All the problems were recognised by xfstests 251 and 260.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

076f0faa

GFS2: Require user to provide argument for FITRIM · 3a238ade

由 Lukas Czerner 提交于 10月 16, 2012

When the fstrim_range argument is not provided by user in FITRIM ioctl
we should just return EFAULT and not promoting bad behaviour by filling
the structure in kernel. Let the user deal with it.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3a238ade

GFS2: Clean up some unused assignments · 73738a77

由 Andrew Price 提交于 10月 12, 2012

Cleans up two cases where variables were assigned values but then never
used again.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

73738a77

GFS2: Fix possible null pointer deref in gfs2_rs_alloc · cd0ed19f

由 Andrew Price 提交于 10月 12, 2012

Despite the return value from kmem_cache_zalloc() being checked, the
error wasn't being returned until after a possible null pointer
dereference. This patch returns the error immediately, allowing the
removal of the error variable.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

cd0ed19f

GFS2: Fix an unchecked error from gfs2_rs_alloc · aaaf68c5

由 Andrew Price 提交于 10月 12, 2012

Check the return value of gfs2_rs_alloc(ip) and avoid a possible null
pointer dereference.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

aaaf68c5

05 11月, 2012 1 次提交

cifs: Do not lookup hashed negative dentry in cifs_atomic_open · 3798f47a

由 Sachin Prabhu 提交于 11月 05, 2012

We do not need to lookup a hashed negative directory since we have
already revalidated it before and have found it to be fine.

This also prevents a crash in cifs_lookup() when it attempts to rehash
the already hashed negative lookup dentry.

The patch has been tested using the reproducer at
https://bugzilla.redhat.com/show_bug.cgi?id=867344#c28

Cc: <stable@kernel.org> # 3.6.x
Reported-by: NVit Zahradka <vit.zahradka@tiscali.cz>
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>

3798f47a

03 11月, 2012 2 次提交

cifs: fix potential buffer overrun in cifs.idmap handling code · 36960e44

由 Jeff Layton 提交于 11月 03, 2012

The userspace cifs.idmap program generally works with the wbclient libs
to generate binary SIDs in userspace. That program defines the struct
that holds these values as having a max of 15 subauthorities. The kernel
idmapping code however limits that value to 5.

When the kernel copies those values around though, it doesn't sanity
check the num_subauths value handed back from userspace or from the
server. It's possible therefore for userspace to hand us back a bogus
num_subauths value (or one that's valid, but greater than 5) that could
cause the kernel to walk off the end of the cifs_sid->sub_auths array.

Fix this by defining a new routine for copying sids and using that in
all of the places that copy it. If we end up with a sid that's longer
than expected then this approach will just lop off the "extra" subauths,
but that's basically what the code does today already. Better approaches
might be to fix this code to reject SIDs with >5 subauths, or fix it
to handle the subauths array dynamically.

At the same time, change the kernel to check the length of the data
returned by userspace. If it's shorter than struct cifs_sid, reject it
and return -EIO. If that happens we'll end up with fields that are
basically uninitialized.

Long term, it might make sense to redefine cifs_sid using a flexarray at
the end, to allow for variable-length subauth lists, and teach the code
to handle the case where the subauths array being passed in from
userspace is shorter than 5 elements.

Note too, that I don't consider this a security issue since you'd need
a compromised cifs.idmap program. If you have that, you can do all sorts
of nefarious stuff. Still, this is probably reasonable for stable.

Cc: stable@kernel.org
Reviewed-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

36960e44

NFS4: nfs4_opendata_access should return errno · 998f40b5

由 Weston Andros Adamson 提交于 11月 02, 2012

Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES.
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

998f40b5

02 11月, 2012 1 次提交
- T
  NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly · f9b1ef5f
  由 Trond Myklebust 提交于 10月 29, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
  f9b1ef5f
01 11月, 2012 8 次提交

NFS: add nfs_sb_deactive_async to avoid deadlock · 324d003b

由 Weston Andros Adamson 提交于 10月 30, 2012

Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue
context.  This avoids a deadlock where rpc_shutdown_client loops forever
in a workqueue kworker context, trying to kill all RPC tasks associated with
the client, while one or more of these tasks have already been assigned to the
same kworker (and will never run rpc_exit_task).

This approach is needed because RPC tasks that have already been assigned
to a kworker by queue_work cannot be canceled, as explained in the comment
for workqueue.c:insert_wq_barrier.
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
[Trond: add module_get/put.]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

324d003b

nfs: Show original device name verbatim in /proc/*/mount{s,info} · 97a54868

由 Ben Hutchings 提交于 10月 21, 2012

Since commit c7f404b4 ('vfs: new superblock methods to override
/proc/*/mount{s,info}'), nfs_path() is used to generate the mounted
device name reported back to userland.

nfs_path() always generates a trailing slash when the given dentry is
the root of an NFS mount, but userland may expect the original device
name to be returned verbatim (as it used to be).  Make this
canonicalisation optional and change the callers accordingly.

[jrnieder@gmail.com: use flag instead of bool argument]
Reported-and-tested-by: NChris Hiestand <chiestand@salk.edu>
Reference: http://bugs.debian.org/669314Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Cc: <stable@vger.kernel.org> # v2.6.39+
Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

97a54868

nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts · acce94e6

由 Scott Mayhew 提交于 10月 16, 2012

In very busy v3 environment, rpc.mountd can respond to the NULL
procedure but not the MNT procedure in a timely manner causing
the MNT procedure to time out. The problem is the mount system
call returns EIO which causes the mount to fail, instead of
ETIMEDOUT, which would cause the mount to be retried.

This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to
the rpc_call_sync() call in nfs_mount() which causes
ETIMEDOUT to be returned on timed out connections.
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

acce94e6

nfs: Check whether a layout pointer is NULL before free it · 7175fe90

由 Yanchuan Nian 提交于 10月 31, 2012

The new layout pointer in pnfs_find_alloc_layout() may be NULL because of
out of memory. we must do some check work, otherwise pnfs_free_layout_hdr()
will go wrong because it can not deal with a NULL pointer.
Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7175fe90

NFS: fix bug in legacy DNS resolver. · 8d96b106

由 NeilBrown 提交于 10月 31, 2012

The DNS resolver's use of the sunrpc cache involves a 'ttl' number
(relative) rather that a timeout (absolute).  This confused me when
I wrote
  commit c5b29f88
     "sunrpc: use seconds since boot in expiry cache"

and I managed to break it.  The effect is that any TTL is interpreted
as 0, and nothing useful gets into the cache.

This patch removes the use of get_expiry() - which really expects an
expiry time - and uses get_uint() instead, treating the int correctly
as a ttl.

This fixes a regression that has been present since 2.6.37, causing
certain NFS accesses in certain environments to incorrectly fail.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8d96b106

NFSv4: nfs4_locku_done must release the sequence id · 2b1bc308

由 Trond Myklebust 提交于 10月 29, 2012

If the state recovery machinery is triggered by the call to
nfs4_async_handle_error() then we can deadlock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

2b1bc308

NFSv4.1: We must release the sequence id when we fail to get a session slot · 2240a9e2

由 Trond Myklebust 提交于 10月 29, 2012

If we do not release the sequence id in cases where we fail to get a
session slot, then we can deadlock if we hit a recovery scenario.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

2240a9e2

NFS: Wait for session recovery to finish before returning · 399f11c3

由 Bryan Schumaker 提交于 10月 30, 2012

Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception.  This works for most cases, but causes
a hang on the following test case:

	Client				Server
	------				------
	Open file over NFS v4.1
	Write to file
					Expire client
	Try to lock file

The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery.  However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled.  The
simplest solution is to wait for session recovery to run before retrying
the lock.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

399f11c3

31 10月, 2012 1 次提交

Return the right error value when dup[23]() newfd argument is too large · 08f05c49

由 Al Viro 提交于 10月 31, 2012

Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE
case changed incorrectly after 3.6.

The culprit is commit f33ff992 ("take rlimit check to callers of
expand_files()") which when it moved the "return -EMFILE" out to the
caller, didn't notice that the dup3() had special code to turn the
EMFILE return into EBADF.

The replace_fd() helper that got added later then inherited the bug too.
Reported-by: NJack Lin <linliangjie@huawei.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
[ Noted more bugs, wrote proper changelog, fixed up typos - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08f05c49

29 10月, 2012 3 次提交

ceph: fix dentry reference leak in encode_fh() · 52eb5a90

由 David Zafman 提交于 10月 18, 2012

Call to d_find_alias() needs a corresponding dput()

This fixes http://tracker.newdream.net/issues/3271Signed-off-by: NDavid Zafman <david.zafman@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

52eb5a90

ext4: fix unjournaled inode bitmap modification · ffb5387e

由 Eric Sandeen 提交于 10月 28, 2012

commit 119c0d44 changed
ext4_new_inode() such that the inode bitmap was being modified
outside a transaction, which could lead to corruption, and was
discovered when journal_checksum found a bad checksum in the
journal during log replay.

Nix ran into this when using the journal_async_commit mount
option, which enables journal checksumming.  The ensuing
journal replay failures due to the bad checksums led to
filesystem corruption reported as the now infamous
"Apparent serious progressive ext4 data corruption bug"

[ Changed by tytso to only call ext4_journal_get_write_access() only
  when we're fairly certain that we're going to allocate the inode. ]

I've tested this by mounting with journal_checksum and
running fsstress then dropping power; I've also tested by
hacking DM to create snapshots w/o first quiescing, which
allows me to test journal replay repeatedly w/o actually
power-cycling the box.  Without the patch I hit a journal
checksum error every time.  With this fix it survives
many iterations.
Reported-by: NNix <nix@esperi.org.uk>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

ffb5387e

Lock splice_read and splice_write functions · 1a25b1c4

由 Mikulas Patocka 提交于 10月 15, 2012

Functions generic_file_splice_read and generic_file_splice_write access
the pagecache directly. For block devices these functions must be locked
so that block size is not changed while they are in progress.

This patch is an additional fix for commit b87570f5 ("Fix a crash
when block device is read and block size is changed at the same time")
that locked aio_read, aio_write and mmap against block size change.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a25b1c4

27 10月, 2012 1 次提交

VFS: don't do protected {sym,hard}links by default · 561ec64a

由 Linus Torvalds 提交于 10月 26, 2012

In commit 800179c9 ("This adds symlink and hardlink restrictions to
the Linux VFS"), the new link protections were enabled by default, in
the hope that no actual application would care, despite it being
technically against legacy UNIX (and documented POSIX) behavior.

However, it does turn out to break some applications.  It's rare, and
it's unfortunate, but it's unacceptable to break existing systems, so
we'll have to default to legacy behavior.

In particular, it has broken the way AFD distributes files, see

  http://www.dwd.de/AFD/

along with some legacy scripts.

Distributions can end up setting this at initrd time or in system
scripts: if you have security problems due to link attacks during your
early boot sequence, you have bigger problems than some kernel sysctl
setting. Do:

	echo 1 > /proc/sys/fs/protected_symlinks
	echo 1 > /proc/sys/fs/protected_hardlinks

to re-enable the link protections.

Alternatively, we may at some point introduce a kernel config option
that sets these kinds of "more secure but not traditional" behavioural
options automatically.
Reported-by: NNick Bowler <nbowler@elliptictech.com>
Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # v3.6
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

561ec64a