提交 · bd9b51e79cb0b8bc00a7e0076a4a8963ca4a797c · openeuler / Kernel

11 12月, 2014 5 次提交

make default ->i_fop have ->open() fail with ENXIO · bd9b51e7

由 Al Viro 提交于 11月 18, 2014

As it is, default ->i_fop has NULL ->open() (along with all other methods).
The only case where it matters is reopening (via procfs symlink) a file that
didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned
to something sane (default would fail on read/write/ioctl/etc.).

	Unfortunately, such case exists - alloc_file() users, especially
anon_get_file() ones.  There we have tons of opened files of very different
kinds sharing the same inode.  As the result, attempt to reopen those via
procfs succeeds and you get a descriptor you can't do anything with.

	Moreover, in case of sockets we set ->i_fop that will only be used
on such reopen attempts - and put a failing ->open() into it to make sure
those do not succeed.

	It would be simpler to put such ->open() into default ->i_fop and leave
it unchanged both for anon inode (as we do anyway) and for socket ones.  Result:
	* everything going through do_dentry_open() works as it used to
	* sock_no_open() kludge is gone
	* attempts to reopen anon-inode files fail as they really ought to
	* ditto for aio_private_file()
	* ditto for perfmon - this one actually tried to imitate sock_no_open()
trick, but failed to set ->i_fop, so in the current tree reopens succeed and
yield completely useless descriptor.  Intent clearly had been to fail with
-ENXIO on such reopens; now it actually does.
	* everything else that used alloc_file() keeps working - it has ->i_fop
set for its inodes anyway
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bd9b51e7

A
make nameidata completely opaque outside of fs/namei.c · 1f55a6ec
由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1f55a6ec
A

Merge branch 'nsfs' into for-next · 707c5960
由 Al Viro 提交于 12月 10, 2014

707c5960

kill proc_ns completely · 3d3d35b1

由 Al Viro 提交于 11月 01, 2014

procfs inodes need only the ns_ops part; nsfs inodes don't need it at all
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3d3d35b1

take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b

由 Al Viro 提交于 11月 01, 2014

New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot. The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.

As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e149ed2b

09 12月, 2014 5 次提交
- A
  
  Merge branch 'iov_iter' into for-next · ba00410b
  由 Al Viro 提交于 12月 08, 2014
  
  ba00410b
- A
  copy_from_iter_nocache() · aa583096
  由 Al Viro 提交于 11月 27, 2014
```
BTW, do we want memcpy_nocache()?
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  aa583096
- A
  new helper: iov_iter_kvec() · abb78f87
  由 Al Viro 提交于 11月 24, 2014
```
initialization of kvec-backed iov_iter
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  abb78f87
- A
  csum_and_copy_..._iter() · a604ec7e
  由 Al Viro 提交于 11月 24, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  a604ec7e
- A
  iov_iter.c: handle ITER_KVEC directly · a280455f
  由 Al Viro 提交于 11月 27, 2014
```
... without bothering with copy_..._user()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  a280455f
05 12月, 2014 8 次提交

bury struct proc_ns in fs/proc · f77c8014

由 Al Viro 提交于 11月 01, 2014

a) make get_proc_ns() return a pointer to struct ns_common
b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that
is_mnt_ns_file() could get away with fewer dereferences.

That way struct proc_ns becomes invisible outside of fs/proc/*.c
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f77c8014

A
copy address of proc_ns_ops into ns_common · 33c42940
由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
33c42940

new helpers: ns_alloc_inum/ns_free_inum · 6344c433

由 Al Viro 提交于 11月 01, 2014

take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6344c433

make proc_ns_operations work with struct ns_common * instead of void * · 64964528

由 Al Viro 提交于 11月 01, 2014

We can do that now.  And kill ->inum(), while we are at it - all instances
are identical.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

64964528

A
switch the rest of proc_ns_operations to working with &...->ns · 3c041184
由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3c041184
A
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns · ff24870f
由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ff24870f
A
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns · 58be2825
由 Al Viro 提交于 11月 01, 2014
```
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
58be2825

common object embedded into various struct ....ns · 435d5f4b

由 Al Viro 提交于 10月 31, 2014

for now - just move corresponding ->proc_inum instances over there
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

435d5f4b

28 11月, 2014 9 次提交

A
iov_iter.c: convert copy_to_iter() to iterate_and_advance · 3d4d3e48
由 Al Viro 提交于 11月 27, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3d4d3e48
A
iov_iter.c: convert copy_from_iter() to iterate_and_advance · 0dbca9a4
由 Al Viro 提交于 11月 27, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0dbca9a4

iov_iter.c: get rid of bvec_copy_page_{to,from}_iter() · d271524a

由 Al Viro 提交于 11月 27, 2014

Just have copy_page_{to,from}_iter() fall back to kmap_atomic +
copy_{to,from}_iter() + kunmap_atomic() in ITER_BVEC case.  As
the matter of fact, that's what we want to do for any iov_iter
kind that isn't blocking - e.g. ITER_KVEC will also go that way
once we recognize it on iov_iter.c primitives level
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d271524a

A
iov_iter.c: convert iov_iter_zero() to iterate_and_advance · 8442fa46
由 Al Viro 提交于 11月 27, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8442fa46
A
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds · 1b17f1f2
由 Al Viro 提交于 11月 27, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1b17f1f2
A
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds · e5393fae
由 Al Viro 提交于 11月 27, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e5393fae
A
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds · e0f2dc40
由 Al Viro 提交于 11月 27, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e0f2dc40

iov_iter.c: iterate_and_advance · 7ce2a91e

由 Al Viro 提交于 11月 27, 2014

same as iterate_all_kinds, but iterator is moved to the position past
the last byte we'd handled.

iov_iter_advance() converted to it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ce2a91e

iov_iter.c: macros for iterating over iov_iter · 04a31165

由 Al Viro 提交于 11月 27, 2014

iterate_all_kinds(iter, size, ident, step_iovec, step_bvec)
iterates through the ranges covered by iter (up to size bytes total),
repeating step_iovec or step_bvec for each of those.  ident is
declared in expansion of that thing, either as struct iovec or
struct bvec, and it contains the range we are currently looking
at.  step_bvec should be a void expression, step_iovec - a size_t
one, with non-zero meaning "stop here, that many bytes from this
range left".  In the end, the amount actually handled is stored
in size.

iov_iter_copy_from_user_atomic() and iov_iter_alignment() converted
to it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

04a31165

24 11月, 2014 11 次提交

L

Linux 3.18-rc6 · 5d01410f
由 Linus Torvalds 提交于 11月 23, 2014

5d01410f

uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME · 82975bc6

由 Andy Lutomirski 提交于 11月 21, 2014

x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

82975bc6

sched: Provide update_curr callbacks for stop/idle scheduling classes · 90e362f4

由 Thomas Gleixner 提交于 11月 23, 2014

Chris bisected a NULL pointer deference in task_sched_runtime() to
commit 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency'.

Chris observed crashes in atop or other /proc walking programs when he
started fork bombs on his machine.  He assumed that this is a new exit
race, but that does not make any sense when looking at that commit.

What's interesting is that, the commit provides update_curr callbacks
for all scheduling classes except stop_task and idle_task.

While nothing can ever hit that via the clock_nanosleep() and
clock_gettime() interfaces, which have been the target of the commit in
question, the author obviously forgot that there are other code paths
which invoke task_sched_runtime()

do_task_stat(()
 thread_group_cputime_adjusted()
   thread_group_cputime()
     task_cputime()
       task_sched_runtime()
        if (task_current(rq, p) && task_on_rq_queued(p)) {
          update_rq_clock(rq);
          up->sched_class->update_curr(rq);
        }

If the stats are read for a stomp machine task, aka 'migration/N' and
that task is current on its cpu, this will happily call the NULL pointer
of stop_task->update_curr.  Ooops.

Chris observation that this happens faster when he runs the fork bomb
makes sense as the fork bomb will kick migration threads more often so
the probability to hit the issue will increase.

Add the missing update_curr callbacks to the scheduler classes stop_task
and idle_task.  While idle tasks cannot be monitored via /proc we have
other means to hit the idle case.

Fixes: 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
Reported-by: NChris Mason <clm@fb.com>
Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90e362f4

Merge branch 'x86-traps' (trap handling from Andy Lutomirski) · 00c89b2f

由 Linus Torvalds 提交于 11月 23, 2014

Merge x86-64 iret fixes from Andy Lutomirski:
 "This addresses the following issues:

   - an unrecoverable double-fault triggerable with modify_ldt.
   - invalid stack usage in espfix64 failed IRET recovery from IST
     context.
   - invalid stack usage in non-espfix64 failed IRET recovery from IST
     context.

  It also makes a good but IMO scary change: non-espfix64 failed IRET
  will now report the correct error.  Hopefully nothing depended on the
  old incorrect behavior, but maybe Wine will get confused in some
  obscure corner case"

* emailed patches from Andy Lutomirski <luto@amacapital.net>:
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C

00c89b2f

x86_64, traps: Rework bad_iret · b645af2d

由 Andy Lutomirski 提交于 11月 22, 2014

It's possible for iretq to userspace to fail.  This can happen because
of a bad CS, SS, or RIP.

Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace.  To make this work, there's
an extra fixup to fudge the gs base into a usable state.

This is suboptimal because it loses the original exception.  It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with.  For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.

This patch throws out bad_iret entirely.  As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C.  It's should be clearer and more correct.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b645af2d

x86_64, traps: Stop using IST for #SS · 6f442be2

由 Andy Lutomirski 提交于 11月 22, 2014

On a 32-bit kernel, this has no effect, since there are no IST stacks.

On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code.  The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.

This fixes a bug in which the espfix64 code mishandles a stack segment
violation.

This saves 4k of memory per CPU and a tiny bit of code.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f442be2

x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C · af726f21

由 Andy Lutomirski 提交于 11月 22, 2014

There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly.  Move it to C.

This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.

Fixes: 3891a04aSigned-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af726f21

Merge tag 'armsoc-for-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 27946315

由 Linus Torvalds 提交于 11月 23, 2014

Pull ARM SoC fixes from Olof Johansson:
 "A collection of fixes this week:

   - A set of clock fixes for shmobile platforms
   - A fix for tegra that moves serial port labels to be per board.
     We're choosing to merge this for 3.18 because the labels will start
     being parsed in 3.19, and without this change serial port numbers
     that used to be stable since the dawn of time will change numbers.
   - A few other DT tweaks for Tegra.
   - A fix for multi_v7_defconfig that makes it stop spewing cpufreq
     errors on Arndale (Exynos)"

* tag 'armsoc-for-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: multi_v7_defconfig: fix failure setting CPU voltage by enabling dependent I2C controller
  ARM: tegra: roth: Fix SD card VDD_IO regulator
  ARM: tegra: Remove eMMC vmmc property for roth/tn7
  ARM: dts: tegra: move serial aliases to per-board
  ARM: tegra: Add serial port labels to Tegra124 DT
  ARM: shmobile: kzm9g legacy: Set i2c clks_per_count to 2
  ARM: shmobile: r8a7740 dtsi: Correct IIC0 parent clock
  ARM: shmobile: r8a7790: Fix SD3CKCR address to device tree
  ARM: shmobile: r8a7740 legacy: Correct IIC0 parent clock
  ARM: shmobile: r8a7740 legacy: Add missing INTCA clock for irqpin module
  ARM: shmobile: r8a7790: Fix SD3CKCR address
  ARM: dts: sun6i: Re-parent ahb1_mux to pll6 as required by dma controller

27946315

Merge branch 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu · 9f2e0f63

由 Linus Torvalds 提交于 11月 23, 2014

Pull percpu fix from Tejun Heo:
 "This contains one patch to fix a race condition which can lead to
  percpu_ref using a percpu pointer which is corrupted with a set DEAD
  bit.  The bug was introduced while separating out the ATOMIC mode flag
  from the DEAD flag.  The fix is pretty straight forward.

  I just committed the patch to the percpu tree but am sending out the
  pull request early as I'll be on vacation for a week.  The patch
  should be fairly safe and while the latency will be higher I'll be
  checking emails"

* 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu-ref: fix DEAD flag contamination of percpu pointer

9f2e0f63

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · d038a63a

由 Linus Torvalds 提交于 11月 23, 2014

Pull btrfs deadlock fix from Chris Mason:
 "This has a fix for a long standing deadlock that we've been trying to
  nail down for a while.  It ended up being a bad interaction with the
  fair reader/writer locks and the order btrfs reacquires locks in the
  btree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix lockups from btrfs_clear_path_blocking

d038a63a

percpu-ref: fix DEAD flag contamination of percpu pointer · 4aab3b5b

由 Tejun Heo 提交于 11月 22, 2014

While decoupling ATOMIC and DEAD flags, f47ad457 ("percpu_ref:
decouple switching to percpu mode and reinit") updated
__ref_is_percpu() so that it only tests ATOMIC flag to determine
whether the ref is in percpu mode or not; however, while DEAD implies
ATOMIC, the two flags are set separately during percpu_ref_kill() and
if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
ATOMIC.  Because __ref_is_percpu() returns @ref->percpu_count_ptr
value verbatim as the percpu pointer after testing ATOMIC, the pointer
may now be contaminated with the DEAD flag.

This can be fixed by clearing the flag bits before returning the
pointer which was the fix proposed by Shaohua; however, as DEAD
implies ATOMIC, we can just test for both flags at once and avoid the
explicit masking.

Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
are clear before returning @ref->percpu_count_ptr as the percpu
pointer.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-and-Reviewed-by: NShaohua Li <shli@kernel.org>
Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
Fixes: f47ad457 ("percpu_ref: decouple switching to percpu mode and reinit")

4aab3b5b

23 11月, 2014 2 次提交

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cb954139

由 Linus Torvalds 提交于 11月 22, 2014

Pull timer fix from Thomas Gleixner:
 "A single bugfix for an init order problem in the sun4i subarch
  clockevents code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevent: sun4i: Fix race condition in the probe code

cb954139

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ecde0064

由 Linus Torvalds 提交于 11月 22, 2014

Pull vfs fixes from Al Viro:
 "Assorted fixes, most in overlayfs land"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ovl: ovl_dir_fsync() cleanup
  ovl: update MAINTAINERS
  ovl: pass dentry into ovl_dir_read_merged()
  ovl: use lockless_dereference() for upperdentry
  ovl: allow filenames with comma
  ovl: fix race in private xattr checks
  ovl: fix remove/copy-up race
  ovl: rename filesystem type to "overlay"
  isofs: avoid unused function warning
  vfs: fix reference leak in d_prune_aliases()

ecde0064

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功