提交 · 810bb172671aec17cf85cc748120cf73c17af372 · openeuler / Kernel

13 10月, 2014 1 次提交

take dname_external() into fs/dcache.c · 810bb172

由 Al Viro 提交于 10月 12, 2014

never used outside and it's too low-level for legitimate uses outside
of fs/dcache.c anyway
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

810bb172

09 10月, 2014 5 次提交

vfs: move getname() from callers to do_mount() · 5e6123f3

由 Seunghun Lee 提交于 9月 14, 2014

It would make more sense to pass char __user * instead of
char * in callers of do_mount() and do getname() inside do_mount().
Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NSeunghun Lee <waydi1@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5e6123f3

Add copy_to_iter(), copy_from_iter() and iov_iter_zero() · c35e0248

由 Matthew Wilcox 提交于 8月 01, 2014

For DAX, we want to be able to copy between iovecs and kernel addresses
that don't necessarily have a struct page.  This is a fairly simple
rearrangement for bvec iters to kmap the pages outside and pass them in,
but for user iovecs it gets more complicated because we might try various
different ways to kmap the memory.  Duplicating the existing logic works
out best in this case.

We need to be able to write zeroes to an iovec for reads from unwritten
ranges in a file.  This is performed by the new iov_iter_zero() function,
again patterned after the existing code that handles iovec iterators.

[AV: and export the buggers...]
Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c35e0248

A
constify file_inode() · 1fa97e8b
由 Al Viro 提交于 5月 07, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1fa97e8b

vfs: Make d_invalidate return void · 5542aa2f

由 Eric W. Biederman 提交于 2月 13, 2014

Now that d_invalidate can no longer fail, stop returning a useless
return code.  For the few callers that checked the return code update
remove the handling of d_invalidate failure.
Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5542aa2f

vfs: Merge check_submounts_and_drop and d_invalidate · 1ffe46d1

由 Eric W. Biederman 提交于 2月 13, 2014

Now that d_invalidate is the only caller of check_submounts_and_drop,
expand check_submounts_and_drop inline in d_invalidate.
Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ffe46d1

27 9月, 2014 1 次提交

fuse: honour max_read and max_write in direct_io mode · 2c80929c

由 Miklos Szeredi 提交于 9月 24, 2014

The third argument of fuse_get_user_pages() "nbytesp" refers to the number of
bytes a caller asked to pack into fuse request. This value may be lesser
than capacity of fuse request or iov_iter.  So fuse_get_user_pages() must
ensure that *nbytesp won't grow.

Now, when helper iov_iter_get_pages() performs all hard work of extracting
pages from iov_iter, it can be done by passing properly calculated
"maxsize" to the helper.

The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need
this capability, so pass LONG_MAX as the maxsize argument here.

Fixes: c9c37e2e ("fuse: switch to iov_iter_get_pages()")
Reported-by: NWerner Baumann <werner.baumann@onlinehome.de>
Tested-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c80929c

25 9月, 2014 4 次提交

i2c: move acpi code back into the core · 17f4a5c4

由 Wolfram Sang 提交于 9月 22, 2014

Commit 5d98e61d ("I2C/ACPI: Add i2c ACPI operation region support")
renamed the i2c-core module. This may cause regressions for
distributions, so put the ACPI code back into the core.
Reported-by: NJean Delvare <jdelvare@suse.de>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
Tested-by: NLan Tianyu <tianyu.lan@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

17f4a5c4

cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags · 2ad654bc

由 Zefan Li 提交于 9月 25, 2014

When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.

Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.

Here's the full report:
https://lkml.org/lkml/2014/9/19/230

To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.

v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes: 950592f7 ("cpusets: update tasks' page/slab spread flags in time")
Cc: <stable@vger.kernel.org> # 2.6.31+
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

2ad654bc

sched: add macros to define bitops for task atomic flags · e0e5070b

由 Zefan Li 提交于 9月 25, 2014

This will simplify code when we add new flags.

v3:
- Kees pointed out that no_new_privs should never be cleared, so we
shouldn't define task_clear_no_new_privs(). we define 3 macros instead
of a single one.

v2:
- updated scripts/tags.sh, suggested by Peter

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

e0e5070b

sched: fix confusing PFA_NO_NEW_PRIVS constant · a2b86f77

由 Zefan Li 提交于 9月 25, 2014

Commit 1d4457f9 ("sched: move no_new_privs into new atomic flags")
defined PFA_NO_NEW_PRIVS as hexadecimal value, but it is confusing
because it is used as bit number. Redefine it as decimal bit number.

Note this changes the bit position of PFA_NOW_NEW_PRIVS from 1 to 0.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NKees Cook <keescook@chromium.org>
[ lizf: slightly modified subject and changelog ]
Signed-off-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

a2b86f77

24 9月, 2014 2 次提交

blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe · 0a30288d

由 Tejun Heo 提交于 9月 23, 2014

blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take more than ten seconds
when scsi-mq is used.

This will be properly fixed by implementing a mechanism to keep
q->mq_usage_counter in atomic mode till genhd registration; however,
that involves rather big updates to percpu_ref which is difficult to
apply late in the devel cycle (v3.17-rc6 at the moment).  As a
stop-gap measure till the proper fix can be implemented in the next
cycle, this patch introduces __percpu_ref_kill_expedited() and makes
blk_mq_freeze_queue() use it.  This is heavy-handed but should work
for testing the experimental SCSI blk-mq implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NChristoph Hellwig <hch@infradead.org>
Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
Fixes: add703fd ("blk-mq: use percpu_ref for mq usage count")
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Tested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0a30288d

crypto: ccp - Check for CCP before registering crypto algs · c9f21cb6

由 Tom Lendacky 提交于 9月 05, 2014

If the ccp is built as a built-in module, then ccp-crypto (whether
built as a module or a built-in module) will be able to load and
it will register its crypto algorithms.  If the system does not have
a CCP this will result in -ENODEV being returned whenever a command
is attempted to be queued by the registered crypto algorithms.

Add an API, ccp_present(), that checks for the presence of a CCP
on the system.  The ccp-crypto module can use this to determine if it
should register it's crypto alogorithms.

Cc: stable@vger.kernel.org
Reported-by: NScot Doyle <lkml14@scotdoyle.com>
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Tested-by: NScot Doyle <lkml14@scotdoyle.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c9f21cb6

23 9月, 2014 1 次提交

net: sched: shrink struct qdisc_skb_cb to 28 bytes · 25711786

由 Eric Dumazet 提交于 9月 18, 2014

We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
or increasing skb->cb[] size.

Commit e0f31d84 ("flow_keys: Record IP layer protocol in
skb_flow_dissect()") broke IPoIB.

Only current offender is sch_choke, and this one do not need an
absolutely precise flow key.

If we store 17 bytes of flow key, its more than enough. (Its the actual
size of flow_keys if it was a packed structure, but we might add new
fields at the end of it later)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: e0f31d84 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25711786

22 9月, 2014 2 次提交

[media] vb2: fix VBI/poll regression · 58d75f4b

由 Hans Verkuil 提交于 9月 20, 2014

The recent conversion of saa7134 to vb2 unconvered a poll() bug that
broke the teletext applications alevt and mtt. These applications
expect that calling poll() without having called VIDIOC_STREAMON will
cause poll() to return POLLERR. That did not happen in vb2.

This patch fixes that behavior. It also fixes what should happen when
poll() is called when STREAMON is called but no buffers have been
queued. In that case poll() will also return POLLERR, but only for
capture queues since output queues will always return POLLOUT
anyway in that situation.

This brings the vb2 behavior in line with the old videobuf behavior.
Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

58d75f4b

[media] videobuf2-core.h: fix comment · 44e8e69d

由 Hans Verkuil 提交于 8月 04, 2014

The comment for start_streaming that tells the developer with which vb2 state
buffers should be returned to vb2 gave the wrong state. Very confusing.
Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

44e8e69d

21 9月, 2014 2 次提交

ACPI / hotplug: Generate online uevents for ACPI containers · 8ab17fc9

由 Rafael J. Wysocki 提交于 9月 21, 2014

Commit 46394fd0 (ACPI / hotplug: Move container-specific code out of
the core) removed the generation of "online" uevents for containers,
because "add" uevents are now generated for them automatically when
container system devices are registered. However, there are user
space tools that need to be notified when the container and all of
its children have been enumerated, which doesn't happen any more.

For this reason, add a mechanism allowing "online" uevents to be
generated for ACPI containers after enumerating the container along
with all of its children.

Fixes: 46394fd0 (ACPI / hotplug: Move container-specific code out of the core)
Reported-and-tested-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8ab17fc9

sched: Fix end_of_stack() and location of stack canary for architectures using CONFIG_STACK_GROWSUP · 6a40281a

由 Chuck Ebbert 提交于 9月 20, 2014

Aaron Tomlin recently posted patches [1] to enable checking the
stack canary on every task switch. Looking at the canary code, I
realized that every arch (except ia64, which adds some space for
register spill above the stack) shares a definition of
end_of_stack() that makes it the first long after the
threadinfo.

For stacks that grow down, this low address is correct because
the stack starts at the end of the thread area and grows toward
lower addresses. However, for stacks that grow up, toward higher
addresses, this is wrong. (The stack actually grows away from
the canary.) On these archs end_of_stack() should return the
address of the last long, at the highest possible address for the stack.

[1] http://lkml.org/lkml/2014/9/12/293Signed-off-by: NChuck Ebbert <cebbert.lkml@gmail.com>
Link: http://lkml.kernel.org/r/20140920101751.6c5166b6@asSigned-off-by: NIngo Molnar <mingo@kernel.org>
Tested-by: James Hogan <james.hogan@imgtec.com> [metag]
Acked-by: NJames Hogan <james.hogan@imgtec.com>
Acked-by: NAaron Tomlin <atomlin@redhat.com>

6a40281a

20 9月, 2014 2 次提交

genetlink: add function genl_has_listeners() · 0d566379

由 Nicolas Dichtel 提交于 9月 18, 2014

This function is the counterpart of the function netlink_has_listeners().
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d566379

IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get · 87773dd5

由 Shawn Bohrer 提交于 9月 03, 2014

In debugging an application that receives -ENOMEM from ib_reg_mr(), I
found that ib_umem_get() can fail because the pinned_vm count has
wrapped causing it to always be larger than the lock limit even with
RLIMIT_MEMLOCK set to RLIM_INFINITY.

The wrapping of pinned_vm occurs because the process that calls
ib_reg_mr() will have its mm->pinned_vm count incremented.  Later a
different process with a different mm_struct than the one that
allocated the ib_umem struct ends up releasing it which results in
decrementing the new processes mm->pinned_vm count past zero and
wrapping.

I'm not entirely sure what circumstances cause a different process to
release the ib_umem than the one that allocated it but the kernel
stack trace of the freeing process from my situation looks like the
following:

    Call Trace:
     [<ffffffff814d64b1>] dump_stack+0x19/0x1b
     [<ffffffffa0b522a5>] ib_umem_release+0x1f5/0x200 [ib_core]
     [<ffffffffa0b90681>] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib]
     [<ffffffffa0b4d93c>] ib_destroy_qp+0x12c/0x170 [ib_core]
     [<ffffffffa0cc7129>] ib_uverbs_close+0x259/0x4e0 [ib_uverbs]
     [<ffffffff81141cba>] __fput+0xba/0x240
     [<ffffffff81141e4e>] ____fput+0xe/0x10
     [<ffffffff81060894>] task_work_run+0xc4/0xe0
     [<ffffffff810029e5>] do_notify_resume+0x95/0xa0
     [<ffffffff814e3dd0>] int_signal+0x12/0x17

The following patch fixes the issue by storing the pid struct of the
process that calls ib_umem_get() so that ib_umem_release and/or
ib_umem_account() can properly decrement the pinned_vm count of the
correct mm_struct.
Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Reviewed-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

87773dd5

19 9月, 2014 2 次提交

[SCSI] fix regression that accidentally disabled block-based tcq · e8be1cf5

由 Christoph Hellwig 提交于 9月 12, 2014

The scsi blk-mq support accidentally flipped a conditional, which lead to
never enabling block based tcq when using the legacy request path.

Fixes: d285203c scsi: add support for a blk-mq based I/O path.
Reported-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

e8be1cf5

vgaswitcheroo: add vga_switcheroo_fini_domain_pm_ops · 766a53d0

由 Alex Deucher 提交于 9月 12, 2014

Drivers should call this on unload to unregister pmops.

Bug:
https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: NBen Skeggs <bskeggs@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPali Rohár <pali.rohar@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>

766a53d0

17 9月, 2014 1 次提交

vgaarb: Drop obsolete #ifndef · ce6eacb0

由 Bruno Prémont 提交于 8月 24, 2014

Commit 20cde694 ("x86, ia64: Move EFI_FB vga_default_device()
initialization to pci_vga_fixup()") moved boot video device detection from
efifb to x86 and ia64 pci/fixup.c.

Remove the left-over #ifndef check that will always match since the
corresponding arch-specific define is gone with above patch.
Signed-off-by: NBruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: Matthew Garrett <matthew.garrett@nebula.com>

ce6eacb0

16 9月, 2014 3 次提交

xfrm: Generate queueing routes only from route lookup functions · b8c203b2

由 Steffen Klassert 提交于 9月 16, 2014

Currently we genarate a queueing route if we have matching policies
but can not resolve the states and the sysctl xfrm_larval_drop is
disabled. Here we assume that dst_output() is called to kill the
queued packets. Unfortunately this assumption is not true in all
cases, so it is possible that these packets leave the system unwanted.

We fix this by generating queueing routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: a0073fe1 ("xfrm: Add a state resolution packet queue")
Reported-by: NKonstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

b8c203b2

xfrm: Generate blackhole routes only from route lookup functions · f92ee619

由 Steffen Klassert 提交于 9月 16, 2014

Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: NKonstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

f92ee619

ACPIPHP / radeon / nouveau: Remove acpi_bus_no_hotplug() · f91ce35e

由 Bjorn Helgaas 提交于 9月 10, 2014

Revert parts of f244d8b6 ("ACPIPHP / radeon / nouveau: Fix VGA
switcheroo problem related to hotplug").

A previous commit 5493b31f0b55 ("PCI: Add pci_ignore_hotplug() to ignore
hotplug events for a device") added equivalent functionality implemented in
a different way for both acpiphp and pciehp.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NDave Airlie <airlied@redhat.com>
Acked-by: NRajat Jain <rajatxjain@gmail.com>

f91ce35e

15 9月, 2014 1 次提交

vfs: avoid non-forwarding large load after small store in path lookup · 9226b5b4

由 Linus Torvalds 提交于 9月 14, 2014

The performance regression that Josef Bacik reported in the pathname
lookup (see commit 99d263d4 "vfs: fix bad hashing of dentries") made
me look at performance stability of the dcache code, just to verify that
the problem was actually fixed.  That turned up a few other problems in
this area.

There are a few cases where we exit RCU lookup mode and go to the slow
serializing case when we shouldn't, Al has fixed those and they'll come
in with the next VFS pull.

But my performance verification also shows that link_path_walk() turns
out to have a very unfortunate 32-bit store of the length and hash of
the name we look up, followed by a 64-bit read of the combined hash_len
field.  That screws up the processor store to load forwarding, causing
an unnecessary hickup in this critical routine.

It's caused by the ugly calling convention for the "hash_name()"
function, and easily fixed by just making hash_name() fill in the whole
'struct qstr' rather than passing it a pointer to just the hash value.

With that, the profile for this function looks much smoother.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9226b5b4

14 9月, 2014 1 次提交

Make hash_64() use a 64-bit multiply when appropriate · 23d0db76

由 Linus Torvalds 提交于 9月 13, 2014

The hash_64() function historically does the multiply by the
GOLDEN_RATIO_PRIME_64 number with explicit shifts and adds, because
unlike the 32-bit case, gcc seems unable to turn the constant multiply
into the more appropriate shift and adds when required.

However, that means that we generate those shifts and adds even when the
architecture has a fast multiplier, and could just do it better in
hardware.

Use the now-cleaned-up CONFIG_ARCH_HAS_FAST_MULTIPLIER (together with
"is it a 64-bit architecture") to decide whether to use an integer
multiply or the explicit sequence of shift/add instructions.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23d0db76

13 9月, 2014 3 次提交

ipv6: clean up anycast when an interface is destroyed · 381f4dca

由 Sabrina Dubroca 提交于 9月 10, 2014

If we try to rmmod the driver for an interface while sockets with
setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up
and we get stuck on:

  unregister_netdevice: waiting for ens3 to become free. Usage count = 1

If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no
problem.

We need to perform a cleanup similar to the one for multicast in
addrconf_ifdown(how == 1).
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

381f4dca

jiffies: Fix timeval conversion to jiffies · d78c9300

由 Andrew Hunter 提交于 9月 04, 2014

timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:

setitimer(ITIMER_PROF, &val, NULL);
setitimer(ITIMER_PROF, NULL, &val);

would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.)  Doing
this repeatedly would cause unbounded growth in val.  So fix the math.

Here's what was wrong with the conversion: we essentially computed
(eliding seconds)

jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)

by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:

jiffies = (usec * x) >> USEC_JIFFIE_SC

and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)

In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.

We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies.  This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.

Tested: the following program:

int main() {
  struct itimerval zero = {{0, 0}, {0, 0}};
  /* Initially set to 10 ms. */
  struct itimerval initial = zero;
  initial.it_interval.tv_usec = 10000;
  setitimer(ITIMER_PROF, &initial, NULL);
  /* Save and restore several times. */
  for (size_t i = 0; i < 10; ++i) {
    struct itimerval prev;
    setitimer(ITIMER_PROF, &zero, &prev);
    /* on old kernels, this goes up by TICK_USEC every iteration */
    printf("previous value: %ld %ld %ld %ld\n",
           prev.it_interval.tv_sec, prev.it_interval.tv_usec,
           prev.it_value.tv_sec, prev.it_value.tv_usec);
    setitimer(ITIMER_PROF, &prev, NULL);
  }
    return 0;
}

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: NPaul Turner <pjt@google.com>
Reported-by: NAaron Jacobs <jacobsa@google.com>
Signed-off-by: NAndrew Hunter <ahh@google.com>
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

d78c9300

workqueue: apply __WQ_ORDERED to create_singlethread_workqueue() · e09c2c29

由 Tejun Heo 提交于 9月 13, 2014

create_singlethread_workqueue() is a compat interface for single
threaded workqueue which maps to ordered workqueue w/ rescuer in the
current implementation.  create_singlethread_workqueue() currently
implemented by invoking alloc_workqueue() w/ appropriate parameters.

8719dcea ("workqueue: reject adjusting max_active or applying
attrs to ordered workqueues") introduced __WQ_ORDERED to protect
ordered workqueues against dynamic attribute changes which can break
ordering guarantees but forgot to apply it to
create_singlethread_workqueue().  This in itself is okay as nobody
currently uses dynamic attribute change on workqueues created with
create_singlethread_workqueue().

However, 4c16bd32 ("workqueue: implement NUMA affinity for unbound
workqueues") broke singlethreaded guarantee for ordered workqueues
through allocating a separate pool_workqueue on each NUMA node by
default.  A later change 8a2b7538 ("workqueue: fix ordered
workqueues in NUMA setups") fixed it by allocating only one global
pool_workqueue if __WQ_ORDERED is set.

Combined, the __WQ_ORDERED omission in create_singlethread_workqueue()
became critical breaking its single threadedness and ordering
guarantee.

Let's make create_singlethread_workqueue() wrap
alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and
can implicitly track future ordered_workqueue changes.

v2: I missed that __WQ_ORDERED now protects against pwq splitting
    across NUMA nodes and incorrectly described the patch as a
    nice-to-have fix to protect against future dynamic attribute
    usages.  Oleg pointed out that this is actually a critical
    breakage due to 8a2b7538 ("workqueue: fix ordered workqueues
    in NUMA setups").
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NMike Anderson <mike.anderson@us.ibm.com>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues")

e09c2c29

12 9月, 2014 1 次提交

xen/arm: introduce XENFEAT_grant_map_identity · 5ebc77de

由 Stefano Stabellini 提交于 9月 10, 2014

The flag tells us that the hypervisor maps a grant page to guest
physical address == machine address of the page in addition to the
normal grant mapping address. It is needed to properly issue cache
maintenance operation at the completion of a DMA operation involving a
foreign grant.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: NDenis Schneider <v1ne2go@gmail.com>

5ebc77de

11 9月, 2014 3 次提交

shm: add memfd.h to UAPI export list · b01d0720

由 David Drysdale 提交于 9月 09, 2014

The new header file memfd.h from commit 9183df25 ("shm: add
memfd_create() syscall") should be exported.
Signed-off-by: NDavid Drysdale <drysdale@google.com>
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b01d0720

net/mlx4: Set vlan stripping policy by the right command · 09e05c3f

由 Matan Barak 提交于 9月 10, 2014

Changing the vlan stripping policy of the QP isn't supported by older
firmware versions for the INIT2RTR command. Nevertheless, we've used it.

Fix that by doing this policy change using INIT2RTR only if the firmware
supports it, otherwise, we call UPDATE_QP command to do the task.

Fixes: 7677fc96 ('net/mlx4: Strengthen VLAN tags/priorities enforcement in VST mode')
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09e05c3f

PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device · b440bde7

由 Bjorn Helgaas 提交于 9月 10, 2014

Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold),
normally generates a hot-remove event that unbinds the driver.

Some drivers expect to remain bound to a device even while they power it
off and back on again. This can be dangerous, because if the device is
removed or replaced while it is powered off, the driver doesn't know that
anything changed. But some drivers accept that risk.

Add pci_ignore_hotplug() for use by drivers that know their device cannot
be removed. Using pci_ignore_hotplug() tells the PCI core that hot-plug
events for the device should be ignored.

The radeon and nouveau drivers use this to switch between a low-power,
integrated GPU and a higher-power, higher-performance discrete GPU. They
power off the unused GPU, but they want to remain bound to it.

This is a reimplementation of f244d8b6 ("ACPIPHP / radeon / nouveau:
Fix VGA switcheroo problem related to hotplug") but extends it to work with
both acpiphp and pciehp.

This fixes a problem where systems with dual GPUs using the radeon drivers
become unusable, freezing every few seconds (see bugzillas below). The
resume of the radeon device may also fail, e.g.,

This fixes problems on dual GPU systems where the radeon driver becomes
unusable because of problems while suspending the device, as in bug 79701:

[drm] radeon: finishing device.
radeon 0000:01:00.0: Userspace still has active objects !
radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free
...
WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]()
trying to unbind memory from uninitialized GART !

or while resuming it, as in bug 77261:

radeon 0000:01:00.0: ring 0 stalled for more than 10158msec
radeon 0000:01:00.0: GPU lockup ...
radeon 0000:01:00.0: GPU pci config reset
pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1)
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
*ERROR* radeon: dpm resume failed
radeon 0000:01:00.0: Wait for MC idle timedout !

Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701Reported-by: NShawn Starr <shawn.starr@rogers.com>
Reported-by: NJose P. <lbdkmjdf@sharklasers.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NRajat Jain <rajatxjain@gmail.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NDave Airlie <airlied@redhat.com>
CC: stable@vger.kernel.org # v3.15+

b440bde7

09 9月, 2014 1 次提交

Input: add INPUT_PROP_POINTING_STICK property · 7611392f

由 Hans de Goede 提交于 9月 08, 2014

It is useful for userspace to know that there not dealing with a regular
mouse but rather with a pointing stick (e.g. a trackpoint) so that
userspace can e.g. automatically enable middle button scrollwheel
emulation.

It is impossible to tell the difference from the evdev info without
resorting to putting a list of device / driver names in userspace, this is
undesirable.

Add a property which allows userspace to see if a device is a pointing
stick, and set it on all the pointing stick drivers.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Acked-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: NPeter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

7611392f

06 9月, 2014 2 次提交

net: treewide: Fix typo found in DocBook/networking.xml · e793c0f7

由 Masanari Iida 提交于 9月 04, 2014

This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e793c0f7

net-timestamp: only report sw timestamp if reporting bit is set · c199105d

由 Willem de Bruijn 提交于 9月 03, 2014

The timestamping API has separate bits for generating and reporting
timestamps. A software timestamp should only be reported for a packet
when the packet has the relevant generation flag (SKBTX_..) set
and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set.

The second check was accidentally removed. Reinstitute the original
behavior.

Tested:
  Without this patch, Documentation/networking/txtimestamp reports
  timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set.
  After the patch, it only reports them when the flag is set.

Fixes: f24b9be5 ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c199105d

05 9月, 2014 2 次提交

crypto: drbg - backport "fix maximum value checks on 32 bit systems" · fb38ab4c

由 Herbert Xu 提交于 9月 05, 2014

This is a backport of commit b9347aff.
This backport is needed as without it the code will crash on 32-bit
systems.
    
The maximum values for additional input string or generated blocks is
larger than 1<<32. To ensure a sensible value on 32 bit systems, return
SIZE_MAX on 32 bit systems. This value is lower than the maximum
allowed values defined in SP800-90A. The standard allow lower maximum
values, but not larger values.

SIZE_MAX - 1 is used for drbg_max_addtl to allow
drbg_healthcheck_sanity to check the enforcement of the variable
without wrapping.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

fb38ab4c

usb: usbip: fix usbip.h path in userspace tool · 6fa9e1be

由 Piotr Król 提交于 9月 05, 2014

Fixes: 588b48ca ("usbip: move usbip userspace code out of staging")
which introduced build failure by not changing uapi/usbip.h include path
according to new location.
Signed-off-by: NPiotr Król <piotr.krol@3mdeb.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6fa9e1be

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功